NetNewsWire/Frameworks/RSParser
2017-11-28 21:29:09 -08:00
..
Feeds Fix two typos in JSONFeedParser which kept the parser from getting the feed’s favicon and icon URLs. Also added a test for this. 2017-11-25 10:34:48 -08:00
HTML Prefer Apple touch icons to other feed icons. 2017-11-26 20:40:07 -08:00
OPML Fix OPML importing. 2017-10-21 21:00:21 -07:00
RSParser.xcodeproj Fix bug detecting Macworld’s RSS feed as an RSS feed. The feed doesn’t start with the standard XML header. 2017-11-28 21:29:09 -08:00
RSParserTests Fix bug detecting Macworld’s RSS feed as an RSS feed. The feed doesn’t start with the standard XML header. 2017-11-28 21:29:09 -08:00
SAX Fix builder errors, mostly in RSParser. 2017-10-04 13:28:48 -07:00
Utilities Fix bug detecting Macworld’s RSS feed as an RSS feed. The feed doesn’t start with the standard XML header. 2017-11-28 21:29:09 -08:00
Info.plist Start work on turning RSXML.framework into RSParser.framework. 2017-06-20 21:18:46 -07:00
ParserData.h Fix builder errors, mostly in RSParser. 2017-10-04 13:28:48 -07:00
ParserData.m Fix builder errors, mostly in RSParser. 2017-10-04 13:28:48 -07:00
README.md Move feedType function to FeedType.swift. Add a few more cases to FeedParserTypeTests. 2017-06-26 19:37:30 -07:00
RSParser.h Parse Open Graph images when parsing metadata from an HTML page. 2017-11-26 11:38:03 -08:00

RSParser

(Note: Tests are still incomplete. Its possible that none of this works.)

(Also note: this framework is intended to supersede my RSXML framework. Use this one instead. Well, once its working, that is.)

Whats inside

This framework includes parsers for:

It also includes Objective-C wrappers for libXML2s XML SAX and HTML SAX parsers. You can write your own parsers on top of these.

This framework builds for macOS. It could be made to build for iOS also, but I havent gotten around to it yet.

How to parse feeds

To get the type of a feed, even with partial data, call FeedParser.feedType(parserData), which will return a FeedType.

To parse a feed, call FeedParser.parseFeed(parserData), which will return a ParsedFeed. Also see related structs: ParsedAuthor, ParsedItem, ParsedAttachment, and ParsedHub.

You do not need to know the type of feed when calling FeedParser.parseFeed — it will figure it out and use the correct concrete parser.

However, if you do want to use a concrete parser directly, see RSSInJSONParser, JSONFeedParser, RSSParser, and AtomParser.

(Note: if you want to write a feed reader app, please do! You have my blessing and encouragement. Let me know when its shipping so I can check it out.)

How to parse OPML

Call +[RSOPMLParser parseOPMLWithParserData:error:], which returns an RSOPMLDocument. See related objects: RSOPMLItem, RSOPMLAttributes, RSOPMLFeedSpecifier, and RSOPMLError.

How to parse dates

Call RSDateWithString or RSDateWithBytes (see RSDateParser). These handle the common internet date formats. You dont need to know which format.

How to parse HTML

To get an array of <a href=… links from from an HTML document, call +[RSHTMLLinkParser htmlLinksWithParserData:]. It returns an array of RSHTMLLink.

To parse the metadata in an HTML document, call +[RSHTMLMetadataParser HTMLMetadataWithParserData:]. It returns an RSHTMLMetadata object.

To write your own HTML parser, see RSSAXHTMLParser. The two parsers above can serve as examples.

How to parse HTML entities

When you have a string with things like &#8212; and &euml; and you want to turn those into the correct characters, call -[NSString rsparser_stringByDecodingHTMLEntities]. (See NSString+RSParser.h.)

How to parse XML

If you need to parse some XML that isnt RSS, Atom, or OPML, you can use RSSAXParser. Dont subclass it — instead, create an RSSAXParserDelegate. See RSRSSParser, RSAtomParser, and RSOPMLParser as examples.

Why use libXML2s SAX API?

SAX is kind of a pain because of all the state you have to manage.

An alternative is to use NSXMLParser, which is event-driven like SAX. However, RSSAXParser was written to avoid allocating Objective-C objects except when absolutely needed. Youll note use of things like memcp and strncmp.

Normally I avoid this kind of thing strenuously. I prefer to work at the highest level possible.

But my more-than-a-decade of experience parsing XML has led me to this solution, which — last time I checked, which was, admittedly, a few years ago — was not only fastest but also uses the least memory. (The two things are related, of course: creating objects is bad for performance, so this code attempts to do the minimum possible.)

All that low-level stuff is encapsulated, however. If you just want to parse one of the popular feed formats, see FeedParser, which makes it easy and Swift-y.

Thread safety

Everything here is thread-safe.

Everythings pretty fast, too, so you probably could just use the main thread/queue. But its totally a-okay to use a non-serial background queue.