NetNewsWire/Frameworks/RSParser
Brent Simmons 1e90237e7e Fix bug decoding ' entities.
1. Fix typo in the entities dictionary — add the actual ' character.
2. Add EntityDecodingTests.swift and a test for this.
2017-12-30 10:24:44 -08:00
..
Feeds Parse Atom authors. Fix #260. 2017-12-19 13:24:19 -08:00
HTML Prefer Apple touch icons to other feed icons. 2017-11-26 20:40:07 -08:00
JSON Make JSONTypes public. Add JSONUtilities. 2017-12-10 13:53:00 -08:00
OPML Fix OPML importing. 2017-10-21 21:00:21 -07:00
RSParser.xcodeproj Fix bug decoding ' entities. 2017-12-30 10:24:44 -08:00
RSParserTests Fix bug decoding ' entities. 2017-12-30 10:24:44 -08:00
SAX Fix builder errors, mostly in RSParser. 2017-10-04 13:28:48 -07:00
Utilities Fix bug decoding ' entities. 2017-12-30 10:24:44 -08:00
Info.plist Start work on turning RSXML.framework into RSParser.framework. 2017-06-20 21:18:46 -07:00
LICENSE Add license and Readme from RSParser’s separate open source project. Just part of keeping these both in sync. 2017-12-30 10:24:04 -08:00
ParserData.h Fix builder errors, mostly in RSParser. 2017-10-04 13:28:48 -07:00
ParserData.m Fix builder errors, mostly in RSParser. 2017-10-04 13:28:48 -07:00
README.md Add license and Readme from RSParser’s separate open source project. Just part of keeping these both in sync. 2017-12-30 10:24:04 -08:00
RSParser.h Support multiple authors in RSS and Atom feeds. 2017-12-19 13:03:05 -08:00

RSParser

This framework was developed for Evergreen and is made available here for developers who just need the parsing code. It has no depencies that arent provided by the system.

Whats inside

This framework includes parsers for:

It also includes Objective-C wrappers for libXML2s XML SAX and HTML SAX parsers. You can write your own parsers on top of these.

This framework builds for macOS. It could be made to build for iOS also, but I havent gotten around to it yet.

How to parse feeds

To get the type of a feed, even with partial data, call FeedParser.feedType(parserData), which will return a FeedType.

To parse a feed, call FeedParser.parse(parserData), which will return a ParsedFeed. Also see related structs: ParsedAuthor, ParsedItem, ParsedAttachment, and ParsedHub.

You do not need to know the type of feed when calling FeedParser.parse — it will figure it out and use the correct concrete parser.

However, if you do want to use a concrete parser directly, see RSSInJSONParser, JSONFeedParser, RSSParser, and AtomParser.

(Note: if you want to write a feed reader app, please do! You have my blessing and encouragement. Let me know when its shipping so I can check it out.)

How to parse OPML

Call +[RSOPMLParser parseOPMLWithParserData:error:], which returns an RSOPMLDocument. See related objects: RSOPMLItem, RSOPMLAttributes, RSOPMLFeedSpecifier, and RSOPMLError.

How to parse dates

Call RSDateWithString or RSDateWithBytes (see RSDateParser). These handle the common internet date formats. You dont need to know which format.

How to parse HTML

To get an array of <a href=… links from from an HTML document, call +[RSHTMLLinkParser htmlLinksWithParserData:]. It returns an array of RSHTMLLink.

To parse the metadata in an HTML document, call +[RSHTMLMetadataParser HTMLMetadataWithParserData:]. It returns an RSHTMLMetadata object.

To write your own HTML parser, see RSSAXHTMLParser. The two parsers above can serve as examples.

How to parse HTML entities

When you have a string with things like &#8212; and &euml; and you want to turn those into the correct characters, call -[NSString rsparser_stringByDecodingHTMLEntities]. (See NSString+RSParser.h.)

How to parse XML

If you need to parse some XML that isnt RSS, Atom, or OPML, you can use RSSAXParser. Dont subclass it — instead, create an RSSAXParserDelegate. See RSRSSParser, RSAtomParser, and RSOPMLParser as examples.

Why use libXML2s SAX API?

SAX is kind of a pain because of all the state you have to manage.

An alternative is to use NSXMLParser, which is event-driven like SAX. However, RSSAXParser was written to avoid allocating Objective-C objects except when absolutely needed. Youll note use of things like memcp and strncmp.

Normally I avoid this kind of thing strenuously. I prefer to work at the highest level possible.

But my more-than-a-decade of experience parsing XML has led me to this solution, which — last time I checked, which was, admittedly, a few years ago — was not only fastest but also uses the least memory. (The two things are related, of course: creating objects is bad for performance, so this code attempts to do the minimum possible.)

All that low-level stuff is encapsulated, however. If you just want to parse one of the popular feed formats, see FeedParser, which makes it easy and Swift-y.

Thread safety

Everything here is thread-safe.

Everythings pretty fast, too, so you probably could just use the main thread/queue. But its totally a-okay to use a non-serial background queue.