Refactoring to v.2.0

This commit is contained in:
relikd
2018-12-27 21:11:59 +01:00
parent f9e672661a
commit 62c5bef463
50 changed files with 2574 additions and 3128 deletions

View File

@@ -1,6 +1,7 @@
MIT License MIT License
Copyright (c) 2016 brentsimmons Original work: Copyright (c) 2016 Brent Simmons
Modified work: Copyright (c) 2018 Oleg Geier
Permission is hereby granted, free of charge, to any person obtaining a copy Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal of this software and associated documentation files (the "Software"), to deal

View File

@@ -1,39 +1,66 @@
# RSXML # RSXML
This is utility code for parsing XML and HTML using libXML2s SAX parser. This is utility code for parsing XML and HTML using libXML2s SAX parser. It does not depend on any other third-party frameworks and builds two targets: one for Mac, one for iOS.
It builds two framework targets: one for Mac, one for iOS. It does not depend on any other third-party frameworks. The code is Objective-C with ARC. **Note:** This is an actively maintained fork of the [RSXML library by Brent Simmons](https://github.com/brentsimmons/RSXML). The original library seems to be inactive in favor of the new version [RSParser](https://github.com/brentsimmons/RSParser) which is written with Swift support in mind. If you prefer Swift you should go ahead and work with that project. However, the reason for this fork is to keep a version alive which is Objective-C only.
#### The gist
To parse XML, create an `RSSAXParserDelegate`. (There are examples in the framework that you can crib from.)
To parse HTML, create an `RSSAXHTMLParserDelegate`. (There are examples for this too.) ### Why use libXML2s SAX API?
#### Goodies and Extras Brent Simmons put much value on low memory footprint and fast parsing. With his own words: "RSXML was written to avoid allocating Objective-C objects except when absolutely needed. You'll note use of things like `memcmp` and `strncmp`". This promise will not be broken in future development.
There are three XML parsers included, for OPML, RSS, and Atom. To parse OPML, see `RSOPMLParser`. To parse RSS and Atom, see `RSFeedParser`.
These parsers may or may not be complete enough for your needs. You could, in theory, start writing an RSS reader just with these. (And, if you want to, go for it, with my blessing.)
There are two HTML parsers included. `RSHTMLMetadataParser` pulls metadata from the head section of an HTML document. `RSHTMLLinkParser` pulls all the links (anchors, <a href=…> tags) from an HTML document. ### Refactoring v.2.0
Other possibly interesting things: The refactoring that led to version 2.0 changed many things. With nearly all files touched, I would say roughly 80% of the code was updated. The parser architecture was rewritten and every parser is now a subclass of `RSXMLParser`. The parsing interface uses generic return types and some of the returned documents have changed as well.
`RSDateParser` makes it easy to parse dates in the formats found in various types of feeds. In general, the performance did not change but if so only to get slightly better. However, the performance of the HTML metadata parser improved by 80% 90% (by canceling the parse after the head tag). At the same time, heap allocations dropped to 50% 30% for the test cases (same reason).
In the previous version, the test case for parsing a non-opml file (with `RSOPMLParser`) took 13 seconds, whereas now, the parser cancels after a few milliseconds.
## Usage
```
RSXMLData *xmlData = [[RSXMLData alloc] initWithData:d urlString:@"https://www.example.org"];
// TODO: check xmlData.parserError
RSFeedParser *parser = [[RSFeedParser alloc] initWithXMLData:xmlData];
// TODO: check [parser canParse]
// TODO: alternatively check error after parseSync:
NSError *parseError;
RSParsedFeed *document = [parser parseSync:&parseError];
```
`RSXMLData` will return an error in `.parserError` if the provided data is not in XML format (see `RSXMLError` for possible reasons). The other point of failure is after initializing a parser with the `RSXMLData`. This will set an error if the parser does not match the underlying data (e.g., if you try to parse an `.opml` file with an Atom or RSS parser).
If you don't care about the parser used to decode the data, `[xmlData getParser]` will return the most suitable parser. You can use that parser right away to call `parseSync:`. Anyway, you can also parse the XML file asynchronously with `parseAsync:`.
```
[[xmlData getParser] parseAsync:^(RSParsedFeed *parsedDocument, NSError *error) {
// process feed items ...
}];
```
### Available parsers
This library includes parsers for RSS, Atom, OPML, and HTML metadata. The latter will return links to feed URLs, icon files, or generally all anchor tags linking to whatever. Use `RSFeedParser` to parse a feed regardless of type (Atom: `RSAtomParser`, RSS: `RSRSSParser`). To parse `.opml` files use `RSOPMLParser`, and for `.html` files there are two available `RSHTMLMetadataParser` (icons and feed links) and `RSHTMLLinkParser` (all anchor tags).
Depending on the parser the return value of `parseSync`/`parseAsync` is: `RSParsedFeed`, `RSOPMLItem`, `RSHTMLMetadata`, or `RSHTMLMetadataAnchor`.
You can define the parser type by declaring it like this: `RSXMLData<RSFeedParser*> xmlData`. That won't force the selection of the parser, though. But `[xmlData getParser]` will return the correct type; which in turn will return the appropriate document type (same as using a specific parser in the first place).
### Extras
`RSDateParser` makes it easy to parse dates from various formats found in different feed types.
`NSString+RSXML` decodes HTML entities. `NSString+RSXML` decodes HTML entities.
Also note: there are some unit tests. Also note: there are some unit tests.
#### Why use libXML2s SAX API?
SAX is kind of a pain because of all the state you have to manage. But its fastest and uses the least amount of memory.
An alternative is to use `NSXMLParser`, which is event-driven like SAX. However, RSXML was written to avoid allocating Objective-C objects except when absolutely needed. Youll note use of things like `memcp` and `strncmp`.
Normally I avoid this kind of thing *strenuously*. I prefer to work at the highest level possible.
But my more-than-a-decade of experience parsing XML has led me to this solution, which — last time I checked, which was, admittedly, a few years ago — was not only fastest but also uses the least memory. (The two things are related, of course: creating objects is bad for performance, so this code attempts to do the minimum possible.)
All that low-level stuff is encapsulated, however. If you parse a feed, for instance, the caller gets an `RSParsedFeed` which contains `RSParsedArticle`s, and theyre standard Objective-C objects. Its only inside your `RSSAXParserDelegate` and `RSSAXHTMLParserDelegate` where youll need to deal with C.

View File

@@ -7,6 +7,14 @@
objects = { objects = {
/* Begin PBXBuildFile section */ /* Begin PBXBuildFile section */
54702A9821D407A00050A741 /* RSXMLParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 54702A9621D4079F0050A741 /* RSXMLParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
54702A9921D407A00050A741 /* RSXMLParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 54702A9621D4079F0050A741 /* RSXMLParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
54702A9A21D407A00050A741 /* RSXMLParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 54702A9721D407A00050A741 /* RSXMLParser.m */; };
54702A9B21D407A00050A741 /* RSXMLParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 54702A9721D407A00050A741 /* RSXMLParser.m */; };
54C707DB21D42B710029BFF1 /* NSDictionary+RSXML.m in Sources */ = {isa = PBXBuildFile; fileRef = 54C707D921D42B710029BFF1 /* NSDictionary+RSXML.m */; };
54C707DC21D42B710029BFF1 /* NSDictionary+RSXML.m in Sources */ = {isa = PBXBuildFile; fileRef = 54C707D921D42B710029BFF1 /* NSDictionary+RSXML.m */; };
54C707DD21D42B710029BFF1 /* NSDictionary+RSXML.h in Headers */ = {isa = PBXBuildFile; fileRef = 54C707DA21D42B710029BFF1 /* NSDictionary+RSXML.h */; };
54C707DE21D42B710029BFF1 /* NSDictionary+RSXML.h in Headers */ = {isa = PBXBuildFile; fileRef = 54C707DA21D42B710029BFF1 /* NSDictionary+RSXML.h */; };
54FCE5F421493B5E00FABB65 /* Resources in Resources */ = {isa = PBXBuildFile; fileRef = 54FCE5F321493B5E00FABB65 /* Resources */; }; 54FCE5F421493B5E00FABB65 /* Resources in Resources */ = {isa = PBXBuildFile; fileRef = 54FCE5F321493B5E00FABB65 /* Resources */; };
8400B0F01B8C20A9004C4CFF /* RSXMLData.h in Headers */ = {isa = PBXBuildFile; fileRef = 8400B0EE1B8C20A9004C4CFF /* RSXMLData.h */; settings = {ATTRIBUTES = (Public, ); }; }; 8400B0F01B8C20A9004C4CFF /* RSXMLData.h in Headers */ = {isa = PBXBuildFile; fileRef = 8400B0EE1B8C20A9004C4CFF /* RSXMLData.h */; settings = {ATTRIBUTES = (Public, ); }; };
8400B0F11B8C20A9004C4CFF /* RSXMLData.m in Sources */ = {isa = PBXBuildFile; fileRef = 8400B0EF1B8C20A9004C4CFF /* RSXMLData.m */; }; 8400B0F11B8C20A9004C4CFF /* RSXMLData.m in Sources */ = {isa = PBXBuildFile; fileRef = 8400B0EF1B8C20A9004C4CFF /* RSXMLData.m */; };
@@ -21,21 +29,16 @@
842D515B1B52E81B00E63D52 /* RSRSSParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51591B52E81B00E63D52 /* RSRSSParser.m */; }; 842D515B1B52E81B00E63D52 /* RSRSSParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51591B52E81B00E63D52 /* RSRSSParser.m */; };
842D51631B53058B00E63D52 /* RSParsedArticle.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51611B53058B00E63D52 /* RSParsedArticle.h */; settings = {ATTRIBUTES = (Public, ); }; }; 842D51631B53058B00E63D52 /* RSParsedArticle.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51611B53058B00E63D52 /* RSParsedArticle.h */; settings = {ATTRIBUTES = (Public, ); }; };
842D51641B53058B00E63D52 /* RSParsedArticle.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51621B53058B00E63D52 /* RSParsedArticle.m */; }; 842D51641B53058B00E63D52 /* RSParsedArticle.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51621B53058B00E63D52 /* RSParsedArticle.m */; };
842D516F1B5308BD00E63D52 /* FeedParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D516D1B5308BD00E63D52 /* FeedParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
842D51761B530BF200E63D52 /* RSParsedFeed.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51741B530BF200E63D52 /* RSParsedFeed.h */; settings = {ATTRIBUTES = (Public, ); }; }; 842D51761B530BF200E63D52 /* RSParsedFeed.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51741B530BF200E63D52 /* RSParsedFeed.h */; settings = {ATTRIBUTES = (Public, ); }; };
842D51771B530BF200E63D52 /* RSParsedFeed.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51751B530BF200E63D52 /* RSParsedFeed.m */; }; 842D51771B530BF200E63D52 /* RSParsedFeed.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51751B530BF200E63D52 /* RSParsedFeed.m */; };
842D517A1B5311AD00E63D52 /* RSOPMLParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51781B5311AD00E63D52 /* RSOPMLParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 842D517A1B5311AD00E63D52 /* RSOPMLParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51781B5311AD00E63D52 /* RSOPMLParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
842D517B1B5311AD00E63D52 /* RSOPMLParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51791B5311AD00E63D52 /* RSOPMLParser.m */; }; 842D517B1B5311AD00E63D52 /* RSOPMLParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51791B5311AD00E63D52 /* RSOPMLParser.m */; };
843819001C8CB00400E2A1DD /* RSSAXHTMLParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 843818FE1C8CB00400E2A1DD /* RSSAXHTMLParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
843819011C8CB00400E2A1DD /* RSSAXHTMLParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 843818FF1C8CB00400E2A1DD /* RSSAXHTMLParser.m */; };
8475C4081D57AB4C0076751E /* RSHTMLLinkParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 8475C4081D57AB4C0076751E /* RSHTMLLinkParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
8475C4091D57AB4C0076751E /* RSHTMLLinkParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */; }; 8475C4091D57AB4C0076751E /* RSHTMLLinkParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */; };
8486F1151BB646140092794F /* NSString+RSXML.h in Headers */ = {isa = PBXBuildFile; fileRef = 8486F1131BB646140092794F /* NSString+RSXML.h */; settings = {ATTRIBUTES = (Public, ); }; }; 8486F1151BB646140092794F /* NSString+RSXML.h in Headers */ = {isa = PBXBuildFile; fileRef = 8486F1131BB646140092794F /* NSString+RSXML.h */; settings = {ATTRIBUTES = (Public, ); }; };
8486F1161BB646140092794F /* NSString+RSXML.m in Sources */ = {isa = PBXBuildFile; fileRef = 8486F1141BB646140092794F /* NSString+RSXML.m */; }; 8486F1161BB646140092794F /* NSString+RSXML.m in Sources */ = {isa = PBXBuildFile; fileRef = 8486F1141BB646140092794F /* NSString+RSXML.m */; };
84AD0BF51E11A6FB00B38510 /* RSDateParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 84AD0BF31E11A6FB00B38510 /* RSDateParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0BF51E11A6FB00B38510 /* RSDateParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 84AD0BF31E11A6FB00B38510 /* RSDateParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0BF61E11A6FB00B38510 /* RSDateParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0BF41E11A6FB00B38510 /* RSDateParser.m */; }; 84AD0BF61E11A6FB00B38510 /* RSDateParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0BF41E11A6FB00B38510 /* RSDateParser.m */; };
84AD0BFA1E11A9A700B38510 /* RSXMLInternal.h in Headers */ = {isa = PBXBuildFile; fileRef = 84AD0BF81E11A9A700B38510 /* RSXMLInternal.h */; };
84AD0BFB1E11A9A700B38510 /* RSXMLInternal.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0BF91E11A9A700B38510 /* RSXMLInternal.m */; };
84AD0C0D1E11B8BE00B38510 /* RSXML.h in Headers */ = {isa = PBXBuildFile; fileRef = 84F22C101B52DDEA000060CE /* RSXML.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C0D1E11B8BE00B38510 /* RSXML.h in Headers */ = {isa = PBXBuildFile; fileRef = 84F22C101B52DDEA000060CE /* RSXML.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C0E1E11B8CA00B38510 /* RSXMLError.h in Headers */ = {isa = PBXBuildFile; fileRef = 84E4BE431C8B8FE400A90B41 /* RSXMLError.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C0E1E11B8CA00B38510 /* RSXMLError.h in Headers */ = {isa = PBXBuildFile; fileRef = 84E4BE431C8B8FE400A90B41 /* RSXMLError.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C0F1E11B8CA00B38510 /* RSXMLError.m in Sources */ = {isa = PBXBuildFile; fileRef = 84E4BE441C8B8FE400A90B41 /* RSXMLError.m */; }; 84AD0C0F1E11B8CA00B38510 /* RSXMLError.m in Sources */ = {isa = PBXBuildFile; fileRef = 84E4BE441C8B8FE400A90B41 /* RSXMLError.m */; };
@@ -53,7 +56,6 @@
84AD0C1D1E11B8CF00B38510 /* RSOPMLItem.m in Sources */ = {isa = PBXBuildFile; fileRef = 8429D1B51C83A03100F97695 /* RSOPMLItem.m */; }; 84AD0C1D1E11B8CF00B38510 /* RSOPMLItem.m in Sources */ = {isa = PBXBuildFile; fileRef = 8429D1B51C83A03100F97695 /* RSOPMLItem.m */; };
84AD0C221E11B8D400B38510 /* RSFeedParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51501B52E80100E63D52 /* RSFeedParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C221E11B8D400B38510 /* RSFeedParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51501B52E80100E63D52 /* RSFeedParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C231E11B8D400B38510 /* RSFeedParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51511B52E80100E63D52 /* RSFeedParser.m */; }; 84AD0C231E11B8D400B38510 /* RSFeedParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51511B52E80100E63D52 /* RSFeedParser.m */; };
84AD0C241E11B8D400B38510 /* FeedParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D516D1B5308BD00E63D52 /* FeedParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C251E11B8D400B38510 /* RSAtomParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D514A1B52E7FC00E63D52 /* RSAtomParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C251E11B8D400B38510 /* RSAtomParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D514A1B52E7FC00E63D52 /* RSAtomParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C261E11B8D400B38510 /* RSAtomParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D514B1B52E7FC00E63D52 /* RSAtomParser.m */; }; 84AD0C261E11B8D400B38510 /* RSAtomParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D514B1B52E7FC00E63D52 /* RSAtomParser.m */; };
84AD0C271E11B8D400B38510 /* RSRSSParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51581B52E81B00E63D52 /* RSRSSParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C271E11B8D400B38510 /* RSRSSParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51581B52E81B00E63D52 /* RSRSSParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
@@ -62,16 +64,12 @@
84AD0C2A1E11B8D400B38510 /* RSParsedFeed.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51751B530BF200E63D52 /* RSParsedFeed.m */; }; 84AD0C2A1E11B8D400B38510 /* RSParsedFeed.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51751B530BF200E63D52 /* RSParsedFeed.m */; };
84AD0C2B1E11B8D400B38510 /* RSParsedArticle.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51611B53058B00E63D52 /* RSParsedArticle.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C2B1E11B8D400B38510 /* RSParsedArticle.h in Headers */ = {isa = PBXBuildFile; fileRef = 842D51611B53058B00E63D52 /* RSParsedArticle.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C2C1E11B8D400B38510 /* RSParsedArticle.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51621B53058B00E63D52 /* RSParsedArticle.m */; }; 84AD0C2C1E11B8D400B38510 /* RSParsedArticle.m in Sources */ = {isa = PBXBuildFile; fileRef = 842D51621B53058B00E63D52 /* RSParsedArticle.m */; };
84AD0C2D1E11B8DA00B38510 /* RSSAXHTMLParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 843818FE1C8CB00400E2A1DD /* RSSAXHTMLParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C2E1E11B8DA00B38510 /* RSSAXHTMLParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 843818FF1C8CB00400E2A1DD /* RSSAXHTMLParser.m */; };
84AD0C2F1E11B8DA00B38510 /* RSHTMLMetadataParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 84BF3E141C8CDD1A005562D8 /* RSHTMLMetadataParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C2F1E11B8DA00B38510 /* RSHTMLMetadataParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 84BF3E141C8CDD1A005562D8 /* RSHTMLMetadataParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C301E11B8DA00B38510 /* RSHTMLMetadataParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 84BF3E151C8CDD1A005562D8 /* RSHTMLMetadataParser.m */; }; 84AD0C301E11B8DA00B38510 /* RSHTMLMetadataParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 84BF3E151C8CDD1A005562D8 /* RSHTMLMetadataParser.m */; };
84AD0C311E11B8DA00B38510 /* RSHTMLMetadata.h in Headers */ = {isa = PBXBuildFile; fileRef = 84BF3E1A1C8CDD6D005562D8 /* RSHTMLMetadata.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C311E11B8DA00B38510 /* RSHTMLMetadata.h in Headers */ = {isa = PBXBuildFile; fileRef = 84BF3E1A1C8CDD6D005562D8 /* RSHTMLMetadata.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C321E11B8DA00B38510 /* RSHTMLMetadata.m in Sources */ = {isa = PBXBuildFile; fileRef = 84BF3E1B1C8CDD6D005562D8 /* RSHTMLMetadata.m */; }; 84AD0C321E11B8DA00B38510 /* RSHTMLMetadata.m in Sources */ = {isa = PBXBuildFile; fileRef = 84BF3E1B1C8CDD6D005562D8 /* RSHTMLMetadata.m */; };
84AD0C331E11B8DA00B38510 /* RSHTMLLinkParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */; settings = {ATTRIBUTES = (Public, ); }; }; 84AD0C331E11B8DA00B38510 /* RSHTMLLinkParser.h in Headers */ = {isa = PBXBuildFile; fileRef = 8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */; settings = {ATTRIBUTES = (Public, ); }; };
84AD0C341E11B8DA00B38510 /* RSHTMLLinkParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */; }; 84AD0C341E11B8DA00B38510 /* RSHTMLLinkParser.m in Sources */ = {isa = PBXBuildFile; fileRef = 8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */; };
84AD0C351E11B8DD00B38510 /* RSXMLInternal.h in Headers */ = {isa = PBXBuildFile; fileRef = 84AD0BF81E11A9A700B38510 /* RSXMLInternal.h */; };
84AD0C361E11B8DD00B38510 /* RSXMLInternal.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0BF91E11A9A700B38510 /* RSXMLInternal.m */; };
84AD0C391E11BAA800B38510 /* libxml2.2.tbd in Frameworks */ = {isa = PBXBuildFile; fileRef = 84AD0C381E11BAA800B38510 /* libxml2.2.tbd */; }; 84AD0C391E11BAA800B38510 /* libxml2.2.tbd in Frameworks */ = {isa = PBXBuildFile; fileRef = 84AD0C381E11BAA800B38510 /* libxml2.2.tbd */; };
84AD0C3B1E11C2D500B38510 /* RSEntityTests.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0C3A1E11C2D500B38510 /* RSEntityTests.m */; }; 84AD0C3B1E11C2D500B38510 /* RSEntityTests.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0C3A1E11C2D500B38510 /* RSEntityTests.m */; };
84AD0C3D1E11D75400B38510 /* RSDateParserTests.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0C3C1E11D75400B38510 /* RSDateParserTests.m */; }; 84AD0C3D1E11D75400B38510 /* RSDateParserTests.m in Sources */ = {isa = PBXBuildFile; fileRef = 84AD0C3C1E11D75400B38510 /* RSDateParserTests.m */; };
@@ -101,9 +99,13 @@
/* End PBXContainerItemProxy section */ /* End PBXContainerItemProxy section */
/* Begin PBXFileReference section */ /* Begin PBXFileReference section */
54702A9621D4079F0050A741 /* RSXMLParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSXMLParser.h; sourceTree = "<group>"; };
54702A9721D407A00050A741 /* RSXMLParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSXMLParser.m; sourceTree = "<group>"; };
54C707D921D42B710029BFF1 /* NSDictionary+RSXML.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = "NSDictionary+RSXML.m"; sourceTree = "<group>"; };
54C707DA21D42B710029BFF1 /* NSDictionary+RSXML.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = "NSDictionary+RSXML.h"; sourceTree = "<group>"; };
54FCE5F321493B5E00FABB65 /* Resources */ = {isa = PBXFileReference; lastKnownFileType = folder; path = Resources; sourceTree = "<group>"; }; 54FCE5F321493B5E00FABB65 /* Resources */ = {isa = PBXFileReference; lastKnownFileType = folder; path = Resources; sourceTree = "<group>"; };
8400B0EE1B8C20A9004C4CFF /* RSXMLData.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSXMLData.h; path = RSXML/RSXMLData.h; sourceTree = "<group>"; }; 8400B0EE1B8C20A9004C4CFF /* RSXMLData.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSXMLData.h; sourceTree = "<group>"; };
8400B0EF1B8C20A9004C4CFF /* RSXMLData.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = RSXMLData.m; path = RSXML/RSXMLData.m; sourceTree = "<group>"; }; 8400B0EF1B8C20A9004C4CFF /* RSXMLData.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSXMLData.m; sourceTree = "<group>"; };
8429D1B41C83A03100F97695 /* RSOPMLItem.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSOPMLItem.h; sourceTree = "<group>"; }; 8429D1B41C83A03100F97695 /* RSOPMLItem.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSOPMLItem.h; sourceTree = "<group>"; };
8429D1B51C83A03100F97695 /* RSOPMLItem.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSOPMLItem.m; sourceTree = "<group>"; }; 8429D1B51C83A03100F97695 /* RSOPMLItem.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSOPMLItem.m; sourceTree = "<group>"; };
8429D1C21C83BCCB00F97695 /* RSOPMLTests.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSOPMLTests.m; sourceTree = "<group>"; }; 8429D1C21C83BCCB00F97695 /* RSOPMLTests.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSOPMLTests.m; sourceTree = "<group>"; };
@@ -115,21 +117,16 @@
842D51591B52E81B00E63D52 /* RSRSSParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; path = RSRSSParser.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; }; 842D51591B52E81B00E63D52 /* RSRSSParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; path = RSRSSParser.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; };
842D51611B53058B00E63D52 /* RSParsedArticle.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSParsedArticle.h; sourceTree = "<group>"; }; 842D51611B53058B00E63D52 /* RSParsedArticle.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSParsedArticle.h; sourceTree = "<group>"; };
842D51621B53058B00E63D52 /* RSParsedArticle.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSParsedArticle.m; sourceTree = "<group>"; }; 842D51621B53058B00E63D52 /* RSParsedArticle.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSParsedArticle.m; sourceTree = "<group>"; };
842D516D1B5308BD00E63D52 /* FeedParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = FeedParser.h; sourceTree = "<group>"; };
842D51741B530BF200E63D52 /* RSParsedFeed.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSParsedFeed.h; sourceTree = "<group>"; }; 842D51741B530BF200E63D52 /* RSParsedFeed.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSParsedFeed.h; sourceTree = "<group>"; };
842D51751B530BF200E63D52 /* RSParsedFeed.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSParsedFeed.m; sourceTree = "<group>"; }; 842D51751B530BF200E63D52 /* RSParsedFeed.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSParsedFeed.m; sourceTree = "<group>"; };
842D51781B5311AD00E63D52 /* RSOPMLParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSOPMLParser.h; sourceTree = "<group>"; }; 842D51781B5311AD00E63D52 /* RSOPMLParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSOPMLParser.h; sourceTree = "<group>"; };
842D51791B5311AD00E63D52 /* RSOPMLParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSOPMLParser.m; sourceTree = "<group>"; }; 842D51791B5311AD00E63D52 /* RSOPMLParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSOPMLParser.m; sourceTree = "<group>"; };
843818FE1C8CB00400E2A1DD /* RSSAXHTMLParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSSAXHTMLParser.h; path = RSXML/RSSAXHTMLParser.h; sourceTree = "<group>"; };
843818FF1C8CB00400E2A1DD /* RSSAXHTMLParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = RSSAXHTMLParser.m; path = RSXML/RSSAXHTMLParser.m; sourceTree = "<group>"; };
8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSHTMLLinkParser.h; path = RSXML/RSHTMLLinkParser.h; sourceTree = "<group>"; }; 8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSHTMLLinkParser.h; path = RSXML/RSHTMLLinkParser.h; sourceTree = "<group>"; };
8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; name = RSHTMLLinkParser.m; path = RSXML/RSHTMLLinkParser.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; }; 8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; name = RSHTMLLinkParser.m; path = RSXML/RSHTMLLinkParser.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; };
8486F1131BB646140092794F /* NSString+RSXML.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = "NSString+RSXML.h"; path = "RSXML/NSString+RSXML.h"; sourceTree = "<group>"; }; 8486F1131BB646140092794F /* NSString+RSXML.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = "NSString+RSXML.h"; sourceTree = "<group>"; };
8486F1141BB646140092794F /* NSString+RSXML.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = "NSString+RSXML.m"; path = "RSXML/NSString+RSXML.m"; sourceTree = "<group>"; }; 8486F1141BB646140092794F /* NSString+RSXML.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = "NSString+RSXML.m"; sourceTree = "<group>"; };
84AD0BF31E11A6FB00B38510 /* RSDateParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSDateParser.h; path = RSXML/RSDateParser.h; sourceTree = "<group>"; }; 84AD0BF31E11A6FB00B38510 /* RSDateParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSDateParser.h; sourceTree = "<group>"; };
84AD0BF41E11A6FB00B38510 /* RSDateParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = RSDateParser.m; path = RSXML/RSDateParser.m; sourceTree = "<group>"; }; 84AD0BF41E11A6FB00B38510 /* RSDateParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSDateParser.m; sourceTree = "<group>"; };
84AD0BF81E11A9A700B38510 /* RSXMLInternal.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSXMLInternal.h; path = RSXML/RSXMLInternal.h; sourceTree = "<group>"; };
84AD0BF91E11A9A700B38510 /* RSXMLInternal.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = RSXMLInternal.m; path = RSXML/RSXMLInternal.m; sourceTree = "<group>"; };
84AD0C051E11B7D200B38510 /* RSXML.framework */ = {isa = PBXFileReference; explicitFileType = wrapper.framework; includeInIndex = 0; path = RSXML.framework; sourceTree = BUILT_PRODUCTS_DIR; }; 84AD0C051E11B7D200B38510 /* RSXML.framework */ = {isa = PBXFileReference; explicitFileType = wrapper.framework; includeInIndex = 0; path = RSXML.framework; sourceTree = BUILT_PRODUCTS_DIR; };
84AD0C081E11B7D200B38510 /* Info.plist */ = {isa = PBXFileReference; lastKnownFileType = text.plist.xml; path = Info.plist; sourceTree = "<group>"; }; 84AD0C081E11B7D200B38510 /* Info.plist */ = {isa = PBXFileReference; lastKnownFileType = text.plist.xml; path = Info.plist; sourceTree = "<group>"; };
84AD0C381E11BAA800B38510 /* libxml2.2.tbd */ = {isa = PBXFileReference; lastKnownFileType = "sourcecode.text-based-dylib-definition"; name = libxml2.2.tbd; path = Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.2.sdk/usr/lib/libxml2.2.tbd; sourceTree = DEVELOPER_DIR; }; 84AD0C381E11BAA800B38510 /* libxml2.2.tbd */ = {isa = PBXFileReference; lastKnownFileType = "sourcecode.text-based-dylib-definition"; name = libxml2.2.tbd; path = Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.2.sdk/usr/lib/libxml2.2.tbd; sourceTree = DEVELOPER_DIR; };
@@ -139,8 +136,8 @@
84BF3E151C8CDD1A005562D8 /* RSHTMLMetadataParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; name = RSHTMLMetadataParser.m; path = RSXML/RSHTMLMetadataParser.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; }; 84BF3E151C8CDD1A005562D8 /* RSHTMLMetadataParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; name = RSHTMLMetadataParser.m; path = RSXML/RSHTMLMetadataParser.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; };
84BF3E1A1C8CDD6D005562D8 /* RSHTMLMetadata.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSHTMLMetadata.h; path = RSXML/RSHTMLMetadata.h; sourceTree = "<group>"; }; 84BF3E1A1C8CDD6D005562D8 /* RSHTMLMetadata.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSHTMLMetadata.h; path = RSXML/RSHTMLMetadata.h; sourceTree = "<group>"; };
84BF3E1B1C8CDD6D005562D8 /* RSHTMLMetadata.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; name = RSHTMLMetadata.m; path = RSXML/RSHTMLMetadata.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; }; 84BF3E1B1C8CDD6D005562D8 /* RSHTMLMetadata.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; lineEnding = 0; name = RSHTMLMetadata.m; path = RSXML/RSHTMLMetadata.m; sourceTree = "<group>"; xcLanguageSpecificationIdentifier = xcode.lang.objc; };
84E4BE431C8B8FE400A90B41 /* RSXMLError.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSXMLError.h; path = RSXML/RSXMLError.h; sourceTree = "<group>"; }; 84E4BE431C8B8FE400A90B41 /* RSXMLError.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSXMLError.h; sourceTree = "<group>"; };
84E4BE441C8B8FE400A90B41 /* RSXMLError.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = RSXMLError.m; path = RSXML/RSXMLError.m; sourceTree = "<group>"; }; 84E4BE441C8B8FE400A90B41 /* RSXMLError.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSXMLError.m; sourceTree = "<group>"; };
84E4BE471C8B989D00A90B41 /* RSHTMLTests.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSHTMLTests.m; sourceTree = "<group>"; }; 84E4BE471C8B989D00A90B41 /* RSHTMLTests.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSHTMLTests.m; sourceTree = "<group>"; };
84F22C0D1B52DDEA000060CE /* RSXML.framework */ = {isa = PBXFileReference; explicitFileType = wrapper.framework; includeInIndex = 0; path = RSXML.framework; sourceTree = BUILT_PRODUCTS_DIR; }; 84F22C0D1B52DDEA000060CE /* RSXML.framework */ = {isa = PBXFileReference; explicitFileType = wrapper.framework; includeInIndex = 0; path = RSXML.framework; sourceTree = BUILT_PRODUCTS_DIR; };
84F22C101B52DDEA000060CE /* RSXML.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = RSXML.h; path = RSXML/RSXML.h; sourceTree = "<group>"; }; 84F22C101B52DDEA000060CE /* RSXML.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = RSXML.h; path = RSXML/RSXML.h; sourceTree = "<group>"; };
@@ -148,8 +145,8 @@
84F22C171B52DDEA000060CE /* RSXMLTests.xctest */ = {isa = PBXFileReference; explicitFileType = wrapper.cfbundle; includeInIndex = 0; path = RSXMLTests.xctest; sourceTree = BUILT_PRODUCTS_DIR; }; 84F22C171B52DDEA000060CE /* RSXMLTests.xctest */ = {isa = PBXFileReference; explicitFileType = wrapper.cfbundle; includeInIndex = 0; path = RSXMLTests.xctest; sourceTree = BUILT_PRODUCTS_DIR; };
84F22C1C1B52DDEA000060CE /* RSXMLTests.m */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.objc; path = RSXMLTests.m; sourceTree = "<group>"; }; 84F22C1C1B52DDEA000060CE /* RSXMLTests.m */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.objc; path = RSXMLTests.m; sourceTree = "<group>"; };
84F22C1E1B52DDEA000060CE /* Info.plist */ = {isa = PBXFileReference; lastKnownFileType = text.plist.xml; path = Info.plist; sourceTree = "<group>"; }; 84F22C1E1B52DDEA000060CE /* Info.plist */ = {isa = PBXFileReference; lastKnownFileType = text.plist.xml; path = Info.plist; sourceTree = "<group>"; };
84F22C271B52DDFE000060CE /* RSSAXParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = RSSAXParser.h; path = RSXML/RSSAXParser.h; sourceTree = "<group>"; }; 84F22C271B52DDFE000060CE /* RSSAXParser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = RSSAXParser.h; sourceTree = "<group>"; };
84F22C281B52DDFE000060CE /* RSSAXParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; name = RSSAXParser.m; path = RSXML/RSSAXParser.m; sourceTree = "<group>"; }; 84F22C281B52DDFE000060CE /* RSSAXParser.m */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.objc; path = RSSAXParser.m; sourceTree = "<group>"; };
84F22C451B52DF90000060CE /* libxml2.2.tbd */ = {isa = PBXFileReference; lastKnownFileType = "sourcecode.text-based-dylib-definition"; name = libxml2.2.tbd; path = usr/lib/libxml2.2.tbd; sourceTree = SDKROOT; }; 84F22C451B52DF90000060CE /* libxml2.2.tbd */ = {isa = PBXFileReference; lastKnownFileType = "sourcecode.text-based-dylib-definition"; name = libxml2.2.tbd; path = usr/lib/libxml2.2.tbd; sourceTree = SDKROOT; };
/* End PBXFileReference section */ /* End PBXFileReference section */
@@ -181,12 +178,33 @@
/* End PBXFrameworksBuildPhase section */ /* End PBXFrameworksBuildPhase section */
/* Begin PBXGroup section */ /* Begin PBXGroup section */
54702A9521D407520050A741 /* General */ = {
isa = PBXGroup;
children = (
84F22C271B52DDFE000060CE /* RSSAXParser.h */,
84F22C281B52DDFE000060CE /* RSSAXParser.m */,
84E4BE431C8B8FE400A90B41 /* RSXMLError.h */,
84E4BE441C8B8FE400A90B41 /* RSXMLError.m */,
84AD0BF31E11A6FB00B38510 /* RSDateParser.h */,
84AD0BF41E11A6FB00B38510 /* RSDateParser.m */,
8486F1131BB646140092794F /* NSString+RSXML.h */,
8486F1141BB646140092794F /* NSString+RSXML.m */,
54C707DA21D42B710029BFF1 /* NSDictionary+RSXML.h */,
54C707D921D42B710029BFF1 /* NSDictionary+RSXML.m */,
8400B0EE1B8C20A9004C4CFF /* RSXMLData.h */,
8400B0EF1B8C20A9004C4CFF /* RSXMLData.m */,
54702A9621D4079F0050A741 /* RSXMLParser.h */,
54702A9721D407A00050A741 /* RSXMLParser.m */,
);
name = General;
path = RSXML;
sourceTree = "<group>";
};
842D515E1B52E83100E63D52 /* Feeds */ = { 842D515E1B52E83100E63D52 /* Feeds */ = {
isa = PBXGroup; isa = PBXGroup;
children = ( children = (
842D51501B52E80100E63D52 /* RSFeedParser.h */, 842D51501B52E80100E63D52 /* RSFeedParser.h */,
842D51511B52E80100E63D52 /* RSFeedParser.m */, 842D51511B52E80100E63D52 /* RSFeedParser.m */,
842D516D1B5308BD00E63D52 /* FeedParser.h */,
842D514A1B52E7FC00E63D52 /* RSAtomParser.h */, 842D514A1B52E7FC00E63D52 /* RSAtomParser.h */,
842D514B1B52E7FC00E63D52 /* RSAtomParser.m */, 842D514B1B52E7FC00E63D52 /* RSAtomParser.m */,
842D51581B52E81B00E63D52 /* RSRSSParser.h */, 842D51581B52E81B00E63D52 /* RSRSSParser.h */,
@@ -231,14 +249,12 @@
84E4BE4D1C8B98E400A90B41 /* HTML */ = { 84E4BE4D1C8B98E400A90B41 /* HTML */ = {
isa = PBXGroup; isa = PBXGroup;
children = ( children = (
843818FE1C8CB00400E2A1DD /* RSSAXHTMLParser.h */,
843818FF1C8CB00400E2A1DD /* RSSAXHTMLParser.m */,
84BF3E141C8CDD1A005562D8 /* RSHTMLMetadataParser.h */, 84BF3E141C8CDD1A005562D8 /* RSHTMLMetadataParser.h */,
84BF3E151C8CDD1A005562D8 /* RSHTMLMetadataParser.m */, 84BF3E151C8CDD1A005562D8 /* RSHTMLMetadataParser.m */,
84BF3E1A1C8CDD6D005562D8 /* RSHTMLMetadata.h */,
84BF3E1B1C8CDD6D005562D8 /* RSHTMLMetadata.m */,
8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */, 8475C4061D57AB4C0076751E /* RSHTMLLinkParser.h */,
8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */, 8475C4071D57AB4C0076751E /* RSHTMLLinkParser.m */,
84BF3E1A1C8CDD6D005562D8 /* RSHTMLMetadata.h */,
84BF3E1B1C8CDD6D005562D8 /* RSHTMLMetadata.m */,
); );
name = HTML; name = HTML;
sourceTree = "<group>"; sourceTree = "<group>";
@@ -247,21 +263,10 @@
isa = PBXGroup; isa = PBXGroup;
children = ( children = (
84F22C101B52DDEA000060CE /* RSXML.h */, 84F22C101B52DDEA000060CE /* RSXML.h */,
84E4BE431C8B8FE400A90B41 /* RSXMLError.h */, 54702A9521D407520050A741 /* General */,
84E4BE441C8B8FE400A90B41 /* RSXMLError.m */,
84F22C271B52DDFE000060CE /* RSSAXParser.h */,
84F22C281B52DDFE000060CE /* RSSAXParser.m */,
8400B0EE1B8C20A9004C4CFF /* RSXMLData.h */,
8400B0EF1B8C20A9004C4CFF /* RSXMLData.m */,
8486F1131BB646140092794F /* NSString+RSXML.h */,
8486F1141BB646140092794F /* NSString+RSXML.m */,
84AD0BF31E11A6FB00B38510 /* RSDateParser.h */,
84AD0BF41E11A6FB00B38510 /* RSDateParser.m */,
842D517C1B5311B000E63D52 /* OPML */, 842D517C1B5311B000E63D52 /* OPML */,
842D515E1B52E83100E63D52 /* Feeds */, 842D515E1B52E83100E63D52 /* Feeds */,
84E4BE4D1C8B98E400A90B41 /* HTML */, 84E4BE4D1C8B98E400A90B41 /* HTML */,
84AD0BF81E11A9A700B38510 /* RSXMLInternal.h */,
84AD0BF91E11A9A700B38510 /* RSXMLInternal.m */,
84F22C121B52DDEA000060CE /* Info.plist */, 84F22C121B52DDEA000060CE /* Info.plist */,
84F22C1B1B52DDEA000060CE /* RSXMLTests */, 84F22C1B1B52DDEA000060CE /* RSXMLTests */,
84AD0C061E11B7D200B38510 /* RSXMLiOS */, 84AD0C061E11B7D200B38510 /* RSXMLiOS */,
@@ -302,20 +307,19 @@
isa = PBXHeadersBuildPhase; isa = PBXHeadersBuildPhase;
buildActionMask = 2147483647; buildActionMask = 2147483647;
files = ( files = (
84AD0C351E11B8DD00B38510 /* RSXMLInternal.h in Headers */,
84AD0C1C1E11B8CF00B38510 /* RSOPMLItem.h in Headers */, 84AD0C1C1E11B8CF00B38510 /* RSOPMLItem.h in Headers */,
84AD0C271E11B8D400B38510 /* RSRSSParser.h in Headers */, 84AD0C271E11B8D400B38510 /* RSRSSParser.h in Headers */,
84AD0C161E11B8CA00B38510 /* RSDateParser.h in Headers */, 84AD0C161E11B8CA00B38510 /* RSDateParser.h in Headers */,
54702A9921D407A00050A741 /* RSXMLParser.h in Headers */,
84AD0C101E11B8CA00B38510 /* RSSAXParser.h in Headers */, 84AD0C101E11B8CA00B38510 /* RSSAXParser.h in Headers */,
84AD0C331E11B8DA00B38510 /* RSHTMLLinkParser.h in Headers */, 84AD0C331E11B8DA00B38510 /* RSHTMLLinkParser.h in Headers */,
84AD0C2F1E11B8DA00B38510 /* RSHTMLMetadataParser.h in Headers */, 84AD0C2F1E11B8DA00B38510 /* RSHTMLMetadataParser.h in Headers */,
84AD0C251E11B8D400B38510 /* RSAtomParser.h in Headers */, 84AD0C251E11B8D400B38510 /* RSAtomParser.h in Headers */,
84AD0C121E11B8CA00B38510 /* RSXMLData.h in Headers */, 84AD0C121E11B8CA00B38510 /* RSXMLData.h in Headers */,
84AD0C311E11B8DA00B38510 /* RSHTMLMetadata.h in Headers */, 84AD0C311E11B8DA00B38510 /* RSHTMLMetadata.h in Headers */,
84AD0C241E11B8D400B38510 /* FeedParser.h in Headers */,
84AD0C221E11B8D400B38510 /* RSFeedParser.h in Headers */, 84AD0C221E11B8D400B38510 /* RSFeedParser.h in Headers */,
84AD0C2D1E11B8DA00B38510 /* RSSAXHTMLParser.h in Headers */,
84AD0C0E1E11B8CA00B38510 /* RSXMLError.h in Headers */, 84AD0C0E1E11B8CA00B38510 /* RSXMLError.h in Headers */,
54C707DE21D42B710029BFF1 /* NSDictionary+RSXML.h in Headers */,
84AD0C2B1E11B8D400B38510 /* RSParsedArticle.h in Headers */, 84AD0C2B1E11B8D400B38510 /* RSParsedArticle.h in Headers */,
84AD0C291E11B8D400B38510 /* RSParsedFeed.h in Headers */, 84AD0C291E11B8D400B38510 /* RSParsedFeed.h in Headers */,
84AD0C181E11B8CF00B38510 /* RSOPMLParser.h in Headers */, 84AD0C181E11B8CF00B38510 /* RSOPMLParser.h in Headers */,
@@ -331,6 +335,7 @@
8486F1151BB646140092794F /* NSString+RSXML.h in Headers */, 8486F1151BB646140092794F /* NSString+RSXML.h in Headers */,
8475C4081D57AB4C0076751E /* RSHTMLLinkParser.h in Headers */, 8475C4081D57AB4C0076751E /* RSHTMLLinkParser.h in Headers */,
84AD0BF51E11A6FB00B38510 /* RSDateParser.h in Headers */, 84AD0BF51E11A6FB00B38510 /* RSDateParser.h in Headers */,
54702A9821D407A00050A741 /* RSXMLParser.h in Headers */,
8400B0F01B8C20A9004C4CFF /* RSXMLData.h in Headers */, 8400B0F01B8C20A9004C4CFF /* RSXMLData.h in Headers */,
842D51631B53058B00E63D52 /* RSParsedArticle.h in Headers */, 842D51631B53058B00E63D52 /* RSParsedArticle.h in Headers */,
842D517A1B5311AD00E63D52 /* RSOPMLParser.h in Headers */, 842D517A1B5311AD00E63D52 /* RSOPMLParser.h in Headers */,
@@ -338,15 +343,13 @@
842D51761B530BF200E63D52 /* RSParsedFeed.h in Headers */, 842D51761B530BF200E63D52 /* RSParsedFeed.h in Headers */,
84BF3E1C1C8CDD6D005562D8 /* RSHTMLMetadata.h in Headers */, 84BF3E1C1C8CDD6D005562D8 /* RSHTMLMetadata.h in Headers */,
8429D1B61C83A03100F97695 /* RSOPMLItem.h in Headers */, 8429D1B61C83A03100F97695 /* RSOPMLItem.h in Headers */,
843819001C8CB00400E2A1DD /* RSSAXHTMLParser.h in Headers */,
842D51521B52E80100E63D52 /* RSFeedParser.h in Headers */, 842D51521B52E80100E63D52 /* RSFeedParser.h in Headers */,
54C707DD21D42B710029BFF1 /* NSDictionary+RSXML.h in Headers */,
84E4BE451C8B8FE400A90B41 /* RSXMLError.h in Headers */, 84E4BE451C8B8FE400A90B41 /* RSXMLError.h in Headers */,
842D516F1B5308BD00E63D52 /* FeedParser.h in Headers */,
84F22C111B52DDEA000060CE /* RSXML.h in Headers */, 84F22C111B52DDEA000060CE /* RSXML.h in Headers */,
842D514C1B52E7FC00E63D52 /* RSAtomParser.h in Headers */, 842D514C1B52E7FC00E63D52 /* RSAtomParser.h in Headers */,
842D515A1B52E81B00E63D52 /* RSRSSParser.h in Headers */, 842D515A1B52E81B00E63D52 /* RSRSSParser.h in Headers */,
84F22C291B52DDFE000060CE /* RSSAXParser.h in Headers */, 84F22C291B52DDFE000060CE /* RSSAXParser.h in Headers */,
84AD0BFA1E11A9A700B38510 /* RSXMLInternal.h in Headers */,
); );
runOnlyForDeploymentPostprocessing = 0; runOnlyForDeploymentPostprocessing = 0;
}; };
@@ -482,7 +485,6 @@
buildActionMask = 2147483647; buildActionMask = 2147483647;
files = ( files = (
84AD0C231E11B8D400B38510 /* RSFeedParser.m in Sources */, 84AD0C231E11B8D400B38510 /* RSFeedParser.m in Sources */,
84AD0C361E11B8DD00B38510 /* RSXMLInternal.m in Sources */,
84AD0C171E11B8CA00B38510 /* RSDateParser.m in Sources */, 84AD0C171E11B8CA00B38510 /* RSDateParser.m in Sources */,
84AD0C191E11B8CF00B38510 /* RSOPMLParser.m in Sources */, 84AD0C191E11B8CF00B38510 /* RSOPMLParser.m in Sources */,
84AD0C341E11B8DA00B38510 /* RSHTMLLinkParser.m in Sources */, 84AD0C341E11B8DA00B38510 /* RSHTMLLinkParser.m in Sources */,
@@ -493,11 +495,12 @@
84AD0C2A1E11B8D400B38510 /* RSParsedFeed.m in Sources */, 84AD0C2A1E11B8D400B38510 /* RSParsedFeed.m in Sources */,
84AD0C261E11B8D400B38510 /* RSAtomParser.m in Sources */, 84AD0C261E11B8D400B38510 /* RSAtomParser.m in Sources */,
84AD0C1D1E11B8CF00B38510 /* RSOPMLItem.m in Sources */, 84AD0C1D1E11B8CF00B38510 /* RSOPMLItem.m in Sources */,
54702A9B21D407A00050A741 /* RSXMLParser.m in Sources */,
84AD0C131E11B8CA00B38510 /* RSXMLData.m in Sources */, 84AD0C131E11B8CA00B38510 /* RSXMLData.m in Sources */,
84AD0C321E11B8DA00B38510 /* RSHTMLMetadata.m in Sources */, 84AD0C321E11B8DA00B38510 /* RSHTMLMetadata.m in Sources */,
54C707DC21D42B710029BFF1 /* NSDictionary+RSXML.m in Sources */,
84AD0C111E11B8CA00B38510 /* RSSAXParser.m in Sources */, 84AD0C111E11B8CA00B38510 /* RSSAXParser.m in Sources */,
84AD0C0F1E11B8CA00B38510 /* RSXMLError.m in Sources */, 84AD0C0F1E11B8CA00B38510 /* RSXMLError.m in Sources */,
84AD0C2E1E11B8DA00B38510 /* RSSAXHTMLParser.m in Sources */,
); );
runOnlyForDeploymentPostprocessing = 0; runOnlyForDeploymentPostprocessing = 0;
}; };
@@ -516,12 +519,12 @@
8475C4091D57AB4C0076751E /* RSHTMLLinkParser.m in Sources */, 8475C4091D57AB4C0076751E /* RSHTMLLinkParser.m in Sources */,
8429D1B71C83A03100F97695 /* RSOPMLItem.m in Sources */, 8429D1B71C83A03100F97695 /* RSOPMLItem.m in Sources */,
84BF3E1D1C8CDD6D005562D8 /* RSHTMLMetadata.m in Sources */, 84BF3E1D1C8CDD6D005562D8 /* RSHTMLMetadata.m in Sources */,
54702A9A21D407A00050A741 /* RSXMLParser.m in Sources */,
842D51531B52E80100E63D52 /* RSFeedParser.m in Sources */, 842D51531B52E80100E63D52 /* RSFeedParser.m in Sources */,
84F22C2A1B52DDFE000060CE /* RSSAXParser.m in Sources */, 84F22C2A1B52DDFE000060CE /* RSSAXParser.m in Sources */,
54C707DB21D42B710029BFF1 /* NSDictionary+RSXML.m in Sources */,
8400B0F11B8C20A9004C4CFF /* RSXMLData.m in Sources */, 8400B0F11B8C20A9004C4CFF /* RSXMLData.m in Sources */,
842D51771B530BF200E63D52 /* RSParsedFeed.m in Sources */, 842D51771B530BF200E63D52 /* RSParsedFeed.m in Sources */,
843819011C8CB00400E2A1DD /* RSSAXHTMLParser.m in Sources */,
84AD0BFB1E11A9A700B38510 /* RSXMLInternal.m in Sources */,
); );
runOnlyForDeploymentPostprocessing = 0; runOnlyForDeploymentPostprocessing = 0;
}; };

View File

@@ -1,24 +0,0 @@
//
// FeedParser.h
// RSXML
//
// Created by Brent Simmons on 7/12/15.
// Copyright © 2015 Ranchero Software, LLC. All rights reserved.
//
@import Foundation;
@class RSParsedFeed;
@class RSXMLData;
@protocol FeedParser <NSObject>
+ (BOOL)canParseFeed:(RSXMLData * _Nonnull)xmlData;
- (nonnull instancetype)initWithXMLData:(RSXMLData * _Nonnull)xmlData;
- (nullable RSParsedFeed *)parseFeed;
@end

View File

@@ -0,0 +1,28 @@
//
// MIT License (MIT)
//
// Copyright (c) 2016 Brent Simmons
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <Foundation/Foundation.h>
@interface NSDictionary (RSXML)
- (nullable id)rsxml_objectForCaseInsensitiveKey:(NSString *)key;
@end

View File

@@ -0,0 +1,41 @@
//
// MIT License (MIT)
//
// Copyright (c) 2016 Brent Simmons
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "NSDictionary+RSXML.h"
@implementation NSDictionary (RSXML)
- (nullable id)rsxml_objectForCaseInsensitiveKey:(NSString *)key {
id obj = self[key];
if (obj) {
return obj;
}
for (NSString *oneKey in self.allKeys) {
if ([oneKey isKindOfClass:[NSString class]] && [key caseInsensitiveCompare:oneKey] == NSOrderedSame) {
return self[oneKey];
}
}
return nil;
}
@end

View File

@@ -1,16 +1,34 @@
// //
// NSString+RSXML.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 9/25/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
@interface NSString (RSXML) @interface NSString (RSXML)
- (NSString *)rs_stringByDecodingHTMLEntities; - (NSString *)rs_stringByDecodingHTMLEntities;
- (nonnull NSString *)rsxml_md5HashString;
- (nullable NSString *)absoluteURLWithBase:(nonnull NSURL *)baseURL;
@end @end

View File

@@ -1,23 +1,66 @@
// //
// NSString+RSXML.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 9/25/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "NSString+RSXML.h" #import "NSString+RSXML.h"
#import <CommonCrypto/CommonDigest.h>
@interface NSScanner (RSXML) @interface NSScanner (RSXML)
- (BOOL)rs_scanEntityValue:(NSString * _Nullable * _Nullable)decodedEntity; - (BOOL)rs_scanEntityValue:(NSString * _Nullable * _Nullable)decodedEntity;
@end @end
#pragma mark - NSString
@implementation NSString (RSXML) @implementation NSString (RSXML)
- (NSData *)rsxml_md5Hash {
NSData *data = [self dataUsingEncoding:NSUTF8StringEncoding];
unsigned char hash[CC_MD5_DIGEST_LENGTH];
CC_MD5(data.bytes, (CC_LONG)data.length, hash);
return [NSData dataWithBytes:(const void *)hash length:CC_MD5_DIGEST_LENGTH];
}
- (NSString *)rsxml_md5HashString {
NSData *md5Data = [self rsxml_md5Hash];
const Byte *bytes = md5Data.bytes;
return [NSString stringWithFormat:@"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x", bytes[0], bytes[1], bytes[2], bytes[3], bytes[4], bytes[5], bytes[6], bytes[7], bytes[8], bytes[9], bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15]];
}
- (NSString *)absoluteURLWithBase:(NSURL *)baseURL {
if (baseURL && ![[self lowercaseString] hasPrefix:@"http"]) {
NSURL *resolvedURL = [NSURL URLWithString:self relativeToURL:baseURL];
if (resolvedURL.absoluteString) {
return resolvedURL.absoluteString;
}
}
return self;
}
- (NSString *)rs_stringByDecodingHTMLEntities { - (NSString *)rs_stringByDecodingHTMLEntities {
@autoreleasepool { @autoreleasepool {
@@ -106,16 +149,18 @@ static NSString *RSXMLStringWithValue(unichar value);
@end @end
#pragma mark - NSScanner
@implementation NSScanner (RSXML) @implementation NSScanner (RSXML)
- (BOOL)rs_scanEntityValue:(NSString * _Nullable * _Nullable)decodedEntity { - (BOOL)rs_scanEntityValue:(NSString * _Nullable * _Nullable)decodedEntity {
NSString *s = self.string; NSString *s = self.string;
NSUInteger initialScanLocation = self.scanLocation; NSUInteger initialScanLocation = self.scanLocation;
static NSUInteger maxEntityLength = 20; // Its probably smaller, but this is just for sanity. static NSUInteger maxEntityLength = 20; // Its probably smaller, but this is just for sanity.
while (true) { while (true) {
unichar ch = [s characterAtIndex:self.scanLocation]; unichar ch = [s characterAtIndex:self.scanLocation];
if ([NSCharacterSet.whitespaceAndNewlineCharacterSet characterIsMember:ch]) { if ([NSCharacterSet.whitespaceAndNewlineCharacterSet characterIsMember:ch]) {
break; break;
@@ -138,12 +183,15 @@ static NSString *RSXMLStringWithValue(unichar value);
break; break;
} }
} }
return NO; return NO;
} }
@end @end
#pragma mark - C Functions
static NSString *RSXMLStringWithValue(unichar value) { static NSString *RSXMLStringWithValue(unichar value) {
return [[NSString alloc] initWithFormat:@"%C", value]; return [[NSString alloc] initWithFormat:@"%C", value];

View File

@@ -1,13 +1,32 @@
// //
// RSAtomParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 1/15/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "FeedParser.h" #import "RSFeedParser.h"
@interface RSAtomParser : NSObject <FeedParser> // <feed> <entry>
// https://validator.w3.org/feed/docs/rfc4287.html
@interface RSAtomParser : RSFeedParser
@end @end

View File

@@ -1,601 +1,253 @@
// //
// RSAtomParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 1/15/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <libxml/xmlstring.h>
#import "RSAtomParser.h" #import "RSAtomParser.h"
#import "RSSAXParser.h"
#import "FeedParser.h"
#import "RSParsedFeed.h" #import "RSParsedFeed.h"
#import "RSParsedArticle.h" #import "RSParsedArticle.h"
#import "RSXMLData.h"
#import "NSString+RSXML.h"
#import "RSDateParser.h"
static NSString *kAlternateValue = @"alternate";
static NSString *kRelatedValue = @"related";
@interface RSAtomParser () <RSSAXParserDelegate> @interface RSAtomParser () <RSSAXParserDelegate>
@property (nonatomic, assign) BOOL endFeedFound;
@property (nonatomic) NSData *feedData; @property (nonatomic, assign) BOOL parsingXHTML;
@property (nonatomic) NSString *urlString; @property (nonatomic, assign) BOOL parsingSource;
@property (nonatomic) BOOL endFeedFound; @property (nonatomic, assign) BOOL parsingArticle;
@property (nonatomic) BOOL parsingXHTML; @property (nonatomic, assign) BOOL parsingAuthor;
@property (nonatomic) BOOL parsingSource;
@property (nonatomic) BOOL parsingArticle;
@property (nonatomic) BOOL parsingAuthor;
@property (nonatomic) NSMutableArray *attributesStack;
@property (nonatomic, readonly) NSDictionary *currentAttributes;
@property (nonatomic) NSMutableString *xhtmlString; @property (nonatomic) NSMutableString *xhtmlString;
@property (nonatomic) NSString *feedLink;
@property (nonatomic) NSString *feedTitle;
@property (nonatomic) NSString *feedSubtitle;
@property (nonatomic) NSMutableArray *articles;
@property (nonatomic) NSDate *dateParsed;
@property (nonatomic) RSSAXParser *parser;
@property (nonatomic, readonly) RSParsedArticle *currentArticle;
@property (nonatomic, readonly) NSDate *currentDate;
@end @end
@implementation RSAtomParser @implementation RSAtomParser
#pragma mark - Class Methods #pragma mark - RSXMLParserDelegate
+ (BOOL)canParseFeed:(RSXMLData *)xmlData { + (NSArray<const NSString *> *)parserRequireOrderedTags {
return @[@"<feed", @"<entry"];
// Checking for '<feed' and '<entry' within first n characters should do it.
@autoreleasepool {
NSData *feedData = xmlData.data;
NSString *s = [[NSString alloc] initWithBytesNoCopy:(void *)feedData.bytes length:feedData.length encoding:NSUTF8StringEncoding freeWhenDone:NO];
if (!s) {
s = [[NSString alloc] initWithData:feedData encoding:NSUTF8StringEncoding];
}
if (!s) {
s = [[NSString alloc] initWithData:feedData encoding:NSUnicodeStringEncoding];
}
if (!s) {
return NO;
} }
static const NSInteger numberOfCharactersToSearch = 4096; #pragma mark - Helper
NSRange rangeToSearch = NSMakeRange(0, numberOfCharactersToSearch);
if (s.length < numberOfCharactersToSearch) {
rangeToSearch.length = s.length;
}
NSRange feedRange = [s rangeOfString:@"<feed" options:NSLiteralSearch range:rangeToSearch]; - (void)setFeedOrArticleLink:(NSDictionary*)attribs {
NSRange entryRange = [s rangeOfString:@"<entry" options:NSLiteralSearch range:rangeToSearch];
if (feedRange.length < 1 || entryRange.length < 1) {
return NO;
}
if (feedRange.location > entryRange.location) { NSString *urlString = attribs[@"href"];
return NO; // Wrong order. if (urlString.length == 0) {
}
}
return YES;
}
#pragma mark - Init
- (instancetype)initWithXMLData:(RSXMLData *)xmlData {
self = [super init];
if (!self) {
return nil;
}
_feedData = xmlData.data;
_urlString = xmlData.urlString;
_parser = [[RSSAXParser alloc] initWithDelegate:self];
_attributesStack = [NSMutableArray new];
_articles = [NSMutableArray new];
return self;
}
#pragma mark - API
- (RSParsedFeed *)parseFeed {
[self parse];
RSParsedFeed *parsedFeed = [[RSParsedFeed alloc] initWithURLString:self.urlString title:self.feedTitle link:self.feedLink articles:self.articles];
parsedFeed.subtitle = self.feedSubtitle;
return parsedFeed;
}
#pragma mark - Constants
static NSString *kTypeKey = @"type";
static NSString *kXHTMLType = @"xhtml";
static NSString *kRelKey = @"rel";
static NSString *kAlternateValue = @"alternate";
static NSString *kHrefKey = @"href";
static NSString *kXMLKey = @"xml";
static NSString *kBaseKey = @"base";
static NSString *kLangKey = @"lang";
static NSString *kXMLBaseKey = @"xml:base";
static NSString *kXMLLangKey = @"xml:lang";
static NSString *kTextHTMLValue = @"text/html";
static NSString *kRelatedValue = @"related";
static NSString *kShortURLValue = @"shorturl";
static NSString *kHTMLValue = @"html";
static NSString *kEnValue = @"en";
static NSString *kTextValue = @"text";
static NSString *kSelfValue = @"self";
static const char *kID = "id";
static const NSInteger kIDLength = 3;
static const char *kTitle = "title";
static const NSInteger kTitleLength = 6;
static const char *kSubtitle = "subtitle";
static const NSInteger kSubtitleLength = 9;
static const char *kContent = "content";
static const NSInteger kContentLength = 8;
static const char *kSummary = "summary";
static const NSInteger kSummaryLength = 8;
static const char *kLink = "link";
static const NSInteger kLinkLength = 5;
static const char *kPublished = "published";
static const NSInteger kPublishedLength = 10;
static const char *kUpdated = "updated";
static const NSInteger kUpdatedLength = 8;
static const char *kAuthor = "author";
static const NSInteger kAuthorLength = 7;
static const char *kEntry = "entry";
static const NSInteger kEntryLength = 6;
static const char *kSource = "source";
static const NSInteger kSourceLength = 7;
static const char *kFeed = "feed";
static const NSInteger kFeedLength = 5;
static const char *kType = "type";
static const NSInteger kTypeLength = 5;
static const char *kRel = "rel";
static const NSInteger kRelLength = 4;
static const char *kAlternate = "alternate";
static const NSInteger kAlternateLength = 10;
static const char *kHref = "href";
static const NSInteger kHrefLength = 5;
static const char *kXML = "xml";
static const NSInteger kXMLLength = 4;
static const char *kBase = "base";
static const NSInteger kBaseLength = 5;
static const char *kLang = "lang";
static const NSInteger kLangLength = 5;
static const char *kTextHTML = "text/html";
static const NSInteger kTextHTMLLength = 10;
static const char *kRelated = "related";
static const NSInteger kRelatedLength = 8;
static const char *kShortURL = "shorturl";
static const NSInteger kShortURLLength = 9;
static const char *kHTML = "html";
static const NSInteger kHTMLLength = 5;
static const char *kEn = "en";
static const NSInteger kEnLength = 3;
static const char *kText = "text";
static const NSInteger kTextLength = 5;
static const char *kSelf = "self";
static const NSInteger kSelfLength = 5;
#pragma mark - Parsing
- (void)parse {
self.dateParsed = [NSDate date];
@autoreleasepool {
[self.parser parseData:self.feedData];
[self.parser finishParsing];
}
// Optimization: make articles do calculations on this background thread.
[self.articles makeObjectsPerformSelector:@selector(calculateArticleID)];
}
- (void)addArticle {
RSParsedArticle *article = [[RSParsedArticle alloc] initWithFeedURL:self.urlString];
article.dateParsed = self.dateParsed;
[self.articles addObject:article];
}
- (RSParsedArticle *)currentArticle {
return self.articles.lastObject;
}
- (NSDictionary *)currentAttributes {
return self.attributesStack.lastObject;
}
- (NSDate *)currentDate {
return RSDateWithBytes(self.parser.currentCharacters.bytes, self.parser.currentCharacters.length);
}
- (void)addFeedLink {
if (self.feedLink && self.feedLink.length > 0) {
return; return;
} }
NSString *related = self.currentAttributes[kRelKey]; NSString *rel = attribs[@"rel"];
if (related == kAlternateValue) { if (rel.length == 0) {
self.feedLink = self.currentAttributes[kHrefKey];
}
}
- (void)addFeedTitle {
if (self.feedTitle.length < 1) {
self.feedTitle = self.parser.currentStringWithTrimmedWhitespace;
}
}
- (void)addFeedSubtitle {
if (self.feedSubtitle.length < 1) {
self.feedSubtitle = self.parser.currentStringWithTrimmedWhitespace;
}
}
- (void)addLink {
NSString *urlString = self.currentAttributes[kHrefKey];
if (urlString.length < 1) {
return;
}
NSString *rel = self.currentAttributes[kRelKey];
if (rel.length < 1) {
rel = kAlternateValue; rel = kAlternateValue;
} }
if (rel == kAlternateValue) { if (!self.parsingArticle) { // Feed
if (!self.currentArticle.link) { if (!self.parsedFeed.link && rel == kAlternateValue) {
self.parsedFeed.link = urlString;
}
}
else if (!self.parsingSource) { // Article
if (!self.currentArticle.link && rel == kAlternateValue) {
self.currentArticle.link = urlString; self.currentArticle.link = urlString;
} }
} else if (!self.currentArticle.permalink && rel == kRelatedValue) {
else if (rel == kRelatedValue) {
if (!self.currentArticle.permalink) {
self.currentArticle.permalink = urlString; self.currentArticle.permalink = urlString;
} }
} }
} }
- (void)addContent { #pragma mark - Parse XHTML
self.currentArticle.body = [self currentStringWithHTMLEntitiesDecoded];
}
- (void)addSummary { - (void)addXHTMLTag:(const xmlChar *)localName attributes:(NSDictionary*)attribs {
self.currentArticle.abstract = [self currentStringWithHTMLEntitiesDecoded];
}
- (NSString *)currentStringWithHTMLEntitiesDecoded {
return [self.parser.currentStringWithTrimmedWhitespace rs_stringByDecodingHTMLEntities];
}
- (void)addArticleElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix {
if (prefix) {
return;
}
if (RSSAXEqualTags(localName, kID, kIDLength)) {
self.currentArticle.guid = self.parser.currentStringWithTrimmedWhitespace;
}
else if (RSSAXEqualTags(localName, kTitle, kTitleLength)) {
self.currentArticle.title = [self currentStringWithHTMLEntitiesDecoded];
}
else if (RSSAXEqualTags(localName, kContent, kContentLength)) {
[self addContent];
}
else if (RSSAXEqualTags(localName, kSummary, kSummaryLength)) {
[self addSummary];
}
else if (RSSAXEqualTags(localName, kLink, kLinkLength)) {
[self addLink];
}
else if (RSSAXEqualTags(localName, kPublished, kPublishedLength)) {
self.currentArticle.datePublished = self.currentDate;
}
else if (RSSAXEqualTags(localName, kUpdated, kUpdatedLength)) {
self.currentArticle.dateModified = self.currentDate;
}
}
- (void)addXHTMLTag:(const xmlChar *)localName {
if (!localName) { if (!localName) {
return; return;
} }
[self.xhtmlString appendString:@"<"]; [self.xhtmlString appendFormat:@"<%s", localName];
[self.xhtmlString appendString:[NSString stringWithUTF8String:(const char *)localName]];
if (self.currentAttributes.count < 1) { for (NSString *key in attribs) {
[self.xhtmlString appendString:@">"]; NSString *val = [attribs[key] stringByReplacingOccurrencesOfString:@"\"" withString:@"&quot;"];
return; [self.xhtmlString appendFormat:@" %@=\"%@\"", key, val];
}
for (NSString *oneKey in self.currentAttributes) {
[self.xhtmlString appendString:@" "];
NSString *oneValue = self.currentAttributes[oneKey];
[self.xhtmlString appendString:oneKey];
[self.xhtmlString appendString:@"=\""];
oneValue = [oneValue stringByReplacingOccurrencesOfString:@"\"" withString:@"&quot;"];
[self.xhtmlString appendString:oneValue];
[self.xhtmlString appendString:@"\""];
} }
[self.xhtmlString appendString:@">"]; [self.xhtmlString appendString:@">"];
} }
- (void)parseXHTMLEndElement:(const xmlChar *)localName length:(int)len {
if (len == 7) {
if (EqualBytes(localName, "content", 7)) {
if (self.parsingArticle) {
self.currentArticle.body = [self.xhtmlString copy];
}
self.parsingXHTML = NO;
}
else if (EqualBytes(localName, "summary", 7)) {
if (self.parsingArticle) {
self.currentArticle.abstract = [self.xhtmlString copy];
}
self.parsingXHTML = NO;
}
}
[self.xhtmlString appendFormat:@"</%s>", localName];
}
#pragma mark - RSSAXParserDelegate #pragma mark - RSSAXParserDelegate
- (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes { - (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes {
if (self.endFeedFound) { if (self.endFeedFound) {
return; return;
} }
NSDictionary *xmlAttributes = [self.parser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
if (!xmlAttributes) {
xmlAttributes = [NSDictionary dictionary];
}
[self.attributesStack addObject:xmlAttributes];
if (self.parsingXHTML) { if (self.parsingXHTML) {
[self addXHTMLTag:localName]; NSDictionary *attribs = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
[self addXHTMLTag:localName attributes:attribs];
return; return;
} }
if (RSSAXEqualTags(localName, kEntry, kEntryLength)) { int len = xmlStrlen(localName);
switch (len) {
case 4:
if (EqualBytes(localName, "link", 4)) {
NSDictionary *attribs = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
[self setFeedOrArticleLink:attribs];
return;
}
break;
case 5:
if (EqualBytes(localName, "entry", 5)) {
self.parsingArticle = YES; self.parsingArticle = YES;
[self addArticle]; self.currentArticle = [self.parsedFeed appendNewArticle];
return; return;
} }
break;
if (RSSAXEqualTags(localName, kAuthor, kAuthorLength)) { case 6:
if (EqualBytes(localName, "author", 6)) {
self.parsingAuthor = YES; self.parsingAuthor = YES;
return; return;
} } else if (EqualBytes(localName, "source", 6)) {
if (RSSAXEqualTags(localName, kSource, kSourceLength)) {
self.parsingSource = YES; self.parsingSource = YES;
return; return;
} }
break;
BOOL isContentTag = RSSAXEqualTags(localName, kContent, kContentLength); case 7: // uses attrib
BOOL isSummaryTag = RSSAXEqualTags(localName, kSummary, kSummaryLength); if (self.parsingArticle) {
if (self.parsingArticle && (isContentTag || isSummaryTag)) { break;
}
NSString *contentType = xmlAttributes[kTypeKey]; if (!EqualBytes(localName, "content", 7) && !EqualBytes(localName, "summary", 7)) {
if ([contentType isEqualToString:kXHTMLType]) { break;
}
NSDictionary *attribs = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
if ([attribs[@"type"] isEqualToString:@"xhtml"]) {
self.parsingXHTML = YES; self.parsingXHTML = YES;
self.xhtmlString = [NSMutableString stringWithString:@""]; self.xhtmlString = [NSMutableString stringWithString:@""];
return; return;
} }
break;
} }
if (!self.parsingArticle && RSSAXEqualTags(localName, kLink, kLinkLength)) { [SAXParser beginStoringCharacters];
[self addFeedLink];
return;
}
[self.parser beginStoringCharacters];
} }
- (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri { - (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri {
if (RSSAXEqualTags(localName, kFeed, kFeedLength)) {
self.endFeedFound = YES;
return;
}
if (self.endFeedFound) { if (self.endFeedFound) {
return; return;
} }
int len = xmlStrlen(localName);
if (len == 4 && EqualBytes(localName, "feed", 4)) {
self.endFeedFound = YES;
return;
}
if (self.parsingXHTML) { if (self.parsingXHTML) {
[self parseXHTMLEndElement:localName length:len];
BOOL isContentTag = RSSAXEqualTags(localName, kContent, kContentLength); return;
BOOL isSummaryTag = RSSAXEqualTags(localName, kSummary, kSummaryLength);
if (self.parsingArticle) {
if (isContentTag) {
self.currentArticle.body = [self.xhtmlString copy];
}
else if (isSummaryTag) {
self.currentArticle.abstract = [self.xhtmlString copy];
}
} }
if (isContentTag || isSummaryTag) { BOOL isArticle = (self.parsingArticle && !self.parsingSource && !prefix);
self.parsingXHTML = NO;
}
[self.xhtmlString appendString:@"</"]; switch (len) {
[self.xhtmlString appendString:[NSString stringWithUTF8String:(const char *)localName]]; case 2:
[self.xhtmlString appendString:@">"]; if (isArticle && EqualBytes(localName, "id", 2)) {
self.currentArticle.guid = SAXParser.currentStringWithTrimmedWhitespace;
} }
return;
else if (RSSAXEqualTags(localName, kAuthor, kAuthorLength)) { case 5:
self.parsingAuthor = NO; if (EqualBytes(localName, "entry", 5)) {
}
else if (RSSAXEqualTags(localName, kEntry, kEntryLength)) {
self.parsingArticle = NO; self.parsingArticle = NO;
} }
else if (isArticle && EqualBytes(localName, "title", 5)) {
else if (self.parsingArticle && !self.parsingSource) { self.currentArticle.title = [self decodeHTMLEntities:SAXParser.currentStringWithTrimmedWhitespace];
[self addArticleElement:localName prefix:prefix];
} }
else if (!self.parsingArticle && !self.parsingSource && self.parsedFeed.title.length == 0) {
else if (RSSAXEqualTags(localName, kSource, kSourceLength)) { if (EqualBytes(localName, "title", 5)) {
self.parsedFeed.title = SAXParser.currentStringWithTrimmedWhitespace;
}
}
return;
case 6:
if (EqualBytes(localName, "author", 6)) {
self.parsingAuthor = NO;
}
else if (EqualBytes(localName, "source", 6)) {
self.parsingSource = NO; self.parsingSource = NO;
} }
return;
else if (!self.parsingArticle && !self.parsingSource) { case 8:
if (RSSAXEqualTags(localName, kTitle, kTitleLength)) { if (!self.parsingArticle && !self.parsingSource && self.parsedFeed.subtitle.length == 0) {
[self addFeedTitle]; if (EqualBytes(localName, "subtitle", 8)) {
} self.parsedFeed.subtitle = SAXParser.currentStringWithTrimmedWhitespace;
else if (RSSAXEqualTags(localName, kSubtitle, kSubtitleLength)) {
[self addFeedSubtitle];
} }
} }
[self.attributesStack removeLastObject]; return;
case 7:
if (isArticle) {
if (EqualBytes(localName, "content", 7)) {
self.currentArticle.body = [self decodeHTMLEntities:SAXParser.currentStringWithTrimmedWhitespace];
} }
else if (EqualBytes(localName, "summary", 7)) {
self.currentArticle.abstract = [self decodeHTMLEntities:SAXParser.currentStringWithTrimmedWhitespace];
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForName:(const xmlChar *)name prefix:(const xmlChar *)prefix {
if (prefix && RSSAXEqualTags(prefix, kXML, kXMLLength)) {
if (RSSAXEqualTags(name, kBase, kBaseLength)) {
return kXMLBaseKey;
} }
if (RSSAXEqualTags(name, kLang, kLangLength)) { else if (EqualBytes(localName, "updated", 7)) {
return kXMLLangKey; self.currentArticle.dateModified = [self dateFromCharacters:SAXParser.currentCharacters];
} }
} }
return;
if (prefix) { case 9:
return nil; if (isArticle && EqualBytes(localName, "published", 9)) {
self.currentArticle.datePublished = [self dateFromCharacters:SAXParser.currentCharacters];
} }
return;
if (RSSAXEqualTags(name, kRel, kRelLength)) {
return kRelKey;
} }
if (RSSAXEqualTags(name, kType, kTypeLength)) {
return kTypeKey;
}
if (RSSAXEqualTags(name, kHref, kHrefLength)) {
return kHrefKey;
}
if (RSSAXEqualTags(name, kAlternate, kAlternateLength)) {
return kAlternateValue;
}
return nil;
}
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForValue:(const void *)bytes length:(NSUInteger)length {
static const NSUInteger alternateLength = kAlternateLength - 1;
static const NSUInteger textHTMLLength = kTextHTMLLength - 1;
static const NSUInteger relatedLength = kRelatedLength - 1;
static const NSUInteger shortURLLength = kShortURLLength - 1;
static const NSUInteger htmlLength = kHTMLLength - 1;
static const NSUInteger enLength = kEnLength - 1;
static const NSUInteger textLength = kTextLength - 1;
static const NSUInteger selfLength = kSelfLength - 1;
if (length == alternateLength && RSSAXEqualBytes(bytes, kAlternate, alternateLength)) {
return kAlternateValue;
}
if (length == textHTMLLength && RSSAXEqualBytes(bytes, kTextHTML, textHTMLLength)) {
return kTextHTMLValue;
}
if (length == relatedLength && RSSAXEqualBytes(bytes, kRelated, relatedLength)) {
return kRelatedValue;
}
if (length == shortURLLength && RSSAXEqualBytes(bytes, kShortURL, shortURLLength)) {
return kShortURLValue;
}
if (length == htmlLength && RSSAXEqualBytes(bytes, kHTML, htmlLength)) {
return kHTMLValue;
}
if (length == enLength && RSSAXEqualBytes(bytes, kEn, enLength)) {
return kEnValue;
}
if (length == textLength && RSSAXEqualBytes(bytes, kText, textLength)) {
return kTextValue;
}
if (length == selfLength && RSSAXEqualBytes(bytes, kSelf, selfLength)) {
return kSelfValue;
}
return nil;
} }
@@ -606,4 +258,60 @@ static const NSInteger kSelfLength = 5;
} }
} }
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForName:(const xmlChar *)name prefix:(const xmlChar *)prefix {
int len = xmlStrlen(name);
if (prefix) {
if (len == 4 && EqualBytes(prefix, "xml", 3)) { // len == 4 is for the next two lines already
if (EqualBytes(name, "base", 4)) { return @"xml:base"; }
if (EqualBytes(name, "lang", 4)) { return @"xml:lang"; }
}
return nil;
}
switch (len) {
case 3:
if (EqualBytes(name, "rel", 3)) { return @"rel"; }
break;
case 4:
if (EqualBytes(name, "type", 4)) { return @"type"; }
if (EqualBytes(name, "href", 4)) { return @"href"; }
break;
case 9:
if (EqualBytes(name, "alternate", 9)) { return kAlternateValue; }
break;
}
return nil;
}
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForValue:(const void *)bytes length:(NSUInteger)length {
switch (length) {
case 2:
if (EqualBytes(bytes, "en", 2)) { return @"en"; }
break;
case 4:
if (EqualBytes(bytes, "html", 4)) { return @"html"; }
if (EqualBytes(bytes, "text", 4)) { return @"text"; }
if (EqualBytes(bytes, "self", 4)) { return @"self"; }
break;
case 7:
if (EqualBytes(bytes, "related", 7)) { return kRelatedValue; }
break;
case 8:
if (EqualBytes(bytes, "shorturl", 8)) { return @"shorturl"; }
break;
case 9:
if (EqualBytes(bytes, "alternate", 9)) { return kAlternateValue; }
if (EqualBytes(bytes, "text/html", 9)) { return @"text/html"; }
break;
}
return nil;
}
@end @end

View File

@@ -1,10 +1,25 @@
// //
// RSDateParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/25/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software, LLC. All rights reserved.
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;

View File

@@ -1,10 +1,25 @@
// //
// RSDateParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/25/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software, LLC. All rights reserved.
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <time.h> #import <time.h>
#import "RSDateParser.h" #import "RSDateParser.h"

View File

@@ -1,28 +1,35 @@
// //
// RSFeedParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 1/4/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "FeedParser.h" #import "RSXMLParser.h"
// If you have a feed and dont know or care what it is (RSS or Atom), @class RSParsedFeed, RSParsedArticle;
// then call RSParseFeed or RSParseFeedSync.
@class RSXMLData; @interface RSFeedParser : RSXMLParser<RSParsedFeed*>
@class RSParsedFeed; @property (nonatomic, readonly) RSParsedFeed *parsedFeed;
@property (nonatomic, weak) RSParsedArticle *currentArticle;
NS_ASSUME_NONNULL_BEGIN - (NSDate *)dateFromCharacters:(NSData *)data;
- (NSString *)decodeHTMLEntities:(NSString *)str;
BOOL RSCanParseFeed(RSXMLData *xmlData); @end
typedef void (^RSParsedFeedBlock)(RSParsedFeed * _Nullable parsedFeed, NSError * _Nullable error);
// callback is called on main queue.
void RSParseFeed(RSXMLData *xmlData, RSParsedFeedBlock callback);
RSParsedFeed * _Nullable RSParseFeedSync(RSXMLData *xmlData, NSError * _Nullable * _Nullable error);
NS_ASSUME_NONNULL_END

View File

@@ -1,229 +1,58 @@
// //
// FeedParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 1/4/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSXMLError.h"
#import "RSFeedParser.h" #import "RSFeedParser.h"
#import "FeedParser.h" #import "RSParsedFeed.h"
#import "RSXMLData.h" #import "RSParsedArticle.h"
#import "RSRSSParser.h" #import "RSDateParser.h"
#import "RSAtomParser.h" #import "NSString+RSXML.h"
static NSArray *parserClasses(void) { @implementation RSFeedParser
static NSArray *gParserClasses = nil; #pragma mark - RSXMLParserDelegate
static dispatch_once_t onceToken; + (BOOL)isFeedParser { return YES; }
dispatch_once(&onceToken, ^{
gParserClasses = @[[RSRSSParser class], [RSAtomParser class]];
});
return gParserClasses;
}
static BOOL feedMayBeParseable(RSXMLData *xmlData) {
/*Sanity checks.*/
if (!xmlData.data) {
return NO;
}
/*TODO: check size, type, etc.*/
- (BOOL)xmlParserWillStartParsing {
_parsedFeed = [[RSParsedFeed alloc] initWithURLString:self.documentURI];
return YES; return YES;
} }
static BOOL optimisticCanParseRSSData(const char *bytes, NSUInteger numberOfBytes); - (id)xmlParserWillReturnDocument {
static BOOL optimisticCanParseAtomData(const char *bytes, NSUInteger numberOfBytes); // Optimization: make articles do calculations on this background thread.
static BOOL optimisticCanParseRDF(const char *bytes, NSUInteger numberOfBytes); [_parsedFeed.articles makeObjectsPerformSelector:@selector(calculateArticleID)];
static BOOL dataIsProbablyHTML(const char *bytes, NSUInteger numberOfBytes); return _parsedFeed;
static BOOL dataIsSomeWeirdException(const char *bytes, NSUInteger numberOfBytes);
static BOOL dataHasLeftCaret(const char *bytes, NSUInteger numberOfBytes);
static const NSUInteger maxNumberOfBytesToSearch = 4096;
static const NSUInteger minNumberOfBytesToSearch = 20;
static Class parserClassForXMLData(RSXMLData *xmlData, NSError **error) {
if (!feedMayBeParseable(xmlData)) {
RSXMLSetError(error, RSXMLErrorNoData, nil);
return nil;
} }
// TODO: check for things like images and movies and return nil. /// @return @c NSDate by parsing RFC 822 and 8601 date strings.
- (NSDate *)dateFromCharacters:(NSData *)data {
const char *bytes = xmlData.data.bytes; return RSDateWithBytes(data.bytes, data.length);
NSUInteger numberOfBytes = xmlData.data.length;
if (numberOfBytes > minNumberOfBytesToSearch) {
if (numberOfBytes > maxNumberOfBytesToSearch) {
numberOfBytes = maxNumberOfBytesToSearch;
} }
if (!dataHasLeftCaret(bytes, numberOfBytes)) { /// @return currentString by removing HTML encoded entities.
RSXMLSetError(error, RSXMLErrorMissingLeftCaret, nil); - (NSString *)decodeHTMLEntities:(NSString *)str {
return nil; return [str rs_stringByDecodingHTMLEntities];
}
if (optimisticCanParseRSSData(bytes, numberOfBytes)) {
return [RSRSSParser class];
}
if (optimisticCanParseAtomData(bytes, numberOfBytes)) {
return [RSAtomParser class];
}
if (optimisticCanParseRDF(bytes, numberOfBytes)) {
return [RSRSSParser class]; //TODO: parse RDF feeds, using RSS parser so far ...
}
if (dataIsProbablyHTML(bytes, numberOfBytes)) {
RSXMLSetError(error, RSXMLErrorProbablyHTML, nil);
return nil;
}
if (dataIsSomeWeirdException(bytes, numberOfBytes)) {
RSXMLSetError(error, RSXMLErrorContainsXMLErrorsTag, nil);
return nil;
}
}
for (Class parserClass in parserClasses()) {
if ([parserClass canParseFeed:xmlData]) {
return parserClass;
//return [[parserClass alloc] initWithXMLData:xmlData]; // does not make sense to return instance
}
}
// Try RSS anyway? libxml would return a parsing error
RSXMLSetError(error, RSXMLErrorNoSuitableParser, nil);
return nil;
}
static id<FeedParser> parserForXMLData(RSXMLData *xmlData, NSError **error) {
Class parserClass = parserClassForXMLData(xmlData, error);
if (!parserClass) {
return nil;
}
return [[parserClass alloc] initWithXMLData:xmlData];
}
static BOOL canParseXMLData(RSXMLData *xmlData) {
return parserClassForXMLData(xmlData, nil) != nil;
}
static BOOL didFindString(const char *string, const char *bytes, NSUInteger numberOfBytes) {
char *foundString = strnstr(bytes, string, numberOfBytes);
return foundString != NULL;
}
static BOOL dataHasLeftCaret(const char *bytes, NSUInteger numberOfBytes) {
return didFindString("<", bytes, numberOfBytes);
}
static BOOL dataIsProbablyHTML(const char *bytes, NSUInteger numberOfBytes) {
// Wont catch every single case, which is fine.
if (didFindString("<html", bytes, numberOfBytes)) {
return YES;
}
if (didFindString("<body", bytes, numberOfBytes)) {
return YES;
}
if (didFindString("doctype html", bytes, numberOfBytes)) {
return YES;
}
if (didFindString("DOCTYPE html", bytes, numberOfBytes)) {
return YES;
}
if (didFindString("DOCTYPE HTML", bytes, numberOfBytes)) {
return YES;
}
if (didFindString("<meta", bytes, numberOfBytes)) {
return YES;
}
if (didFindString("<HTML", bytes, numberOfBytes)) {
return YES;
}
return NO;
}
static BOOL dataIsSomeWeirdException(const char *bytes, NSUInteger numberOfBytes) {
if (didFindString("<errors xmlns='http://schemas.google", bytes, numberOfBytes)) {
return YES;
}
return NO;
}
static BOOL optimisticCanParseRDF(const char *bytes, NSUInteger numberOfBytes) {
return didFindString("<rdf:RDF", bytes, numberOfBytes);
}
static BOOL optimisticCanParseRSSData(const char *bytes, NSUInteger numberOfBytes) {
if (!didFindString("<rss", bytes, numberOfBytes)) {
return NO;
}
return didFindString("<channel", bytes, numberOfBytes);
}
static BOOL optimisticCanParseAtomData(const char *bytes, NSUInteger numberOfBytes) {
return didFindString("<feed", bytes, numberOfBytes);
}
static void callCallback(RSParsedFeedBlock callback, RSParsedFeed *parsedFeed, NSError *error) {
dispatch_async(dispatch_get_main_queue(), ^{
@autoreleasepool {
if (callback) {
callback(parsedFeed, error);
}
}
});
}
#pragma mark - API
BOOL RSCanParseFeed(RSXMLData *xmlData) {
return canParseXMLData(xmlData);
}
void RSParseFeed(RSXMLData *xmlData, RSParsedFeedBlock callback) {
dispatch_async(dispatch_get_global_queue(QOS_CLASS_UTILITY, 0), ^{
NSError *error = nil;
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error);
callCallback(callback, parsedFeed, error);
});
}
RSParsedFeed *RSParseFeedSync(RSXMLData *xmlData, NSError **error) {
xmlResetLastError();
id<FeedParser> parser = parserForXMLData(xmlData, error);
if (error && *error) {
return nil;
}
RSParsedFeed *parsedResult = [parser parseFeed];
if (error) {
*error = RSXMLMakeErrorFromLIBXMLError(xmlGetLastError());
xmlResetLastError();
}
return parsedResult;
} }
@end

View File

@@ -1,31 +1,31 @@
// //
// RSHTMLLinkParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 8/7/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
#import "RSXMLParser.h"
/*Returns all <a href="some_url">some_text</a> as RSHTMLLink object array.*/ @class RSHTMLMetadataAnchor;
@class RSXMLData;
@class RSHTMLLink;
@interface RSHTMLLinkParser : NSObject
+ (NSArray <RSHTMLLink *> *)htmlLinksWithData:(RSXMLData *)xmlData;
@end
@interface RSHTMLLink : NSObject
// Any of these, even urlString, may be nil, because HTML can be bad.
@property (nonatomic, readonly) NSString *urlString; //absolute
@property (nonatomic, readonly) NSString *text;
@property (nonatomic, readonly) NSString *title; //title attribute inside anchor tag
@interface RSHTMLLinkParser : RSXMLParser<NSArray<RSHTMLMetadataAnchor*>*>
@end @end

View File

@@ -1,151 +1,91 @@
// //
// RSHTMLLinkParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 8/7/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <libxml/xmlstring.h>
#import "RSHTMLLinkParser.h" #import "RSHTMLLinkParser.h"
#import "RSSAXHTMLParser.h" #import "RSHTMLMetadata.h"
#import "RSSAXParser.h" #import "NSDictionary+RSXML.h"
#import "RSXMLData.h"
#import "RSXMLInternal.h"
@interface RSHTMLLinkParser()
@interface RSHTMLLinkParser() <RSSAXHTMLParserDelegate>
@property (nonatomic, readonly) NSMutableArray *links;
@property (nonatomic, readonly) RSXMLData *xmlData;
@property (nonatomic, readonly) NSMutableArray *dictionaries;
@property (nonatomic, readonly) NSURL *baseURL; @property (nonatomic, readonly) NSURL *baseURL;
@property (nonatomic) NSMutableArray<RSHTMLMetadataAnchor*> *mutableLinksList;
@property (nonatomic) NSMutableString *currentText;
@end @end
@interface RSHTMLLink()
@property (nonatomic, readwrite) NSString *urlString; //absolute
@property (nonatomic, readwrite) NSString *text;
@property (nonatomic, readwrite) NSString *title; //title attribute inside anchor tag
@end
@implementation RSHTMLLinkParser @implementation RSHTMLLinkParser
#pragma mark - RSXMLParserDelegate
#pragma mark - Class Methods + (BOOL)isHTMLParser { return YES; }
+ (NSArray *)htmlLinksWithData:(RSXMLData *)xmlData { - (BOOL)xmlParserWillStartParsing {
_baseURL = [NSURL URLWithString:self.documentURI];
_mutableLinksList = [NSMutableArray new];
return YES;
}
RSHTMLLinkParser *parser = [[self alloc] initWithXMLData:xmlData]; - (id)xmlParserWillReturnDocument {
return parser.links; return [_mutableLinksList copy];
} }
#pragma mark - Init #pragma mark - RSSAXParserDelegate
- (instancetype)initWithXMLData:(RSXMLData *)xmlData {
NSParameterAssert(xmlData.data);
NSParameterAssert(xmlData.urlString);
self = [super init];
if (!self) {
return nil;
}
_links = [NSMutableArray new];
_xmlData = xmlData;
_dictionaries = [NSMutableArray new];
_baseURL = [NSURL URLWithString:xmlData.urlString];
[self parse];
return self;
}
#pragma mark - Parse - (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes {
- (void)parse { if (EqualBytes(localName, "a", 2)) { // 2 because length is not checked
NSDictionary *attribs = [SAXParser attributesDictionaryHTML:attributes];
RSSAXHTMLParser *parser = [[RSSAXHTMLParser alloc] initWithDelegate:self]; if (!attribs || attribs.count == 0) {
[parser parseData:self.xmlData.data];
[parser finishParsing];
}
- (RSHTMLLink *)currentLink {
return self.links.lastObject;
}
static NSString *kHrefKey = @"href";
- (NSString *)urlStringFromDictionary:(NSDictionary *)d {
NSString *href = [d rsxml_objectForCaseInsensitiveKey:kHrefKey];
if (!href) {
return nil;
}
NSURL *absoluteURL = [NSURL URLWithString:href relativeToURL:self.baseURL];
return absoluteURL.absoluteString;
}
static NSString *kTitleKey = @"title";
- (NSString *)titleFromDictionary:(NSDictionary *)d {
return [d rsxml_objectForCaseInsensitiveKey:kTitleKey];
}
- (void)handleLinkAttributes:(NSDictionary *)d {
RSHTMLLink *link = self.currentLink;
link.urlString = [self urlStringFromDictionary:d];
link.title = [self titleFromDictionary:d];
}
static const char *kAnchor = "a";
static const NSInteger kAnchorLength = 2;
- (void)saxParser:(RSSAXHTMLParser *)SAXParser XMLStartElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes {
if (!RSSAXEqualTags(localName, kAnchor, kAnchorLength)) {
return; return;
} }
NSString *href = [attribs rsxml_objectForCaseInsensitiveKey:@"href"];
RSHTMLLink *link = [RSHTMLLink new]; if (!href) {
[self.links addObject:link]; return;
NSDictionary *d = [SAXParser attributesDictionary:attributes];
if (!RSXMLIsEmpty(d)) {
[self handleLinkAttributes:d];
} }
RSHTMLMetadataAnchor *obj = [RSHTMLMetadataAnchor new];
[self.mutableLinksList addObject:obj];
// set link properties
obj.tooltip = [attribs rsxml_objectForCaseInsensitiveKey:@"title"];
obj.link = [[NSURL URLWithString:href relativeToURL:self.baseURL] absoluteString];
// begin storing data for link description
[SAXParser beginStoringCharacters]; [SAXParser beginStoringCharacters];
self.currentText = [NSMutableString new];
}
} }
- (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName { - (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName {
if (!RSSAXEqualTags(localName, kAnchor, kAnchorLength)) { if (self.currentText != nil) {
return; NSString *str = SAXParser.currentStringWithTrimmedWhitespace;
if (str) {
[self.currentText appendString:str];
}
if (EqualBytes(localName, "a", 2)) { // 2 because length is not checked
self.mutableLinksList.lastObject.title = self.currentText;
self.currentText = nil;
}
} }
self.currentLink.text = SAXParser.currentStringWithTrimmedWhitespace;
} }
@end @end
@implementation RSHTMLLink
@end

View File

@@ -1,45 +1,64 @@
// //
// RSHTMLMetadata.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/6/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
@class RSHTMLMetadataFeedLink; typedef enum {
@class RSHTMLMetadataAppleTouchIcon; RSFeedTypeNone,
RSFeedTypeRSS,
RSFeedTypeAtom
} RSFeedType;
RSFeedType RSFeedTypeFromLinkTypeAttribute(NSString * typeStr);
@class RSHTMLMetadataIconLink, RSHTMLMetadataFeedLink;
@interface RSHTMLMetadata : NSObject @interface RSHTMLMetadata : NSObject
@property (nonatomic, copy, nullable) NSString *faviconLink;
- (instancetype)initWithURLString:(NSString *)urlString dictionaries:(NSArray <NSDictionary *> *)dictionaries; @property (nonatomic, nonnull) NSArray <RSHTMLMetadataIconLink *> *iconLinks;
@property (nonatomic, nonnull) NSArray <RSHTMLMetadataFeedLink *> *feedLinks;
@property (nonatomic, readonly) NSString *baseURLString;
@property (nonatomic, readonly) NSArray <NSDictionary *> *dictionaries;
@property (nonatomic, readonly) NSString *faviconLink;
@property (nonatomic, readonly) NSArray <RSHTMLMetadataAppleTouchIcon *> *appleTouchIcons;
@property (nonatomic, readonly) NSArray <RSHTMLMetadataFeedLink *> *feedLinks;
@end @end
@interface RSHTMLMetadataAppleTouchIcon : NSObject @interface RSHTMLMetadataLink : NSObject
@property (nonatomic, copy, nonnull) NSString *link; // absolute
@property (nonatomic, readonly) NSString *rel; @property (nonatomic, copy, nullable) NSString *title;
@property (nonatomic, readonly) NSString *sizes;
@property (nonatomic, readonly) NSString *urlString; // Absolute.
@end @end
@interface RSHTMLMetadataFeedLink : NSObject @interface RSHTMLMetadataIconLink : RSHTMLMetadataLink
@property (nonatomic, copy, nullable) NSString *sizes;
@property (nonatomic, readonly) NSString *title; - (CGSize)getSize;
@property (nonatomic, readonly) NSString *type;
@property (nonatomic, readonly) NSString *urlString; // Absolute.
@end @end
@interface RSHTMLMetadataFeedLink : RSHTMLMetadataLink // title: 'icon' or 'apple-touch-icon*'
@property (nonatomic, assign) RSFeedType type;
@end
@interface RSHTMLMetadataAnchor : RSHTMLMetadataLink // title: anchor text-value
@property (nonatomic, copy, nullable) NSString *tooltip;
@end

View File

@@ -1,245 +1,98 @@
// //
// RSHTMLMetadata.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/6/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSHTMLMetadata.h" #import "RSHTMLMetadata.h"
#import "RSXMLInternal.h"
static NSString *urlStringFromDictionary(NSDictionary *d); RSFeedType RSFeedTypeFromLinkTypeAttribute(NSString * typeStr) {
static NSString *absoluteURLStringWithRelativeURLString(NSString *relativeURLString, NSString *baseURLString); if (typeStr || typeStr.length > 0) {
static NSString *absoluteURLStringWithDictionary(NSDictionary *d, NSString *baseURLString); typeStr = [typeStr lowercaseString];
static NSArray *objectsOfClassWithDictionaries(Class class, NSArray *dictionaries, NSString *baseURLString); if ([typeStr hasSuffix:@"/rss+xml"]) {
static NSString *relValue(NSDictionary *d); return RSFeedTypeRSS;
static BOOL typeIsFeedType(NSString *type); } else if ([typeStr hasSuffix:@"/atom+xml"]) {
return RSFeedTypeAtom;
}
}
return RSFeedTypeNone;
}
static NSString *kShortcutIconRelValue = @"shortcut icon";
static NSString *kHrefKey = @"href";
static NSString *kSrcKey = @"src";
static NSString *kAppleTouchIconValue = @"apple-touch-icon";
static NSString *kAppleTouchIconPrecomposedValue = @"apple-touch-icon-precomposed";
static NSString *kSizesKey = @"sizes";
static NSString *kTitleKey = @"title";
static NSString *kRelKey = @"rel";
static NSString *kAlternateKey = @"alternate";
static NSString *kRSSSuffix = @"/rss+xml";
static NSString *kAtomSuffix = @"/atom+xml";
static NSString *kTypeKey = @"type";
@interface RSHTMLMetadataAppleTouchIcon ()
- (instancetype)initWithDictionary:(NSDictionary *)d baseURLString:(NSString *)baseURLString;
@implementation RSHTMLMetadataLink
- (NSString*)description { return self.link; }
@end @end
@interface RSHTMLMetadataFeedLink () @implementation RSHTMLMetadataIconLink
- (instancetype)initWithDictionary:(NSDictionary *)d baseURLString:(NSString *)baseURLString; - (CGSize)getSize {
if (self.sizes && self.sizes.length > 0) {
@end NSArray<NSString*> *parts = [self.sizes componentsSeparatedByString:@"x"];
if (parts.count == 2) {
return CGSizeMake([parts.firstObject intValue], [parts.lastObject intValue]);
@implementation RSHTMLMetadata
#pragma mark - Init
- (instancetype)initWithURLString:(NSString *)urlString dictionaries:(NSArray <NSDictionary *> *)dictionaries {
self = [super init];
if (!self) {
return nil;
}
_baseURLString = urlString;
_dictionaries = dictionaries;
_faviconLink = [self resolvedLinkFromFirstDictionaryWithMatchingRel:kShortcutIconRelValue];
NSArray *appleTouchIconDictionaries = [self appleTouchIconDictionaries];
_appleTouchIcons = objectsOfClassWithDictionaries([RSHTMLMetadataAppleTouchIcon class], appleTouchIconDictionaries, urlString);
NSArray *feedLinkDictionaries = [self feedLinkDictionaries];
_feedLinks = objectsOfClassWithDictionaries([RSHTMLMetadataFeedLink class], feedLinkDictionaries, urlString);
return self;
}
#pragma mark - Private
- (NSDictionary *)firstDictionaryWithMatchingRel:(NSString *)valueToMatch {
// Case-insensitive.
for (NSDictionary *oneDictionary in self.dictionaries) {
NSString *oneRelValue = relValue(oneDictionary);
if (oneRelValue && [oneRelValue compare:valueToMatch options:NSCaseInsensitiveSearch] == NSOrderedSame) {
return oneDictionary;
} }
} }
return CGSizeZero;
return nil;
} }
- (NSString*)description {
- (NSArray *)appleTouchIconDictionaries { return [NSString stringWithFormat:@"%@ [%@] (%@)", self.title, self.sizes, self.link];
NSMutableArray *dictionaries = [NSMutableArray new];
for (NSDictionary *oneDictionary in self.dictionaries) {
NSString *oneRelValue = relValue(oneDictionary).lowercaseString;
if ([oneRelValue isEqualToString:kAppleTouchIconValue] || [oneRelValue isEqualToString:kAppleTouchIconPrecomposedValue]) {
[dictionaries addObject:oneDictionary];
} }
}
return dictionaries;
}
- (NSArray *)feedLinkDictionaries {
NSMutableArray *dictionaries = [NSMutableArray new];
for (NSDictionary *oneDictionary in self.dictionaries) {
NSString *oneRelValue = relValue(oneDictionary).lowercaseString;
if (![oneRelValue isEqualToString:kAlternateKey]) {
continue;
}
NSString *oneType = [oneDictionary rsxml_objectForCaseInsensitiveKey:kTypeKey];
if (!typeIsFeedType(oneType)) {
continue;
}
if (RSXMLStringIsEmpty(urlStringFromDictionary(oneDictionary))) {
continue;
}
[dictionaries addObject:oneDictionary];
}
return dictionaries;
}
- (NSString *)resolvedLinkFromFirstDictionaryWithMatchingRel:(NSString *)relValue {
NSDictionary *d = [self firstDictionaryWithMatchingRel:relValue];
return absoluteURLStringWithDictionary(d, self.baseURLString);
}
@end
static NSString *relValue(NSDictionary *d) {
return [d rsxml_objectForCaseInsensitiveKey:kRelKey];
}
static NSString *urlStringFromDictionary(NSDictionary *d) {
NSString *urlString = [d rsxml_objectForCaseInsensitiveKey:kHrefKey];
if (urlString) {
return urlString;
}
return [d rsxml_objectForCaseInsensitiveKey:kSrcKey];
}
static NSString *absoluteURLStringWithRelativeURLString(NSString *relativeURLString, NSString *baseURLString) {
NSURL *url = [NSURL URLWithString:baseURLString];
if (!url) {
return nil;
}
NSURL *absoluteURL = [NSURL URLWithString:relativeURLString relativeToURL:url];
return absoluteURL.absoluteString;
}
static NSString *absoluteURLStringWithDictionary(NSDictionary *d, NSString *baseURLString) {
NSString *urlString = urlStringFromDictionary(d);
if (RSXMLStringIsEmpty(urlString)) {
return nil;
}
return absoluteURLStringWithRelativeURLString(urlString, baseURLString);
}
static NSArray *objectsOfClassWithDictionaries(Class class, NSArray *dictionaries, NSString *baseURLString) {
NSMutableArray *objects = [NSMutableArray new];
for (NSDictionary *oneDictionary in dictionaries) {
id oneObject = [[class alloc] initWithDictionary:oneDictionary baseURLString:baseURLString];
if (oneObject) {
[objects addObject:oneObject];
}
}
return [objects copy];
}
static BOOL typeIsFeedType(NSString *type) {
type = type.lowercaseString;
return [type hasSuffix:kRSSSuffix] || [type hasSuffix:kAtomSuffix];
}
@implementation RSHTMLMetadataAppleTouchIcon
- (instancetype)initWithDictionary:(NSDictionary *)d baseURLString:(NSString *)baseURLString {
self = [super init];
if (!self) {
return nil;
}
_urlString = absoluteURLStringWithDictionary(d, baseURLString);
_sizes = [d rsxml_objectForCaseInsensitiveKey:kSizesKey];
_rel = [d rsxml_objectForCaseInsensitiveKey:kRelKey];
return self;
}
@end @end
@implementation RSHTMLMetadataFeedLink @implementation RSHTMLMetadataFeedLink
- (NSString*)description {
- (instancetype)initWithDictionary:(NSDictionary *)d baseURLString:(NSString *)baseURLString { NSString *prefix;
switch (_type) {
self = [super init]; case RSFeedTypeNone: prefix = @"None"; break;
if (!self) { case RSFeedTypeRSS: prefix = @"RSS"; break;
return nil; case RSFeedTypeAtom: prefix = @"Atom"; break;
} }
return [NSString stringWithFormat:@"[%@] %@ (%@)", prefix, self.title, self.link];
_urlString = absoluteURLStringWithDictionary(d, baseURLString);
_title = [d rsxml_objectForCaseInsensitiveKey:kTitleKey];
_type = [d rsxml_objectForCaseInsensitiveKey:kTypeKey];
return self;
} }
@end @end
@implementation RSHTMLMetadataAnchor
- (NSString*)description {
if (!_tooltip) {
return [NSString stringWithFormat:@"%@ (%@)", self.title, self.link];
}
return [NSString stringWithFormat:@"%@ [%@] (%@)", self.title, self.tooltip, self.link];
}
@end
@implementation RSHTMLMetadata
- (NSString*)description {
return [NSString stringWithFormat:@"favicon: %@\nFeed links: %@\nIcons: %@\n",
self.faviconLink, self.feedLinks, self.iconLinks];
}
@end

View File

@@ -1,28 +1,32 @@
// //
// RSHTMLMetadataParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/6/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
#import "RSXMLParser.h"
@class RSHTMLMetadata; @class RSHTMLMetadata;
@class RSXMLData;
NS_ASSUME_NONNULL_BEGIN
@interface RSHTMLMetadataParser : NSObject
+ (RSHTMLMetadata *)HTMLMetadataWithXMLData:(RSXMLData *)xmlData;
- (instancetype)initWithXMLData:(RSXMLData *)xmlData;
@property (nonatomic, readonly) RSHTMLMetadata *metadata;
@interface RSHTMLMetadataParser : RSXMLParser<RSHTMLMetadata*>
@end @end
NS_ASSUME_NONNULL_END

View File

@@ -1,128 +1,111 @@
// //
// RSHTMLMetadataParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/6/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <libxml/xmlstring.h>
#import "RSHTMLMetadataParser.h" #import "RSHTMLMetadataParser.h"
#import "RSXMLData.h"
#import "RSHTMLMetadata.h" #import "RSHTMLMetadata.h"
#import "RSSAXHTMLParser.h" #import "NSString+RSXML.h"
#import "RSSAXParser.h" #import "NSDictionary+RSXML.h"
#import "RSXMLInternal.h"
@interface RSHTMLMetadataParser () <RSSAXHTMLParserDelegate>
@property (nonatomic, readonly) RSXMLData *xmlData;
@property (nonatomic, readwrite) RSHTMLMetadata *metadata;
@property (nonatomic) NSMutableArray *dictionaries;
@property (nonatomic) BOOL didFinishParsing;
@interface RSHTMLMetadataParser()
@property (nonatomic, readonly) NSURL *baseURL;
@property (nonatomic) NSString *faviconLink;
@property (nonatomic) NSMutableArray<RSHTMLMetadataIconLink*> *iconLinks;
@property (nonatomic) NSMutableArray<RSHTMLMetadataFeedLink*> *feedLinks;
@end @end
@implementation RSHTMLMetadataParser @implementation RSHTMLMetadataParser
#pragma mark - RSXMLParserDelegate
#pragma mark - Class Methods + (BOOL)isHTMLParser { return YES; }
+ (RSHTMLMetadata *)HTMLMetadataWithXMLData:(RSXMLData *)xmlData { - (BOOL)xmlParserWillStartParsing {
_baseURL = [NSURL URLWithString:self.documentURI];
_iconLinks = [NSMutableArray new];
_feedLinks = [NSMutableArray new];
return YES;
}
RSHTMLMetadataParser *parser = [[self alloc] initWithXMLData:xmlData]; - (id)xmlParserWillReturnDocument {
return parser.metadata; RSHTMLMetadata *metadata = [[RSHTMLMetadata alloc] init];
metadata.faviconLink = self.faviconLink;
metadata.feedLinks = [self.feedLinks copy];
metadata.iconLinks = [self.iconLinks copy];
return metadata;
} }
#pragma mark - Init #pragma mark - RSSAXParserDelegate
- (instancetype)initWithXMLData:(RSXMLData *)xmlData {
NSParameterAssert(xmlData.data);
NSParameterAssert(xmlData.urlString);
self = [super init];
if (!self) {
return nil;
}
_xmlData = xmlData;
_dictionaries = [NSMutableArray new];
[self parse];
return self;
}
#pragma mark - Parse - (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes {
- (void)parse { if (xmlStrlen(localName) != 4) {
RSSAXHTMLParser *parser = [[RSSAXHTMLParser alloc] initWithDelegate:self];
[parser parseData:self.xmlData.data];
[parser finishParsing];
self.metadata = [[RSHTMLMetadata alloc] initWithURLString:self.xmlData.urlString dictionaries:[self.dictionaries copy]];
}
static NSString *kHrefKey = @"href";
static NSString *kSrcKey = @"src";
static NSString *kRelKey = @"rel";
- (NSString *)linkForDictionary:(NSDictionary *)d {
NSString *link = [d rsxml_objectForCaseInsensitiveKey:kHrefKey];
if (link) {
return link;
}
return [d rsxml_objectForCaseInsensitiveKey:kSrcKey];
}
- (void)handleLinkAttributes:(NSDictionary *)d {
if (RSXMLStringIsEmpty([d rsxml_objectForCaseInsensitiveKey:kRelKey])) {
return; return;
} }
if (RSXMLStringIsEmpty([self linkForDictionary:d])) { else if (EqualBytes(localName, "body", 4)) {
[SAXParser cancel]; // we're only interested in head
}
else if (EqualBytes(localName, "link", 4)) {
[self parseLinkItemWithAttributes:[SAXParser attributesDictionaryHTML:attributes]];
}
}
- (void)parseLinkItemWithAttributes:(NSDictionary*)attribs {
if (!attribs || attribs.count == 0)
return;
NSString *rel = [attribs rsxml_objectForCaseInsensitiveKey:@"rel"];
if (!rel || rel.length == 0)
return;
NSString *link = [attribs rsxml_objectForCaseInsensitiveKey:@"href"];
if (!link) {
link = [attribs rsxml_objectForCaseInsensitiveKey:@"src"];
if (!link)
return; return;
} }
[self.dictionaries addObject:d]; rel = [rel lowercaseString];
if ([rel isEqualToString:@"shortcut icon"]) {
self.faviconLink = [link absoluteURLWithBase:self.baseURL];
} }
else if ([rel isEqualToString:@"icon"] || [rel hasPrefix:@"apple-touch-icon"]) { // also matching "apple-touch-icon-precomposed"
RSHTMLMetadataIconLink *icon = [RSHTMLMetadataIconLink new];
#pragma mark - RSSAXHTMLParserDelegate icon.link = [link absoluteURLWithBase:self.baseURL];
icon.title = rel;
static const char *kBody = "body"; icon.sizes = [attribs rsxml_objectForCaseInsensitiveKey:@"sizes"];
static const NSInteger kBodyLength = 5; [self.iconLinks addObject:icon];
static const char *kLink = "link";
static const NSInteger kLinkLength = 5;
- (void)saxParser:(RSSAXHTMLParser *)SAXParser XMLStartElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes {
if (self.didFinishParsing) {
return;
} }
else if ([rel isEqualToString:@"alternate"]) {
if (RSSAXEqualTags(localName, kBody, kBodyLength)) { RSFeedType type = RSFeedTypeFromLinkTypeAttribute([attribs rsxml_objectForCaseInsensitiveKey:@"type"]);
self.didFinishParsing = YES; if (type != RSFeedTypeNone) {
return; RSHTMLMetadataFeedLink *feedLink = [RSHTMLMetadataFeedLink new];
feedLink.link = [link absoluteURLWithBase:self.baseURL];
feedLink.title = [attribs rsxml_objectForCaseInsensitiveKey:@"title"];
feedLink.type = type;
[self.feedLinks addObject:feedLink];
} }
if (!RSSAXEqualTags(localName, kLink, kLinkLength)) {
return;
}
NSDictionary *d = [SAXParser attributesDictionary:attributes];
if (!RSXMLIsEmpty(d)) {
[self handleLinkAttributes:d];
} }
} }

View File

@@ -1,3 +1,25 @@
//
// MIT License (MIT)
//
// Copyright (c) 2018 Oleg Geier
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;

View File

@@ -1,6 +1,28 @@
//
// MIT License (MIT)
//
// Copyright (c) 2018 Oleg Geier
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSOPMLItem.h" #import "RSOPMLItem.h"
#import "RSXMLInternal.h" #import "NSDictionary+RSXML.h"
NSString *OPMLTextKey = @"text"; NSString *OPMLTextKey = @"text";
@@ -63,8 +85,8 @@ NSString *OPMLXMLURLKey = @"xmlUrl";
} }
- (id)attributeForKey:(NSString *)key { - (id)attributeForKey:(NSString *)key {
if (self.attributes.count > 0 && !RSXMLStringIsEmpty(key)) { if (self.mutableAttributes.count > 0 && key && key.length > 0) {
return [self.attributes rsxml_objectForCaseInsensitiveKey:key]; return [self.mutableAttributes rsxml_objectForCaseInsensitiveKey:key];
} }
return nil; return nil;
} }

View File

@@ -1,29 +1,35 @@
// //
// RSOPMLParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 7/12/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; #import "RSXMLParser.h"
// <opml> <outline>
// http://dev.opml.org/spec2.html#subscriptionLists
@class RSXMLData;
@class RSOPMLItem; @class RSOPMLItem;
@interface RSOPMLParser: RSXMLParser<RSOPMLItem*>
typedef void (^RSParsedOPMLBlock)(RSOPMLItem *opmlDocument, NSError *error);
void RSParseOPML(RSXMLData *xmlData, RSParsedOPMLBlock callback); //async; calls back on main thread.
@interface RSOPMLParser: NSObject
- (instancetype)initWithXMLData:(RSXMLData *)xmlData;
@property (nonatomic, readonly) RSOPMLItem *opmlDocument;
@property (nonatomic, readonly) NSError *error;
@end @end

View File

@@ -1,172 +1,76 @@
// //
// RSOPMLParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 7/12/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSOPMLParser.h" #import "RSOPMLParser.h"
#import <libxml/xmlstring.h>
#import "RSXMLData.h"
#import "RSSAXParser.h"
#import "RSOPMLItem.h" #import "RSOPMLItem.h"
#import "RSXMLError.h"
@interface RSOPMLParser()
void RSParseOPML(RSXMLData *xmlData, RSParsedOPMLBlock callback) { @property (nonatomic, assign) BOOL parsingHead;
@property (nonatomic) RSOPMLItem *opmlDocument;
NSCParameterAssert(xmlData);
NSCParameterAssert(callback);
dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
@autoreleasepool {
RSOPMLParser *parser = [[RSOPMLParser alloc] initWithXMLData:xmlData];
RSOPMLItem *document = parser.opmlDocument;
NSError *error = parser.error;
dispatch_async(dispatch_get_main_queue(), ^{
callback(document, error);
});
}
});
}
@interface RSOPMLParser () <RSSAXParserDelegate>
@property (nonatomic, readwrite) RSOPMLItem *opmlDocument;
@property (nonatomic, readwrite) NSError *error;
@property (nonatomic) NSMutableArray<RSOPMLItem*> *itemStack; @property (nonatomic) NSMutableArray<RSOPMLItem*> *itemStack;
@end @end
@implementation RSOPMLParser @implementation RSOPMLParser
#pragma mark - RSXMLParserDelegate
#pragma mark - Init + (BOOL)isOPMLParser { return YES; }
- (instancetype)initWithXMLData:(RSXMLData *)XMLData { + (NSArray<const NSString*>*)parserRequireOrderedTags {
return @[@"<opml", @"<outline"];
self = [super init];
if (!self) {
return nil;
} }
[self parse:XMLData]; - (BOOL)xmlParserWillStartParsing {
return self;
}
#pragma mark - Private
- (void)parse:(RSXMLData *)XMLData {
@autoreleasepool {
if ([self canParseData:XMLData.data]) {
RSSAXParser *parser = [[RSSAXParser alloc] initWithDelegate:self];
self.itemStack = [NSMutableArray new];
self.opmlDocument = [RSOPMLItem new]; self.opmlDocument = [RSOPMLItem new];
[self.itemStack addObject:self.opmlDocument]; self.itemStack = [NSMutableArray arrayWithObject:self.opmlDocument];
[parser parseData:XMLData.data];
[parser finishParsing];
} else {
NSString *filename = nil;
NSURL *url = [NSURL URLWithString:XMLData.urlString];
if (url && url.isFileURL) {
filename = url.path.lastPathComponent;
}
if (!filename) {
filename = XMLData.urlString;
}
self.error = RSXMLMakeError(RSXMLErrorFileNotOPML, filename);
}
}
}
- (BOOL)canParseData:(NSData *)d {
// Check for <opml and <outline near the top.
@autoreleasepool {
NSString *s = [[NSString alloc] initWithBytesNoCopy:(void *)d.bytes length:d.length encoding:NSUTF8StringEncoding freeWhenDone:NO];
if (!s) {
NSDictionary *options = @{NSStringEncodingDetectionSuggestedEncodingsKey : @[@(NSUTF8StringEncoding)]};
(void)[NSString stringEncodingForData:d encodingOptions:options convertedString:&s usedLossyConversion:nil];
}
if (!s) {
return NO;
}
static const NSInteger numberOfCharactersToSearch = 4096;
NSRange rangeToSearch = NSMakeRange(0, numberOfCharactersToSearch);
if (s.length < numberOfCharactersToSearch) {
rangeToSearch.length = s.length;
}
NSRange opmlRange = [s rangeOfString:@"<opml" options:NSCaseInsensitiveSearch range:rangeToSearch];
if (opmlRange.location == NSNotFound) {
return NO;
}
NSRange outlineRange = [s rangeOfString:@"<outline" options:NSLiteralSearch range:rangeToSearch];
if (outlineRange.location == NSNotFound) {
return NO;
}
if (outlineRange.location < opmlRange.location) {
return NO;
}
}
return YES; return YES;
} }
- (id)xmlParserWillReturnDocument {
- (void)popItem { return self.opmlDocument;
NSAssert(self.itemStack.count > 0, nil);
/*If itemStack is empty, bad things are happening.
But we still shouldn't crash in production.*/
if (self.itemStack.count > 0) {
[self.itemStack removeLastObject];
}
} }
#pragma mark - RSSAXParserDelegate #pragma mark - RSSAXParserDelegate
static const char *kOutline = "outline";
static const char kOutlineLength = 8;
static const char *kHead = "head";
static const char kHeadLength = 5;
static BOOL isHead = NO;
- (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes { - (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes {
if (RSSAXEqualTags(localName, kOutline, kOutlineLength)) { int len = xmlStrlen(localName);
if (len == 7 && EqualBytes(localName, "outline", 7)) {
RSOPMLItem *item = [RSOPMLItem new]; RSOPMLItem *item = [RSOPMLItem new];
item.attributes = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes]; item.attributes = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
[self.itemStack.lastObject addChild:item]; [self.itemStack.lastObject addChild:item];
[self.itemStack addObject:item]; [self.itemStack addObject:item];
} else if (RSSAXEqualTags(localName, kHead, kHeadLength)) { }
isHead = YES; else if (len == 4 && EqualBytes(localName, "head", 4)) {
} else if (isHead) { self.parsingHead = YES;
}
else if (self.parsingHead) {
[SAXParser beginStoringCharacters]; [SAXParser beginStoringCharacters];
} }
} }
@@ -174,13 +78,17 @@ static BOOL isHead = NO;
- (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri { - (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri {
if (RSSAXEqualTags(localName, kOutline, kOutlineLength)) { int len = xmlStrlen(localName);
[self popItem];
} else if (RSSAXEqualTags(localName, kHead, kHeadLength)) { if (len == 7 && EqualBytes(localName, "outline", 7)) {
isHead = NO; [self.itemStack removeLastObject]; // safe to be called on empty array
} else if (isHead) { }
else if (len == 4 && EqualBytes(localName, "head", 4)) {
self.parsingHead = NO;
}
else if (self.parsingHead) { // handle xml tags in head as if they were attributes
NSString *key = [NSString stringWithFormat:@"%s", localName]; NSString *key = [NSString stringWithFormat:@"%s", localName];
[self.itemStack.lastObject setAttribute:[SAXParser currentString] forKey:key]; [self.itemStack.lastObject setAttribute:SAXParser.currentStringWithTrimmedWhitespace forKey:key];
} }
} }
@@ -191,24 +99,24 @@ static BOOL isHead = NO;
return nil; return nil;
} }
size_t nameLength = strlen((const char *)name); int len = xmlStrlen(name);
switch (nameLength) { switch (len) {
case 4: case 4:
if (RSSAXEqualTags(name, "text", 5)) return OPMLTextKey; if (EqualBytes(name, "text", 4)) return OPMLTextKey;
if (RSSAXEqualTags(name, "type", 5)) return OPMLTypeKey; if (EqualBytes(name, "type", 4)) return OPMLTypeKey;
break; break;
case 5: case 5:
if (RSSAXEqualTags(name, "title", 6)) return OPMLTitleKey; if (EqualBytes(name, "title", 5)) return OPMLTitleKey;
break; break;
case 6: case 6:
if (RSSAXEqualTags(name, "xmlUrl", 7)) return OPMLXMLURLKey; if (EqualBytes(name, "xmlUrl", 6)) return OPMLXMLURLKey;
break; break;
case 7: case 7:
if (RSSAXEqualTags(name, "version", 8)) return OPMLVersionKey; if (EqualBytes(name, "version", 7)) return OPMLVersionKey;
if (RSSAXEqualTags(name, "htmlUrl", 8)) return OPMLHMTLURLKey; if (EqualBytes(name, "htmlUrl", 7)) return OPMLHMTLURLKey;
break; break;
case 11: case 11:
if (RSSAXEqualTags(name, "description", 12)) return OPMLDescriptionKey; if (EqualBytes(name, "description", 11)) return OPMLDescriptionKey;
break; break;
} }
return nil; return nil;
@@ -220,11 +128,10 @@ static BOOL isHead = NO;
if (length < 1) { if (length < 1) {
return @""; return @"";
} else if (length == 3) { } else if (length == 3) {
if (RSSAXEqualBytes(bytes, "RSS", 3)) return @"RSS"; if (EqualBytes(bytes, "RSS", 3)) return @"RSS";
if (RSSAXEqualBytes(bytes, "rss", 3)) return @"rss"; if (EqualBytes(bytes, "rss", 3)) return @"rss";
} }
return nil; return nil;
} }
@end @end

View File

@@ -1,20 +1,37 @@
// //
// RSParsedArticle.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 12/6/14. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2014 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
@interface RSParsedArticle : NSObject @interface RSParsedArticle : NSObject
- (nonnull instancetype)initWithFeedURL:(NSString * _Nonnull)feedURL; - (nonnull instancetype)initWithFeedURL:(NSString * _Nonnull)feedURL dateParsed:(NSDate*)parsed;
@property (nonatomic, readonly, nonnull) NSString *feedURL; @property (nonatomic, readonly, nonnull) NSString *feedURL;
@property (nonatomic, nonnull) NSString *articleID; //Calculated. Don't get until other properties have been set. @property (nonatomic, readonly, nonnull) NSDate *dateParsed;
@property (nonatomic, readonly, nonnull) NSString *articleID; //Calculated. Don't get until other properties have been set.
@property (nonatomic, nullable) NSString *guid; @property (nonatomic, nullable) NSString *guid;
@property (nonatomic, nullable) NSString *title; @property (nonatomic, nullable) NSString *title;
@@ -25,7 +42,6 @@
@property (nonatomic, nullable) NSString *author; @property (nonatomic, nullable) NSString *author;
@property (nonatomic, nullable) NSDate *datePublished; @property (nonatomic, nullable) NSDate *datePublished;
@property (nonatomic, nullable) NSDate *dateModified; @property (nonatomic, nullable) NSDate *dateModified;
@property (nonatomic, nonnull) NSDate *dateParsed;
- (void)calculateArticleID; // Optimization. Call after all properties have been set. Call on a background thread. - (void)calculateArticleID; // Optimization. Call after all properties have been set. Call on a background thread.

View File

@@ -1,101 +1,104 @@
// //
// RSParsedArticle.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 12/6/14. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2014 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSParsedArticle.h" #import "RSParsedArticle.h"
#import "RSXMLInternal.h" #import "NSString+RSXML.h"
@interface RSParsedArticle()
@property (nonatomic, copy) NSString *internalArticleID;
@end
@implementation RSParsedArticle @implementation RSParsedArticle
- (instancetype)initWithFeedURL:(NSString *)feedURL dateParsed:(NSDate*)parsed {
#pragma mark - Init
- (instancetype)initWithFeedURL:(NSString *)feedURL {
NSParameterAssert(feedURL != nil); NSParameterAssert(feedURL != nil);
self = [super init]; self = [super init];
if (!self) { if (self) {
return nil;
}
_feedURL = feedURL; _feedURL = feedURL;
_dateParsed = [NSDate date]; _dateParsed = parsed;
}
return self; return self;
} }
#pragma mark - Unique Article ID
#pragma mark - Accessors /**
Article ID will be generated on the first access.
*/
- (NSString *)articleID { - (NSString *)articleID {
if (!_internalArticleID) {
if (!_articleID) { _internalArticleID = self.calculatedUniqueID;
_articleID = self.calculatedUniqueID; }
return _internalArticleID;
} }
return _articleID; /**
Initiate calculation of article id.
*/
- (void)calculateArticleID {
(void)self.articleID;
} }
/**
@return MD5 hash of @c feedURL @c + @c guid. Or a combination of properties when guid is not set.
@note
In general, feeds should have guids. When they don't, re-runs are very likely,
because there's no other 100% reliable way to determine identity.
*/
- (NSString *)calculatedUniqueID { - (NSString *)calculatedUniqueID {
/*guid+feedID, or a combination of properties when no guid. Then hash the result. NSAssert(self.feedURL != nil, @"Feed URL should always be set!");
In general, feeds should have guids. When they don't, re-runs are very likely, NSMutableString *s = [NSMutableString stringWithString:self.feedURL];
because there's no other 100% reliable way to determine identity.*/
NSMutableString *s = [NSMutableString stringWithString:@""]; if (self.guid.length > 0) {
NSString *datePublishedTimeStampString = nil;
if (self.datePublished) {
datePublishedTimeStampString = [NSString stringWithFormat:@"%.0f", self.datePublished.timeIntervalSince1970];
}
if (!RSXMLStringIsEmpty(self.guid)) {
[s appendString:self.guid]; [s appendString:self.guid];
} }
else if (!RSXMLStringIsEmpty(self.link) && self.datePublished != nil) {
[s appendString:self.link];
[s appendString:datePublishedTimeStampString];
}
else if (!RSXMLStringIsEmpty(self.title) && self.datePublished != nil) {
[s appendString:self.title];
[s appendString:datePublishedTimeStampString];
}
else if (self.datePublished != nil) { else if (self.datePublished != nil) {
[s appendString:datePublishedTimeStampString];
}
else if (!RSXMLStringIsEmpty(self.link)) { if (self.link.length > 0) {
[s appendString:self.link]; [s appendString:self.link];
} } else if (self.title.length > 0) {
else if (!RSXMLStringIsEmpty(self.title)) {
[s appendString:self.title]; [s appendString:self.title];
} }
[s appendString:[NSString stringWithFormat:@"%.0f", self.datePublished.timeIntervalSince1970]];
else if (!RSXMLStringIsEmpty(self.body)) { }
else if (self.link.length > 0) {
[s appendString:self.link];
}
else if (self.title.length > 0) {
[s appendString:self.title];
}
else if (self.body.length > 0) {
[s appendString:self.body]; [s appendString:self.body];
} }
NSAssert(!RSXMLStringIsEmpty(self.feedURL), nil);
[s appendString:self.feedURL];
return [s rsxml_md5HashString]; return [s rsxml_md5HashString];
} }
- (void)calculateArticleID { #pragma mark - Printing
(void)self.articleID;
}
- (NSString*)description { - (NSString*)description {
return [NSString stringWithFormat:@"{%@ '%@', guid: %@}", [self class], self.title, self.guid]; return [NSString stringWithFormat:@"{%@ '%@', guid: %@}", [self class], self.title, self.guid];

View File

@@ -1,23 +1,41 @@
// //
// RSParsedFeed.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 7/12/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
@class RSParsedArticle; @class RSParsedArticle;
@interface RSParsedFeed : NSObject @interface RSParsedFeed : NSObject
- (nonnull instancetype)initWithURLString:(NSString * _Nonnull)urlString title:(NSString * _Nullable)title link:(NSString * _Nullable)link articles:(NSArray <RSParsedArticle *>* _Nonnull)articles;
@property (nonatomic, readonly, nonnull) NSString *urlString; @property (nonatomic, readonly, nonnull) NSString *urlString;
@property (nonatomic, readonly, nullable) NSString *title; @property (nonatomic, readonly, nonnull) NSDate *dateParsed;
@property (nonatomic, readonly, nullable) NSString *link;
@property (nonatomic, nullable) NSString *subtitle;
@property (nonatomic, readonly, nonnull) NSArray <RSParsedArticle *> *articles; @property (nonatomic, readonly, nonnull) NSArray <RSParsedArticle *> *articles;
@property (nonatomic, nullable) NSString *title;
@property (nonatomic, nullable) NSString *link;
@property (nonatomic, nullable) NSString *subtitle;
- (nonnull instancetype)initWithURLString:(NSString * _Nonnull)urlString;
- (RSParsedArticle *)appendNewArticle;
@end @end

View File

@@ -1,33 +1,65 @@
// //
// RSParsedFeed.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 7/12/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSParsedFeed.h" #import "RSParsedFeed.h"
#import "RSParsedArticle.h"
@interface RSParsedFeed()
@property (nonatomic) NSMutableArray <RSParsedArticle *> *mutableArticles;
@end
@implementation RSParsedFeed @implementation RSParsedFeed
- (instancetype)initWithURLString:(NSString *)urlString title:(NSString *)title link:(NSString *)link articles:(NSArray *)articles { - (instancetype)initWithURLString:(NSString *)urlString {
self = [super init]; self = [super init];
if (!self) { if (self) {
return nil;
}
_urlString = urlString; _urlString = urlString;
_title = title; _mutableArticles = [NSMutableArray new];
_link = link; _dateParsed = [NSDate date];
_articles = articles; }
return self; return self;
} }
- (NSArray<RSParsedArticle *> *)articles {
return _mutableArticles;
}
/**
Append new @c RSParsedArticle object to @c .articles and return newly inserted instance.
*/
- (RSParsedArticle *)appendNewArticle {
RSParsedArticle *article = [[RSParsedArticle alloc] initWithFeedURL:self.urlString dateParsed:_dateParsed];
[_mutableArticles addObject:article];
return article;
}
#pragma mark - Printing
- (NSString*)description { - (NSString*)description {
return [NSString stringWithFormat:@"{%@ (%@), title: '%@', subtitle: '%@', entries: %@}", return [NSString stringWithFormat:@"{%@ (%@), title: '%@', subtitle: '%@', entries: %@}",
[self class], _link, _title, _subtitle, _articles]; [self class], _link, _title, _subtitle, _mutableArticles];
} }
@end @end

View File

@@ -1,13 +1,32 @@
// //
// RSRSSParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 1/6/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "FeedParser.h" #import "RSFeedParser.h"
@interface RSRSSParser : NSObject <FeedParser> // <channel> <item>
// https://cyber.harvard.edu/rss/rss.html
@interface RSRSSParser : RSFeedParser
@end @end

View File

@@ -1,351 +1,52 @@
// //
// RSRSSParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 1/6/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <libxml/xmlstring.h>
#import "RSRSSParser.h" #import "RSRSSParser.h"
#import "RSSAXParser.h"
#import "RSParsedFeed.h" #import "RSParsedFeed.h"
#import "RSParsedArticle.h" #import "RSParsedArticle.h"
#import "RSXMLData.h"
#import "RSXMLInternal.h"
#import "NSString+RSXML.h" #import "NSString+RSXML.h"
#import "RSDateParser.h" #import "NSDictionary+RSXML.h"
@interface RSRSSParser () <RSSAXParserDelegate>
@property (nonatomic) NSData *feedData;
@property (nonatomic) NSString *urlString;
@property (nonatomic) NSDictionary *currentAttributes;
@property (nonatomic) RSSAXParser *parser;
@property (nonatomic) NSMutableArray *articles;
@property (nonatomic) BOOL parsingArticle;
@property (nonatomic, readonly) RSParsedArticle *currentArticle;
@property (nonatomic) BOOL parsingChannelImage;
@property (nonatomic, readonly) NSDate *currentDate;
@property (nonatomic) BOOL endRSSFound;
@property (nonatomic) NSString *feedLink;
@property (nonatomic) NSString *feedTitle;
@property (nonatomic) NSString *feedSubtitle;
@property (nonatomic) NSDate *dateParsed;
@end
@implementation RSRSSParser
#pragma mark - Class Methods
+ (BOOL)canParseFeed:(RSXMLData *)xmlData {
// Checking for '<rss' and '<channel>' within first n characters should do it.
// TODO: handle RSS 1.0
@autoreleasepool {
NSData *feedData = xmlData.data;
NSString *s = [[NSString alloc] initWithBytesNoCopy:(void *)feedData.bytes length:feedData.length encoding:NSUTF8StringEncoding freeWhenDone:NO];
if (!s) {
s = [[NSString alloc] initWithData:feedData encoding:NSUTF8StringEncoding];
}
if (!s) {
s = [[NSString alloc] initWithData:feedData encoding:NSUnicodeStringEncoding];
}
if (!s) {
return NO;
}
static const NSInteger numberOfCharactersToSearch = 4096;
NSRange rangeToSearch = NSMakeRange(0, numberOfCharactersToSearch);
if (s.length < numberOfCharactersToSearch) {
rangeToSearch.length = s.length;
}
NSRange rssRange = [s rangeOfString:@"<rss" options:NSLiteralSearch range:rangeToSearch];
NSRange channelRange = [s rangeOfString:@"<channel>" options:NSLiteralSearch range:rangeToSearch];
if (rssRange.length < 1 || channelRange.length < 1) {
return NO;
}
if (rssRange.location > channelRange.location) {
return NO; // Wrong order.
}
}
return YES;
}
#pragma mark - Init
- (instancetype)initWithXMLData:(RSXMLData *)xmlData {
self = [super init];
if (!self) {
return nil;
}
_feedData = xmlData.data;
_urlString = xmlData.urlString;
_parser = [[RSSAXParser alloc] initWithDelegate:self];
_articles = [NSMutableArray new];
return self;
}
#pragma mark - API
- (RSParsedFeed *)parseFeed {
[self parse];
RSParsedFeed *parsedFeed = [[RSParsedFeed alloc] initWithURLString:self.urlString title:self.feedTitle link:self.feedLink articles:self.articles];
parsedFeed.subtitle = self.feedSubtitle;
return parsedFeed;
}
#pragma mark - Constants
static NSString *kIsPermaLinkKey = @"isPermaLink";
static NSString *kURLKey = @"url";
static NSString *kLengthKey = @"length";
static NSString *kTypeKey = @"type";
static NSString *kFalseValue = @"false";
static NSString *kTrueValue = @"true";
static NSString *kContentEncodedKey = @"content:encoded";
static NSString *kDCDateKey = @"dc:date";
static NSString *kDCCreatorKey = @"dc:creator";
static NSString *kRDFAboutKey = @"rdf:about"; static NSString *kRDFAboutKey = @"rdf:about";
static const char *kItem = "item"; @interface RSRSSParser () <RSSAXParserDelegate>
static const NSInteger kItemLength = 5; @property (nonatomic) BOOL parsingArticle;
@property (nonatomic) BOOL parsingChannelImage;
@property (nonatomic) BOOL guidIsPermalink;
@property (nonatomic) BOOL endRSSFound;
@property (nonatomic) NSURL *baseURL;
@end
static const char *kImage = "image"; // TODO: handle RSS 1.0
static const NSInteger kImageLength = 6; @implementation RSRSSParser
static const char *kLink = "link"; #pragma mark - RSXMLParserDelegate
static const NSInteger kLinkLength = 5;
static const char *kTitle = "title"; + (NSArray<const NSString *> *)parserRequireOrderedTags {
static const NSInteger kTitleLength = 6; return @[@"<rss", @"<channel>"];
static const char *kDC = "dc";
static const NSInteger kDCLength = 3;
static const char *kCreator = "creator";
static const NSInteger kCreatorLength = 8;
static const char *kDate = "date";
static const NSInteger kDateLength = 5;
static const char *kContent = "content";
static const NSInteger kContentLength = 8;
static const char *kEncoded = "encoded";
static const NSInteger kEncodedLength = 8;
static const char *kGuid = "guid";
static const NSInteger kGuidLength = 5;
static const char *kPubDate = "pubDate";
static const NSInteger kPubDateLength = 8;
static const char *kAuthor = "author";
static const NSInteger kAuthorLength = 7;
static const char *kDescription = "description";
static const NSInteger kDescriptionLength = 12;
static const char *kRSS = "rss";
static const NSInteger kRSSLength = 4;
static const char *kURL = "url";
static const NSInteger kURLLength = 4;
static const char *kLength = "length";
static const NSInteger kLengthLength = 7;
static const char *kType = "type";
static const NSInteger kTypeLength = 5;
static const char *kIsPermaLink = "isPermaLink";
static const NSInteger kIsPermaLinkLength = 12;
static const char *kRDF = "rdf";
static const NSInteger kRDFlength = 4;
static const char *kAbout = "about";
static const NSInteger kAboutLength = 6;
static const char *kFalse = "false";
static const NSInteger kFalseLength = 6;
static const char *kTrue = "true";
static const NSInteger kTrueLength = 5;
#pragma mark - Parsing
- (void)parse {
self.dateParsed = [NSDate date];
@autoreleasepool {
[self.parser parseData:self.feedData];
[self.parser finishParsing];
} }
// Optimization: make articles do calculations on this background thread.
[self.articles makeObjectsPerformSelector:@selector(calculateArticleID)];
}
- (void)addArticle {
RSParsedArticle *article = [[RSParsedArticle alloc] initWithFeedURL:self.urlString];
article.dateParsed = self.dateParsed;
[self.articles addObject:article];
}
- (RSParsedArticle *)currentArticle {
return self.articles.lastObject;
}
- (void)addFeedElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix {
if (prefix != NULL) {
return;
}
if (RSSAXEqualTags(localName, kLink, kLinkLength)) {
if (!self.feedLink) {
self.feedLink = self.parser.currentStringWithTrimmedWhitespace;
}
}
else if (RSSAXEqualTags(localName, kTitle, kTitleLength)) {
self.feedTitle = self.parser.currentStringWithTrimmedWhitespace;
}
else if (RSSAXEqualTags(localName, kDescription, kDescriptionLength)) {
self.feedSubtitle = self.parser.currentStringWithTrimmedWhitespace;
}
}
- (void)addDCElement:(const xmlChar *)localName {
if (RSSAXEqualTags(localName, kCreator, kCreatorLength)) {
self.currentArticle.author = self.parser.currentStringWithTrimmedWhitespace;
}
else if (RSSAXEqualTags(localName, kDate, kDateLength)) {
self.currentArticle.datePublished = self.currentDate;
}
}
- (void)addGuid {
self.currentArticle.guid = self.parser.currentStringWithTrimmedWhitespace;
NSString *isPermaLinkValue = [self.currentAttributes rsxml_objectForCaseInsensitiveKey:@"ispermalink"];
if (!isPermaLinkValue || ![isPermaLinkValue isEqualToString:@"false"]) {
self.currentArticle.permalink = [self urlString:self.currentArticle.guid];
}
}
- (NSString *)urlString:(NSString *)s {
/*Resolve against home page URL (if available) or feed URL.*/
if ([[s lowercaseString] hasPrefix:@"http"]) {
return s;
}
if (!self.feedLink) {
//TODO: get feed URL and use that to resolve URL.*/
return s;
}
NSURL *baseURL = [NSURL URLWithString:self.feedLink];
if (!baseURL) {
return s;
}
NSURL *resolvedURL = [NSURL URLWithString:s relativeToURL:baseURL];
if (resolvedURL.absoluteString) {
return resolvedURL.absoluteString;
}
return s;
}
- (NSString *)currentStringWithHTMLEntitiesDecoded {
return [self.parser.currentStringWithTrimmedWhitespace rs_stringByDecodingHTMLEntities];
}
- (void)addArticleElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix {
if (RSSAXEqualTags(prefix, kDC, kDCLength)) {
[self addDCElement:localName];
return;
}
if (RSSAXEqualTags(prefix, kContent, kContentLength) && RSSAXEqualTags(localName, kEncoded, kEncodedLength)) {
self.currentArticle.body = [self currentStringWithHTMLEntitiesDecoded];
return;
}
if (prefix != NULL) {
return;
}
if (RSSAXEqualTags(localName, kGuid, kGuidLength)) {
[self addGuid];
}
else if (RSSAXEqualTags(localName, kPubDate, kPubDateLength)) {
self.currentArticle.datePublished = self.currentDate;
}
else if (RSSAXEqualTags(localName, kAuthor, kAuthorLength)) {
self.currentArticle.author = self.parser.currentStringWithTrimmedWhitespace;
}
else if (RSSAXEqualTags(localName, kLink, kLinkLength)) {
self.currentArticle.link = [self urlString:self.parser.currentStringWithTrimmedWhitespace];
}
else if (RSSAXEqualTags(localName, kDescription, kDescriptionLength)) {
self.currentArticle.abstract = [self currentStringWithHTMLEntitiesDecoded];
}
else if (RSSAXEqualTags(localName, kTitle, kTitleLength)) {
self.currentArticle.title = [self currentStringWithHTMLEntitiesDecoded];
}
}
- (NSDate *)currentDate {
return RSDateWithBytes(self.parser.currentCharacters.bytes, self.parser.currentCharacters.length);
}
#pragma mark - RSSAXParserDelegate #pragma mark - RSSAXParserDelegate
- (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes { - (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes {
@@ -354,31 +55,61 @@ static const NSInteger kTrueLength = 5;
return; return;
} }
NSDictionary *xmlAttributes = nil; int len = xmlStrlen(localName);
if (RSSAXEqualTags(localName, kItem, kItemLength) || RSSAXEqualTags(localName, kGuid, kGuidLength)) {
xmlAttributes = [self.parser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
}
if (self.currentAttributes != xmlAttributes) {
self.currentAttributes = xmlAttributes;
}
if (!prefix && RSSAXEqualTags(localName, kItem, kItemLength)) { if (prefix != NULL) {
if (!self.parsingArticle || self.parsingChannelImage) {
[self addArticle]; return;
}
if (len != 4 && len != 7) {
return;
}
int prefLen = xmlStrlen(prefix);
if (prefLen == 2 && EqualBytes(prefix, "dc", 2)) {
if (EqualBytes(localName, "date", 4) || EqualBytes(localName, "creator", 7)) {
[SAXParser beginStoringCharacters];
}
}
else if (len == 7 && prefLen == 7 && EqualBytes(prefix, "content", 7) && EqualBytes(localName, "encoded", 7)) {
[SAXParser beginStoringCharacters];
}
return;
}
// else: localname without prefix
switch (len) {
case 4:
if (EqualBytes(localName, "item", 4)) {
self.parsingArticle = YES; self.parsingArticle = YES;
self.currentArticle = [self.parsedFeed appendNewArticle];
if (xmlAttributes && xmlAttributes[kRDFAboutKey]) { /*RSS 1.0 guid*/ NSDictionary *attribs = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
self.currentArticle.guid = xmlAttributes[kRDFAboutKey]; if (attribs) {
self.currentArticle.permalink = self.currentArticle.guid; NSString *about = attribs[kRDFAboutKey]; // RSS 1.0 guid
if (about) {
self.currentArticle.guid = about;
self.currentArticle.permalink = about;
} }
} }
}
else if (!prefix && RSSAXEqualTags(localName, kImage, kImageLength)) { else if (EqualBytes(localName, "guid", 4)) {
NSDictionary *attribs = [SAXParser attributesDictionary:attributes numberOfAttributes:numberOfAttributes];
NSString *isPermaLinkValue = [attribs rsxml_objectForCaseInsensitiveKey:@"isPermaLink"];
if (!isPermaLinkValue || ![isPermaLinkValue isEqualToString:@"false"]) {
self.guidIsPermalink = YES;
} else {
self.guidIsPermalink = NO;
}
}
break;
case 5:
if (EqualBytes(localName, "image", 5)) {
self.parsingChannelImage = YES; self.parsingChannelImage = YES;
} }
break;
}
if (!self.parsingChannelImage) { if (self.parsingArticle || !self.parsingChannelImage) {
[self.parser beginStoringCharacters]; [SAXParser beginStoringCharacters];
} }
} }
@@ -389,76 +120,131 @@ static const NSInteger kTrueLength = 5;
return; return;
} }
if (RSSAXEqualTags(localName, kRSS, kRSSLength)) { int len = xmlStrlen(localName);
self.endRSSFound = YES;
}
else if (RSSAXEqualTags(localName, kImage, kImageLength)) { // Meta parsing
self.parsingChannelImage = NO; if (len == 3 && EqualBytes(localName, "rss", 3)) { self.endRSSFound = YES; }
else if (len == 4 && EqualBytes(localName, "item", 4)) { self.parsingArticle = NO; }
else if (len == 5 && EqualBytes(localName, "image", 5)) { self.parsingChannelImage = NO; }
// Always exit if prefix is set
else if (prefix != NULL)
{
if (!self.parsingArticle) {
// Feed parsing
return;
} }
int prefLen = xmlStrlen(prefix);
else if (RSSAXEqualTags(localName, kItem, kItemLength)) { // Article parsing
self.parsingArticle = NO; switch (len) {
case 4:
if (prefLen == 2 && EqualBytes(prefix, "dc", 2) && EqualBytes(localName, "date", 4))
self.currentArticle.datePublished = [self dateFromCharacters:SAXParser.currentCharacters];
return;
case 7:
if (prefLen == 2 && EqualBytes(prefix, "dc", 2) && EqualBytes(localName, "creator", 7)) {
self.currentArticle.author = SAXParser.currentStringWithTrimmedWhitespace;
} }
else if (prefLen == 7 && EqualBytes(prefix, "content", 7) && EqualBytes(localName, "encoded", 7)) {
else if (self.parsingArticle) { self.currentArticle.body = [self decodeHTMLEntities:SAXParser.currentStringWithTrimmedWhitespace];
[self addArticleElement:localName prefix:prefix]; }
return;
}
}
// Article parsing
else if (self.parsingArticle)
{
switch (len) {
case 4:
if (EqualBytes(localName, "link", 4)) {
self.currentArticle.link = [SAXParser.currentStringWithTrimmedWhitespace absoluteURLWithBase:self.baseURL];
}
else if (EqualBytes(localName, "guid", 4)) {
self.currentArticle.guid = SAXParser.currentStringWithTrimmedWhitespace;
if (self.guidIsPermalink) {
self.currentArticle.permalink = [self.currentArticle.guid absoluteURLWithBase:self.baseURL];
}
}
return;
case 5:
if (EqualBytes(localName, "title", 5))
self.currentArticle.title = [self decodeHTMLEntities:SAXParser.currentStringWithTrimmedWhitespace];
return;
case 6:
if (EqualBytes(localName, "author", 6))
self.currentArticle.author = SAXParser.currentStringWithTrimmedWhitespace;
return;
case 7:
if (EqualBytes(localName, "pubDate", 7))
self.currentArticle.datePublished = [self dateFromCharacters:SAXParser.currentCharacters];
return;
case 11:
if (EqualBytes(localName, "description", 11))
self.currentArticle.abstract = [self decodeHTMLEntities:SAXParser.currentStringWithTrimmedWhitespace];
return;
}
}
// Feed parsing
else if (!self.parsingChannelImage)
{
switch (len) {
case 4:
if (EqualBytes(localName, "link", 4)) {
self.parsedFeed.link = SAXParser.currentStringWithTrimmedWhitespace;
self.baseURL = [NSURL URLWithString:self.parsedFeed.link];
}
return;
case 5:
if (EqualBytes(localName, "title", 5))
self.parsedFeed.title = SAXParser.currentStringWithTrimmedWhitespace;
return;
case 11:
if (EqualBytes(localName, "description", 11))
self.parsedFeed.subtitle = SAXParser.currentStringWithTrimmedWhitespace;
return;
} }
else if (!self.parsingChannelImage) {
[self addFeedElement:localName prefix:prefix];
} }
} }
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForName:(const xmlChar *)name prefix:(const xmlChar *)prefix { - (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForName:(const xmlChar *)name prefix:(const xmlChar *)prefix {
if (RSSAXEqualTags(prefix, kRDF, kRDFlength)) { int len = xmlStrlen(name);
if (RSSAXEqualTags(name, kAbout, kAboutLength)) {
return kRDFAboutKey;
}
return nil;
}
if (prefix) { if (prefix) {
if (len == 5 && EqualBytes(prefix, "rdf", 4) && EqualBytes(name, "about", 5)) { // 4 because prefix length is not checked
return kRDFAboutKey;
}
return nil; return nil;
} }
if (RSSAXEqualTags(name, kIsPermaLink, kIsPermaLinkLength)) { switch (len) {
return kIsPermaLinkKey; case 3:
if (EqualBytes(name, "url", 3)) { return @"url"; }
break;
case 4:
if (EqualBytes(name, "type", 4)) { return @"type"; }
break;
case 6:
if (EqualBytes(name, "length", 6)) { return @"length"; }
break;
case 11:
if (EqualBytes(name, "isPermaLink", 11)) { return @"isPermaLink"; }
break;
} }
if (RSSAXEqualTags(name, kURL, kURLLength)) {
return kURLKey;
}
if (RSSAXEqualTags(name, kLength, kLengthLength)) {
return kLengthKey;
}
if (RSSAXEqualTags(name, kType, kTypeLength)) {
return kTypeKey;
}
return nil; return nil;
} }
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForValue:(const void *)bytes length:(NSUInteger)length { - (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForValue:(const void *)bytes length:(NSUInteger)length {
static const NSUInteger falseLength = kFalseLength - 1; switch (length) {
static const NSUInteger trueLength = kTrueLength - 1; case 4:
if (EqualBytes(bytes, "true", 4)) { return @"true"; }
if (length == falseLength && RSSAXEqualBytes(bytes, kFalse, falseLength)) { break;
return kFalseValue; case 5:
if (EqualBytes(bytes, "false", 5)) { return @"false"; }
break;
} }
if (length == trueLength && RSSAXEqualBytes(bytes, kTrue, trueLength)) {
return kTrueValue;
}
return nil; return nil;
} }

View File

@@ -1,49 +0,0 @@
//
// RSSAXHTMLParser.h
// RSXML
//
// Created by Brent Simmons on 3/6/16.
// Copyright © 2016 Ranchero Software, LLC. All rights reserved.
//
@import Foundation;
@class RSSAXHTMLParser;
@protocol RSSAXHTMLParserDelegate <NSObject>
@optional
- (void)saxParser:(RSSAXHTMLParser *)SAXParser XMLStartElement:(const unsigned char *)localName attributes:(const unsigned char **)attributes;
- (void)saxParser:(RSSAXHTMLParser *)SAXParser XMLEndElement:(const unsigned char *)localName;
- (void)saxParser:(RSSAXHTMLParser *)SAXParser XMLCharactersFound:(const unsigned char *)characters length:(NSUInteger)length;
- (void)saxParserDidReachEndOfDocument:(RSSAXHTMLParser *)SAXParser; // If canceled, may not get called (but might).
@end
@interface RSSAXHTMLParser : NSObject
- (instancetype)initWithDelegate:(id<RSSAXHTMLParserDelegate>)delegate;
- (void)parseData:(NSData *)data;
- (void)parseBytes:(const void *)bytes numberOfBytes:(NSUInteger)numberOfBytes;
- (void)finishParsing;
- (void)cancel;
@property (nonatomic, strong, readonly) NSData *currentCharacters; // nil if not storing characters. UTF-8 encoded.
@property (nonatomic, strong, readonly) NSString *currentString; // Convenience to get string version of currentCharacters.
@property (nonatomic, strong, readonly) NSString *currentStringWithTrimmedWhitespace;
- (void)beginStoringCharacters; // Delegate can call from XMLStartElement. Characters will be available in XMLEndElement as currentCharacters property. Storing characters is stopped after each XMLEndElement.
// Delegate can call from within XMLStartElement.
- (NSDictionary *)attributesDictionary:(const unsigned char **)attributes;
@end

View File

@@ -1,315 +0,0 @@
//
// RSSAXHTMLParser.m
// RSXML
//
// Created by Brent Simmons on 3/6/16.
// Copyright © 2016 Ranchero Software, LLC. All rights reserved.
//
#import "RSSAXHTMLParser.h"
#import "RSSAXParser.h"
#import <libxml/tree.h>
#import <libxml/xmlstring.h>
#import <libxml/HTMLparser.h>
#import "RSXMLInternal.h"
@interface RSSAXHTMLParser ()
@property (nonatomic) id<RSSAXHTMLParserDelegate> delegate;
@property (nonatomic, assign) htmlParserCtxtPtr context;
@property (nonatomic, assign) BOOL storingCharacters;
@property (nonatomic) NSMutableData *characters;
@property (nonatomic) BOOL delegateRespondsToStartElementMethod;
@property (nonatomic) BOOL delegateRespondsToEndElementMethod;
@property (nonatomic) BOOL delegateRespondsToCharactersFoundMethod;
@property (nonatomic) BOOL delegateRespondsToEndOfDocumentMethod;
@end
@implementation RSSAXHTMLParser
+ (void)initialize {
RSSAXInitLibXMLParser();
}
#pragma mark - Init
- (instancetype)initWithDelegate:(id<RSSAXHTMLParserDelegate>)delegate {
self = [super init];
if (self == nil)
return nil;
_delegate = delegate;
if ([_delegate respondsToSelector:@selector(saxParser:XMLStartElement:attributes:)]) {
_delegateRespondsToStartElementMethod = YES;
}
if ([_delegate respondsToSelector:@selector(saxParser:XMLEndElement:)]) {
_delegateRespondsToEndElementMethod = YES;
}
if ([_delegate respondsToSelector:@selector(saxParser:XMLCharactersFound:length:)]) {
_delegateRespondsToCharactersFoundMethod = YES;
}
if ([_delegate respondsToSelector:@selector(saxParserDidReachEndOfDocument:)]) {
_delegateRespondsToEndOfDocumentMethod = YES;
}
return self;
}
#pragma mark - Dealloc
- (void)dealloc {
if (_context != nil) {
htmlFreeParserCtxt(_context);
_context = nil;
}
_delegate = nil;
}
#pragma mark - API
static xmlSAXHandler saxHandlerStruct;
- (void)parseData:(NSData *)data {
[self parseBytes:data.bytes numberOfBytes:data.length];
}
- (void)parseBytes:(const void *)bytes numberOfBytes:(NSUInteger)numberOfBytes {
if (self.context == nil) {
xmlCharEncoding characterEncoding = xmlDetectCharEncoding(bytes, (int)numberOfBytes);
self.context = htmlCreatePushParserCtxt(&saxHandlerStruct, (__bridge void *)self, nil, 0, nil, characterEncoding);
htmlCtxtUseOptions(self.context, XML_PARSE_RECOVER | XML_PARSE_NONET | HTML_PARSE_COMPACT);
}
@autoreleasepool {
htmlParseChunk(self.context, (const char *)bytes, (int)numberOfBytes, 0);
}
}
- (void)finishParsing {
NSAssert(self.context != nil, nil);
if (self.context == nil)
return;
@autoreleasepool {
htmlParseChunk(self.context, nil, 0, 1);
htmlFreeParserCtxt(self.context);
self.context = nil;
self.characters = nil;
}
}
- (void)cancel {
@autoreleasepool {
xmlStopParser(self.context);
}
}
- (void)beginStoringCharacters {
self.storingCharacters = YES;
self.characters = [NSMutableData new];
}
- (void)endStoringCharacters {
self.storingCharacters = NO;
self.characters = nil;
}
- (NSData *)currentCharacters {
if (!self.storingCharacters) {
return nil;
}
return self.characters;
}
- (NSString *)currentString {
NSData *d = self.currentCharacters;
if (RSXMLIsEmpty(d)) {
return nil;
}
return [[NSString alloc] initWithData:d encoding:NSUTF8StringEncoding];
}
- (NSString *)currentStringWithTrimmedWhitespace {
return [self.currentString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
#pragma mark - Attributes Dictionary
- (NSDictionary *)attributesDictionary:(const xmlChar **)attributes {
if (!attributes) {
return nil;
}
NSMutableDictionary *d = [NSMutableDictionary new];
NSInteger ix = 0;
NSString *currentKey = nil;
while (true) {
const xmlChar *oneAttribute = attributes[ix];
ix++;
if (!currentKey && !oneAttribute) {
break;
}
if (!currentKey) {
currentKey = [NSString stringWithUTF8String:(const char *)oneAttribute];
}
else {
NSString *value = nil;
if (oneAttribute) {
value = [NSString stringWithUTF8String:(const char *)oneAttribute];
}
d[currentKey] = value ? value : @"";
currentKey = nil;
}
}
return [d copy];
}
#pragma mark - Callbacks
- (void)xmlEndDocument {
@autoreleasepool {
if (self.delegateRespondsToEndOfDocumentMethod) {
[self.delegate saxParserDidReachEndOfDocument:self];
}
[self endStoringCharacters];
}
}
- (void)xmlCharactersFound:(const xmlChar *)ch length:(NSUInteger)length {
@autoreleasepool {
if (self.storingCharacters) {
[self.characters appendBytes:(const void *)ch length:length];
}
if (self.delegateRespondsToCharactersFoundMethod) {
[self.delegate saxParser:self XMLCharactersFound:ch length:length];
}
}
}
- (void)xmlStartElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes {
@autoreleasepool {
if (self.delegateRespondsToStartElementMethod) {
[self.delegate saxParser:self XMLStartElement:localName attributes:attributes];
}
}
}
- (void)xmlEndElement:(const xmlChar *)localName {
@autoreleasepool {
if (self.delegateRespondsToEndElementMethod) {
[self.delegate saxParser:self XMLEndElement:localName];
}
[self endStoringCharacters];
}
}
@end
static void startElementSAX(void *context, const xmlChar *localname, const xmlChar **attributes) {
[(__bridge RSSAXHTMLParser *)context xmlStartElement:localname attributes:attributes];
}
static void endElementSAX(void *context, const xmlChar *localname) {
[(__bridge RSSAXHTMLParser *)context xmlEndElement:localname];
}
static void charactersFoundSAX(void *context, const xmlChar *ch, int len) {
[(__bridge RSSAXHTMLParser *)context xmlCharactersFound:ch length:(NSUInteger)len];
}
static void endDocumentSAX(void *context) {
[(__bridge RSSAXHTMLParser *)context xmlEndDocument];
}
static htmlSAXHandler saxHandlerStruct = {
nil, /* internalSubset */
nil, /* isStandalone */
nil, /* hasInternalSubset */
nil, /* hasExternalSubset */
nil, /* resolveEntity */
nil, /* getEntity */
nil, /* entityDecl */
nil, /* notationDecl */
nil, /* attributeDecl */
nil, /* elementDecl */
nil, /* unparsedEntityDecl */
nil, /* setDocumentLocator */
nil, /* startDocument */
endDocumentSAX, /* endDocument */
startElementSAX, /* startElement*/
endElementSAX, /* endElement */
nil, /* reference */
charactersFoundSAX, /* characters */
nil, /* ignorableWhitespace */
nil, /* processingInstruction */
nil, /* comment */
nil, /* warning */
nil, /* error */
nil, /* fatalError //: unused error() get all the errors */
nil, /* getParameterEntity */
nil, /* cdataBlock */
nil, /* externalSubset */
XML_SAX2_MAGIC,
nil,
nil, /* startElementNs */
nil, /* endElementNs */
nil /* serror */
};

View File

@@ -1,12 +1,29 @@
// //
// RSSAXParser.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/25/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
#import <libxml/xmlstring.h>
/*Thread-safe, not re-entrant. /*Thread-safe, not re-entrant.
@@ -22,48 +39,39 @@
@protocol RSSAXParserDelegate <NSObject> @protocol RSSAXParserDelegate <NSObject>
+ (BOOL)isHTMLParser; // reusing class method of RSXMLParser delegate
@optional @optional
- (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const unsigned char *)localName prefix:(const unsigned char *)prefix uri:(const unsigned char *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const unsigned char **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const unsigned char **)attributes; // Called when parsing HTML
- (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes;
- (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName;
- (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const unsigned char *)localName prefix:(const unsigned char *)prefix uri:(const unsigned char *)uri; // Called when parsing XML (Atom, RSS, OPML)
- (void)saxParser:(RSSAXParser *)SAXParser XMLStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(NSInteger)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(NSInteger)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes;
- (void)saxParser:(RSSAXParser *)SAXParser XMLCharactersFound:(const unsigned char *)characters length:(NSUInteger)length; - (void)saxParser:(RSSAXParser *)SAXParser XMLEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri;
- (void)saxParserDidReachEndOfDocument:(RSSAXParser *)SAXParser; /*If canceled, may not get called (but might).*/
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForName:(const unsigned char *)name prefix:(const unsigned char *)prefix; /*Okay to return nil. Prefix may be nil.*/
// Called regardless of parser type
- (void)saxParser:(RSSAXParser *)SAXParser XMLCharactersFound:(const xmlChar *)characters length:(NSUInteger)length;
- (void)saxParserDidReachEndOfDocument:(RSSAXParser *)SAXParser; // If canceled, may not get called (but might).
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForName:(const xmlChar *)name prefix:(const xmlChar *)prefix; // Okay to return nil. Prefix may be nil.
- (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForValue:(const void *)bytes length:(NSUInteger)length; - (NSString *)saxParser:(RSSAXParser *)SAXParser internedStringForValue:(const void *)bytes length:(NSUInteger)length;
@end @end
void RSSAXInitLibXMLParser(void); // Needed by RSSAXHTMLParser.
/*For use by delegate.*/
BOOL RSSAXEqualTags(const unsigned char *localName, const char *tag, NSInteger tagLength);
BOOL RSSAXEqualBytes(const void *bytes1, const void *bytes2, NSUInteger length);
@interface RSSAXParser : NSObject @interface RSSAXParser : NSObject
@property (nonatomic, strong, readonly) NSData *currentCharacters;
@property (nonatomic, strong, readonly) NSString *currentString;
@property (nonatomic, strong, readonly) NSString *currentStringWithTrimmedWhitespace;
- (instancetype)initWithDelegate:(id<RSSAXParserDelegate>)delegate; - (instancetype)initWithDelegate:(id<RSSAXParserDelegate>)delegate;
- (void)parseData:(NSData *)data;
- (void)parseBytes:(const void *)bytes numberOfBytes:(NSUInteger)numberOfBytes; - (void)parseBytes:(const void *)bytes numberOfBytes:(NSUInteger)numberOfBytes;
- (void)finishParsing;
- (void)cancel; - (void)cancel;
- (void)beginStoringCharacters;
@property (nonatomic, strong, readonly) NSData *currentCharacters; /*nil if not storing characters. UTF-8 encoded.*/
@property (nonatomic, strong, readonly) NSString *currentString; /*Convenience to get string version of currentCharacters.*/
@property (nonatomic, strong, readonly) NSString *currentStringWithTrimmedWhitespace;
- (void)beginStoringCharacters; /*Delegate can call from XMLStartElement. Characters will be available in XMLEndElement as currentCharacters property. Storing characters is stopped after each XMLEndElement.*/
/*Delegate can call from within XMLStartElement. Returns nil if numberOfAttributes < 1.*/
- (NSDictionary *)attributesDictionary:(const unsigned char **)attributes numberOfAttributes:(NSInteger)numberOfAttributes; - (NSDictionary *)attributesDictionary:(const unsigned char **)attributes numberOfAttributes:(NSInteger)numberOfAttributes;
- (NSDictionary *)attributesDictionaryHTML:(const xmlChar **)attributes;
@end @end

View File

@@ -1,42 +1,57 @@
// //
// RSSAXParser.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/25/15. // Copyright (c) 2016 Brent Simmons
// Copyright (c) 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <libxml/tree.h> #import <libxml/tree.h>
#import <libxml/xmlstring.h> #import <libxml/xmlstring.h>
#import <libxml/parser.h> #import <libxml/parser.h>
#import "RSSAXParser.h" #import "RSSAXParser.h"
#import "RSXMLInternal.h"
@interface RSSAXParser () @interface RSSAXParser ()
@property (nonatomic, weak) id<RSSAXParserDelegate> delegate; @property (nonatomic, weak) id<RSSAXParserDelegate> delegate;
@property (nonatomic, assign) xmlParserCtxtPtr context; @property (nonatomic, assign) xmlParserCtxtPtr context;
@property (nonatomic, assign) BOOL storingCharacters; @property (nonatomic, assign) BOOL storingCharacters;
@property (nonatomic) NSMutableData *characters; @property (nonatomic) NSMutableData *characters;
@property (nonatomic) BOOL delegateRespondsToInternedStringMethod; @property (nonatomic, assign) BOOL isHTMLParser;
@property (nonatomic) BOOL delegateRespondsToInternedStringForValueMethod; @property (nonatomic, assign) BOOL delegateRespondsToInternedStringMethod;
@property (nonatomic) BOOL delegateRespondsToStartElementMethod; @property (nonatomic, assign) BOOL delegateRespondsToInternedStringForValueMethod;
@property (nonatomic) BOOL delegateRespondsToEndElementMethod; @property (nonatomic, assign) BOOL delegateRespondsToStartElementMethod;
@property (nonatomic) BOOL delegateRespondsToCharactersFoundMethod; @property (nonatomic, assign) BOOL delegateRespondsToEndElementMethod;
@property (nonatomic) BOOL delegateRespondsToEndOfDocumentMethod; @property (nonatomic, assign) BOOL delegateRespondsToCharactersFoundMethod;
@property (nonatomic, assign) BOOL delegateRespondsToEndOfDocumentMethod;
@end @end
@implementation RSSAXParser @implementation RSSAXParser
+ (void)initialize { + (void)initialize {
static dispatch_once_t onceToken;
RSSAXInitLibXMLParser(); dispatch_once(&onceToken, ^{
xmlInitParser();
});
} }
#pragma mark - Init #pragma mark - Init
- (instancetype)initWithDelegate:(id<RSSAXParserDelegate>)delegate { - (instancetype)initWithDelegate:(id<RSSAXParserDelegate>)delegate {
@@ -46,32 +61,23 @@
return nil; return nil;
_delegate = delegate; _delegate = delegate;
_delegateRespondsToCharactersFoundMethod = [_delegate respondsToSelector:@selector(saxParser:XMLCharactersFound:length:)];
_delegateRespondsToEndOfDocumentMethod = [_delegate respondsToSelector:@selector(saxParserDidReachEndOfDocument:)];
_delegateRespondsToInternedStringMethod = [_delegate respondsToSelector:@selector(saxParser:internedStringForName:prefix:)];
_delegateRespondsToInternedStringForValueMethod = [_delegate respondsToSelector:@selector(saxParser:internedStringForValue:length:)];
if ([_delegate respondsToSelector:@selector(saxParser:internedStringForName:prefix:)]) { if ([[_delegate class] respondsToSelector:@selector(isHTMLParser)] && [[_delegate class] isHTMLParser]) {
_delegateRespondsToInternedStringMethod = YES; _isHTMLParser = YES;
} _delegateRespondsToStartElementMethod = [_delegate respondsToSelector:@selector(saxParser:XMLStartElement:attributes:)];
if ([_delegate respondsToSelector:@selector(saxParser:internedStringForValue:length:)]) { _delegateRespondsToEndElementMethod = [_delegate respondsToSelector:@selector(saxParser:XMLEndElement:)];
_delegateRespondsToInternedStringForValueMethod = YES; } else {
} _delegateRespondsToStartElementMethod = [_delegate respondsToSelector:@selector(saxParser:XMLStartElement:prefix:uri:numberOfNamespaces:namespaces:numberOfAttributes:numberDefaulted:attributes:)];
if ([_delegate respondsToSelector:@selector(saxParser:XMLStartElement:prefix:uri:numberOfNamespaces:namespaces:numberOfAttributes:numberDefaulted:attributes:)]) { _delegateRespondsToEndElementMethod = [_delegate respondsToSelector:@selector(saxParser:XMLEndElement:prefix:uri:)];
_delegateRespondsToStartElementMethod = YES;
}
if ([_delegate respondsToSelector:@selector(saxParser:XMLEndElement:prefix:uri:)]) {
_delegateRespondsToEndElementMethod = YES;
}
if ([_delegate respondsToSelector:@selector(saxParser:XMLCharactersFound:length:)]) {
_delegateRespondsToCharactersFoundMethod = YES;
}
if ([_delegate respondsToSelector:@selector(saxParserDidReachEndOfDocument:)]) {
_delegateRespondsToEndOfDocumentMethod = YES;
} }
return self; return self;
} }
#pragma mark - Dealloc
- (void)dealloc { - (void)dealloc {
if (_context != nil) { if (_context != nil) {
xmlFreeParserCtxt(_context); xmlFreeParserCtxt(_context);
@@ -83,28 +89,39 @@
#pragma mark - API #pragma mark - API
static xmlSAXHandler saxHandlerStruct; static xmlSAXHandler saxHandlerStruct;
- (void)parseData:(NSData *)data { /**
Initialize new xml or html parser context and start processing of data.
[self parseBytes:data.bytes numberOfBytes:data.length]; */
}
- (void)parseBytes:(const void *)bytes numberOfBytes:(NSUInteger)numberOfBytes { - (void)parseBytes:(const void *)bytes numberOfBytes:(NSUInteger)numberOfBytes {
if (self.context == nil) { if (self.context == nil) {
if (self.isHTMLParser) {
xmlCharEncoding characterEncoding = xmlDetectCharEncoding(bytes, (int)numberOfBytes);
self.context = htmlCreatePushParserCtxt(&saxHandlerStruct, (__bridge void *)self, nil, 0, nil, characterEncoding);
htmlCtxtUseOptions(self.context, XML_PARSE_RECOVER | XML_PARSE_NONET | HTML_PARSE_COMPACT);
} else {
self.context = xmlCreatePushParserCtxt(&saxHandlerStruct, (__bridge void *)self, nil, 0, nil); self.context = xmlCreatePushParserCtxt(&saxHandlerStruct, (__bridge void *)self, nil, 0, nil);
xmlCtxtUseOptions(self.context, XML_PARSE_RECOVER | XML_PARSE_NOENT); xmlCtxtUseOptions(self.context, XML_PARSE_RECOVER | XML_PARSE_NOENT);
} }
}
@autoreleasepool { @autoreleasepool {
if (self.isHTMLParser) {
htmlParseChunk(self.context, (const char *)bytes, (int)numberOfBytes, 0);
} else {
xmlParseChunk(self.context, (const char *)bytes, (int)numberOfBytes, 0); xmlParseChunk(self.context, (const char *)bytes, (int)numberOfBytes, 0);
} }
} }
[self finishParsing];
}
/**
Call after @c parseData: or @c parseBytes:numberOfBytes:
*/
- (void)finishParsing { - (void)finishParsing {
NSAssert(self.context != nil, nil); NSAssert(self.context != nil, nil);
@@ -112,63 +129,70 @@ static xmlSAXHandler saxHandlerStruct;
return; return;
@autoreleasepool { @autoreleasepool {
if (self.isHTMLParser) {
htmlParseChunk(self.context, nil, 0, 1);
htmlFreeParserCtxt(self.context);
} else {
xmlParseChunk(self.context, nil, 0, 1); xmlParseChunk(self.context, nil, 0, 1);
xmlFreeParserCtxt(self.context); xmlFreeParserCtxt(self.context);
}
self.context = nil; self.context = nil;
self.characters = nil; self.characters = nil;
} }
} }
/// Will stop the sax parser from processing any further. @c saxParserDidReachEndOfDocument: will not be called.
- (void)cancel { - (void)cancel {
@autoreleasepool { @autoreleasepool {
xmlStopParser(self.context); xmlStopParser(self.context);
} }
} }
/**
Delegate can call from @c XMLStartElement.
Characters will be available in @c XMLEndElement as @c currentCharacters property.
Storing characters is stopped after each @c XMLEndElement.
*/
- (void)beginStoringCharacters { - (void)beginStoringCharacters {
self.storingCharacters = YES; self.storingCharacters = YES;
self.characters = [NSMutableData new]; self.characters = [NSMutableData new];
} }
/// Will be called after each closing tag and the document end.
- (void)endStoringCharacters { - (void)endStoringCharacters {
self.storingCharacters = NO; self.storingCharacters = NO;
self.characters = nil; self.characters = nil;
} }
/// @return @c nil if not storing characters. UTF-8 encoded.
- (NSData *)currentCharacters { - (NSData *)currentCharacters {
if (!self.storingCharacters) { if (!self.storingCharacters) {
return nil; return nil;
} }
return self.characters; return self.characters;
} }
/// Convenience method to get string version of @c currentCharacters.
- (NSString *)currentString { - (NSString *)currentString {
NSData *d = self.currentCharacters; NSData *d = self.currentCharacters;
if (RSXMLIsEmpty(d)) { if (!d || d.length == 0) {
return nil; return nil;
} }
return [[NSString alloc] initWithData:d encoding:NSUTF8StringEncoding]; return [[NSString alloc] initWithData:d encoding:NSUTF8StringEncoding];
} }
/// Trim whitespace and newline characters from @c currentString.
- (NSString *)currentStringWithTrimmedWhitespace { - (NSString *)currentStringWithTrimmedWhitespace {
return [self.currentString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]; return [self.currentString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
} }
#pragma mark - Attributes Dictionary #pragma mark - Attributes Dictionary
/**
Delegate can call from within @c XMLStartElement. Returns @c nil if @c numberOfAttributes @c < @c 1.
*/
- (NSDictionary *)attributesDictionary:(const xmlChar **)attributes numberOfAttributes:(NSInteger)numberOfAttributes { - (NSDictionary *)attributesDictionary:(const xmlChar **)attributes numberOfAttributes:(NSInteger)numberOfAttributes {
if (numberOfAttributes < 1 || !attributes) { if (numberOfAttributes < 1 || !attributes) {
@@ -178,8 +202,7 @@ static xmlSAXHandler saxHandlerStruct;
NSMutableDictionary *d = [NSMutableDictionary new]; NSMutableDictionary *d = [NSMutableDictionary new];
@autoreleasepool { @autoreleasepool {
NSInteger i = 0, j = 0; for (NSInteger i = 0, j = 0; i < numberOfAttributes; i++, j+=5) {
for (i = 0, j = 0; i < numberOfAttributes; i++, j+=5) {
NSUInteger lenValue = (NSUInteger)(attributes[j + 4] - attributes[j + 3]); NSUInteger lenValue = (NSUInteger)(attributes[j + 4] - attributes[j + 3]);
NSString *value = nil; NSString *value = nil;
@@ -210,29 +233,48 @@ static xmlSAXHandler saxHandlerStruct;
} }
} }
} }
return d;
}
/**
Delegate can call from within @c XMLStartElement. Returns @c nil if @c numberOfAttributes @c < @c 1.
*/
- (NSDictionary *)attributesDictionaryHTML:(const xmlChar **)attributes {
if (!attributes) {
return nil;
}
NSMutableDictionary *d = [NSMutableDictionary new];
NSInteger ix = 0;
NSString *currentKey = nil;
while (true) {
const xmlChar *oneAttribute = attributes[ix];
ix++;
if (!currentKey && !oneAttribute) {
break;
}
if (!currentKey) {
currentKey = [NSString stringWithUTF8String:(const char *)oneAttribute];
}
else {
NSString *value = nil;
if (oneAttribute) {
value = [NSString stringWithUTF8String:(const char *)oneAttribute];
}
d[currentKey] = (value ? value : @"");
currentKey = nil;
}
}
return d; return d;
} }
#pragma mark - Equal Tags
BOOL RSSAXEqualTags(const xmlChar *localName, const char *tag, NSInteger tagLength) {
if (!localName) {
return NO;
}
return !strncmp((const char *)localName, tag, (size_t)tagLength);
}
BOOL RSSAXEqualBytes(const void *bytes1, const void *bytes2, NSUInteger length) {
return memcmp(bytes1, bytes2, length) == 0;
}
#pragma mark - Callbacks #pragma mark - Callbacks
- (void)xmlEndDocument { - (void)xmlEndDocument {
@autoreleasepool { @autoreleasepool {
@@ -261,50 +303,72 @@ BOOL RSSAXEqualBytes(const void *bytes1, const void *bytes2, NSUInteger length)
- (void)xmlStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(int)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(int)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes { - (void)xmlStartElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri numberOfNamespaces:(int)numberOfNamespaces namespaces:(const xmlChar **)namespaces numberOfAttributes:(int)numberOfAttributes numberDefaulted:(int)numberDefaulted attributes:(const xmlChar **)attributes {
@autoreleasepool {
if (self.delegateRespondsToStartElementMethod) { if (self.delegateRespondsToStartElementMethod) {
@autoreleasepool {
[self.delegate saxParser:self XMLStartElement:localName prefix:prefix uri:uri numberOfNamespaces:numberOfNamespaces namespaces:namespaces numberOfAttributes:numberOfAttributes numberDefaulted:numberDefaulted attributes:attributes]; [self.delegate saxParser:self XMLStartElement:localName prefix:prefix uri:uri numberOfNamespaces:numberOfNamespaces namespaces:namespaces numberOfAttributes:numberOfAttributes numberDefaulted:numberDefaulted attributes:attributes];
} }
} }
} }
- (void)xmlStartHTMLElement:(const xmlChar *)localName attributes:(const xmlChar **)attributes {
if (self.delegateRespondsToStartElementMethod) {
@autoreleasepool {
[self.delegate saxParser:self XMLStartElement:localName attributes:attributes];
}
}
}
- (void)xmlEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri { - (void)xmlEndElement:(const xmlChar *)localName prefix:(const xmlChar *)prefix uri:(const xmlChar *)uri {
@autoreleasepool { @autoreleasepool {
if (self.delegateRespondsToEndElementMethod) { if (self.delegateRespondsToEndElementMethod) {
[self.delegate saxParser:self XMLEndElement:localName prefix:prefix uri:uri]; [self.delegate saxParser:self XMLEndElement:localName prefix:prefix uri:uri];
} }
[self endStoringCharacters]; [self endStoringCharacters];
} }
} }
- (void)xmlEndHTMLElement:(const xmlChar *)localName {
@autoreleasepool {
if (self.delegateRespondsToEndElementMethod) {
[self.delegate saxParser:self XMLEndElement:localName];
}
[self endStoringCharacters];
}
}
@end @end
static void startElementSAX(void *context, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI, int nb_namespaces, const xmlChar **namespaces, int nb_attributes, int nb_defaulted, const xmlChar **attributes) { static void startElementSAX(void *context, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI, int nb_namespaces, const xmlChar **namespaces, int nb_attributes, int nb_defaulted, const xmlChar **attributes) {
[(__bridge RSSAXParser *)context xmlStartElement:localname prefix:prefix uri:URI numberOfNamespaces:nb_namespaces namespaces:namespaces numberOfAttributes:nb_attributes numberDefaulted:nb_defaulted attributes:attributes]; [(__bridge RSSAXParser *)context xmlStartElement:localname prefix:prefix uri:URI numberOfNamespaces:nb_namespaces namespaces:namespaces numberOfAttributes:nb_attributes numberDefaulted:nb_defaulted attributes:attributes];
} }
static void endElementSAX(void *context, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI) { static void endElementSAX(void *context, const xmlChar *localname, const xmlChar *prefix, const xmlChar *URI) {
[(__bridge RSSAXParser *)context xmlEndElement:localname prefix:prefix uri:URI]; [(__bridge RSSAXParser *)context xmlEndElement:localname prefix:prefix uri:URI];
} }
static void charactersFoundSAX(void *context, const xmlChar *ch, int len) { static void charactersFoundSAX(void *context, const xmlChar *ch, int len) {
[(__bridge RSSAXParser *)context xmlCharactersFound:ch length:(NSUInteger)len]; [(__bridge RSSAXParser *)context xmlCharactersFound:ch length:(NSUInteger)len];
} }
static void endDocumentSAX(void *context) { static void endDocumentSAX(void *context) {
[(__bridge RSSAXParser *)context xmlEndDocument]; [(__bridge RSSAXParser *)context xmlEndDocument];
} }
static void startElementSAX_HTML(void *context, const xmlChar *localname, const xmlChar **attributes) {
[(__bridge RSSAXParser *)context xmlStartHTMLElement:localname attributes:attributes];
}
static void endElementSAX_HTML(void *context, const xmlChar *localname) {
[(__bridge RSSAXParser *)context xmlEndHTMLElement:localname];
}
static xmlSAXHandler saxHandlerStruct = { static xmlSAXHandler saxHandlerStruct = {
nil, /* internalSubset */ nil, /* internalSubset */
@@ -321,8 +385,8 @@ static xmlSAXHandler saxHandlerStruct = {
nil, /* setDocumentLocator */ nil, /* setDocumentLocator */
nil, /* startDocument */ nil, /* startDocument */
endDocumentSAX, /* endDocument */ endDocumentSAX, /* endDocument */
nil, /* startElement*/ startElementSAX_HTML, /* startElement*/
nil, /* endElement */ endElementSAX_HTML, /* endElement */
nil, /* reference */ nil, /* reference */
charactersFoundSAX, /* characters */ charactersFoundSAX, /* characters */
nil, /* ignorableWhitespace */ nil, /* ignorableWhitespace */
@@ -340,13 +404,3 @@ static xmlSAXHandler saxHandlerStruct = {
endElementSAX, /* endElementNs */ endElementSAX, /* endElementNs */
nil /* serror */ nil /* serror */
}; };
void RSSAXInitLibXMLParser(void) {
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
xmlInitParser();
});
}

View File

@@ -1,36 +1,48 @@
// //
// RSXML.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 7/12/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
// General
#import <RSXML/RSSAXParser.h> #import <RSXML/RSXMLError.h>
#import <RSXML/NSString+RSXML.h>
#import <RSXML/RSDateParser.h>
#import <RSXML/RSXMLData.h> #import <RSXML/RSXMLData.h>
#import <RSXML/RSXMLParser.h>
// RSS & Atom Feeds
#import <RSXML/RSFeedParser.h> #import <RSXML/RSFeedParser.h>
#import <RSXML/FeedParser.h>
#import <RSXML/RSAtomParser.h> #import <RSXML/RSAtomParser.h>
#import <RSXML/RSRSSParser.h> #import <RSXML/RSRSSParser.h>
#import <RSXML/RSParsedFeed.h> #import <RSXML/RSParsedFeed.h>
#import <RSXML/RSParsedArticle.h> #import <RSXML/RSParsedArticle.h>
// OPML
#import <RSXML/RSOPMLParser.h> #import <RSXML/RSOPMLParser.h>
#import <RSXML/RSOPMLItem.h> #import <RSXML/RSOPMLItem.h>
#import <RSXML/RSXMLError.h>
#import <RSXML/NSString+RSXML.h>
#import <RSXML/RSDateParser.h>
// HTML // HTML
#import <RSXML/RSSAXHTMLParser.h>
#import <RSXML/RSHTMLMetadataParser.h> #import <RSXML/RSHTMLMetadataParser.h>
#import <RSXML/RSHTMLMetadata.h>
#import <RSXML/RSHTMLLinkParser.h> #import <RSXML/RSHTMLLinkParser.h>
#import <RSXML/RSHTMLMetadata.h>

View File

@@ -1,22 +1,41 @@
// //
// RSXMLData.h // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 8/24/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
#import "RSXMLParser.h"
NS_ASSUME_NONNULL_BEGIN @class RSXMLParser;
@interface RSXMLData : NSObject @interface RSXMLData <__covariant T : RSXMLParser *> : NSObject
@property (nonatomic, readonly, nonnull) NSString *urlString;
@property (nonatomic, readonly, nullable) NSData *data;
@property (nonatomic, readonly, nullable) Class parserClass;
@property (nonatomic, readonly, nullable) NSError *parserError;
- (instancetype)initWithData:(NSData *)data urlString:(NSString *)urlString; - (instancetype)initWithData:(NSData * _Nonnull)data urlString:(NSString * _Nonnull)urlString;
@property (nonatomic, readonly) NSData *data; - (T _Nullable)getParser;
@property (nonatomic, readonly) NSString *urlString; - (BOOL)canParseData;
@end @end
NS_ASSUME_NONNULL_END

View File

@@ -1,28 +1,212 @@
// //
// RSXMLData.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 8/24/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSXMLData.h" #import "RSXMLData.h"
#import "RSXMLError.h"
// Parser classes
#import "RSRSSParser.h"
#import "RSAtomParser.h"
#import "RSOPMLParser.h"
#import "RSHTMLMetadataParser.h"
@implementation RSXMLData @implementation RSXMLData
static const NSUInteger minNumberOfBytesToSearch = 20;
static const NSInteger numberOfCharactersToSearch = 4096;
- (instancetype)initWithData:(NSData *)data urlString:(NSString *)urlString { - (instancetype)initWithData:(NSData *)data urlString:(NSString *)urlString {
self = [super init]; self = [super init];
if (!self) { if (self) {
return nil;
}
_data = data; _data = data;
_urlString = urlString; _urlString = urlString;
_parserError = nil;
_parserClass = [self determineParserClass]; // will set error
if (!_parserClass && _parserError)
_data = nil;
}
return self; return self;
} }
/**
Get location of @c str in data. May be inaccurate since UTF8 uses multi-byte characters.
*/
- (NSInteger)findCString:(const char*)str {
char *foundStr = strnstr(_data.bytes, str, numberOfCharactersToSearch);
if (foundStr == NULL) {
return NSNotFound;
}
return foundStr - (char*)_data.bytes;
}
/**
@return @c YES if any of the provided tags is found within the first 4096 bytes.
*/
- (BOOL)matchAny:(const char*[])tags count:(int)len {
for (int i = 0; i < len; i++) {
if ([self findCString:tags[i]] != NSNotFound) {
return YES;
}
}
return NO;
}
/**
@return @c YES if all of the provided tags are found within the first 4096 bytes.
*/
- (BOOL)matchAll:(const char*[])tags count:(int)len {
for (int i = 0; i < len; i++) {
if ([self findCString:tags[i]] == NSNotFound) {
return NO;
}
}
return YES;
}
/**
Do a fast @c strnstr() search on the @c char* data.
All strings must match exactly and in the same order provided.
*/
- (BOOL)matchAllInCorrectOrder:(const char*[])tags count:(int)len {
NSInteger oldPos = 0;
for (int i = 0; i < len; i++) {
NSInteger newPos = [self findCString:tags[i]];
if (newPos == NSNotFound || newPos < oldPos) {
return NO;
}
oldPos = newPos;
}
return YES;
}
#pragma mark - Determine XML Parser
/**
Try to find the correct parser for the underlying data. Will return @c nil and @c error if couldn't be determined.
@return Parser class: @c RSRSSParser, @c RSAtomParser, @c RSOPMLParser or @c RSHTMLMetadataParser.
*/
- (nullable Class)determineParserClass {
// TODO: check for things like images and movies and return nil.
if (!_data || _data.length < minNumberOfBytesToSearch) {
// TODO: check size, type, etc.
_parserError = RSXMLMakeError(RSXMLErrorNoData);
return nil;
}
if (NSNotFound == [self findCString:"<"]) {
_parserError = RSXMLMakeError(RSXMLErrorMissingLeftCaret);
return nil;
}
if ([self matchAll:(const char*[]){"<rss", "<channel"} count:2]) { // RSS
return [RSRSSParser class];
}
if ([self matchAll:(const char*[]){"<feed", "<entry"} count:2]) { // Atom
return [RSAtomParser class];
}
if (NSNotFound != [self findCString:"<rdf:RDF"]) {
return [RSRSSParser class]; //TODO: parse RDF feeds ... for now, use RSS parser.
}
if ([self matchAll:(const char*[]){"<opml", "<outline"} count:2]) {
return [RSOPMLParser class];
}
if ([self matchAny:(const char*[]){"<html", "<HTML", "<body", "<meta", "doctype html", "DOCTYPE html", "DOCTYPE HTML"} count:7]) {
// Wont catch every single case, which is fine.
return [RSHTMLMetadataParser class];
}
if ([self findCString:"<errors xmlns='http://schemas.google"] != NSNotFound) {
_parserError = RSXMLMakeError(RSXMLErrorContainsXMLErrorsTag);
return nil;
}
// else: try slower NSString conversion and search case insensitive.
return [self determineParserClassSafeAndSlow];
}
/**
Create @c NSString object from @c .data and try to parse it as UTF8 and UTF16.
Then search for each parser if the tags match (case insensitive) in the same order provided.
*/
- (nullable Class)determineParserClassSafeAndSlow {
@autoreleasepool {
NSString *s = [[NSString alloc] initWithBytesNoCopy:(void *)_data.bytes length:_data.length encoding:NSUTF8StringEncoding freeWhenDone:NO];
if (!s) {
s = [[NSString alloc] initWithBytesNoCopy:(void *)_data.bytes length:_data.length encoding:NSUnicodeStringEncoding freeWhenDone:NO];
}
if (!s) {
_parserError = RSXMLMakeError(RSXMLErrorNoSuitableParser);
return nil;
}
NSRange rangeToSearch = NSMakeRange(0, numberOfCharactersToSearch);
if (s.length < numberOfCharactersToSearch) {
rangeToSearch.length = s.length;
}
for (Class parserClass in [self listOfParserClasses]) {
NSArray<const NSString *> *tags = [parserClass parserRequireOrderedTags];
NSUInteger oldPos = 0;
for (NSString *tag in tags) {
NSUInteger newPos = [s rangeOfString:tag options:NSCaseInsensitiveSearch range:rangeToSearch].location;
if (newPos == NSNotFound || newPos < oldPos) {
oldPos = NSNotFound;
break;
}
oldPos = newPos;
}
if (oldPos != NSNotFound) {
return parserClass;
}
}
}
// Try RSS anyway? libxml would return a parsing error
_parserError = RSXMLMakeError(RSXMLErrorNoSuitableParser);
return nil;
}
/// @return List of parsers. @c RSRSSParser, @c RSAtomParser, @c RSOPMLParser.
- (NSArray *)listOfParserClasses {
static NSArray *gParserClasses = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
gParserClasses = @[[RSRSSParser class], [RSAtomParser class], [RSOPMLParser class]];
});
return gParserClasses;
}
#pragma mark - Check Methods to Determine Parser Type
/// @return Kind of @c RSXMLParser or @c nil if no suitable parser found.
- (id)getParser {
return [[_parserClass alloc] initWithXMLData:self];
}
/// @return @c YES if any parser, regardless of type, is suitable.
- (BOOL)canParseData {
return (_parserClass != nil && _parserError == nil);
}
@end @end

View File

@@ -1,21 +1,47 @@
//
// MIT License (MIT)
//
// Copyright (c) 2018 Oleg Geier
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation; @import Foundation;
#import <libxml/xmlerror.h> #import <libxml/xmlerror.h>
extern NSErrorDomain kLIBXMLParserErrorDomain; extern NSErrorDomain const kLIBXMLParserErrorDomain;
extern NSErrorDomain kRSXMLParserErrorDomain; extern NSErrorDomain const kRSXMLParserErrorDomain;
/// Error codes for RSXML error domain @c (kRSXMLParserErrorDomain) /// Error codes for RSXML error domain @c (kRSXMLParserErrorDomain)
typedef NS_ENUM(NSInteger, RSXMLError) { typedef NS_ERROR_ENUM(kRSXMLParserErrorDomain, RSXMLError) {
/// Error codes /// Error codes
RSXMLErrorNoData = 100, // 1xx: general xml parsing error
RSXMLErrorMissingLeftCaret = 110, RSXMLErrorNoData = 110, // input length is less than 20 characters
RSXMLErrorProbablyHTML = 120, RSXMLErrorInputEncoding = 111, // input is not decodable with UTF8 or UTF16 encoding
RSXMLErrorContainsXMLErrorsTag = 130, RSXMLErrorMissingLeftCaret = 120, // input does not contain any '<' character
RSXMLErrorNoSuitableParser = 140, RSXMLErrorContainsXMLErrorsTag = 130, // input contains: "<errors xmlns='http://schemas.google"
RSXMLErrorFileNotOPML = 1024 // original value RSXMLErrorNoSuitableParser = 140, // none of the provided parsers can read the data
// 2xx: xml content <-> parser, mismatch
RSXMLErrorExpectingFeed = 210,
RSXMLErrorExpectingHTML = 220,
RSXMLErrorExpectingOPML = 230
}; };
void RSXMLSetError(NSError **error, RSXMLError code, NSString *filename); NSError * RSXMLMakeError(RSXMLError code);
NSError * RSXMLMakeError(RSXMLError code, NSString *filename); NSError * RSXMLMakeErrorWrongParser(RSXMLError code, RSXMLError expected);
NSError * RSXMLMakeErrorFromLIBXMLError(xmlErrorPtr err); NSError * RSXMLMakeErrorFromLIBXMLError(xmlErrorPtr err);

View File

@@ -1,43 +1,73 @@
//
// MIT License (MIT)
//
// Copyright (c) 2018 Oleg Geier
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSXMLError.h" #import "RSXMLError.h"
NSErrorDomain kLIBXMLParserErrorDomain = @"LIBXMLParserErrorDomain"; const NSErrorDomain kLIBXMLParserErrorDomain = @"LIBXMLParserErrorDomain";
NSErrorDomain kRSXMLParserErrorDomain = @"RSXMLParserErrorDomain"; const NSErrorDomain kRSXMLParserErrorDomain = @"RSXMLParserErrorDomain";
NSString * getErrorMessageForRSXMLError(RSXMLError code, id paramA); const char * parserDescriptionForError(RSXMLError code);
NSString * getErrorMessageForRSXMLError(RSXMLError code, id paramA) { const char * parserDescriptionForError(RSXMLError code) {
switch (code) {
case RSXMLErrorExpectingHTML: return "HTML data";
case RSXMLErrorExpectingOPML: return "OPML data";
case RSXMLErrorExpectingFeed: return "RSS or Atom feed";
default: return "Unknown format";
}
}
NSString * getErrorMessageForRSXMLError(RSXMLError code, RSXMLError expected);
NSString * getErrorMessageForRSXMLError(RSXMLError code, RSXMLError expected) {
switch (code) { // switch statement will warn if an enum value is missing switch (code) { // switch statement will warn if an enum value is missing
case RSXMLErrorNoData: case RSXMLErrorNoData:
return @"Couldn't parse feed. No data available."; return @"Can't parse data. Empty data.";
case RSXMLErrorInputEncoding:
return @"Can't parse data. Input encoding cannot be converted to UTF-8 / UTF-16.";
case RSXMLErrorMissingLeftCaret: case RSXMLErrorMissingLeftCaret:
return @"Couldn't parse feed. Missing left caret character ('<')."; return @"Can't parse XML. Missing left caret character ('<').";
case RSXMLErrorProbablyHTML:
return @"Couldn't parse feed. Expecting XML data but found html data.";
case RSXMLErrorContainsXMLErrorsTag: case RSXMLErrorContainsXMLErrorsTag:
return @"Couldn't parse feed. XML contains 'errors' tag."; return @"Can't parse XML. XML contains 'errors' tag.";
case RSXMLErrorNoSuitableParser: case RSXMLErrorNoSuitableParser:
return @"Couldn't parse feed. No suitable parser found. XML document not well-formed."; return @"Can't parse XML. No suitable parser found. Document not well-formed?";
case RSXMLErrorFileNotOPML: case RSXMLErrorExpectingHTML:
if (paramA) { case RSXMLErrorExpectingOPML:
return [NSString stringWithFormat:@"The file %@ can't be parsed because it's not an OPML file.", paramA]; case RSXMLErrorExpectingFeed:
} return [NSString stringWithFormat:@"Can't parse XML. %s expected, but %s found.",
return @"The file can't be parsed because it's not an OPML file."; parserDescriptionForError(code), parserDescriptionForError(expected)];
} }
} }
void RSXMLSetError(NSError **error, RSXMLError code, NSString *filename) { NSError * RSXMLMakeError(RSXMLError code) {
if (error) { return RSXMLMakeErrorWrongParser(code, RSXMLErrorNoData);
*error = RSXMLMakeError(code, filename);
}
} }
NSError * RSXMLMakeError(RSXMLError code, NSString *filename) { NSError * RSXMLMakeErrorWrongParser(RSXMLError code, RSXMLError expected) {
return [NSError errorWithDomain:kRSXMLParserErrorDomain code:code return [NSError errorWithDomain:kRSXMLParserErrorDomain code:code
userInfo:@{NSLocalizedDescriptionKey: getErrorMessageForRSXMLError(code, nil)}]; userInfo:@{NSLocalizedDescriptionKey: getErrorMessageForRSXMLError(code, expected)}];
} }
NSError * RSXMLMakeErrorFromLIBXMLError(xmlErrorPtr err) { NSError * RSXMLMakeErrorFromLIBXMLError(xmlErrorPtr err) {
if (err) { if (err && err->level == XML_ERR_FATAL) {
int errCode = err->code; int errCode = err->code;
char * msg = err->message; char * msg = err->message;
//if (err->level == XML_ERR_FATAL) //if (err->level == XML_ERR_FATAL)

View File

@@ -1,31 +0,0 @@
//
// RSXMLInternal.h
// RSXML
//
// Created by Brent Simmons on 12/26/16.
// Copyright © 2016 Ranchero Software, LLC. All rights reserved.
//
@import Foundation;
NS_ASSUME_NONNULL_BEGIN
BOOL RSXMLIsEmpty(id _Nullable obj);
BOOL RSXMLStringIsEmpty(NSString * _Nullable s);
@interface NSString (RSXMLInternal)
- (NSString *)rsxml_md5HashString;
@end
@interface NSDictionary (RSXMLInternal)
- (nullable id)rsxml_objectForCaseInsensitiveKey:(NSString *)key;
@end
NS_ASSUME_NONNULL_END

View File

@@ -1,83 +0,0 @@
//
// RSXMLInternal.m
// RSXML
//
// Created by Brent Simmons on 12/26/16.
// Copyright © 2016 Ranchero Software, LLC. All rights reserved.
//
#import <CommonCrypto/CommonDigest.h>
#import "RSXMLInternal.h"
static BOOL RSXMLIsNil(id obj) {
return obj == nil || obj == [NSNull null];
}
BOOL RSXMLIsEmpty(id obj) {
if (RSXMLIsNil(obj)) {
return YES;
}
if ([obj respondsToSelector:@selector(count)]) {
return [obj count] < 1;
}
if ([obj respondsToSelector:@selector(length)]) {
return [obj length] < 1;
}
return NO; /*Shouldn't get here very often.*/
}
BOOL RSXMLStringIsEmpty(NSString *s) {
return RSXMLIsNil(s) || s.length < 1;
}
@implementation NSString (RSXMLInternal)
- (NSData *)rsxml_md5Hash {
NSData *data = [self dataUsingEncoding:NSUTF8StringEncoding];
unsigned char hash[CC_MD5_DIGEST_LENGTH];
CC_MD5(data.bytes, (CC_LONG)data.length, hash);
return [NSData dataWithBytes:(const void *)hash length:CC_MD5_DIGEST_LENGTH];
}
- (NSString *)rsxml_md5HashString {
NSData *md5Data = [self rsxml_md5Hash];
const Byte *bytes = md5Data.bytes;
return [NSString stringWithFormat:@"%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x", bytes[0], bytes[1], bytes[2], bytes[3], bytes[4], bytes[5], bytes[6], bytes[7], bytes[8], bytes[9], bytes[10], bytes[11], bytes[12], bytes[13], bytes[14], bytes[15]];
}
@end
@implementation NSDictionary (RSXMLInternal)
- (nullable id)rsxml_objectForCaseInsensitiveKey:(NSString *)key {
id obj = self[key];
if (obj) {
return obj;
}
for (NSString *oneKey in self.allKeys) {
if ([oneKey isKindOfClass:[NSString class]] && [key caseInsensitiveCompare:oneKey] == NSOrderedSame) {
return self[oneKey];
}
}
return nil;
}
@end

69
RSXML/RSXMLParser.h Normal file
View File

@@ -0,0 +1,69 @@
//
// MIT License (MIT)
//
// Copyright (c) 2018 Oleg Geier
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import Foundation;
#import "RSSAXParser.h"
#define EqualBytes(bytes1, bytes2, length) (memcmp(bytes1, bytes2, length) == 0)
//#define EqualBytes(bytes1, bytes2, length) (!strncmp(bytes1, bytes2, length))
@class RSXMLData;
@protocol RSXMLParserDelegate <NSObject>
@optional
/**
A subclass may return a list of tags that the data @c (RSXMLData) should include.
Only if all strings are found (in correct order) the parser will be selected.
@note This method will only be called if the original data has some weird encoding.
@c RSXMLData will first try to convert the data to an @c UTF8 string, then @c UTF16.
If both conversions fail the parser will be deemed as not suitable for this data.
*/
+ (NSArray<const NSString *> *)parserRequireOrderedTags;
/// @return Return @c NO to cancel parsing before it even started. E.g. check if parser is of correct type.
- (BOOL)xmlParserWillStartParsing;
@required
/// @return @c YES if parser supports parsing feeds (RSS or Atom).
+ (BOOL)isFeedParser;
/// @return @c YES if parser supports parsing OPML files.
+ (BOOL)isOPMLParser;
/// @return @c YES if parser supports parsing HTML files.
+ (BOOL)isHTMLParser;
/// Keeps an internal pointer to the @c RSXMLData and initializes a new @c RSSAXParser.
- (instancetype)initWithXMLData:(RSXMLData * _Nonnull)xmlData;
/// Will be called after the parsing is finished. @return Reference to parsed object.
- (id)xmlParserWillReturnDocument;
@end
@interface RSXMLParser<__covariant T> : NSObject <RSXMLParserDelegate, RSSAXParserDelegate>
@property (nonatomic, readonly, nonnull, copy) NSString *documentURI;
- (T _Nullable)parseSync:(NSError ** _Nullable)error;
- (void)parseAsync:(void(^)(T _Nullable parsedDocument, NSError * _Nullable error))block;
- (BOOL)canParse;
@end

143
RSXML/RSXMLParser.m Normal file
View File

@@ -0,0 +1,143 @@
//
// MIT License (MIT)
//
// Copyright (c) 2018 Oleg Geier
//
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import "RSXMLParser.h"
#import "RSXMLData.h"
#import "RSXMLError.h"
@interface RSXMLParser()
@property (nonatomic) RSSAXParser *parser;
@property (nonatomic) NSData *xmlData;
@property (nonatomic, copy) NSError *xmlInputError;
@end
@implementation RSXMLParser
+ (BOOL)isFeedParser { return NO; } // override
+ (BOOL)isOPMLParser { return NO; } // override
+ (BOOL)isHTMLParser { return NO; } // override
- (id)xmlParserWillReturnDocument { return nil; } // override
/**
Designated initializer. Runs a check whether it matches the detected parser in @c RSXMLData.
*/
- (instancetype)initWithXMLData:(nonnull RSXMLData *)xmlData {
self = [super init];
if (self) {
_documentURI = [xmlData.urlString copy];
_xmlInputError = [xmlData.parserError copy];
[self checkIfParserMatches:xmlData.parserClass];
_xmlData = xmlData.data;
if (!_xmlData) {
_xmlInputError = RSXMLMakeError(RSXMLErrorNoData);
}
_parser = [[RSSAXParser alloc] initWithDelegate:self];
}
return self;
}
/**
Parse the XML data on whatever thread this method is called.
@param error Sets @c error if parser gets unrecognized data or libxml runs into a parsing error.
@return The parsed object. The object type depends on the underlying data. @c RSParsedFeed, @c RSOPMLItem or @c RSHTMLMetadata.
*/
- (id _Nullable)parseSync:(NSError **)error {
if (_xmlInputError) {
if (error) *error = _xmlInputError;
return nil;
}
if ([self respondsToSelector:@selector(xmlParserWillStartParsing)] && ![self xmlParserWillStartParsing])
return nil;
@autoreleasepool {
xmlResetLastError();
[_parser parseBytes:_xmlData.bytes numberOfBytes:_xmlData.length];
if (error) {
*error = RSXMLMakeErrorFromLIBXMLError(xmlGetLastError());
xmlResetLastError();
}
}
return [self xmlParserWillReturnDocument];
}
/**
Dispatch new background thread, parse the data synchroniously on the background thread and exec callback on the main thread.
*/
- (void)parseAsync:(void(^)(id parsedDocument, NSError *error))block {
dispatch_async(dispatch_get_global_queue(QOS_CLASS_UTILITY, 0), ^{ // QOS_CLASS_DEFAULT
@autoreleasepool {
NSError *error;
id obj = [self parseSync:&error];
dispatch_async(dispatch_get_main_queue(), ^{
block(obj, error);
});
}
});
}
/// @return @c YES if @c .xmlInputError is not @c nil.
- (BOOL)canParse {
return (self.xmlInputError != nil);
}
#pragma mark - Check Parser Type Matches
/**
@return Returns either @c ExpectingFeed, @c ExpectingOPML, @c ExpectingHTML.
@return @c RSXMLErrorNoData for an unexpected class (e.g., if @c RSXMLParser is used directly).
*/
- (RSXMLError)getExpectedErrorForClass:(Class<RSXMLParserDelegate>)cls {
if ([cls isFeedParser])
return RSXMLErrorExpectingFeed;
if ([cls isOPMLParser])
return RSXMLErrorExpectingOPML;
if ([cls isHTMLParser])
return RSXMLErrorExpectingHTML;
return RSXMLErrorNoData; // will result in 'Unknown format'
}
/**
Check whether parsing class matches the expected parsing class. If not set @c .xmlInputError along the way.
@return @c YES if @c parserClass matches, @c NO otherwise. If @c NO is returned, @c parserError is set also.
*/
- (BOOL)checkIfParserMatches:(Class<RSXMLParserDelegate>)xmlParserClass {
if (!xmlParserClass)
return NO;
if (xmlParserClass != [self class]) { // && !_xmlInputError
RSXMLError current = [self getExpectedErrorForClass:[self class]];
RSXMLError expected = [self getExpectedErrorForClass:xmlParserClass];
if (current != expected) {
_xmlInputError = RSXMLMakeErrorWrongParser(current, expected);
return NO;
}
}
return YES; // only if no error was set (not now, nor before)
}
@end

View File

@@ -1,10 +1,25 @@
// //
// RSDateParserTests.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 12/26/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved.
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <XCTest/XCTest.h> #import <XCTest/XCTest.h>
@import RSXML; @import RSXML;

View File

@@ -1,10 +1,25 @@
// //
// RSEntityTests.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 12/26/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved.
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <XCTest/XCTest.h> #import <XCTest/XCTest.h>
@import RSXML; @import RSXML;
@@ -15,7 +30,6 @@
@implementation RSEntityTests @implementation RSEntityTests
- (void)testInnerAmpersand { - (void)testInnerAmpersand {
NSString *expectedResult = @"A&P"; NSString *expectedResult = @"A&P";

View File

@@ -1,13 +1,29 @@
// //
// RSHTMLTests.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 3/5/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
@import RSXML;
#import <XCTest/XCTest.h> #import <XCTest/XCTest.h>
@import RSXML;
@interface RSHTMLTests : XCTestCase @interface RSHTMLTests : XCTestCase
@@ -15,169 +31,109 @@
@implementation RSHTMLTests @implementation RSHTMLTests
+ (NSArray<XCTPerformanceMetric> *)defaultPerformanceMetrics {
return @[XCTPerformanceMetric_WallClockTime, @"com.apple.XCTPerformanceMetric_TotalHeapAllocationsKilobytes"];
}
+ (RSXMLData *)xmlData:(NSString *)title urlString:(NSString *)urlString { - (RSXMLData *)xmlData:(NSString *)title urlString:(NSString *)urlString {
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:title ofType:@"html" inDirectory:@"Resources"]; NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:title ofType:@"html" inDirectory:@"Resources"];
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; return [[RSXMLData alloc] initWithData:[[NSData alloc] initWithContentsOfFile:s] urlString:urlString];
return [[RSXMLData alloc] initWithData:d urlString:urlString];
} }
+ (RSXMLData *)daringFireballData {
static RSXMLData *xmlData = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
xmlData = [self xmlData:@"DaringFireball" urlString:@"http://daringfireball.net/"];
});
return xmlData;
}
+ (RSXMLData *)furboData {
static RSXMLData *xmlData = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
xmlData = [self xmlData:@"furbo" urlString:@"http://furbo.org/"];
});
return xmlData;
}
+ (RSXMLData *)inessentialData {
static RSXMLData *xmlData = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
xmlData = [self xmlData:@"inessential" urlString:@"http://inessential.com/"];
});
return xmlData;
}
+ (RSXMLData *)sixcolorsData {
static RSXMLData *xmlData = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
xmlData = [self xmlData:@"sixcolors" urlString:@"https://sixcolors.com/"];
});
return xmlData;
}
- (void)testDaringFireball { - (void)testDaringFireball {
RSXMLData *xmlData = [[self class] daringFireballData]; RSXMLData *xmlData = [self xmlData:@"DaringFireball" urlString:@"http://daringfireball.net/"];
RSHTMLMetadata *metadata = [RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; XCTAssertTrue([xmlData.parserClass isHTMLParser]);
RSHTMLMetadataParser *parser = [[RSHTMLMetadataParser alloc] initWithXMLData:xmlData];
NSError *error;
RSHTMLMetadata *metadata = [parser parseSync:&error];
XCTAssertNil(error);
XCTAssertEqualObjects(metadata.faviconLink, @"http://daringfireball.net/graphics/favicon.ico?v=005"); XCTAssertEqualObjects(metadata.faviconLink, @"http://daringfireball.net/graphics/favicon.ico?v=005");
XCTAssertTrue(metadata.feedLinks.count == 1); XCTAssertTrue(metadata.feedLinks.count == 1);
RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0]; RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0];
XCTAssertNil(feedLink.title); XCTAssertNil(feedLink.title);
XCTAssertEqualObjects(feedLink.type, @"application/atom+xml"); XCTAssertEqual(feedLink.type, RSFeedTypeAtom);
XCTAssertEqualObjects(feedLink.urlString, @"http://daringfireball.net/feeds/main"); XCTAssertEqualObjects(feedLink.link, @"http://daringfireball.net/feeds/main");
}
- (void)testDaringFireballPerformance {
RSXMLData *xmlData = [[self class] daringFireballData];
[self measureBlock:^{ [self measureBlock:^{
(void)[RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; for (int i = 0; i < 10; i++)
[parser parseSync:nil];
}]; }];
} }
- (void)testFurbo { - (void)testFurbo {
RSXMLData *xmlData = [[self class] furboData]; RSXMLData *xmlData = [self xmlData:@"furbo" urlString:@"http://furbo.org/"];
RSHTMLMetadata *metadata = [RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; XCTAssertTrue([xmlData.parserClass isHTMLParser]);
RSHTMLMetadataParser *parser = [[RSHTMLMetadataParser alloc] initWithXMLData:xmlData];
NSError *error;
RSHTMLMetadata *metadata = [parser parseSync:&error];
XCTAssertNil(error);
XCTAssertEqualObjects(metadata.faviconLink, @"http://furbo.org/favicon.ico"); XCTAssertEqualObjects(metadata.faviconLink, @"http://furbo.org/favicon.ico");
XCTAssertTrue(metadata.feedLinks.count == 1); XCTAssertTrue(metadata.feedLinks.count == 1);
RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0]; RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0];
XCTAssertEqualObjects(feedLink.title, @"Iconfactory News Feed"); XCTAssertEqualObjects(feedLink.title, @"Iconfactory News Feed");
XCTAssertEqualObjects(feedLink.type, @"application/rss+xml"); XCTAssertEqual(feedLink.type, RSFeedTypeRSS);
}
- (void)testFurboPerformance {
RSXMLData *xmlData = [[self class] furboData];
[self measureBlock:^{ [self measureBlock:^{
(void)[RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; for (int i = 0; i < 10; i++)
[parser parseSync:nil];
}]; }];
} }
- (void)testInessential { - (void)testInessential {
RSXMLData *xmlData = [[self class] inessentialData]; RSXMLData *xmlData = [self xmlData:@"inessential" urlString:@"http://inessential.com/"];
RSHTMLMetadata *metadata = [RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; XCTAssertTrue([xmlData.parserClass isHTMLParser]);
RSHTMLMetadataParser *parser = [[RSHTMLMetadataParser alloc] initWithXMLData:xmlData];
NSError *error;
RSHTMLMetadata *metadata = [parser parseSync:&error];
XCTAssertNil(error);
XCTAssertNil(metadata.faviconLink); XCTAssertNil(metadata.faviconLink);
XCTAssertTrue(metadata.feedLinks.count == 1); XCTAssertTrue(metadata.feedLinks.count == 1);
RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0]; RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0];
XCTAssertEqualObjects(feedLink.title, @"RSS"); XCTAssertEqualObjects(feedLink.title, @"RSS");
XCTAssertEqualObjects(feedLink.type, @"application/rss+xml"); XCTAssertEqual(feedLink.type, RSFeedTypeRSS);
XCTAssertEqualObjects(feedLink.urlString, @"http://inessential.com/xml/rss.xml"); XCTAssertEqualObjects(feedLink.link, @"http://inessential.com/xml/rss.xml");
XCTAssertEqual(metadata.appleTouchIcons.count, 0u); XCTAssertEqual(metadata.iconLinks.count, 0u);
}
- (void)testInessentialPerformance {
RSXMLData *xmlData = [[self class] inessentialData];
[self measureBlock:^{ [self measureBlock:^{
(void)[RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; for (int i = 0; i < 10; i++)
[parser parseSync:nil];
}]; }];
} }
- (void)testSixcolors { - (void)testSixcolors {
RSXMLData *xmlData = [[self class] sixcolorsData]; RSXMLData *xmlData = [self xmlData:@"sixcolors" urlString:@"https://sixcolors.com/"];
RSHTMLMetadata *metadata = [RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; XCTAssertTrue([xmlData.parserClass isHTMLParser]);
RSHTMLMetadataParser *parser = [[RSHTMLMetadataParser alloc] initWithXMLData:xmlData];
NSError *error;
RSHTMLMetadata *metadata = [parser parseSync:&error];
XCTAssertNil(error);
XCTAssertEqualObjects(metadata.faviconLink, @"https://sixcolors.com/images/favicon.ico"); XCTAssertEqualObjects(metadata.faviconLink, @"https://sixcolors.com/images/favicon.ico");
XCTAssertTrue(metadata.feedLinks.count == 1); XCTAssertTrue(metadata.feedLinks.count == 1);
RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0]; RSHTMLMetadataFeedLink *feedLink = metadata.feedLinks[0];
XCTAssertEqualObjects(feedLink.title, @"RSS"); XCTAssertEqualObjects(feedLink.title, @"RSS");
XCTAssertEqualObjects(feedLink.type, @"application/rss+xml"); XCTAssertEqual(feedLink.type, RSFeedTypeRSS);
XCTAssertEqualObjects(feedLink.urlString, @"http://feedpress.me/sixcolors"); XCTAssertEqualObjects(feedLink.link, @"http://feedpress.me/sixcolors");
XCTAssertEqual(metadata.appleTouchIcons.count, 6u); XCTAssertEqual(metadata.iconLinks.count, 6u);
RSHTMLMetadataAppleTouchIcon *icon = metadata.appleTouchIcons[3]; RSHTMLMetadataIconLink *icon = metadata.iconLinks[3];
XCTAssertEqualObjects(icon.rel, @"apple-touch-icon"); XCTAssertEqualObjects(icon.title, @"apple-touch-icon");
XCTAssertEqualObjects(icon.sizes, @"120x120"); XCTAssertEqualObjects(icon.sizes, @"120x120");
XCTAssertEqualObjects(icon.urlString, @"https://sixcolors.com/apple-touch-icon-120.png"); XCTAssertEqual([icon getSize].width, 120);
} XCTAssertEqualObjects(icon.link, @"https://sixcolors.com/apple-touch-icon-120.png");
- (void)testSixcolorsPerformance {
RSXMLData *xmlData = [[self class] sixcolorsData];
[self measureBlock:^{ [self measureBlock:^{
(void)[RSHTMLMetadataParser HTMLMetadataWithXMLData:xmlData]; for (int i = 0; i < 10; i++)
[parser parseSync:nil];
}]; }];
} }
@@ -185,32 +141,35 @@
- (void)testSixColorsLinks { - (void)testSixColorsLinks {
RSXMLData *xmlData = [[self class] sixcolorsData]; RSXMLData *xmlData = [self xmlData:@"sixcolors" urlString:@"https://sixcolors.com/"];
NSArray *links = [RSHTMLLinkParser htmlLinksWithData:xmlData]; XCTAssertTrue([xmlData.parserClass isHTMLParser]);
RSHTMLLinkParser *parser = [[RSHTMLLinkParser alloc] initWithXMLData:xmlData];
NSString *linkToFind = @"https://www.theincomparable.com/theincomparable/290/index.php"; NSError *error;
NSString *textToFind = @"this weeks episode of The Incomparable"; NSArray<RSHTMLMetadataAnchor*> *links = [parser parseSync:&error];
XCTAssertNil(error);
BOOL found = NO; BOOL found = NO;
for (RSHTMLLink *oneLink in links) { for (RSHTMLMetadataAnchor *oneLink in links) {
if ([oneLink.title isEqualToString:@"this weeks episode of The Incomparable"] &&
if ([oneLink.urlString isEqualToString:linkToFind] && [oneLink.text isEqualToString:textToFind]) { [oneLink.link isEqualToString:@"https://www.theincomparable.com/theincomparable/290/index.php"])
{
found = YES; found = YES;
break; break;
} }
} }
// item No 11 to ensure .text removes <em></em>
XCTAssertEqualObjects(links[11].title, @"Podcasting");
XCTAssertEqualObjects(links[11].link, @"https://sixcolors.com/topic/podcasting/");
// item No. 18 & 19 to ensure '<a>Topics</a>' is skipped
XCTAssertEqualObjects(links[18].title, @"Podcasts");
XCTAssertEqualObjects(links[18].link, @"https://sixcolors.com/podcasts/");
XCTAssertEqualObjects(links[19].title, @"Gift Guide");
XCTAssertEqualObjects(links[19].link, @"https://sixcolors.com/topic/giftguide/");
XCTAssertTrue(found, @"Expected link should have been found."); XCTAssertTrue(found, @"Expected link should have been found.");
XCTAssertEqual(links.count, 131u, @"Expected 131 links."); XCTAssertEqual(links.count, 130u, @"Expected 130 links.");
}
- (void)testSixColorsLinksPerformance {
RSXMLData *xmlData = [[self class] sixcolorsData];
[self measureBlock:^{ [self measureBlock:^{
(void)[RSHTMLLinkParser htmlLinksWithData:xmlData]; [parser parseSync:nil];
}]; }];
} }

View File

@@ -1,10 +1,26 @@
// //
// RSOPMLTests.m // MIT License (MIT)
// RSXML
// //
// Created by Brent Simmons on 2/28/16. // Copyright (c) 2016 Brent Simmons
// Copyright © 2016 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <XCTest/XCTest.h> #import <XCTest/XCTest.h>
@import RSXML; @import RSXML;
@@ -15,63 +31,62 @@
@implementation RSOPMLTests @implementation RSOPMLTests
+ (RSXMLData *)subsData { + (NSArray<XCTPerformanceMetric> *)defaultPerformanceMetrics {
return @[XCTPerformanceMetric_WallClockTime, @"com.apple.XCTPerformanceMetric_TotalHeapAllocationsKilobytes"];
}
static RSXMLData *xmlData = nil; - (RSXMLData*)xmlFile:(NSString*)name extension:(NSString*)ext {
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:name ofType:ext inDirectory:@"Resources"];
static dispatch_once_t onceToken; if (s == nil) return nil;
dispatch_once(&onceToken, ^{
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"Subs" ofType:@"opml" inDirectory:@"Resources"];
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; NSData *d = [[NSData alloc] initWithContentsOfFile:s];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://example.org/"]; return [[RSXMLData alloc] initWithData:d urlString:[NSString stringWithFormat:@"%@.%@", name, ext]];
});
return xmlData;
} }
- (void)testNotOPML { - (void)testNotOPML {
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"DaringFireball" ofType:@"rss" inDirectory:@"Resources"]; NSError *error;
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; RSXMLData *xmlData = [self xmlFile:@"DaringFireball" extension:@"atom"];
RSXMLData *xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://example.org/"]; XCTAssertNotEqualObjects(xmlData.parserClass, [RSOPMLParser class]);
XCTAssertNil(xmlData.parserError);
RSOPMLParser *parser = [[RSOPMLParser alloc] initWithXMLData:xmlData]; RSOPMLParser *parser = [[RSOPMLParser alloc] initWithXMLData:xmlData];
XCTAssertNotNil(parser.error); RSOPMLItem *document = [parser parseSync:&error];
XCTAssert(parser.error.code == RSXMLErrorFileNotOPML); XCTAssertNil(document);
XCTAssert([parser.error.domain isEqualTo:kRSXMLParserErrorDomain]); XCTAssertNotNil(error);
XCTAssertEqual(error.code, RSXMLErrorExpectingOPML);
XCTAssertEqualObjects(error.domain, kRSXMLParserErrorDomain);
xmlData = [[RSXMLData alloc] initWithData:[[NSData alloc] initWithContentsOfFile:@"/System/Library/Kernels/kernel"]
urlString:@"/System/Library/Kernels/kernel"];
XCTAssertNotNil(xmlData.parserError);
XCTAssert(xmlData.parserError.code == RSXMLErrorMissingLeftCaret);
RSXMLParser *parser2 = [xmlData getParser];
XCTAssertNil(parser2);
XCTAssertNotNil(xmlData.parserError);
XCTAssert(xmlData.parserError.code == RSXMLErrorMissingLeftCaret); // error should not be overwritten
d = [[NSData alloc] initWithContentsOfFile:@"/System/Library/Kernels/kernel"];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"/System/Library/Kernels/kernel"];
parser = [[RSOPMLParser alloc] initWithXMLData:xmlData];
XCTAssertNotNil(parser.error);
} }
- (void)testSubsPerformance {
RSXMLData *xmlData = [[self class] subsData];
[self measureBlock:^{
(void)[[RSOPMLParser alloc] initWithXMLData:xmlData];
}];
}
- (void)testSubsStructure { - (void)testSubsStructure {
RSXMLData *xmlData = [[self class] subsData]; RSXMLData<RSOPMLParser*> *xmlData = [self xmlFile:@"Subs" extension:@"opml"];
XCTAssertEqualObjects(xmlData.parserClass, [RSOPMLParser class]);
RSOPMLParser *parser = [[RSOPMLParser alloc] initWithXMLData:xmlData]; NSError *error;
XCTAssertNotNil(parser); RSOPMLParser *parser = [xmlData getParser];
RSOPMLItem *document = [parser parseSync:&error];
RSOPMLItem *document = parser.opmlDocument;
XCTAssertNotNil(document); XCTAssertNotNil(document);
XCTAssert([document.displayName isEqualToString:@"Subs"]); XCTAssertEqualObjects(document.displayName, @"Subs");
XCTAssert([document.children.firstObject.displayName isEqualToString:@"Daring Fireball"]); XCTAssertEqualObjects(document.children.firstObject.displayName, @"Daring Fireball");
XCTAssert([document.children.lastObject.displayName isEqualToString:@"Writers"]); XCTAssertEqualObjects(document.children.lastObject.displayName, @"Writers");
XCTAssert([document.children.lastObject.children.lastObject.displayName isEqualToString:@"Gerrold"]); XCTAssertEqualObjects(document.children.lastObject.children.lastObject.displayName, @"Gerrold");
[self checkStructureForOPMLItem:document isRoot:YES]; [self checkStructureForOPMLItem:document isRoot:YES];
//NSLog(@"\n%@", [document recursiveDescription]); //NSLog(@"\n%@", [document recursiveDescription]);
[self measureBlock:^{
[parser parseSync:nil];
}];
} }
- (void)checkStructureForOPMLItem:(RSOPMLItem *)item isRoot:(BOOL)root { - (void)checkStructureForOPMLItem:(RSOPMLItem *)item isRoot:(BOOL)root {
@@ -98,5 +113,4 @@
} }
} }
@end @end

View File

@@ -1,10 +1,26 @@
// //
// RSXMLTests.m // MIT License (MIT)
// RSXMLTests
// //
// Created by Brent Simmons on 7/12/15. // Copyright (c) 2016 Brent Simmons
// Copyright © 2015 Ranchero Software, LLC. All rights reserved. // Copyright (c) 2018 Oleg Geier
// //
// Permission is hereby granted, free of charge, to any person obtaining a copy of
// this software and associated documentation files (the "Software"), to deal in
// the Software without restriction, including without limitation the rights to
// use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
// of the Software, and to permit persons to whom the Software is furnished to do
// so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#import <XCTest/XCTest.h> #import <XCTest/XCTest.h>
@import RSXML; @import RSXML;
@@ -15,226 +31,253 @@
@implementation RSXMLTests @implementation RSXMLTests
+ (RSXMLData *)oftData { /** @see https://indiestack.com/2018/02/xcodes-secret-performance-tests/
static RSXMLData *xmlData = nil; "com.apple.XCTPerformanceMetric_WallClockTime"
"com.apple.XCTPerformanceMetric_UserTime"
static dispatch_once_t onceToken; "com.apple.XCTPerformanceMetric_RunTime"
dispatch_once(&onceToken, ^{ "com.apple.XCTPerformanceMetric_SystemTime"
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"OneFootTsunami" ofType:@"atom" inDirectory:@"Resources"]; "com.apple.XCTPerformanceMetric_HighWaterMarkForHeapAllocations"
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; "com.apple.XCTPerformanceMetric_HighWaterMarkForVMAllocations"
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://onefoottsunami.com/"]; "com.apple.XCTPerformanceMetric_PersistentHeapAllocations"
}); "com.apple.XCTPerformanceMetric_PersistentHeapAllocationsNodes"
"com.apple.XCTPerformanceMetric_PersistentVMAllocations"
return xmlData; "com.apple.XCTPerformanceMetric_TemporaryHeapAllocationsKilobytes"
"com.apple.XCTPerformanceMetric_TotalHeapAllocationsKilobytes"
"com.apple.XCTPerformanceMetric_TransientHeapAllocationsKilobytes"
"com.apple.XCTPerformanceMetric_TransientHeapAllocationsNodes"
"com.apple.XCTPerformanceMetric_TransientVMAllocationsKilobytes"
*/
+ (NSArray<XCTPerformanceMetric> *)defaultPerformanceMetrics {
return @[XCTPerformanceMetric_WallClockTime, @"com.apple.XCTPerformanceMetric_TotalHeapAllocationsKilobytes"];
} }
// http://onefoottsunami.com/
// http://scripting.com/
// http://manton.org/
// http://daringfireball.net/
// http://katiefloyd.com/
// https://medium.com/@emarley
+ (RSXMLData *)scriptingNewsData { - (RSXMLData*)xmlFile:(NSString*)name extension:(NSString*)ext {
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:name ofType:ext inDirectory:@"Resources"];
static RSXMLData *xmlData = nil; if (s == nil) return nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"scriptingNews" ofType:@"rss" inDirectory:@"Resources"];
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; NSData *d = [[NSData alloc] initWithContentsOfFile:s];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://scripting.com/"]; return [[RSXMLData alloc] initWithData:d urlString:[NSString stringWithFormat:@"%@.%@", name, ext]];
});
return xmlData;
} }
- (RSFeedParser*)parserForFile:(NSString*)name extension:(NSString*)ext expect:(Class)cls {
+ (RSXMLData *)mantonData { RSXMLData<RSFeedParser*> *xmlData = [self xmlFile:name extension:ext];
XCTAssertEqual(xmlData.parserClass, cls);
static RSXMLData *xmlData = nil; return [xmlData getParser];
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"manton" ofType:@"rss" inDirectory:@"Resources"];
NSData *d = [[NSData alloc] initWithContentsOfFile:s];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://manton.org/"];
});
return xmlData;
} }
#pragma mark - Completeness Tests
+ (RSXMLData *)daringFireballData { - (void)testAsync {
RSXMLData *xmlData = [self xmlFile:@"OneFootTsunami" extension:@"atom"];
[[xmlData getParser] parseAsync:^(RSParsedFeed *parsedDocument, NSError *error) {
XCTAssertEqualObjects(parsedDocument.title, @"One Foot Tsunami");
XCTAssertEqualObjects(parsedDocument.subtitle, @"Slightly less disappointing than it sounds");
XCTAssertEqualObjects(parsedDocument.link, @"http://onefoottsunami.com");
XCTAssertEqual(parsedDocument.articles.count, 25u);
static RSXMLData *xmlData = nil; RSParsedArticle *a = parsedDocument.articles.firstObject;
XCTAssertEqualObjects(a.title, @"Link: Pillow Fight Leaves 24 Concussed");
static dispatch_once_t onceToken; XCTAssertEqualObjects(a.link, @"http://www.nytimes.com/2015/09/05/us/at-west-point-annual-pillow-fight-becomes-weaponized.html?mwrsm=Email&_r=1&pagewanted=all");
dispatch_once(&onceToken, ^{ XCTAssertEqualObjects(a.guid, @"http://onefoottsunami.com/?p=14863");
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"DaringFireball" ofType:@"rss" inDirectory:@"Resources"]; XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1441722101]); // 2015-09-08T14:21:41Z
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; }];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://daringfireball.net/"];
});
return xmlData;
} }
+ (RSXMLData *)katieFloydData {
static RSXMLData *xmlData = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"KatieFloyd" ofType:@"rss" inDirectory:@"Resources"];
NSData *d = [[NSData alloc] initWithContentsOfFile:s];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"http://katiefloyd.com/"];
});
return xmlData;
}
+ (RSXMLData *)eMarleyData {
static RSXMLData *xmlData = nil;
static dispatch_once_t onceToken;
dispatch_once(&onceToken, ^{
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:@"EMarley" ofType:@"rss" inDirectory:@"Resources"];
NSData *d = [[NSData alloc] initWithContentsOfFile:s];
xmlData = [[RSXMLData alloc] initWithData:d urlString:@"https://medium.com/@emarley"];
});
return xmlData;
}
- (void)testOneFootTsunami { - (void)testOneFootTsunami {
RSXMLData *xmlData = [self xmlFile:@"OneFootTsunami" extension:@"atom"];
XCTAssertEqual(xmlData.parserClass, [RSAtomParser class]);
NSError *error = nil; NSError *error = nil;
RSXMLData *xmlData = [[self class] oftData]; RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error); XCTAssertEqualObjects(parsedFeed.title, @"One Foot Tsunami");
NSLog(@"parsedFeed: %@", parsedFeed); XCTAssertEqualObjects(parsedFeed.subtitle, @"Slightly less disappointing than it sounds");
} XCTAssertEqualObjects(parsedFeed.link, @"http://onefoottsunami.com");
XCTAssertEqual(parsedFeed.articles.count, 25u);
RSParsedArticle *a = parsedFeed.articles.firstObject;
- (void)testOFTPerformance { XCTAssertEqualObjects(a.title, @"Link: Pillow Fight Leaves 24 Concussed");
XCTAssertEqualObjects(a.link, @"http://www.nytimes.com/2015/09/05/us/at-west-point-annual-pillow-fight-becomes-weaponized.html?mwrsm=Email&_r=1&pagewanted=all");
RSXMLData *xmlData = [[self class] oftData]; XCTAssertEqualObjects(a.guid, @"http://onefoottsunami.com/?p=14863");
XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1441722101]); // 2015-09-08T14:21:41Z
[self measureBlock:^{ [self measureBlock:^{
NSError *error = nil; [[xmlData getParser] parseSync:nil];
RSParseFeedSync(xmlData, &error);
}]; }];
} }
- (void)testScriptingNews { - (void)testScriptingNews {
RSXMLData *xmlData = [self xmlFile:@"scriptingNews" extension:@"rss"];
XCTAssertEqual(xmlData.parserClass, [RSRSSParser class]);
NSError *error = nil; NSError *error = nil;
RSXMLData *xmlData = [[self class] scriptingNewsData]; RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error); XCTAssertEqualObjects(parsedFeed.title, @"Scripting News");
NSLog(@"parsedFeed: %@", parsedFeed); XCTAssertEqualObjects(parsedFeed.subtitle, @"Scripting News, the weblog started in 1997 that bootstrapped the blogging revolution...");
XCTAssertEqualObjects(parsedFeed.link, @"http://scripting.com/");
XCTAssertEqual(parsedFeed.articles.count, 25u);
RSParsedArticle *a = parsedFeed.articles.firstObject;
XCTAssertEqualObjects(a.title, @"People don't click links, that's why the 140-char limit will cripple Twitter");
XCTAssertEqualObjects(a.link, @"http://scripting.com/2015/09/08/peopleDontClickLinks.html");
XCTAssertEqualObjects(a.guid, @"http://scripting.com/2015/09/08/peopleDontClickLinks.html");
XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1441723501]); // Tue Sep 8 16:45:01 2015
[self measureBlock:^{
[[xmlData getParser] parseSync:nil];
}];
} }
- (void)testManton { - (void)testManton {
RSXMLData *xmlData = [self xmlFile:@"manton" extension:@"rss"];
XCTAssertEqual(xmlData.parserClass, [RSRSSParser class]);
NSError *error = nil; NSError *error = nil;
RSXMLData *xmlData = [[self class] mantonData]; RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error); XCTAssertEqualObjects(parsedFeed.title, @"Manton Reece");
NSLog(@"parsedFeed: %@", parsedFeed); XCTAssertNil(parsedFeed.subtitle);
XCTAssertEqualObjects(parsedFeed.link, @"http://www.manton.org");
XCTAssertEqual(parsedFeed.articles.count, 10u);
RSParsedArticle *a = parsedFeed.articles.firstObject;
XCTAssertNil(a.title);
XCTAssertEqualObjects(a.link, @"http://www.manton.org/2015/09/3071.html");
XCTAssertEqualObjects(a.guid, @"http://www.manton.org/?p=3071");
XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1443191200]); // Fri, 25 Sep 2015 14:26:40 +0000
[self measureBlock:^{
[[xmlData getParser] parseSync:nil];
}];
} }
- (void)testKatieFloyd { - (void)testKatieFloyd {
RSXMLData *xmlData = [self xmlFile:@"KatieFloyd" extension:@"rss"];
XCTAssertEqual(xmlData.parserClass, [RSRSSParser class]);
NSError *error = nil; NSError *error = nil;
RSXMLData *xmlData = [[self class] katieFloydData]; RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error);
XCTAssertEqualObjects(parsedFeed.title, @"Katie Floyd"); XCTAssertEqualObjects(parsedFeed.title, @"Katie Floyd");
XCTAssertNil(parsedFeed.subtitle);
XCTAssertEqualObjects(parsedFeed.link, @"http://www.katiefloyd.com");
XCTAssertEqual(parsedFeed.articles.count, 20u);
RSParsedArticle *a = parsedFeed.articles.firstObject;
XCTAssertEqualObjects(a.title, @"Special Mac Power Users for Relay FM Members");
XCTAssertEqualObjects(a.link, @"http://tracking.feedpress.it/link/980/4243452");
XCTAssertEqualObjects(a.guid, @"50c628b3e4b07b56461546c5:50c658a6e4b0cc9aa9ce4405:57bcbe83e4fcb567fdffc020");
XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1472163600]); // Thu, 25 Aug 2016 22:20:00 +0000
[self measureBlock:^{
[[xmlData getParser] parseSync:nil];
}];
} }
- (void)testEMarley { - (void)testEMarley {
RSXMLData *xmlData = [self xmlFile:@"EMarley" extension:@"rss"];
XCTAssertEqual(xmlData.parserClass, [RSRSSParser class]);
NSError *error = nil; NSError *error = nil;
RSXMLData *xmlData = [[self class] eMarleyData]; RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error);
XCTAssertEqualObjects(parsedFeed.title, @"Stories by Liz Marley on Medium"); XCTAssertEqualObjects(parsedFeed.title, @"Stories by Liz Marley on Medium");
XCTAssertEqualObjects(parsedFeed.subtitle, @"Stories by Liz Marley on Medium");
XCTAssertEqualObjects(parsedFeed.link, @"https://medium.com/@emarley?source=rss-b4981c59ffa5------2");
XCTAssertEqual(parsedFeed.articles.count, 10u); XCTAssertEqual(parsedFeed.articles.count, 10u);
}
RSParsedArticle *a = parsedFeed.articles.firstObject;
- (void)testScriptingNewsPerformance { XCTAssertEqualObjects(a.title, @"UI Automation & screenshots");
XCTAssertEqualObjects(a.link, @"https://medium.com/@emarley/ui-automation-screenshots-c44a41af38d1?source=rss-b4981c59ffa5------2");
RSXMLData *xmlData = [[self class] scriptingNewsData]; XCTAssertEqualObjects(a.guid, @"https://medium.com/p/c44a41af38d1");
XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1462665210]); // Sat, 07 May 2016 23:53:30 GMT
[self measureBlock:^{ [self measureBlock:^{
[[xmlData getParser] parseSync:nil];
}];
}
- (void)testDaringFireball {
RSXMLData *xmlData = [self xmlFile:@"DaringFireball" extension:@"atom"];
XCTAssertEqual(xmlData.parserClass, [RSAtomParser class]);
NSError *error = nil; NSError *error = nil;
RSParseFeedSync(xmlData, &error); RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
}]; XCTAssertEqualObjects(parsedFeed.title, @"Daring Fireball");
XCTAssertEqualObjects(parsedFeed.subtitle, @"By John Gruber");
XCTAssertEqualObjects(parsedFeed.link, @"http://daringfireball.net/");
XCTAssertEqual(parsedFeed.articles.count, 47u);
} RSParsedArticle *a = parsedFeed.articles.firstObject;
XCTAssertEqualObjects(a.title, @"Apple Product Event: Monday March 21");
XCTAssertEqualObjects(a.link, @"http://recode.net/2016/02/27/remark-your-calendars-apples-product-event-will-week-of-march-21/");
- (void)testMantonPerformance { XCTAssertEqualObjects(a.guid, @"tag:daringfireball.net,2016:/linked//6.32173");
XCTAssertEqual(a.datePublished, [NSDate dateWithTimeIntervalSince1970:1456610387]); // 2016-02-27T21:59:47Z
RSXMLData *xmlData = [[self class] mantonData];
[self measureBlock:^{ [self measureBlock:^{
NSError *error = nil; [[xmlData getParser] parseSync:nil];
RSParseFeedSync(xmlData, &error);
}];
}
- (void)testDaringFireballPerformance {
RSXMLData *xmlData = [[self class] daringFireballData];
[self measureBlock:^{
NSError *error = nil;
RSParseFeedSync(xmlData, &error);
}]; }];
} }
- (void)testCanParseFeedPerformance { #pragma mark - Variety Test & Other
RSXMLData *xmlData = [[self class] daringFireballData];
// 0.379
[self measureBlock:^{
for (NSInteger i = 0; i < 100; i++) {
RSCanParseFeed(xmlData);
}
}];
}
- (void)testDownloadedFeeds { - (void)testDownloadedFeeds {
NSError *error = nil; NSError *error = nil;
int i = 0; int i = 0;
while (true) { while (true) {
++i; ++i;
NSString *pth = [NSString stringWithFormat:@"feed_%d", i]; RSXMLData *xmlData = [self xmlFile:[NSString stringWithFormat:@"feed_%d", i] extension:@"rss"];
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:pth ofType:@"rss" inDirectory:@"Resources"]; if (!xmlData) break;
if (s == nil) { RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
break;
}
NSData *d = [[NSData alloc] initWithContentsOfFile:s];
RSXMLData *xmlData = [[RSXMLData alloc] initWithData:d urlString:pth];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error);
printf("\n\nparsing: %s\n%s\n", pth.UTF8String, parsedFeed.description.UTF8String);
XCTAssertNil(error); XCTAssertNil(error);
XCTAssert(parsedFeed);
XCTAssert(parsedFeed.title);
XCTAssert(parsedFeed.link);
XCTAssert(parsedFeed.articles.count > 0);
//printf("\n\nparsing: %s\n%s\n", xmlData.urlString.UTF8String, parsedFeed.description.UTF8String);
} }
} }
- (void)testDownloadedFeedsPerformance {
[self measureBlock:^{
[self testDownloadedFeeds];
}];
}
- (void)testSingle { - (void)testSingle {
NSError *error = nil; NSError *error = nil;
NSString *filename = @"feed_1"; RSXMLData *xmlData = [self xmlFile:@"feed_1" extension:@"rss"];
NSString *s = [[NSBundle bundleForClass:[self class]] pathForResource:filename ofType:@"rss" inDirectory:@"Resources"]; RSParsedFeed *parsedFeed = [[xmlData getParser] parseSync:&error];
NSData *d = [[NSData alloc] initWithContentsOfFile:s]; printf("\n\nparsing: %s\n%s\n", xmlData.urlString.UTF8String, parsedFeed.description.UTF8String);
RSXMLData *xmlData = [[RSXMLData alloc] initWithData:d urlString:@"single-feed"];
RSParsedFeed *parsedFeed = RSParseFeedSync(xmlData, &error);
printf("\n\nparsing: %s\n%s\n", filename.UTF8String, parsedFeed.description.UTF8String);
XCTAssertNil(error); XCTAssertNil(error);
} }
- (void)testDetermineParserClassPerformance {
RSXMLData *xmlData = [self xmlFile:@"DaringFireball" extension:@"atom"];
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wundeclared-selector"
[self measureBlock:^{
for (NSInteger i = 0; i < 100; i++) {
[xmlData performSelector:@selector(determineParserClass)];
}
}];
#pragma clang diagnostic pop
}
@end @end

View File

@@ -1,31 +0,0 @@
class SomeViewController: NSViewController {
@IBOutlet weak var textField: NSTextField
private var NSTimer: fetchDataTimer?
private var currentText: String? {
didSet {
invalidateTimer()
if currentText.length > 3 {
restartTimer()
}
}
}
func textDidChange(notification: NSNotification) {
currentText = textField.stringValue
}
func invalidateTimer() {
if let timer = timer {
if timer.isValid {
timer.invalidate()
}
self.timer = nil
}
}
}