AntennaPod is quite slow with huge feeds. The reason is that we have a bunch of workarounds for misbehaving feeds that also make it slower to work with feeds that do not misbehave.
Changes:
- Only start guessing duplicate episodes when no "proper" match is found
- Only parse non-html as HTML for attributes that really need it
- Do not log failed Long parsing when size is not specified
- Try to parse dates with RFC822 first before falling back to workarounds for other formats
I ran a benchmark with "Stuff you should know" (for which the workarounds are not needed) containing 2k episodes. Includes download of 8MB of feed XML (~5 seconds), debug build.
Before: 44 seconds, after: 13 seconds ==> 3.4 times faster feed refresh
* Move duplicate detection to one single place
* Canonicalize some common characters that are often confused
* Assume same episode even when date is off by 1 week
* Display duplicate detection as warning, not error