There are two ideological standpoints on parsing of broken XML:
- The XML spec says parsing broken XML is forbidden. Tolerant parsing will cause the feed generators to be broken forever.
- Some feeds will be always broken. Total perfection of all generators isn't possible and only the user experience counts. Therefore tolerant XML parsing is mandatory for a good aggegrator.
With the rise of Atom 1.0 and the continuous improvements of the major feed generators it is getting more and more realisitic to follow the approach of opinion 1: to refuse broken feeds and force the feed generators to fix the problem. The main reason to forbear the use of libxml2's recovery mode is of course the prospect of its future removal.
- Don't use recovery mode for Atom 1.0 and OPML at once (released with v1.1.1)
- Continue to use recovery mode for RSS for now.
- Later split RSS parser into 0.9x and 1.0/2.0 parser where only the 0.9x parser uses the recovery mode.
- When libxml2 removes recovery mode either drop RSS 0.9x support or write a new parser.