The XML specs say that you must reject a file that is bad XML. This is good for XML because it encourages people to produce good well formed XML. But our goal is the data not the spec. We're also likely to be collecting embedded rdf in web pages among other things, where the overall file may not be XML at all. So there's a need here for an "Ultra- Liberal RDF parser" like the one Mark Pilgrim has produced for RSS. see http://diveintomark.org/projects/rss_parser/ Most current RDF parsers are built on top of XML parsers which will reject bad XML so you'll never see the data.

If you find bad XML or bad RDF then tell the author. Don't just ignore it or dump it. They will probably be very thankful. And if you're generating RDF then check it with the W3C validator http://www.w3.org/RDF/Validator/


[ << On writing spiders and scutters for FOAF ] [ Stupid Security: Exposing Fake Security Since 2003 >> ]
[ 03-Aug-03 1:04pm ]