[Standards] Handling for characters that have entities, but XML does not require them to be escaped

Robin Redeker elmex at x-paste.de
Sun Jul 22 19:19:05 UTC 2007


On Sun, Jul 22, 2007 at 04:30:13PM +0200, Matthias Wimmer wrote:
> Hi!
> 
> There are several characters, that have predefined entities in XML, but
> that do not need to be escaped in XML.
> Examples for such characters are > ' and " in text nodes.
> 
[...]
> So I have two questions regarding this:
> 
> Why at all do these characters have to be escaped?

I guess because many people did implement their own broken XML parsers
in the past and many couldn't handle real XML, so they enforced escaping
that character for the backward compatibility. (just a guess)

> I it really necessary, that RFC 3920bis mandates a server to reject such
> XMPP streams? I very much dislike this requirement, as it would require
> me to implement my own XML parser, as I don't know any parser I could
> use, that could be configured to notice me that these characters have
> been received unescaped.

If you use expat you could get the original string from a text node
and look for a '>' in that string. But this is an ugly hack that I also
consider unneccessary.

The RFC should be fixed and software that doesn't parse unescaped > in
text nodes should be fixed (noone is forced in todays world to write his
own XML parser, libxml2 (afaik) and expat (for sure) can be convinced to
handle partial transferred XML documents these days).



Robin



More information about the Standards mailing list