[Standards] <[CDATA[ in XMPP
rcb at ceruleanstudios.com
Tue Jul 31 02:33:41 UTC 2007
> CDATA is purely XML level and doesn't carry any semantic meaning.
> And yes, the normal compliant XML parser doesn't even bother to tell
> you how the data was encoded in the byte stream.
> You are seriously confusing layers here.
Fair enough that I shouldn't have used spaces as the example; you're
right that it's invalid, and I simply grabbed it as the sample due to
the JID escaping thing.
JID escaping has, however, been put forward as a method of escaping
characters to get them across the wire. I think I am failing to get
my point across clearly, so I will try one last time. What I've been
trying to address is, for instance:
> So, we are talking about way to escape characters in an XML stream. My
> view is that all way to escape characters are good, especially when
> are defined in XML, cited in XMPP RFC and are simple to implement (and
> implemented by all parsers I know).
Read that carefully. "All way to escape characters are good."
If we are viewing CDATA as 'one more way to escape characters,' then
we need to think about the implications. Because I will /guarantee/
you that if we recommend CDATA as an escaping method, then someone
will do a <![CDATA[john&mary at family.org]]> in an <item/> value, or
My point is that we need to /define/ things like this, rather than
leaving them vague. Or else someone WILL go, 'Oh, well, when I send
down john&mary at family.org it disconnects me with a stream error
saying there's an unescaped character there. I'll just make sure
anything with unescaped characters goes into a CDATA block.' And if
they do that, it will be valid XML across the wire, too! It should
not pop them off with a stream error, right?
If we proclaim that all JIDs must adhere to the current rules and the
characters we've discussed as visually useful but invalid to send
across the wire as part of a node (namely, & and ' and so on) must be
escaped using JID escaping, that's *fine*. Deciding to explicitly
say that JIDs cannot contain those characters except as represented
in JID escaping is a *valid and viable solution* to my concern.
If, however, we want to just leave it vague and make CDATA 'one more
way to escape characters,' then people will most likely make
assumptions about how things interact. Based on past experience, I
suspect at least some of those assumptions will be wrong. My point
is that if we want to include CDATA, then we need to make it clear
where CDATA is /not/ an appropriate solution for escaping.
I hope that makes my concern clearer, but I will leave it alone at
this point; I have realized I am arguing this point utterly alone; it
may mean that I am utterly failing to communicate my concern clearly,
or that I am seeing a problem where one does not exist. I will hope
that it is the latter and my concerns are just motivated by my
personal generally squidgy feelings about hazily-defined edges to
standards, rather than being an actual problem. :)
Rachel Blackman <rcb at ceruleanstudios.com>
Trillian Messenger - http://www.trillianastra.com/
More information about the Standards