[Standards] <[CDATA[ in XMPP

Rachel Blackman rcb at ceruleanstudios.com
Tue Jul 31 02:33:41 UTC 2007

> CDATA is purely XML level and doesn't carry any semantic meaning.
> And yes, the normal compliant XML parser doesn't even bother to tell
> you how the data was encoded in the byte stream.
> You are seriously confusing layers here.

Fair enough that I shouldn't have used spaces as the example; you're  
right that it's invalid, and I simply grabbed it as the sample due to  
the JID escaping thing.

JID escaping has, however, been put forward as a method of escaping  
characters to get them across the wire.  I think I am failing to get  
my point across clearly, so I will try one last time.  What I've been  
trying to address is, for instance:

> So, we are talking about way to escape characters in an XML stream. My
> view is that all way to escape characters are good, especially when  
> they
> are defined in XML, cited in XMPP RFC and are simple to implement (and
> implemented by all parsers I know).

Read that carefully. "All way to escape characters are good."

If we are viewing CDATA as 'one more way to escape characters,' then  
we need to think about the implications.  Because I will /guarantee/  
you that if we recommend CDATA as an escaping method, then someone  
will do a <![CDATA[john&mary at family.org]]> in an <item/> value, or  

My point is that we need to /define/ things like this, rather than  
leaving them vague.  Or else someone WILL go, 'Oh, well, when I send  
down john&mary at family.org it disconnects me with a stream error  
saying there's an unescaped character there.  I'll just make sure  
anything with unescaped characters goes into a CDATA block.'  And if  
they do that, it will be valid XML across the wire, too!  It should  
not pop them off with a stream error, right?

If we proclaim that all JIDs must adhere to the current rules and the  
characters we've discussed as visually useful but invalid to send  
across the wire as part of a node (namely, & and ' and so on) must be  
escaped using JID escaping, that's *fine*.  Deciding to explicitly  
say that JIDs cannot contain those characters except as represented  
in JID escaping is a *valid and viable solution* to my concern.

If, however, we want to just leave it vague and make CDATA 'one more  
way to escape characters,' then people will most likely make  
assumptions about how things interact.  Based on past experience, I  
suspect at least some of those assumptions will be wrong.  My point  
is that if we want to include CDATA, then we need to make it clear  
where CDATA is /not/ an appropriate solution for escaping.

I hope that makes my concern clearer, but I will leave it alone at  
this point; I have realized I am arguing this point utterly alone; it  
may mean that I am utterly failing to communicate my concern clearly,  
or that I am seeing a problem where one does not exist.  I will hope  
that it is the latter and my concerns are just motivated by my  
personal generally squidgy feelings about hazily-defined edges to  
standards, rather than being an actual problem. :)

Rachel Blackman <rcb at ceruleanstudios.com>
Trillian Messenger - http://www.trillianastra.com/

More information about the Standards mailing list