[Standards] <[CDATA[ in XMPP

Rachel Blackman rcb at ceruleanstudios.com
Tue Jul 31 00:12:16 UTC 2007

On Jul 30, 2007, at 4:43 PM, Tobias S. Josefowitz wrote:

> On 7/31/07, Rachel Blackman <rcb at ceruleanstudios.com> wrote:
>> Not that I disagree that XMPP should be defined as a rational subset
>> of XML, rather than including the whole spec, but... this seems to be
>> needlessly splitting hairs, to me.
>> Correct me if I'm wrong, but the definition of XMPP is that you /
>> restart the stream/ when you get an opening <stream> element (such as
>> after starttls or whatever).  Given that the stream starts over with
>> the new <stream>, the complete XML stream is indeed still a complete
>> and valid document.
> If seen that way, XMPP should probably define that universal peace
> ensues upon <starttls/>.


It seems fairly simple to me to say that when you do a stream feature  
negotiation, the state is altered, the previous stream/document is  
now discarded, and you begin a new stream with the newly altered  
state.  You have, in effect, negotiated features required for a new  
stream, and then you begin that stream.

If we accept -- as I believe the XMPP spec claims -- that the stream  
begins with the <stream:stream> element and ends with </ 
stream:stream>, then I fail to see why everything between those two  
would not validate as proper XML.

If you're arguing that it's the encryption which makes it invalid...  
most people would not claim that XML retrieved over HTTPS is invalid  
because the raw encrypted SSL data cannot be parsed; why should  
starttls be considered invalid?

Regardless, the point of this is whether or not XMPP should include  
support for <[CDATA[ escaping, and we've digressed into whether or  
not XMPP is XML.

My take is that XML and XMPP are related but not identical; XMPP is a  
structured text language which parses and validates as XML, but  
there's absolutely no reason that XMPP should allow whatever-the-hell- 
you-want that can fit into XML.  One is a subset of the other, but  
this does not imply an equivalence; just because all Apple  
Macintoshes are computers does not mean all computers are Apple  
Macintoshes.  Similarly, just because all XMPP is valid XML (where  
XMPP in this case is defined as the document comprising everything  
between the /final/ stream opening and the stream closing) does not  
mean all XML is valid XMPP.

I cannot -- and should not -- go around defining new base stanzas  
willy-nilly just because they happen to be valid XML.  Similarly, I  
happen to think adding <[CDATA[ complicates matters somewhat; while  
it is undeniably useful in message bodies and a few other places, I  
think it can lead to complications when you try to figure out how to  
deal with it in other areas.

Can I use <[CDATA[ in, say, roster additions or removals?  If I'm  
using it there, how do I need to process the text on the server-side  
for the JIDs?  If I send ' stpeter at jabber.org' as a CDATA element --  
allowing the space in there -- how do I handle escaping it on the  
server side?  Do I just store it as ' stpeter at jabber.org' in the  
roster?  Do I need to re-escape it before sending it back?  Do I need  
to determine that the JID requires escaping, and so send that roster  
item as a <[CDATA[ block?  Does it show up as the same JID or  
different than \20stpeter at jabber.org?  Etc.

My issues with CDATA in XMPP are not because I think it makes the  
actual XML parsing more difficult, but because it really messes with  
state and context in some areas.  (Such as used around JIDs.)  If we  
want to include CDATA as a valid method of escaping in XMPP, then we / 
need/ to nail down how it interacts with some pretty core parts of XMPP.

Rachel Blackman <rcb at ceruleanstudios.com>
Trillian Messenger - http://www.trillianastra.com/

More information about the Standards mailing list