[Standards] <[CDATA[ in XMPP
rcb at ceruleanstudios.com
Tue Jul 31 00:12:16 UTC 2007
On Jul 30, 2007, at 4:43 PM, Tobias S. Josefowitz wrote:
> On 7/31/07, Rachel Blackman <rcb at ceruleanstudios.com> wrote:
>> Not that I disagree that XMPP should be defined as a rational subset
>> of XML, rather than including the whole spec, but... this seems to be
>> needlessly splitting hairs, to me.
>> Correct me if I'm wrong, but the definition of XMPP is that you /
>> restart the stream/ when you get an opening <stream> element (such as
>> after starttls or whatever). Given that the stream starts over with
>> the new <stream>, the complete XML stream is indeed still a complete
>> and valid document.
> If seen that way, XMPP should probably define that universal peace
> ensues upon <starttls/>.
It seems fairly simple to me to say that when you do a stream feature
negotiation, the state is altered, the previous stream/document is
now discarded, and you begin a new stream with the newly altered
state. You have, in effect, negotiated features required for a new
stream, and then you begin that stream.
If we accept -- as I believe the XMPP spec claims -- that the stream
begins with the <stream:stream> element and ends with </
stream:stream>, then I fail to see why everything between those two
would not validate as proper XML.
If you're arguing that it's the encryption which makes it invalid...
most people would not claim that XML retrieved over HTTPS is invalid
because the raw encrypted SSL data cannot be parsed; why should
starttls be considered invalid?
Regardless, the point of this is whether or not XMPP should include
support for <[CDATA[ escaping, and we've digressed into whether or
not XMPP is XML.
My take is that XML and XMPP are related but not identical; XMPP is a
structured text language which parses and validates as XML, but
there's absolutely no reason that XMPP should allow whatever-the-hell-
you-want that can fit into XML. One is a subset of the other, but
this does not imply an equivalence; just because all Apple
Macintoshes are computers does not mean all computers are Apple
Macintoshes. Similarly, just because all XMPP is valid XML (where
XMPP in this case is defined as the document comprising everything
between the /final/ stream opening and the stream closing) does not
mean all XML is valid XMPP.
I cannot -- and should not -- go around defining new base stanzas
willy-nilly just because they happen to be valid XML. Similarly, I
happen to think adding <[CDATA[ complicates matters somewhat; while
it is undeniably useful in message bodies and a few other places, I
think it can lead to complications when you try to figure out how to
deal with it in other areas.
Can I use <[CDATA[ in, say, roster additions or removals? If I'm
using it there, how do I need to process the text on the server-side
for the JIDs? If I send ' stpeter at jabber.org' as a CDATA element --
allowing the space in there -- how do I handle escaping it on the
server side? Do I just store it as ' stpeter at jabber.org' in the
roster? Do I need to re-escape it before sending it back? Do I need
to determine that the JID requires escaping, and so send that roster
item as a <[CDATA[ block? Does it show up as the same JID or
different than \20stpeter at jabber.org? Etc.
My issues with CDATA in XMPP are not because I think it makes the
actual XML parsing more difficult, but because it really messes with
state and context in some areas. (Such as used around JIDs.) If we
want to include CDATA as a valid method of escaping in XMPP, then we /
need/ to nail down how it interacts with some pretty core parts of XMPP.
Rachel Blackman <rcb at ceruleanstudios.com>
Trillian Messenger - http://www.trillianastra.com/
More information about the Standards