[Standards] <[CDATA[ in XMPP

textshell-E19442 at neutronstar.dyndns.org textshell-E19442 at neutronstar.dyndns.org
Tue Jul 31 01:53:48 UTC 2007

On Mon, Jul 30, 2007 at 05:37:29PM -0700, Rachel Blackman wrote:
> On Jul 30, 2007, at 5:28 PM, Robin Redeker wrote:
> >On Mon, Jul 30, 2007 at 05:12:16PM -0700, Rachel Blackman wrote:
> >>Can I use <[CDATA[ in, say, roster additions or removals?  If I'm
> >>using it there, how do I need to process the text on the server-side
> >>for the JIDs?  If I send ' stpeter at jabber.org' as a CDATA element --
> >>allowing the space in there -- how do I handle escaping it on the
> >>server side?  Do I just store it as ' stpeter at jabber.org' in the
> >>roster?  Do I need to re-escape it before sending it back?  Do I need
> >>to determine that the JID requires escaping, and so send that roster
> >>item as a <[CDATA[ block?  Does it show up as the same JID or
> >>different than \20stpeter at jabber.org?  Etc.
> >
> >' stpeter at jabber.org' is not (yet) a valid JID.
> >And you can already send such a JID to the server:
> >
> >   <message to=" stpeter at jabber.org" ...
> >
> >I would expect the server to give me an error.
> >
> >On top of that is JID escaping in a completely different layer
> >than XML escaping.
> But we're discussing MAKING it a valid JID.  I'd argue that if we're  
> discussing these things, we should consider the implications of such  
> changes during the course of the discussions.  We are discussing  
> multiple methods of escaping data; if we really want more than one  
> way to do something, then we have an obligation as those laying down  
> the standards to determine how those different methods interact.

That's implict in the layering. jid escaping is presentation only.
it doesn't change what is a valid jid char sequence.

"""The escaped JID is unescaped only for presentation to a human user
(typically by an XMPP client) or for gatewaying to a non-XMPP system (such
as an LDAP database or a messaging system that does not use XMPP)."""
[XEP 106]

> If I send '<item> stpeter at jabber.org</item>' to the server in a  
> roster add/remove request, it will almost certainly eat that  
> whitespace at the beginning.  Now we're talking about making that,  
> for instance, '<item>\20stpeter at jabber.org</item>' with JID escaping,  
> so that you could actually have that space there.  Okie, that's fine.
> But now let's say I do '<item><![CDATA[ stpeter at jabber.org]]></item>'  
> -- is that processed as ' stpeter at jabber.org' (with the raw space),  
> thus requiring a CDATA block any time you want to refer to that JID?   
> Or is the burden on the server to convert it to \20stpeter at jabber.org  
> for the sake of compatibility, or what?

Be so kind and read the XML spec, can you?
"""Within a CDATA section, only the CDEnd string is recognized as markup,
so that left angle brackets and ampersands may occur in their literal form;
they need not (and cannot) be escaped using "<" and "&". CDATA
sections cannot nest.""" 

CDATA is purely XML level and doesn't carry any semantic meaning. 
And yes, the normal compliant XML parser doesn't even bother to tell
you how the data was encoded in the byte stream. 
You are seriously confusing layers here.

<item><![CDATA[ stpeter at jabber.org]]></item>
has exactly the same semantics as
<item> stpeter at jabber.org</item>
because the CDATA doesn't contain any chars that need escaping at the
xml level.

[ ...]
> This is what I mean by added complexity from CDATA. :)

Yes, reading the spec and understanding XML is a heavy burden...

 - Martin H.

More information about the Standards mailing list