[Standards] <[CDATA[ in XMPP

Mickaël Rémond mickael.remond at process-one.net
Mon Jul 30 23:36:43 UTC 2007


Hello,

----- Original Message -----
> Tobias S. Josefowitz wrote:
>> On 7/31/07, Peter Saint-Andre <stpeter at jabber.org> wrote:
>>
>>> Similarly, a complete XMPP session (with start and end stream  
>>> tags) is
>>> a
>>> conforming XML document. Just because the session does not include
>>> comments, processing instructions, DTD subsets, entity references  
>>> other
>>> than those predefined in the XML spec, other other restricted  
>>> features
>>> does not mean that the session is not an XML document.
>>
>> Unless for example starttls comes into play... exceptions everywhere.
>> Is that the existence of a rule I am sensing here?
>
> How does the inclusion of STARTTLS negotiation cause the complete XML
> stream to not be an XML document?

If we go further why do server and client accept the "prologue" <?xml
version="1.0"?>
If we are not talking about XML (or a close relative, as it seems to  
make
people nervous ;), it should be forbidden (yes, I am kidding :)

If we go back to the heart of the discussion, what is bad about  
CDATA ? We
are talking about characters escaping and it does not seems to be a big
deal. It is defined in XML (and for a good reason: To simplify parsing).
Is CDATA a controverse part of the XML Spec ? I do not think so.
We are talking about characters escaping, and I do not see what is
complicated in this. All the clients I have tried support CDATA, as  
there
parser is build on a lower level XML parser.
Yes, you can do parsing optimisation for XMPP, but I think when you say
that you do not want to mess with tokenisation of the XML stream,  
which is
the low level stuff you will most probably want to rely on.

For those interested in gory details, why can you optimise the way you
handle a stream with this ? Because, other means of escaping means
replacing & with & for example. This is not a one character to one
character mapping, so you have to do copy of your string to make a  
new one
which is longer. With CDATA, you can in 99,999% of the time escape
characters by concatening several strings, without any copy operation
(your are manipulating references to your data parts, not changing the
data).
So my view is that to prevent this kind of optimisation in XMPP  
(which is
why CDATA is in XML), you need to have a good reason. If it is not
prevented it should probably be stated somewhere that it is allowed.

-- 
Mickaël Rémond
  http://www.process-one.net/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20070731/a86cf63d/attachment.html>


More information about the Standards mailing list