[Standards] About stream namespaces

Robin Redeker elmex at x-paste.de
Sun Mar 18 13:52:06 UTC 2007

On Sun, Mar 18, 2007 at 11:57:09AM +1100, Daniel Noll wrote:
> On Sunday 18 March 2007 03:03, Robin Redeker wrote:
> > Further does the w3c XML recommendation only speak about 'XML Documents'
> > which are (by definition) well-formed.
> >
> > Fragmented XML is by definition not well-formed and is not a XML Document.
> >
> > => XMPP is basically not-well-formed XML.
> >
> > Maybe this is perfectly valid because XMPP calls it 'valid'. But it's
> > not valid if you ask the XML recommendation, because it doesn't say
> > anything about validness of fragmented XML.
> I'm not even sure what people are meaning by "fragmented" here, since it isn't 
> a standard term in relation to XML...
> But the XMPP stream itself is basically well-formed.  That is, if you take the 
> entire stream, it should have a prolog, and one top-level elements with 
> contained elements where all the start and end tags match up. 

The point is, that you can't take the entire stream and process it as XML document
after you read it.

> The fact that
> part of the document isn't available to parse yet is completely irrelevant, 
> as when you parse a file on disk, occasionally a part of the document won't 
> be available to parse yet either (the difference is that when reading from 
> disk, you won't have to wait very long for the next chunk of bytes.)

The difference, which matters, is that when you read from harddisk you
are not forced to process the document before you have read the full
file. You are not _forced_ to parse a partial XML document when you read
a file.  If you parse it anyway (without special chunked parsing modes
which some sophisticated parsers have, which are by no means required by
the XML recommendation), the XML parser is allowed to bail out and call
it a 'not-well-formed XML document'.

When reading from a socket and receiving an error or when reading from
another device and receiving an error you don't have to pass the 'partial'
XML document into the parser.

> I say "basically", because as soon as you negotiate StartTLS, you've failed 
> because binary data is not valid in XML.

XMPP is also without TLS "basically" not well-formed, take a look at this (client
to server stream example from RFC3920 '6.5. Client-to-Server Example'):

   *** tcp connection established ***
   <?xml version='1.0'?>

   <auth xmlns='urn:ietf:params:xml:ns:xmpp-sasl'

   <response xmlns='urn:ietf:params:xml:ns:xmpp-sasl'>

   <response xmlns='urn:ietf:params:xml:ns:xmpp-sasl'/>


   <stanzas-here xmlns='this:is-a-placeholder-for-xml-stanzas'/>

   *** tcp connection lost ***

Whoooops, where is the missing closing </stream:stream> tag?

I know that after sasl authentication the client is supposed to 'flush'
everything. But what travels the wire is still not a well-formed XML document.

Of course you can now go on and say: "But if you take the <stream> element
and look for an closing tag and take what you have by then, then you have a
XML document!". But looking for that closing tag is actually _parsing_ XML.

And this kind of preprocessing of looking for balanced tags etc. is "basically"
the point where pain begins and where you leave the well defined ground of
the XML recommendation.

> But I guess we're stuck with this 
> because the only alternative which would retain XML well-formedness would be 
> to Base64 encode all the TLS data and contain that inside elements.  And 
> clearly that alternative is worse. :-D

You mean like this:
:-) ?

And speaking of 'alternatives' (which are clearly no alternative to the current
situation as noone is going to change the RFC and force everyone to adopt,
and I don't even propose to do that):

Another alternative would be to make a small packet-layer on the tcp stream
which sends packets which look roughly like this:
   | packet length header | data |
data would contain a fully well-formed XML document and the other side
just needs to parse that well-formed XML document contained in the data part
after it has read the whole packet.

(This idea is similar to http://www.xmpp.org/extensions/xep-0017.html, only
that I would propose not to have any special cases with the <stream> tags.)

I don't want to propose to change everything or anything right now.
I just want to point out a fact of the current specification.


More information about the Standards mailing list