[Standards] About stream namespaces

Daniel Noll daniel at noll.id.au
Sun Mar 18 20:23:46 UTC 2007

On Monday 19 March 2007 00:52, Robin Redeker wrote:
> The difference, which matters, is that when you read from harddisk you
> are not forced to process the document before you have read the full
> file. You are not _forced_ to parse a partial XML document when you read
> a file.  If you parse it anyway (without special chunked parsing modes
> which some sophisticated parsers have, which are by no means required by
> the XML recommendation), the XML parser is allowed to bail out and call
> it a 'not-well-formed XML document'.

Is it?  I don't know of any parsers which do, and I certainly don't know of 
anything in the XML specification which demands that the entire document be 
available up-front.  Most parsers will just sit there and do nothing until 
they get all the data (which in this situation would never happen, if that 
were the only parser reading from the stream.  If you had two parsers and one 
of them was making responses, it might be a different story.)

> > I say "basically", because as soon as you negotiate StartTLS, you've
> > failed because binary data is not valid in XML.
> XMPP is also without TLS "basically" not well-formed, take a look at this
> (client to server stream example from RFC3920 '6.5. Client-to-Server
> Example'):

Yep, SASL is an offender too.  I forgot about that one.  Back in the days when 
we used the old style of authentication, this issue didn't occur because it 
didn't start a new stream.  I wonder why we didn't decide to close the outer 
<stream:stream/>, actually.  It wouldn't have been a stretch to do so.

> > But I guess we're stuck with this
> > because the only alternative which would retain XML well-formedness would
> > be to Base64 encode all the TLS data and contain that inside elements. 
> > And clearly that alternative is worse. :-D
> You mean like this:
>     http://mail.jabber.org/pipermail/security/2007-March/000002.html
> :-) ?

Basically, only for the entire stream. :-D

> Another alternative would be to make a small packet-layer on the tcp stream
> which sends packets which look roughly like this:
>    -------------------------------
>    | packet length header | data |
>    -------------------------------
> data would contain a fully well-formed XML document and the other side
> just needs to parse that well-formed XML document contained in the data
> part after it has read the whole packet.
> (This idea is similar to http://www.xmpp.org/extensions/xep-0017.html, only
> that I would propose not to have any special cases with the <stream> tags.)

There's nothing entirely wrong with that approach either.  One thing it does 
do is make it easier to skip over the entire stanza, or parse it far enough 
to extract the recipient without parsing the rest.  One downside is that the 
namespaces need to be redefined on each stanza (or since we'd be redesigning, 
maybe we should just say that the stanza namespace is the empty one.)

While we're discussing things which will likely not happen for ages, a similar 
option might be to move to a binary XML equivalent format.  EBML, for 
instance, encodes the binary length of all elements so that at any point in 
the parsing you can skip an entire element and all its children.  Plus it's 
still extensible, but it uses a huge integer space instead of using 
namespaces (so non-official extensions would need to be organised by some 
kind of registry, which is a little less convenient but not the end of the 

Of course if you then went to use the Matroska container format for streaming 
audio and video communications, you'd be using EBML both for the video 
streams and the IM stream, which is somewhat entertaining to think about.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.jabber.org/pipermail/standards/attachments/20070319/c606be77/attachment.sig>

More information about the Standards mailing list