[Standards] About stream namespaces
daniel at noll.id.au
Sun Mar 18 20:23:46 UTC 2007
On Monday 19 March 2007 00:52, Robin Redeker wrote:
> The difference, which matters, is that when you read from harddisk you
> are not forced to process the document before you have read the full
> file. You are not _forced_ to parse a partial XML document when you read
> a file. If you parse it anyway (without special chunked parsing modes
> which some sophisticated parsers have, which are by no means required by
> the XML recommendation), the XML parser is allowed to bail out and call
> it a 'not-well-formed XML document'.
Is it? I don't know of any parsers which do, and I certainly don't know of
anything in the XML specification which demands that the entire document be
available up-front. Most parsers will just sit there and do nothing until
they get all the data (which in this situation would never happen, if that
were the only parser reading from the stream. If you had two parsers and one
of them was making responses, it might be a different story.)
> > I say "basically", because as soon as you negotiate StartTLS, you've
> > failed because binary data is not valid in XML.
> XMPP is also without TLS "basically" not well-formed, take a look at this
> (client to server stream example from RFC3920 '6.5. Client-to-Server
Yep, SASL is an offender too. I forgot about that one. Back in the days when
we used the old style of authentication, this issue didn't occur because it
didn't start a new stream. I wonder why we didn't decide to close the outer
<stream:stream/>, actually. It wouldn't have been a stretch to do so.
> > But I guess we're stuck with this
> > because the only alternative which would retain XML well-formedness would
> > be to Base64 encode all the TLS data and contain that inside elements.
> > And clearly that alternative is worse. :-D
> You mean like this:
> :-) ?
Basically, only for the entire stream. :-D
> Another alternative would be to make a small packet-layer on the tcp stream
> which sends packets which look roughly like this:
> | packet length header | data |
> data would contain a fully well-formed XML document and the other side
> just needs to parse that well-formed XML document contained in the data
> part after it has read the whole packet.
> (This idea is similar to http://www.xmpp.org/extensions/xep-0017.html, only
> that I would propose not to have any special cases with the <stream> tags.)
There's nothing entirely wrong with that approach either. One thing it does
do is make it easier to skip over the entire stanza, or parse it far enough
to extract the recipient without parsing the rest. One downside is that the
namespaces need to be redefined on each stanza (or since we'd be redesigning,
maybe we should just say that the stanza namespace is the empty one.)
While we're discussing things which will likely not happen for ages, a similar
option might be to move to a binary XML equivalent format. EBML, for
instance, encodes the binary length of all elements so that at any point in
the parsing you can skip an entire element and all its children. Plus it's
still extensible, but it uses a huge integer space instead of using
namespaces (so non-official extensions would need to be organised by some
kind of registry, which is a little less convenient but not the end of the
Of course if you then went to use the Matroska container format for streaming
audio and video communications, you'd be using EBML both for the video
streams and the IM stream, which is somewhat entertaining to think about.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the Standards