[Standards] About stream namespaces

Daniel Noll daniel at noll.id.au
Mon Mar 19 05:24:56 UTC 2007


> The whole paragraph '5 Conformance' in
>     http://www.w3.org/TR/2006/REC-xml-20060816/#sec-conformance
> says that any parser has to check for well-formedness.
>
> To check a XML document for well-formedness it first needs a _complete_
> XML document (the recommendation does not define 'partial' XML
> documents).
>
> And the quoted paragraph above says that any information retrieved from
> unchecked or even errornous documents is not defined.

Actually you can't even say "no, it isn't well-formed because it hasn't
ended yet."  The well-formedness itself is undefined until the document
ends -- it may be well-formed or it may not be well-formed.  (Quantum
physicists would have us believe that it's both at the same time. ;-))

What the spec actually says is:

 | Validating and non-validating processors alike MUST report violations
 | of this specification's well-formedness constraints in the content of
 | the document entity and any other parsed entities that they read.

The legacy protocol was more friendly, the SSL being done outside the
start of the stream.  A lot of us (including me) still remember it, and
forget that things like SASL and StartTLS break it to various degrees.

If you treat that as an EOF though (and yes, that means the XML up to now
isn't valid because the end is missing, but that happens even with normal
files and HTTP queries) and take the rest to be the actual document, then
it becomes well-formed again.

| Non-validating processors are REQUIRED to check only the document entity,
| including the entire internal DTD subset, for well-formedness.

This doesn't state that the entire document needs to be present either. 
But it does state that if a DOCTYPE is present, the entire DTD needs to be
validated.

That being said, surely the lack of a DOCTYPE is a problem in itself. 
Maybe that's really the only thing that would make the original protocol
non-XML.

> I know that there are implementation of XML parser that are quite liberal
> in what they accept. The point is: XMPP does exploit this beyond the XML
> recommendation. It makes it _neccessary_ to have such parsers.

The fact that it makes it a necessity isn't really a major problem.  For
one thing, even the old SAX API was already capable of handling an XMPP
stream, and various parsers are based on that one already.

As far as parsers go, the only hack most of them ever needed to do was
this stream namespace one, to strip them off and put them back on.

I disagree with having the different namespaces myself, not because it's
difficult to implement (I'd do it in Java using an XMLFilter probably, in
maybe five lines of code for each side) but because it carries
questionable semantics.  Which is to say, the message I send from myself
to the server and the message the server sends to another server; those
two elements have the same content and the same meaning, and thus should
have the same name.

It's a minor quibble, though, and far worse semantic errors are present in
XHTML (or at least in XHTML 1.x, since XHTML 2 seems to have fixed a large
number of issues.)

Daniel


-- 
 Jabber: daniel at noll.id.au
  Email: daniel at noll.id.au
    Web: http://noll.id.au/daniel/





More information about the Standards mailing list