[Jabber-IETF] Agenda items

Iain Shigeoka iain.shigeoka at messaginglogic.com
Tue Oct 1 11:37:07 CDT 2002


On 9/30/02 12:44 PM, "Pete Chown" <1 at 234.cx> wrote:

> Iain Shigeoka wrote:
> 
>> Due to Jabber's XML streaming nature, many people end up writing their own
>> parsers or heavily hacking COTS XML parsers to report sub DOM trees of the
>> streaming document and change the normal caching behavior of most COTS
>> parsers.
> 
> I'm using an unmodified expat. :-) Xerces didn't work too well though,
> which is a pity because I wanted to use the schema fragments from the
> drafts to catch erroneous input.  Oh well...

Yup. The majority of generic XML parsers don't work well against Jabber out
of the box.

>> This results in the widespread hardcoding of character encodings,
>> namespace prefixes, etc.  Rather supporting full XML syntax, these home
>> grown ones assume UTF-8 and scan for the char sequence '<stream:stream' and
>> reject any other root document, make sure the attribute xmlns:stream is set
>> to the correct value in the root, etc.
>> 
>> Since this helps make implementation simpler, and is fully XML compliant, it
>> may be wise to restrict and confine the XMPP usage of XML as tightly as
>> possible unless there are compelling reasons to support a more flexible
>> subset (or the entire set) of XML features.
> 
> Actually I disagree that it is fully XML compliant.  Being XML compliant
> doesn't just mean that you end up with a stream that is well-formed.  It
> means that the XML reader can accept the whole class of documents that
> mean the same thing.  I included an example of this in my earlier post:
> 
> http://www.jabber.org/pipermail/jabber-ietf/2002-September/000131.html

For parsers yes, for data no. When we define XML data ala the XMPP protocol,
we're really just selecting a subset of all possible XML data and calling it
XMPP. In this case, the subset of XML is that which follows the XMPP DTDs
and is used in a particular sequence. It should be perfectly valid to select
an even smaller subset and say that the document must be UTF-8, only the
prefix "stream" can be used in the initial root element, and that it must
exist within the etherx namespace. The XML data is fully compliant. A parser
that can only read XMPP's subset of XML is not a fully compliant XML parser.

However, in the case of XMPP implementations, most don't care about full XML
parser compliance. They only want to be able to read XMPP. If we can
simplify that requirement and not otherwise cripple XMPPs extensibility, I
think it is worthwhile and has significant benefits to building a community
of developers/software as well as making standardization of implementations
simpler. 

> Also, it can actually make implementation harder.  Expat gives you the
> long name for a namespace ("http://etherx.jabber.org/streams") rather
> than the short name ("stream").  If namespaces are being used in the
> usual way, this is the helpful thing to do.  However, with Jabber, you
> have to take care to reconstruct the short names in a way that is
> acceptable to the software at the other end.

Any added complexity is entirely an issue of your parser design. If we say
that the opening stream must be of the form <stream:stream
xmlns:stream='http://...' ...> then the simplest case parser just checks for
the sequence <stream:stream then looks to make sure there is an attribute
named xmlns:stream that has the correct value. In the general case, a prefix
mapping must be maintained and checked, default namespace stack must be
maintained and checked, etc. These tasks may all be done by a compliant XML
parser but aren't necessary if we define our XMPP subset correctly.



More information about the xmppwg mailing list