[standards-jig] No Subject....

Iain Shigeoka iainshigeoka at yahoo.com
Fri Feb 1 21:59:25 UTC 2002


On 2/1/02 1:10 PM, "David Waite" <mass at akuma.org> wrote:

> Iain Shigeoka wrote:
> 
>> On 1/30/02 3:21 PM, "mlin at mlin.net" <mlin at mlin.net> wrote:
>> 
>>> ML: The purpose of byte length framing is not to entirely remove all XML
>>> parsing. Well-formedness checking is easier than building a DOM.
>>> Furthermore,
>>> byte length framing allows well-formedness checking to be done separately
>>> from
>>> I/O, in another thread or on another CPU.  Thus, I'll agree that
>>> well-formedness checking is necessary on all nodes, which is not stated in
>>> the
>>> JEP; however, there are still performance benefits to be gained, and there
>>> are
>>> further code simplicity advantages...
>>> 
>> 
>> I disagree.  If we can isolate errors to frames, the behavior of error
>> response (esp to malformed XML) can be changed dramatically.  Errors within
>> a frame don't need to invalidate the session as they do now.  I also don't
>> like the implication that the server will still need to parse the XML in a
>> frame.  I don't think this is necessarily true if the framing is designed
>> properly.
>> 
> However, if you change the behavior of malformed data you then are no
> longer compliant with XML; the spec states that parsing should not
> continue on any fatal error (except for guessing further errors within a
> document). I suppose it would be possible for hops to indicate whether
> they have a custom XML parser, and have the sending entity parse and
> verify all the XML before routing on to these nodes. If a packet is
> invalid, it does not forward that chunk on.

Doh.  Yeah.  I should have been clearer.  Within a frame, we must assume a
complete XML document.  So any malformed data within the framed document
will cause the whole document to be invalid... However in this case, just
the frame not the entire session.  The current Jabber scheme embeds the
entire session within a <stream> so any error anywhere should cause a bail
out.  In fact, doesn't that mean that from a strictness standpoint, if we do
a bailout in the middle of a jabber session we should "undo" the effects of
the session... :)

> I suppose what I'm really troubled by is that this resembles just extra
> CDATA in the XML, but support indicates that you can no longer use a
> compliant XML parser - you have to use a parser that understands the
> framing and will ignore a section if it does not pass additional tests.
> I suppose if it was more explicit that this was framing to be considered
> external to the data stream, it would make more sense to me. You just
> can't really have an 'optional' framing mechanism, because older clients
> and servers would not know the framing rules and how to recognize and
> handle invalid data.

Agreed.  I think any framing outside of natural XML will probably break
backward compatibility without a lot of really ugly hacks.  That being the
case, I think to talk about framing is to talk about some non-compatible
changes...

>>> DW: So, at some point later in the route, this non-well-formed data may hit
>>> a
>>> node which does not support framing, and instead uses a normal XML parser.
>>> This parser will stop parsing on the error. So, I could frame invalid
>>> packets
>>> and send them over the wire to disrupt any clients or components using a
>>> compliant XML parser. Because of this, client and server connections cannot
>>> be trusted to provide well-formed XML, and the XML must be fully parsed
>>> whether
>>> or not framing information is present.
>>> 
>>> ML: ...the critical difference is that it is much easier to fully parse the
>>> packet with framing data available.
>> 
>> Agreed.  It especially makes XML data binding practical which can really
>> accelerate things.
>> 
> I don't quite understand :-) Could you ellaborate?

XML binding is sort of de-generalizing XML.  :)  If you can assume certain
XML structures and DTD's you can really optimize parsing tasks, etc etc.  In
addition, a lot of tools can make dealing with XML a lot easier.  For
example, in Java, there is a standard being developed for Java XML binding.
You essentially use the binding compiler on any DTD or schema, and it
auto-generates a Java class with built in, optimized parser (input), XML
generator (output), and convenient access methods for getting at attributes,
child elements, etc.

The source for "programming" is really the DTDs/schemas.  Change the DTD,
and automatically generate updated Java classes.  You can then apply naming
conventions for very simple access to the data within the class.  And since
you know what's coming in before hand, the parser can hold static strings
for all element names, attributes and known attribute values, and  make
things much more efficient.  All in all, it _extremely_ simplifies working
with XML and makes it more efficient at the same time.  However, it's very
DOM like so you have to have the whole document before binding can occur.
Without framing, you're forced to pre-parse the Jabber XML to break out the
sub-elements negating a lot of the speed and ease of use benefits of
binding...  The feeling is, if you're already going to throw a modified SAX
parser in the mix, why bother with binding...

Does that make sense?  I know there are workarounds but we have been
discussing arguments for avoiding workarounds so I threw in the data binding
argument as an appeal for lazy programmers.

-iain


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




More information about the Standards mailing list