[standards-jig] No Subject....

Iain Shigeoka iainshigeoka at yahoo.com
Fri Feb 1 19:28:54 UTC 2002


On 1/30/02 3:21 PM, "mlin at mlin.net" <mlin at mlin.net> wrote:

> ML: The purpose of byte length framing is not to entirely remove all XML
> parsing. Well-formedness checking is easier than building a DOM. Furthermore,
> byte length framing allows well-formedness checking to be done separately from
> I/O, in another thread or on another CPU.  Thus, I'll agree that
> well-formedness checking is necessary on all nodes, which is not stated in the
> JEP; however, there are still performance benefits to be gained, and there are
> further code simplicity advantages...

I disagree.  If we can isolate errors to frames, the behavior of error
response (esp to malformed XML) can be changed dramatically.  Errors within
a frame don't need to invalidate the session as they do now.  I also don't
like the implication that the server will still need to parse the XML in a
frame.  I don't think this is necessarily true if the framing is designed
properly.

> DW: So, at some point later in the route, this non-well-formed data may hit a
> node which does not support framing, and instead uses a normal XML parser.
> This parser will stop parsing on the error. So, I could frame invalid packets
> and send them over the wire to disrupt any clients or components using a
> compliant XML parser. Because of this, client and server connections cannot
> be trusted to provide well-formed XML, and the XML must be fully parsed
> whether
> or not framing information is present.
> 
> ML: ...the critical difference is that it is much easier to fully parse the
> packet with framing data available.

Agreed.  It especially makes XML data binding practical which can really
accelerate things.

> Most XML parsing suites are not good at handling partially-received
> information. Any SAX implementation I have used, where they support partially
> received data at all, is based on a "pull" model, where the parser reads data
> from a stream. This means that you have to block if data hasn't come in over
> the network yet, which in actual Jabber implementations has resulted in all
> sorts of nasty hacks to properly process the stream. Most of these really just
> come down to checking for well-formedness before sending data to the parser.

;)  Yup.

> With framing information available you can buffer the element in a
> statically-sized buffer and then parse (or check well-formedness) in one go.
> In actual implementation, this "push" model is much easier and more efficient;
> it saves memory reallocation and copying, and is not dependent on how
> thoughtful the authors of your XML parsing suite were. You furthermore get at
> least the possibility of knowing the size of packets ahead of time and
> rejecting them if they are too large.

Precisely. 

> DW: The solution I see for this are to make one document inputted to the
> system
> actually represent a document received by another endpoint in the system.
> The two ways I can think of doing (as I said previously) are:
> - to send self-contained documents between points, rather than first-level
> child elements of the root. You would then frame each document.
> - to send a document between endpoints, framed by something like BEEP.
> 
> ML: JEP-0017 is definitely a compromise solution. The key motivations besides
> efficiency are that it is backwards compatible and easy to implement. This is
> not the case with sending self-contained documents, or with switching
> everything to BEEP. Either of these I agree would be a cleaner and probably
> better solution, but they either make a lot of people rewrite a lot of code,
> or require servers to be able to speak more protocols. I don't think either is
> desirable in the near term when there is a simpler (if hackish) solution
> available.

I agree.  I guess the question is, "do we accept the pain of migrating out
or stick with the same old stuff despite the warts?"  Typically practical
matters vote for the former.  If a rewrite is compelling enough, we might be
best off starting a new project and competing and see who "wins"...  History
usually favors the ugly but existing technology.  I'm extremely curious to
see Jabber.com people's opinion as they should be the most invested in
pursuing something compatible with current Jabber...

-iain


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com




More information about the Standards mailing list