[Standards-JIG] XMPP bandwidth compression
thoutbeckers at splendo.com
Thu Jul 1 15:45:46 UTC 2004
On Thu, 1 Jul 2004 10:28:27 +0200, Ralph Meijer <jabber.org at ralphm.ik.nu>
> On Thu, Jul 01, 2004 at 09:46:24AM +0200, Fabrice Desré - France Telecom
>> Did you really read it ? I don't think... Being not schema aware
>> doesn't mean that you don't take advantage of the fact that you are
>> dealing only with XML docs. And the processing requirements of gzip are
>> sometimes too high for some devices.
> I did glance on it, and since gzip also uses table based lookups, but
> based on character strings, I am wondering if using Fast Infoset
> actually gives
> better compression. About the processing requirements, for servers this
> is no
> problem at all. I don't think that devices with very limited processing
> power *that can parse XML*, wouldn't be able to handle gzip for the
> of traffic that can be expected in such devices.
The current feeling on the list seems to be, "gzip compresses pretty well,
how do you know your xml specific solution does better?". I'd like to
remind you that gzip, compared to other character based compression
methods (such as bzip2, LZH) does not do well on XML at all. Even a simple
reordering of the XML document (or stanza in our case) greatly benefits
(it compresses Hamlet even.. after reading all these JEPs I'm convinced
this is a very realistic test for Jabber ;)
There are a number of relativly inexpensive techniques (referring to CPU
time here) to greatly enhance XML compression (fast infoset for example
seems to *increase* throughput of SAX parsing of XML documents), and yes;
they beat gzip on size (which will always be slower I'd think, gzip will
have to uncompress first, then do regular XML parsing).
Fast info set seems to bundle a number of those techniques, though the
spec isn't quite final yet.
Ofcourse (provided you have a gzip lib. for your platform) a gzip based
implementation is very simple. So if some framework for compression will
be made, it should be generic to allow for different methods. Perhaps
something based on MIME types. Eg.
More information about the Standards