[Standards-JIG] XMPP bandwidth compression

Tijl Houtbeckers thoutbeckers at splendo.com
Thu Jul 1 15:45:46 UTC 2004

On Thu, 1 Jul 2004 10:28:27 +0200, Ralph Meijer <jabber.org at ralphm.ik.nu>  

> On Thu, Jul 01, 2004 at 09:46:24AM +0200, Fabrice Desré - France Telecom
>>  Did you really read it ? I don't think... Being not schema aware
>> doesn't mean that you don't take advantage of the fact that you are
>> dealing only with XML docs. And the processing requirements of gzip are
>> sometimes too high for some devices.
> I did glance on it, and since gzip also uses table based lookups, but  
> then
> based on character strings, I am wondering if using Fast Infoset  
> actually gives
> better compression. About the processing requirements, for servers this  
> is no
> problem at all. I don't think that devices with very limited processing
> power *that can parse XML*, wouldn't be able to handle gzip for the  
> amount
> of traffic that can be expected in such devices.

The current feeling on the list seems to be, "gzip compresses pretty well,  
how do you know your xml specific solution does better?". I'd like to  
remind you that gzip, compared to other character based compression  
methods (such as bzip2, LZH) does not do well on XML at all. Even a simple  
reordering of the XML document (or stanza in our case) greatly benefits  

for example:
(it compresses Hamlet even.. after reading all these JEPs I'm convinced  
this is a very realistic test for Jabber ;)

There are a number of relativly inexpensive techniques (referring to CPU  
time here) to greatly enhance XML compression (fast infoset for example  
seems to *increase* throughput of SAX parsing of XML documents), and yes;  
they beat gzip on size (which will always be slower I'd think, gzip will  
have to uncompress first, then do regular XML parsing).

Fast info set seems to bundle a number of those techniques, though the  
spec isn't quite final yet.

Ofcourse (provided you have a gzip lib. for your platform) a gzip based  
implementation is very simple. So if some framework for compression will  
be made, it should be generic to allow for different methods. Perhaps  
something based on MIME types. Eg.  

More information about the Standards mailing list