[JDEV] GZipping Jabber Messages
Michael F. March
march at indirect.com
Sat Jan 5 23:29:12 CST 2002
Doing compression with SSH I am getting about 70% compression
outbound and 80% compression inbound..
I have not investigated how OpenSSH implements compression on
the TCP stream though so I am not sure how great of gauge this
> Update. I am finding that you can get better compression ratios, up to
> around 57%, by maintaining the LZ dictionary between packets. Also this
> reduces the processor hit asymptotically (but still quite nonzero) with
> more packets sent along.
> This technique raises still other problems, though, most notably
> reliability. For this to work the gzip deflater on one end and the
> on the other end must remain exactly in sync for the duration of the
> connection (hours, days, ...). An error in the compressed stream would be
> magnified many times over in the inflated stream. So for reliability you
> had better hash or at least checksum all the data going across. That means
> you have to have an envelope format.
> So for bandwidth and processor usage, this does a lot better than I
> expected compared to my original run, but now we are just a few steps away
> (credential verification, key exchange, and stream encryption) from
> re-doing SSL.
> ----- Forwarded by Michael F Lin/Cambridge/IBM on 01/05/2002 11:38
> Michael F Lin
> To: jdev at jabber.org
> 01/04/2002 09:26 cc:
> PM From: Michael F
Lin/Cambridge/IBM at IBMUS
> Subject: Re: [JDEV]
GZipping Jabber Messages(Document link: Michael Lin)
> Hi Adam, I looked over some of the DotGNU mailing list archives at the
> discussion you are referring to.
> One person from DotGNU says
> At the end of the day, it is easier to just gzip it and forget about
> the problem. No data loss, and roughly the same level of
> compaction. Highly redundant data like XML compresses
> very well. For example, the 6 Mb All.xml file for the C#
> library specification compresses to ~630k using gzip: about
> 10% of the original size.
> I believe this is misleading in the context of realtime XML streams (e.g.
> Jabber; SOAP; presumably, whatever DotGNU will use) because you are not
> compressing 6Mb of data at once. Rather you are compressing small packets,
> a few hundred bytes in length in the case of Jabber, and then transmitting
> them individually. I ran some tests to see how gzip performs under these
> I wrote a program which generates random Jabber <message/> packets. The
> body of each message is formed by randomly selecting between 1 and 25
> from a 10,000-word English language dictionary file. For each test vector,
> the program runs zlib compress, level 9, on it (equivalent [I think] to
> gzip with maximum compression), then records the compressed size and the
> original size. It repeats this until at least 1 million bytes of
> uncompressed data has been processed.
> The results from about a dozen runs of this program are very consistent: a
> compression ratio of 17% in 7 seconds of runtime. A typical result is
> 1,000,011 total bytes of raw data; 830,654 bytes of compressed data.
> If I comment the code to compress the test vectors, and leave the code to
> generate the test vectors, the program runs in less than 1 second.
> [This was run on]
> athena% uname -a
> SunOS department-of-alchemy.mit.edu 5.8 Generic_108528-08 sun4u sparc
> Obviously these are preliminary and nonscientific results only, and there
> are other factors to consider with Jabber, such as the likelihood
> previously mentioned that the XML processing is going to be the limiting
> factor in processor time. I find the topic quite interesting, however, so
> am going to fiddle around with it over the next few days and see if I can
> get it to do better with custom deflate dictionaries and such. Hopefully I
> will even find time to write something on the topic and post it with my
> source code. However, based on these initial results I am very wary of
> gzipping instant messaging XML because of the apparent high processing
> and mediocre compression ratio. I will continue to test but my hypothesis
> is that gzip or any generic compression algorithm is going to be very
> mediocre for Jabber as instant messaging.
> Adam Theo
> <adamtheo at theoret To: jdev
<jdev at jabber.org>
> ic.com> cc:
> Sent by: Subject: [JDEV] GZipping
> jdev-admin at jabber
> 01/04/2002 03:32
> Please respond to
> Hi, all. There's a good discussion going on over at the DotGNU Developer
> list about gzip'ing the XML that is transmitted around on the DotGNU
> Was wondering if it would be possible to incorporate the same thing for
> future versions of the Jabber server? Is it feasible, anyway? They are
> saying the trade-offs for extra resource consumption would not be bad at
> all if designed into the server properly, and would reduce bandwidth
> very dramatically (like by 80%, i think). This would be useful for
> high-volume servers with enough processing power, i think...
> /\ -- Adam Theo, Age 22, Tallahassee FL USA --
> //\\ Theoretic Solutions (http://www.theoretic.com)
> /____\ "Software, Internet Services and Advocacy"
> /--||--\ Personal Website (http://www.theoretic.com/adamtheo)
> || Jabber Open IM (http://www.jabber.org)
> || Email & Jabber: adamtheo at theoretic.com
> || AIM: AdamTheo2000 ICQ: 3617306 Y!: AdamTheo2
> "A free-market socialist computer geek patriotic American buddhist."
> jdev mailing list
> jdev at jabber.org
> jdev mailing list
> jdev at jabber.org
More information about the JDev