[Standards] Fun and games with Huffman

Dave Cridland dave at cridland.net
Tue Aug 4 15:13:59 UTC 2009


On Tue Aug  4 10:39:53 2009, Christoph Terwelp wrote:
> Hi,
> 
> Am 04.08.2009 um 11:00 schrieb Dave Cridland:
> 
>> So, I've been playing with compression algorithms, by way of fun  
>> and  games.
> 
> Did you have a look at EXI ( http://www.w3.org/XML/EXI/ ), It's the  
>  W3Cs approach to compress XML. It may be of interest to you.

I have looked at EXI on multiple occasions, yes. One of the things I  
didn't like was that it's schema-informed, which doesn't work for  
XMPP, where there are multiple schemas, often unknown at the session  
commencement. Absent a schema, it essentially degenerates into  
DEFLATE, and so it performs - from the figures I've seen - only  
marginally better than DEFLATE does by itself.

DEFLATE is pretty memory-heavy - it's using around 250-300k per  
session, and that's really a limiting factor for XMPP servers as far  
as I can see.

So what I'm doing here is trying to find potential algorithms that  
are sufficiently lightweight in terms of memory, but give at least  
ballpark performance with the existing XEP-0138 and TLS based DEFLATE.

As it happens, my current plans aren't working out well - on a real  
XMPP session, I'm seeing figures of:

Raw: 435379 100%
Huffman String: 238509 54.8%
Huffman XML: 214689 49.3%
Huffman XML SymDef: 178203 40.9%
Zlib String: 88366 20.3%

This is based on reading in my session's telemetry log, which  
includes sufficient information that I can replicate flushes, etc.

I'll try increasing the order, and play some fun and games with  
preset trees. And possibly switch to range compression - arithmetic  
without the patents. (Increasing the order will, in principle,  
increase the memory, but if I only create trees on demand, I may  
avoid most of them).

Dave.
-- 
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade



More information about the Standards mailing list