[Standards] Fun and games with Huffman
dave at cridland.net
Tue Aug 4 15:13:59 UTC 2009
On Tue Aug 4 10:39:53 2009, Christoph Terwelp wrote:
> Am 04.08.2009 um 11:00 schrieb Dave Cridland:
>> So, I've been playing with compression algorithms, by way of fun
>> and games.
> Did you have a look at EXI ( http://www.w3.org/XML/EXI/ ), It's the
> W3Cs approach to compress XML. It may be of interest to you.
I have looked at EXI on multiple occasions, yes. One of the things I
didn't like was that it's schema-informed, which doesn't work for
XMPP, where there are multiple schemas, often unknown at the session
commencement. Absent a schema, it essentially degenerates into
DEFLATE, and so it performs - from the figures I've seen - only
marginally better than DEFLATE does by itself.
DEFLATE is pretty memory-heavy - it's using around 250-300k per
session, and that's really a limiting factor for XMPP servers as far
as I can see.
So what I'm doing here is trying to find potential algorithms that
are sufficiently lightweight in terms of memory, but give at least
ballpark performance with the existing XEP-0138 and TLS based DEFLATE.
As it happens, my current plans aren't working out well - on a real
XMPP session, I'm seeing figures of:
Raw: 435379 100%
Huffman String: 238509 54.8%
Huffman XML: 214689 49.3%
Huffman XML SymDef: 178203 40.9%
Zlib String: 88366 20.3%
This is based on reading in my session's telemetry log, which
includes sufficient information that I can replicate flushes, etc.
I'll try increasing the order, and play some fun and games with
preset trees. And possibly switch to range compression - arithmetic
without the patents. (Increasing the order will, in principle,
increase the memory, but if I only create trees on demand, I may
avoid most of them).
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards