[Standards] Deprecating XEP-0138: Stream Compression

Thijs Alkemade thijs at xnyhps.nl
Tue Oct 14 12:48:42 UTC 2014


On 9 okt. 2014, at 17:06, Peter Saint-Andre - &yet <peter at andyet.net> wrote:

> On 10/9/14, 7:59 AM, Thijs Alkemade wrote:
>> Hello all,
>> 
>> Stream compression is insecure, that was shown with CRIME and BREACH and the
>> situation for XMPP isn't much different [1]. I think we should look at the
>> easiest way to deprecate XEP-0138 and move to something better.
>> 
>> Using a "full flush" (in zlib terms) after every stanza would solve the
>> problem, as I can't find any realistic examples where an attacker could insert
>> their own payload into the same stanza as something secret they want to know.
>> However, clients and servers have no way to negotiate a mode like that, so
>> it's not possible to reject connections that won't do a per-stanza full flush.
>> Reading draft-ietf-hybi-permessage-compression-18, I was happy to see that this
>> could be negotiated in WebSocket extension [2].
>> 
>> From my own (very small scale) tests with raw XMPP XML, it appears that full
>> flushing after every stanza yields about the same compression ratio as
>> compressing each stanza separately. Doing that would have a number of
>> advantages:
>> 
>> 1. Not relying on nothing leaking through the "full flush", which may be a
>> concept that other compression algorithms than zlib don't have or don't do
>> securely enough.
>> 
>> 2. Practically no memory overhead in the server or client between messages.
>> There's no context to keep around, each new message can be decompressed with a
>> fresh new context. Memory overhead for compression is a real concern for
>> servers: one of the reasons Prosody was pushing for XEP-0138 to replace TLS
>> compression was that it's impossible configure the memory use of TLS
>> compression to sane levels in OpenSSL.
>> 
>> However, it also has downsides. It requires either:
>> 
>> 1. That the concatenation of two compressed stanzas can be separated
>> unambiguously.
> 
> Could you explain that a bit more? For example, are you talking about compressing two stanzas and sending them in the same TCP packet?

Instead of sending:

zlib(“<message/><iq/><message/><iq>...”)

(Where you’d occasionally send the compressed data you have so far.)

You'd send:

zlib(“<message/>”) + zlib(“<iq/>”) + zlib(“<message/>”) + zlib(“<iq>”)

(Where + is concatenation.)

This is easy in zlib because it’s possible to tell when a zlib stream ends [1][2].

> 
>> 2. Or that we apply framing outside of compression (which I expect to be
>> another can of worms).
> 
> Yes, I'd expect so. I recall debates about framing (or the lack thereof) for XMPP on this very list from over 10 years ago. ;-)
> 
> a> zlib has a header bit that indicates whether a block is the last block in a
>> stream, but again, that might be zlib-specific.
> 
> Would it be worthwhile to investigate what the various compression algorithms support here?

I've been trying to look into LZW, as it is described by XEP 0229, but while I
can find enough descriptions of the algorithm itself, I can't find much about
the output encoding. Most of the LZW API's I've seen also have no flush-method
or something similar.

Regards,
Thijs

[1] = http://zlib.net/manual.html:

"If the flush parameter is Z_FINISH, the remaining data is written and the
gzip stream is completed in the output. If gzwrite() is called again, a new
gzip stream will be started in the output. gzread() is able to read such
concatented gzip streams."

[2] = https://docs.python.org/2/library/zlib.html#zlib.Decompress.unused_data

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.jabber.org/pipermail/standards/attachments/20141014/4194aab6/attachment.sig>


More information about the Standards mailing list