[Standards] XEP-0115 redux
Dave Cridland
dave at cridland.net
Thu Jan 10 04:31:19 CST 2008
FWIW, some input, and my opinion.
On Wed Jan 9 23:14:42 2008, Peter Saint-Andre wrote:
> ISSUE #1: Do we need a new namespace?
>
> Description: We have changed things around so radically since
> version 1.3 [2] of the spec that maybe we need a new namespace (as
> we did for the Entity Time protocol).
>
> Discussion: Yes we could do this, but then we'd have two separate
> entity capabilities notations in every presence notification that
> every user sent over the network, thus violating one of the
> requirements of XEP-0115 ("minimize network impact"). Therefore we
> have bent over backwards to not define a new namespace. The result
> is not the prettiest protocol in the world, but it doesn't break
> anything.
>
> My conclusion: I am opposed to defining a new namespace.
>
>
Equally, with the current design - and I agree it's ugly, and may
offend purists - we have the neat trick that it degrades gracefully
to disco in three key cases:
1) If the sender doesn't understand hashes, and therefore doesn't use
them.
2) If the receiver doesn't understand hashes, and therefore ignores
them.
3) If the sender uses a hash that the receiver doesn't understand,
even though the receiver *does* understand hashes in general.
That latter is key to our "hash agility" story, incidentally, as it
allows graceful fallback in the case where we're forced into using
hash agility.
> ISSUE #2: Should the 'v' attribute be REQUIRED?
>
> Description: The 'ver' attribute was REQUIRED in version 1.3 [2] of
> the spec. In a late change made to version 1.4 [3] of the spec
> during the Council meeting at which version 1.4 was approved, we
> suggested that the value of the 'node' should be
> "ProductURL#ProductVersion" (e.g., "http://psi-im.org/#0.11") but
> we agreed that this would *not* be REQUIRED or even officially
> RECOMMENDED. In the proposed version 1.5 [4] of the spec, we added
> a new attribute 'v' to encapsulate the software version, but it is
> only RECOMMENDED, *not* REQUIRED.
>
> Discussion: Some people on the list objected strenuously to the
> late change made to version 1.4 [3] which suggested that the 'node'
> attribute should encapsulate the ProductVersion. Therefore the list
> consensus was that the 'node' attribute should be the ProductURL
> not including the ProductVersion, and that we would define a new
> attribute 'v' to communicate the ProductVersion; however, the list
> consensus was that this attribute would *not* be REQUIRED but
> instead only RECOMMENDED (some people argued for making it OPTIONAL
> or removing it altogether, but we settled on RECOMMENDED).
>
> My conclusion: Leave version 1.5 [4] as it is now, with 'v'
> RECOMMENDED but *not* REQUIRED. (In fact I would not object to
> making it OPTIONAL, but RECOMMENDED seems closest to the prior list
> consensus.)
>
>
Conflicting arguments here. As a not-really-client developer (I do
have a client, but even I don't use it), I hold no strong opinion.
1) The old spec did have a version, held in ver, so the new version
is to this extent a regression.
2) Exposing your client software version is a potential security
issue.
If I had to state an opinion, I'd say that if you wanted to hide your
software version in "Classic" XEP-0115, it was pretty easy to
obfuscate the ver attribute, whereas making v optional (whether
OPTIONAL or RECOMMENDED) does at least make this choice explicit.
> ISSUE #3: Which hashing algorithms?
>
> Description: The Council discussion seemed to assume that version
> 1.5 [4] says SHA-1 is mandatory-to-implement ("MTI"). In fact,
> version 1.5 does not mandate implementation of any specific
> algorithm. Be that as it may, some Council members suggested that
> we recommend MD5 instead of SHA-1 (the only concrete reason I heard
> in the meeting is that MD5 output is smaller).
>
>
(Kind of. One issue is that MD5 might actually be more secure.)
> Discussion: As far as I can see, we had consensus not to mandate
> any particular hashing algorithm, but instead to allow any
> algorithm that is registered with the IANA [5]. Currently the
> registered algorithms are md2, md5, sha-1, sha-224, sha-256,
> sha-384, and sha-512. However, we seemed to have list consensus
> that most people would use SHA-1 at the beginning (SHA-1 is the
> default value of the 'hash' algorithm in the currently-approved
> version 1.4 [3] of the spec), and perhaps switch to SHA-256 in the
> future if it is shown that pre-image attacks (see RFC 4270) are
> likely against SHA-1. That said, people *could* implement MD5 if
> they want to because it is registered with the IANA.
>
>
Note that RFC4270 was a fairly extensive survey by an experienced
IETF security chap - Paul Hoffman runs the VPN Consortium - and Bruce
Schneier's name ought to be familiar to people interested in crypto
and security.
Note also that whilst it describes some progress made in preimage
weaknesses in SHA-1, none are mentioned for either SHA-2 (That's
SHA-256, SHA-512, etc), or MD5. MD5 has had a lot of cryptanalysis -
you'll note that more researchers are producing papers on it than any
other hash algorithm, and this isn't entirely down to relative
strength compared to SHA-* - it's more down to the fact that MD5 has
considerably larger deployment, and so is a more attractive hash to
analyse.
The fact that after this length of time, nobody appears to have found
a preimage attack on it is pretty gratifying. MD5 *is* demonstrably
weak in two areas:
1) Challenge-Response password hashing, for example in CRAM-MD5. Not
because of a mathematical weakness, but because you can brute force
things too fast, across the entire, fairly limited, space of a
password. This doesn't affect us for the twin reasons that:
a) Our space is much bigger.
b) The space we have is quite rigid in format.
2) Collisions, and from there signature algorithms. This is where you
come up with two inputs that produce an identical output. This is
useful if:
a) You get to choose both inputs. (Our poisoner cannot).
b) There is scope for adding random junk somewhere. (Likewise).
In theory, you can do a collision without random junk, but it would
take considerably longer. Also important to note is that this has no
impact on whether we're more likely to find inadvertant collisions
with MD5. In theory, the shorter hash length will have an impact,
simply by the birthday "paradox", but it's still pretty rare.
But it's not weak in preimage attacks - those where the attacker
knows the hash, and/or the input, and wishes to construct an
alternate input of their choosing which matches.
In order to perform caps poisoning with MD5, therefore, the attacker
must:
i) Subvert the development process of the client.
ii) Optionally, to cover his tracks, subvert the XSF, thus allowing
the attacker to have some control over what counts as legitimate
input, thus reducing, to a degree, the random junk problem.
You'll note that Kevin Smith is in a position to do both, but no
other person or entity is, throughout the entire world.
And anyone in either position is capable of inflicting a
significantly higher damage by choosing to do some easier attack - if
the developer of your client turns out to be an Evil Genius, you're
henceforth Doomed. Similarly, if Council members wish to subtley
undermine your security, we are in a position to do that.
> 3a. Do we specify an MTI algorithm or let the market decide?
>
>
I think we need an MTI, I have to admit I'd read the current text as
essentially stating that SHA-1 was the MTI.
> 3b. If we specify an MTI algorithm, do we specify MD5 or SHA-1 or
> something else?
What concerns me is not that SHA-1 is a particularly poor choice, but
that we may have reached that choice by applying faulty logic. SHA-1
does appear to have *some* weakness in preimage. I don't know if,
given the similarity between SHA-1 and SHA-2, this also applies
there, but I cannot find any mention of preimage weakness in MD5.
I'll drop my objection if people want - in fact, I'll drop it if the
other two issues are resolved, but I would like people to take the
opportunity to satisfy themselves that they've made the right choice
in the face of the evidence.
So, some reading:
1) RFC4270 is an excellent backgrounder on the different attacks on
hashes, and how these affect real-world protocols.
2) Wikipædia is helpful, too:
http://en.wikipedia.org/wiki/Birthday_attack demonstrates that we
need around 2.2 x 10^19 possible inputs for MD5 before an inadvertant
collision is more likely than 50%, assuming that these are randomly
spread. (They aren't, so this is in effect a worst case).
http://en.wikipedia.org/wiki/Preimage_attack,
http://en.wikipedia.org/wiki/Cryptographic_hash_function, both give
detailed background.
Finally, of course, feel free to bug me by XMPP or email. :-)
Dave.
--
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards
mailing list