[xmppwg] Review of draft-meyer-xmpp-e2e-encryption-01
ekr at rtfm.com
Fri Mar 20 11:30:51 CDT 2009
$Id: draft-meyer-xmpp-e2e-encryption-01-rev.txt,v 1.1 2009/03/20
15:35:56 ekr Exp $
The context of this draft is that currently messages in XMPP from
alice at atlanta.com to bob at biloxi.net go through Alice and Bob's
respective servers (atlanta.com and biloxi.net) in transit. This
implies that Alice and Bob need to trust their servers both to enforce
appropriate security policies (i.e., to make sure there is TLS along
the whole path if appropriate) and not to actively subvert security as
by message sniffing, injection, etc. The purpose of this document is
to allow Alice and Bob to establish an end-to-end secure cryptographic
channel that does not rely on the server for security.
Before talking about the draft details, it's important to get clear on
the threat model. In particular, we need to be clear on how much the
server are trusted. There are at least three plausible models:
- The server is trusted completely (the current system).
- The server is trusted to authenticate Alice and Bob,
but should not see the traffic.
- Server not trusted at all.
Clearly, we're trying to do better than the first of these, so it's
between the second two. For contrast, in SIP (cf. RFC 4474) the basic
assumption is that that proxy (the server) owns the namespace
associated with it. So, for instance, if atlanta.com decides it wants
to take the name "alice at atlanta.com" away from Alice and give it to
her sister "Alice", it can. So, the proxy is trusted to authenticate
Alice, but shouldn't see the traffic, i.e., the second model.
The security requirements for these two are different. In particular,
in the second case, you need some independent mechanism for Alice and
Bob to authenticate each other.
I think it's important to be clear on which of these environments
you think is the dominant one. I'm sure there are *some* cases
where people don't trust the servers at all, but I suspect in most cases
they just want (1) not to have to trust the server to enforce security
policy and (2) deter casual sniffing by server operators. In these
cases, a model where the server authenticates the users for an
E2E connection (a la DTLS-SRTP) is appropriate. If that's a common
model, then forcing all users to use a secure independent channel
just because some want to is going to be a very serious inconvenience.
My instinct is that that's a mistake.
The design of a system in which the servers vouch for the users identities
is fairly straightforward, with DTLS-SRTP as a model: the servers simply
authenticate the users and then pass on digests of the user's certificates
(as provided by the user) along with an authenticated indication of the
user's identity (a la RFC 4474 or even the current TLS model)
and the end-to-end connection is compared to these fingerprints.
As noted above, the design of a system in which the servers aren't trusted
is significantly more complicated. Roughly speaking, thre are three
major techniques available here: key/certificate fingerprints, a
shared password, and a short authentication string. See
some background here.
I think it's generally agreed that fingerprints are too much of a hassle
for regular use, though if your model was that most users would be
happy without it, then you might think that they would be OK for
the exceptionally paranoid.
This leaves us with SAS and shared passwords. The important interface
differences here are as follows:
- The SAS must be verified *after* the connection is set up. The password
must be set up before hand.
- You can use the same password with multiple people semi-safely.
The SAS is new for every message.
- SAS probably requires modifying TLS. There are existing mechanisms
- The SAS is "optional" in the sense that you can generate it and not
check it. The password is "mandatory" in the sense that if it's
specified, it must be supplied or the connection will not be set up.
Passwords can be further broken down into two variants: ZKPP/PAKE
schemes and ordinary PSK schemes. The relevant differences between
these two are that PSK schemes are susceptible to offline dictionary
attack but that ZKPP/PAKE schemes have a much more problematic IPR
Finally, there is the question of where the authentication is done.
As I noted above, TLS has existing PSK and SRP mechanisms. However,
one could also add at least password and PAKE mechanisms to
SASL if one wanted and use a channel binding to connect the two.
In this specification we primarily address communications security
("commsec") between two parties, especially confidentiality, data
It's probably a bad idea to use the term "session" here because it has
a technical meaning in TLS.
o A more sophisticated active attack would involve a cryptanalytic
attack on the keying material or other credentials used to
establish trust between the parties, such as an ephemeral password
exchanged during an initial certificate exchange if Secure Remote
Password [TLS-SRP] is used.
The whole point of TLS-SRTP is that this kind of attack is not practical.
o Perfect forward secrecy. The content of an encrypted
communication should not be revealed even if long-lived keys are
compromised in the future (e.g., if one of the parties loses their
device). For long-lived sessions it should be possible to
periodically change the decryption keys.
So, this is actually two different features. And it's not clear to
me that either of them is a real requirement. There are monitoring
applications where you don't want PFS, for instance.
4. When an entity wishes to encrypt its communications with a second
entity, it sends a Jingle session-initiate request that specifies
the desired application type, a possible transport, the sender's
X.509 fingerprint, and optionally hints about the sender's
supported TLS methods.
So, I think the fingerprint here represents a bit of confusion vis-a-vis
my previous notes about the threat model. It principally makes sense
if you trust the servers... So, it's important to be clear on what
we're trying to achieve.
I'm pretty skeptical of this stuff where we signal what cipher suites
we're willing to use in the XMPP. The TLS negotiation design is
reasonably complex and trying to replicate it (or worse, summarize it)
here seems problematic.
It's important to remember that TLS has a maximum record size of
2^14 or so. If an XMPP stanza might be larger than this, we'll need to
let them span records.
It seems like an open question whether one should send a TLS Finished
to close things down.
TLS-SRP is Experimental, so that may produce a 2026 dependency problem
SCRAM is a pure PSK method, so very inferior to SRP cryptographically.
As I said above, you could do a PAKE method in SASL, but SCRAM is not
More information about the xmppwg