[Standards] Jingle / e2e security (1)

Peter Saint-Andre stpeter at stpeter.im
Tue Jan 13 23:16:23 UTC 2009

First, I am not a member of the security mafia (IANAMOTSM?), so question
everything I say here.

Second, please send follow-ups to the security at xmpp.org list.

I've been researching Jingle (and, more generally, end-to-end) security.
The landscape is a bit confused, so I'm attempting to clarify things (at
least in my own mind). As far as I can see, we are interested in several

1. Most pressingly, proper negotiation of a secure data transport for
voice and video (or, more generally, any RTP traffic per XEP-0167).

2. A bit less pressingly, proper negotiation of a secure data transport
for file transfer, where the transport method could be In-Band
Bytestreams ("IBB"; XEP-0047), SOCKS5 Bytestreams (XEP-0065), etc.

3. As a generalization of #2 and #3, proper negotiation of transport
method security no matter which streaming or datagram transport is used.

4. Use of Jingle to negotiate end-to-end encryption of XMPP traffic
(a.k.a. "XTLS"), where the transport might be IBB or some other
streaming transport (this *might* simply be a special case of #2).

This email focuses mainly on Goal #1 because that's what I've researched
so far. By research I mean a reading of the following specs:

http://tools.ietf.org/html/rfc3711 (SRTP)
http://tools.ietf.org/html/rfc4347 (DTLS)

The following slide deck is also helpful (pretty pictures!):


For Goal #1, the IETF has settled on SRTP (RFC 3711) because it is
optimized for media traffic. (Another alternative would have been RTP
over DTLS, but it is not optimized in that way.) However, SRTP does not
solve the problem of communicating the keying material that will be used
in the transport channel. There are several major proposals for doing that:

- SDP Security Descriptions <http://tools.ietf.org/html/rfc4568> (this
defines the a=crypto SDP line, which is currently re-used in XEP-0167)

- ZRTP <http://tools.ietf.org/html/draft-zimmermann-avt-zrtp>

- DTLS-SRTP <http://tools.ietf.org/html/draft-ietf-avt-dtls-srtp> and
<http://tools.ietf.org/html/draft-ietf-sip-dtls-srtp-framework> (these
define the a=fingerprint SDP line and a method for using it by setting
up a DTLS association over the host/port quartet and then pulling the
SRTP keying material out of that DTLS association)

The "Requirements and Analysis of Media Security Management Protocols"
provides an overview of these and other approaches.

According to my reading of RFC 4568, SDP Security Descriptions MUST NOT
be used unless the signalling channel (that's XMPP for us) can "provide
strong message authentication and packet-payload encryption, as well as
effective replay protection". Because we don't provide those services in
XMPP out of the box, I don't think we can securely use a=crypto (or our
XMLish flavor of a=crypto as currently described in XEP-0167). But we
might be able to use it if we negotiate XTLS (or some other e2e method)

That leaves ZRTP or DTLS-SRTP. ZRTP is completely independent of the
signalling channel (or can be, see Section 8 of the ZRTP spec), so we
don't need to define anything in Jingle to support it. However, we could
provide some hints in the Jingle signalling.

For DTLS, we'd need to define an XMPP-friendly mapping of the SDP
a=fingerprint line and the various SDP parameters discussed in
http://tools.ietf.org/html/draft-ietf-sip-dtls-srtp-framework and
http://tools.ietf.org/html/draft-ietf-avt-dtls-srtp -- but this seems
fairly straightforward.

I have not yet sketched out any of the Jingle (or more general XMPP)
protocol bits to make this happen, but I figured I would share the
fruits of my research so far.

Please do correct me where I'm wrong.

More soon.


More information about the Standards mailing list