[Security] Jingle / e2e security (2)
stpeter at stpeter.im
Wed Jan 14 17:23:26 CST 2009
The first installment in this series was about VoIP security. Now I turn
my attention to e2e XMPP security. The usual caveats apply (IANAMOTSM).
The approach here is that you start with an insecure channel and you
upgrade it to secure.
E.g., we do that today for both client-to-server ("c2s") and
server-to-server ("s2s") connections, where the insecure channel is an
XML stream over TCP and the stream is secured using STARTTLS -- see
For direct client-to-client ("c2c") communication where two entities
communicate over a local or wide-area network with no server
infrastructure in place (Serverless Messaging =
<http://xmpp.org/extensions/xep-0174.html>), the insecure channel is an
XML stream over TCP, and the stream can be secured using STARTTLS just
as for c2s and s2s.
For end-to-end communication where two entities communicate over XMPP
through one or two intermediate servers, the insecure channel is XMPP
itself (typically in the form of In-Band Bytestreams =
<http://xmpp.org/extensions/xep-0047.html>) or potentially some
out-of-band streaming transport (such as SOCKS5 Bytestreams =
<http://xmpp.org/extensions/xep-0065.html> or someday ICE-TCP), and here
again the stream can be secured using STARTTLS.
So we have 4 cases: c2s, s2s, c2c, and e2e. In all of them, we start
with an insecure channel and upgrade it to secure using STARTTLS.
For c2s, s2s, and c2c we don't need Jingle because we use the TCP
binding defined in RFC 3920 -- you open a direct TCP connection to an
IP+port and start the stream over that TCP connection.
For e2e, we need a way to start the stream over XMPP itself. The method
we are proposing is to use Jingle to negotiate the transport and other
parameters as described in <http://xmpp.org/extensions/xep-0247.html>.
For both c2c and e2e, the initiator and responder need a way to provide
some hints about TLS methods and fingerprints before proceeding with
STARTTLS. For this we have <http://xmpp.org/extensions/xep-0250.html>
(which I think needs to be cleaned up a bit but that's mostly just
syntax), where the hints are provided either in the stream features
(c2c) or in the Jingle session-initiate and session-accept (e2e).
In the e2e case, the flow would be as follows (see also
1. Initiator sends Jingle session-initiate with offer, including hints
about TLS methods and fingerprints
2. Initiator and responder agree on transport and negotiate IBB or
SOCKS5 (or future ICE-TCP) connection
3. Parties start XML stream over negotiated transport (e.g.,
encapsulated in IBB packets)
4. Parties upgrade stream using STARTTLS
5. If STARTTLS succeeds, the e2e stream is now secured
6. Responder sends Jingle session-accept to initiator
At least that is the general idea. I'll post more about particulars in
the next installment.
More information about the Security