[Jingle] Future Jingle ideas

Paul Witty paul.witty at silverflare.com
Wed Jul 24 10:28:04 UTC 2013


Fitting in with the discussion on next-generation Jingle, I'm currently 
working for a company (www.acano.com) implementing a system which I 
think could be one of the potential future directions for Jingle, or at 
least could be built on top of the future Jingle. Rather than the 
entirely peer-to-peer, client only nature of Jingle, it uses a central 
media server (implemented as an XEP-0114 component), and makes heavy use 
of PEP and pubsub, to enable all kinds of cool features not available 
within the current Jingle specs.

Much of our signalling is based on the Jingle XEPs, particularly where 
it comes to exchange of session descriptions, however, we don't use the 
Jingle call model. Instead there's a client-server protocol, rather than 
peer-to-peer. Within this we use the offer answer semantics, and, with 
strong relevance to the current discussions regarding the interaction 
between Jingle and WebRTC, we support a WebRTC based client 
implementation, so the protocol is compatible with the work being done 
on WebRTC. We also support SIP clients (currently there is no support 
for direct media connections between SIP and XMPP clients, although this 
may be possible in the future) within the system, so all the work we do 
is strongly focused on creating a solution which is applicable across 
the different types of client - XMPP, WebRTC, SIP.

Because of this, I have opinions on where I'd the like the future of 
Jingle to go - partly in corporate self-interest, but also because I 
think it's important that we can produce a specification which is 
flexible enough to both support simple client-client calls in a useful 
fashion, and be applicable to next-generation enterprise systems.

The main point here is the importance of separating call control from 
media descriptions. In SIP, a SIP message may contain SDP data: the SIP 
is the call control layer, with messages used to start a call, accept a 
call, and send call-level data. Within the SIP message, there may be an 
SDP payload, which describes the media channels available in the call 
which we are setting up in the SIP level. There is a very clear 
separation between these (SIP in RFC 3261, SDP in RFC 4566). WebRTC, as 
far as I'm aware (though please correct me if I'm wrong), only covers 
the media description part - it provides a way to negotiate a media 
session between two entities, but doesn't give any framework in which 
this media session exists. To build a useful system, some layer of call 
control needs to be built on top of WebRTC.

The current Jingle spec includes both call control and media description 
in one XEP. I'd strongly favour separating these two concepts into 
separate specifications, with the call control part of Jingle being 
reliant on the media description part. This means that companies such as 
mine can build a system using the media description part, with whatever 
call control is useful on top. We've built a media description exchange 
strongly based on Jingle, but extended where necessary to include the 
features used in WebRTC (in particular, the implementation in Chrome). 
Included below are the SDP from Chrome, and our representation of this 
in XML.

Chrome's SDP:
v=0
o=- 668367723 3 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video data
a=msid-semantic: WMS
m=audio 56276 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126
c=IN IP4 10.1.1.161
a=rtcp:56276 IN IP4 10.1.1.161
a=candidate:3483358994 1 udp 2113937151 10.1.1.161 56276 typ host 
generation 0
a=candidate:3483358994 2 udp 2113937151 10.1.1.161 56276 typ host 
generation 0
a=ice-ufrag:WL6pJZ5YYr26WlQe
a=ice-pwd:dsaVa7MYI2wOTMP1gnyAl59a
a=ice-options:google-ice
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendrecv
a=mid:audio
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 
inline:rIU/XnMmPBLJQVqt3LK0l7wS//6d7oSQlbdFEv5v
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:107 CN/48000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
a=maxptime:60
m=video 56276 RTP/SAVPF 100 116 117
c=IN IP4 10.1.1.161
a=rtcp:56276 IN IP4 10.1.1.161
a=candidate:3483358994 1 udp 2113937151 10.1.1.161 56276 typ host 
generation 0
a=candidate:3483358994 2 udp 2113937151 10.1.1.161 56276 typ host 
generation 0
a=ice-ufrag:WL6pJZ5YYr26WlQe
a=ice-pwd:dsaVa7MYI2wOTMP1gnyAl59a
a=ice-options:google-ice
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=sendrecv
a=mid:video
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 
inline:rIU/XnMmPBLJQVqt3LK0l7wS//6d7oSQlbdFEv5v
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
m=application 56276 RTP/SAVPF 101
c=IN IP4 10.1.1.161
a=rtcp:56276 IN IP4 10.1.1.161
a=candidate:3483358994 1 udp 2113937151 10.1.1.161 56276 typ host 
generation 0
a=candidate:3483358994 2 udp 2113937151 10.1.1.161 56276 typ host 
generation 0
a=ice-ufrag:WL6pJZ5YYr26WlQe
a=ice-pwd:dsaVa7MYI2wOTMP1gnyAl59a
a=ice-options:google-ice
a=sendrecv
a=mid:data
b=AS:30
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 
inline:rIU/XnMmPBLJQVqt3LK0l7wS//6d7oSQlbdFEv5v
a=rtpmap:101 google-data/90000
a=ssrc:3595327575 cname:wDqErLljYprTBcWu
a=ssrc:3595327575 msid:SfChannel SfChannel
a=ssrc:3595327575 mslabel:SfChannel
a=ssrc:3595327575 label:SfChannel

Our XMLified version of this (may not map exactly, because of 
transformations we perform along the way):
<sessionDescription>
   <bandwidth type="TIAS">1000000</bandwidth>
   <content>
     <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='audio' 
feedback='true'>
       <payload-type id='111' name='opus' clockrate='48000'/>
       <payload-type id='103' name='isac' clockrate='16000'/>
       <payload-type id='104' name='isac' clockrate='32000'/>
       <payload-type id='0' name='PCMU' clockrate='8000'/>
       <payload-type id='8' name='PCMA' clockrate='8000'/>
       <payload-type id='107' name='CN' clockrate='48000'/>
       <payload-type id='106' name='CN' clockrate='32000'/>
       <payload-type id='105' name='CN' clockrate='16000'/>
       <payload-type id='13' name='CN' clockrate='8000'/>
       <payload-type id='126' name='telephone-event' clockrate='8000'>
         <parameter value='0-15'/>
       </payload-type>
       <headerExtensions>
         <extension value="1" type="ssrcAudioLevel" />
       </headerExtensions>
       <encryption required="1">
         <crypto tag="1" crypto-suite="AES_CM_128_HMAC_SHA1_80" 
key-params="inline:rIU/XnMmPBLJQVqt3LK0l7wS//6d7oSQlbdFEv5v" />
       </encryption>
     </description>
     <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1' 
pwd='dsaVa7MYI2wOTMP1gnyAl59a' ufrag='WL6pJZ5YYr26WlQe'>
       <candidate component='1' generation='0' foundation='3483358994' 
id='dummy' ip='10.1.1.161' port='56276' priority='2113937151' 
protocol='udp' type='host' />
       <candidate component='1' generation='0' 
foundation='a=candidate:750670334' id='dummy' ip='82.71.160.189' 
port='7840' priority='1845501695' protocol='udp' type='srflx' 
rel-addr='10.1.1.161' rel-port='56276' />
       <candidate component='1' generation='0' 
foundation='a=candidate:2798023649' id='dummy' ip='82.71.160.183' 
port='50187' priority='7935' protocol='udp' type='relay' 
rel-addr='82.71.160.189' rel-port='30359' />
       <candidate component='2' generation='0' foundation='3483358994' 
id='dummy' ip='10.1.1.161' port='56276' priority='2113937151' 
protocol='udp' type='host' />
       <candidate component='2' generation='0' 
foundation='a=candidate:750670334' id='dummy' ip='82.71.160.189' 
port='7840' priority='1845501695' protocol='udp' type='srflx' 
rel-addr='10.1.1.161' rel-port='56276' />
       <candidate component='2' generation='0' 
foundation='a=candidate:2798023649' id='dummy' ip='82.71.160.183' 
port='50187' priority='7935' protocol='udp' type='relay' 
rel-addr='82.71.160.189' rel-port='30359' />
     </transport>
   </content>
   <content>
     <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='video' 
feedback='true'>
       <payload-type id='100' name='VP8' clockrate='90000'>
         <rtcp-fb xmlns="urn:xmpp:jingle:apps:rtp:rtcp-fb:0" type="nack"/>
         <rtcp-fb xmlns="urn:xmpp:jingle:apps:rtp:rtcp-fb:0" type="ccm" 
subtype="fir"/>
       </payload-type>
       <encryption required="1">
         <crypto tag="1" crypto-suite="AES_CM_128_HMAC_SHA1_80" 
key-params="inline:rIU/XnMmPBLJQVqt3LK0l7wS//6d7oSQlbdFEv5v" />
       </encryption>
     </description>
     <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1' 
pwd='dsaVa7MYI2wOTMP1gnyAl59a' ufrag='WL6pJZ5YYr26WlQe'>
       <candidate component='1' generation='0' foundation='3483358994' 
id='dummy' ip='10.1.1.161' port='56276' priority='2113937151' 
protocol='udp' type='host' />
       <candidate component='1' generation='0' 
foundation='a=candidate:750670334' id='dummy' ip='82.71.160.189' 
port='7840' priority='1845501695' protocol='udp' type='srflx' 
rel-addr='10.1.1.161' rel-port='56276' />
       <candidate component='1' generation='0' 
foundation='a=candidate:2798023649' id='dummy' ip='82.71.160.183' 
port='50187' priority='7935' protocol='udp' type='relay' 
rel-addr='82.71.160.189' rel-port='30359' />
       <candidate component='2' generation='0' foundation='3483358994' 
id='dummy' ip='10.1.1.161' port='56276' priority='2113937151' 
protocol='udp' type='host' />
       <candidate component='2' generation='0' 
foundation='a=candidate:750670334' id='dummy' ip='82.71.160.189' 
port='7840' priority='1845501695' protocol='udp' type='srflx' 
rel-addr='10.1.1.161' rel-port='56276' />
       <candidate component='2' generation='0' 
foundation='a=candidate:2798023649' id='dummy' ip='82.71.160.183' 
port='50187' priority='7935' protocol='udp' type='relay' 
rel-addr='82.71.160.189' rel-port='30359' />
     </transport>
   </content>
   <content>
     <description xmlns='urn:xmpp:jingle:apps:rtp:1' media='application' 
feedback='true'>
       <payload-type id='101' name='google-data' clockrate='90000'/>
         <bandwidth type="TIAS">30720</bandwidth>
         <ssrc id="3595327575">
           <cname>wDqErLljYprTBcWu</cname>
           <msid>SfChannel</msid>
       </ssrc>
       <encryption required="1">
         <crypto tag="1" crypto-suite="AES_CM_128_HMAC_SHA1_80" 
key-params="inline:rIU/XnMmPBLJQVqt3LK0l7wS//6d7oSQlbdFEv5v" />
       </encryption>
     </description>
     <transport xmlns='urn:xmpp:jingle:transports:ice-udp:1' 
pwd='dsaVa7MYI2wOTMP1gnyAl59a' ufrag='WL6pJZ5YYr26WlQe'>
       <candidate component='1' generation='0' foundation='3483358994' 
id='dummy' ip='10.1.1.161' port='56276' priority='2113937151' 
protocol='udp' type='host' />
       <candidate component='1' generation='0' 
foundation='a=candidate:750670334' id='dummy' ip='82.71.160.189' 
port='7840' priority='1845501695' protocol='udp' type='srflx' 
rel-addr='10.1.1.161' rel-port='56276' />
       <candidate component='1' generation='0' 
foundation='a=candidate:2798023649' id='dummy' ip='82.71.160.183' 
port='50187' priority='7935' protocol='udp' type='relay' 
rel-addr='82.71.160.189' rel-port='30359' />
       <candidate component='2' generation='0' foundation='3483358994' 
id='dummy' ip='10.1.1.161' port='56276' priority='2113937151' 
protocol='udp' type='host' />
       <candidate component='2' generation='0' 
foundation='a=candidate:750670334' id='dummy' ip='82.71.160.189' 
port='7840' priority='1845501695' protocol='udp' type='srflx' 
rel-addr='10.1.1.161' rel-port='56276' />
       <candidate component='2' generation='0' 
foundation='a=candidate:2798023649' id='dummy' ip='82.71.160.183' 
port='50187' priority='7935' protocol='udp' type='relay' 
rel-addr='82.71.160.189' rel-port='30359' />
     </transport>
   </content>
</sessionDescription>

This is included largely as an illustration of the ease of transforming 
a WebRTC SDP to a Jingle-like form, rather than any particular claims 
that the way we've implemented things not covered by the Jingle spec, or 
any deviations from the Jingle spec, are the Right Way to do things.

I'd like to push as a first step for next generation Jingle to split 
this session description representation out from the call-control part 
of Jingle. Whether this is done as shown above (providing an XML 
representation of the session description), or in a SoX format (though I 
am strongly opposed to including SIP-like headers within the messaging, 
because this breaks the call-control/session description split I'm 
looking for), I don't have a particular argument for, although the SDP 
over XMPP format feels bad and wrong to me. I think this is important 
because it reflects the nature of WebRTC in only doing the media 
description, and so opens up the use of XMPP as a transport for media 
sessions which end up in the browser, and it also simplifies the job of 
gateways between Jingle and e.g. SIP, where they can terminate the 
call-control layer, and pass through the media description layer.

I think there are then two sensible layers to be built on top of the 
session description specification - a purely peer-to-peer specification, 
like the current Jingle implementation, allowing clients to call each 
other without relying on any other services, and a client-server 
specification, such as I'm implementing within Acano, for building more 
complicated systems supporting such standard telephony features as 
multipoint conferencing, call forking, PSTN gateways etc. We are looking 
to standardise our client-server protocol, as well as implement a 
peer-to-peer protocol within our clients. I appreciate that any 
client-server protocol I'm going to propose is likely to be a 
contentious issue (as will be any changes to the messaging within Jingle 
which might make it more useful for gatewaying to SIP or WebRTC 
clients). But if we can agree on this first step, opening the door to 
build systems like mine on top of the session description, I feel like 
we'll have made good progress in the right direction.

Please ask for any more details if interested (there's a lot of them), 
or tell me why the whole thing is a bad idea.

-- 

Paul


More information about the Jingle mailing list