[jdev] voicechat again

Ido Rosen ido at cs.uchicago.edu
Wed Mar 3 04:44:35 CST 2004

Hash: SHA1

I'm new to this list, and have no knowledge of the previous argument.  Let me try to post some advantages and disadvantages of peer to peer versus client server approaches to voice communication...

Although I refer to "voice chat" below, I wish to note that I do this out of convenience to you.  We should really be calling it "audible communication", as some would stream music rather than voice, or use Jabber as a radio station, if the implementation is scalable.  Furthermore, I may in the future call video chat "visual communication".

 (*) Scalability
 (*) Functionality
 (*) Extensibility
 (*) Compatibility

First, let us define what the options are.  Two types of channels need to reach all parties involved in voice chat:  Control and data.  Data in this case consists of the actual encoded voice stream (let's assume this is codec-independent, for now, though my arguments should work regardless of codec dependency decisions).  Control is, initially, parameters required to initiate the connection (quality [khz?], maximum throughput desired [kbits/sec?], codec used).  Then, during the stream, control should probably be used for performance monitoring (packets lost, kbits/sec, etc.)  

Let's acknowledge two things: (1) Control throughput will generally be much lower than data throughput.  (2) Control throughput must initially (at least at some level) be through server, such as when connecting the two peers by providing eachother a direct link (giving each the other's IP address, for example).  I know that some protocols will already handle this for us, but let's assume that at some point the server has to route this data between the clients.

So, given those assumptions, there are several possibilities:

(1) Server forwards all control + data messages between clients, as it does with IMs.
(2) Server forwards all control messages (throughput, voice establishment, codec selection, etc.), but data is handled in a peer to peer fashion. (Directly from one user to another, optionally to many others in emulation of a multicast voice reflector, if we're talking about voice conferencing.)
(3) All control + data messages are handled in a peer to peer fashion.  All server does is give each peer the other's IP address, and no more than that.

(1) presents none (or few?) of the classic peer to peer problems, such as NAT traversal, service discovery, etc. since there is already established communication at the layer that the data will be transferred on well before the voice chat session starts.  Both clients initiate a connection to the jabber server at the TCP layer, so no port forwarding issues should arise.

(1) is not scalable -- a server without multicast could easily waste its entire pipe with three or four high-quality voice connections.  There is significant overhead to using the standard jabber XML-based transport mechanisms for each data packet, if data packets must be sent within milliseconds.  Assuming we want lagless and lossless audio, there must be slack between the throughput and the maximum desired throughput, so minimizing overhead is a priority.  Internal networks, or networks with high-speed VPNs, may find this advantageous though, as the maximum desired throughput is well beyond any overhead, and the server is very likely to support multicast in such tightly knit networks.  

(2) still presents NAT traversal and port forwarding, though since a link is already established, alternatives can be sought, and possibly a port sweep can be attempted to find an open/forwarding port, or to try several permutations of client1-->client2, client2-->client1 connections until one works.  This is just for the data channel.  The control channel can coordinate such sweeps/tests.  Some protocols may already do this.  

(2) is scalable, and an accepted method by services such as MSN, iChat, etc.  (Also Yahoo Super Webcam mode, I think).  (2) can fall back to (1) for internal addresses / by some algorithm that determines if the server's link can handle the throughput.   (2) allows server to collect data on connection statistics, and possibly even a conversation log for clients with speech-to-text engines (or at least I think this'd be a cool feature to implement).

(3) means that jabber clients just execute an external voice chat program with the parameter as IP of remote client, or some other connection data, and means that the server does not get any knowledge about how the chat went, etc.  (3) seems to be overlooked or unacceptable to some users of this mailing list.

I believe that we do not necessarily have to decide between the two -- client-server and peer-peer -- types of data transfer modes.  I think we can define client-server as a fallback in case peer-peer fails, under certain conditions, such as:  client-server voice chats do not take up too much bandwidth -- possibly limit the audio bitrate?  All attempts at initiating peer-peer voice chat fail.  Server can give lower priority (QoS?) to transmitting voice data versus IM and other data.  Server can still function with the additional bandwidth and resource strain of multiple (hundreds?  thousands?) of voice chats crossing its bus.

You probably already knew most of this, but I just thought I'd suggest that paragraph above, that you do not need to chose one over the other necessarily, and that we can use their advantages and disadvantages to produce some algorithm by which we choose one over the other.


On Wed, 3 Mar 2004 10:00:41 -0000
"Richard Dobson" <richard at dobson-i.net> wrote:

> > then here is the simple way for this §!&%§$% p2p case:
> If as I suspect the symbols above represent a swearword I suggest you calm
> down and rethink your posts or risk severly denting your credibility, such
> comments are not very professional and IMO are not appropriate here. If you
> do not want to make useful posts then I suggest you dont post, these rants
> are just a waste of bandwidth and peoples time.
> > 1) use JEP 95 to negotiate the voice session parameters
> >     * includes voice codec ID
> >     * includes frame size (i.e. 2600byte)
> >     * includes crunched size if applicable
> >     * includes channels (mono/stereo)
> >     * includes sample size (8bit/16bit/24bit/32bit)
> >     * includes sample rate (i.e. 8000hz/16000hz)
> > 2) do JEP65 stream negotation.
> > go, implement it.
> Seems a reasonable starting point.
> >i really give favor to server based approaches.
> As you have indicated previously, but just because you dont think it useful
> or a good idea doesnt mean it isnt and other people dont want it. Im not
> against server based approaches (infact its a very good idea) but there is
> plenty of room and requirement for both options, you really shouldnt be
> required to go via a server for a quick couple of minute two person chat.
> Richard
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> https://jabberstudio.org/mailman/listinfo/jdev
Version: GnuPG v1.2.4 (GNU/Linux)


More information about the JDev mailing list