[Standards] Re: Proposed XMPP Extension: Jingle Remote Control

22 May 2024

Hi Goffi,

On Tue, 2024-05-21 at 18:04 +0200, Goffi wrote:
...
  Could you please tell me more about this use case and
why it isn't
 covered by 
 Jingle transmission? Specifically, what is the advantage of sending
 <message> 
 through the server instead of handling it directly? I'd like to
 understand 
 your perspective better. 
The use case I'm thinking of has low throughput and only short usage
time. I might be sending 10 or 20 key events within a short time and
then nothing for several hours.

Technically, this can all be done with Jingle, but for just a few keys,
the overhead of starting a Jingle session just for those keys probably
adds way more latency than sending those keys via <message> through an
XMPP server. And using Jingle would require way more complex software
on both sides.

Without a proper specification to send keys, I would do this via non
standardized body messages. Works, but isn't particularly nice.

I also noticed that in cases where XEP-0174 Serverless Messaging is
used, an additional Jingle connection probably doesn't add a lot of
benefit either.

...
  That's the job of the controlling client to assure
consistency. That
 may be 
 specified in business rules though.

 The wire format comes from the web API, but we are not developing
 browsers, we 
 are developing XMPP clients. 
Well, XMPP clients that also speak a ton of other protocols, including
the one you are just creating.

My point is that not only does this need to be specified in the
business rules, but also a ton of other things. There are probably a
lot of side cases that you don't cover and where I can't reasonably
expect Council to think about them.

...
  And we come back to the point where I don't see
the need to another
 way of 
 sending input, when there is already a low latency one. I'm not
 saying that 
 you are wrong, I'm saying that I don't see why we should have another
 mechanism, when there is already low latency way to send input data
 to any 
 device. 
There is no best way to do remote control (or precisely, input event
sending). Low latency is a nice property to have and certainly doesn't
hurt if it comes for free, but it doesn't come for free, it adds
complexity, requires a multi-message setup phase and often the
management of additional network sockets. And depending on the use
case, it might be less important to have low latency than low
complexity. The lowest complexity way to provide a streaming transport
with Jingle is IBB and it still requires connection setup and is
strictly worse than <message>'s for most purposes (except transporting
of large binary blobs that exceed single stanza size limits).

Both Jingle and especially WebRTC come with huge complexity. Your
WebRTC library and your existing code for working with it might take
away most of this from you, but that doesn't mean it's not there. By
using Jingle and WebRTC you're effectively excluding clients, devices
and platforms that can't easily run libwebrtc or any other popular
WebRTC implementation.

...

 It's not arbitrarily at all, it's discarding data which don't make
 sense in 
 this context, and it has been done while doing an implementation with
 Freedesktop remote control portal. 
I was already guessing it's not arbitrarily, but probably what made
sense in your setup and for your usecase. However, not knowing any of
that it *seems* arbitrary.

The RemoteDesktop portal was clearly designed for remote desktop use
cases, not other remote control cases. However, as you already mention
that you designed the data sent around what is needed for the
RemoteDesktop portal, why not send the information directly in a format
that matches the design of RemoteDesktop portal, instead of a mix of
Web API interfaces and RemoteDesktop portal?

Also I noticed that the RemoteDesktop portal does not have a notion of
an independent wheel, the mouse wheel is tied to the pointing device,
why did you choose to not do it the same way? I also checked Wayland
protocol and wheel axis events are tied to a wl_pointer as well - and
as far as I know, apps do expect scroll wheel to be tied to a pointer
so you know where to scroll.

...
  Issue with [0,1] coordinate is that you go into
prevision loss or 
 rounding error troubles. I think that ideally the screen size should
 be send 
 separately and updated. 
The precision on a double (64 bit floating point) remain the same, no
matter if you scale [0,1] or [0,<screen-width>]. The precision is about
15 decimal digits which should be more than enough (you barely see
screen coordinates with more than 4 decimal digits), even if you do
calculations on them (which may result in a few bits of precision
loss).

...
  It would not work in the case of a FPS when you have
already reached
 the right 
 corner of your screen and you need to go right again. 
Assuming you refer the FPS games, those "lock" the cursor position to
the screen center, so they never have that issue. To correctly
reproduce this behavior you need a back channel to the controlling app
so it can know the cursor position and/or lock if it is changed on the
controlled device.

(Above might not be correct on all platforms.)

Also I did not intend to say that you shouldn't support movement
vectors (like touchpads), I was just saying that absolute pointing
could be relative to screen size, so that you don't need to know the
absolute screen size.

...
  While, if I had time and resources, yes I definitely
think that CBOR
 or similar 
 would be a good serialization protocol. Bet let's not go down this
 rabbit hole 
 ;) 
The advantage of going down this rabbit hole is:
a) We improve XMPP for other usecases
b) You can specify this protocol using XML and use Jingle XML streams.
As the CBOR<>XML translation will take care of creating the CBOR for
you, you still get the CBOR for this protocol, but without the need to
make it explicit. And in cases where people prefer to not use CBOR,
they can still use this protocol, just with XML. It's a win-win for
everyone (except that you as the specification author have more work).

If going forward, you still want to specify your own
payload/application protocol (that is, the CBOR thing that is
transferred with the Jingle streaming transport), I'd like to ask you:
- To evaluate if a XEP is the right place to specify such a protocol,
of if it is more a generic thing that could well be used outside XMPP
and maybe should also be specified elsewhere.
- If you consider a XEP to be the right place and want to stick with
your CBOR protocol, I'd like to ask you to split it into two parts: 1.
the payload protocol (sections 8 and 9 of the proposal) and 2. The
Jingle signaling protocol (sections 5 to 7 of your proposal). This way
the protocol can be used and referenced easily for use outside of
Jingle context.

If you feel it's possible to transition to a <message> based approach,
this can of course be a single XEP (that will barely have anything to
do with Jingle except for anecdotal mentioning that it can be used with
Jingle XML stream or serverless messaging for lower latency).

Best,
Marvin

2025

2024

2023

[Standards] Re: Proposed XMPP Extension: Jingle Remote Control