Hi Goffi,
On Tue, 2024-05-21 at 12:47 +0200, Goffi wrote:
I know that, I've just ruled out using
<message> through the server
as it has
been proposed in another feedback.
Why do you rule that out? Because you don't see a purpose, when my
whole point is that I do see a purpose? Of course I can send whatever
CBOR/JSON you come up with as a base64 blob inside a <message> for my
usecase, but then I wonder why not to handle it in first place.
From a quick glance at the Wikipedia page, I see
"In terms of
transferring
clipboard data, "there is currently no way to transfer text outside
the
Latin-1 character set".[5] A common pseudo-encoding extension solves
the
problem by using UTF-8 in an extended format.[2]: § 7.7.27 ", which
makes me
suspicious though.
RFB definitely is old, so these kind of things are expected. And, while
I see that you added clipboard as a potential future extension, it
seems odd to complain that RFB has a suboptimal implementation of a
feature your proposed XEP currently doesn't have at all.
One of the design goal of my proposal is to have
something really
simple and
straightforward to implement.
RFB isn't really hard to implement either. And ther are a ton of
implementations out there already.
There is no modifier flag used in the specification.
There is the key
value, and
the location number. From my tests, it's consistent and corresponds
to the
documentation for the browsers that I've tried (Firefox and
Chromium).
I know that your specification doesn't transfer the modifier flags,
probably assuming they are superfluous. However, if your browser client
was to naively send the key events it receives as is without further
checking for plausibility, things will go wrong: I tested pressing the
keys that would logically result in the events meta down, control down,
control up, meta up and here are the results on different browsers:
https://imgur.com/a/zVxDAVa
From what I understand, the state of keyup and keydown events in the
web API doesn't need to be consistent (e.g. there can be keydown
without keyup and vice-versa). Do we want the same behavior for this
protocol or something else?
I'm not saying there aren't any cases
where low-latency is
important,
where I disagree is that this is the case in all occasions. If you
don't have low latency feedback from the remote device, low latency
for
input is very likely not crucial.
I have the feeling that you only see this specification with the
remote desktop
use case point of view. There are other use cases, and one another
major one
is to use a device as input for another one in the same physical
location: use
of a smartphone as ad-hoc touch pad or gamepad for instance. And if
low
latency is easily achieved, I still don't see the point to have other
mechanism because in some niche case low latency is not that annoying
(but
still is, it's always annoying).
I think you misunderstood my point. Using a smartphone as a touch pad
or gamepad while playing a game on a screen next to you, is low latency
feedback (you can see the screen with low latency). Example for where
you don't need low latency would be when blindly typing into a remote
shell, because you won't get feedback there (except after confirming a
command which is probably not low latency).
Anyway, I remain not convinced that XSF is the place to specify a
remote control protocol from scratch (which is what sections 8 and
9 of
the XEP are about). Mostly because I feel the XSF does not have the
competence for doing so (aka. we will probably do things terribly
wrong, due to lack of experience in the field).
Again, it is not from scratch. It's re-using existing protocols, in a
simple,
working, easy-to-implement, and efficient way.
I was talking about the remote control protocol, which is what runs on
the topmost layer (inside the webrtc datachannel or whatever other
Jingle transport is used). This protocol is mostly from scratch (it's
loosely based on web API events, but then only taking an arbitrarily
picked subset of events and event properties)
The goal here is to be sure that it will work with web
clients, as
data
channels are currently the only way to have direct connection with
browsers. I
can reformulate to only suggest it and get rid of the SHOULD.
Which isn't an issue if web clients are not relevant for my usecase.
And honestly, any kind of pointing to "you should support web clients"
sounds weird to me. It certainly is interesting that we can support web
clients, but really shouldn't siphon into unrelated specifications (and
this one totally is unrelated to web).
WebRTC has sessions pretty much like Jingle; its ID is
what you have
in the o=
line of your SDP.
My point is: Either it's a Jingle session or it's not part of XMPP.
Jingle doesn't use WebRTC. It just happens that WebRTC APIs are
somewhat compatible to Jingle (because they are based on Jingle), but
from XMPP perspective, you never have WebRTC sessions. I don't know
exactly what it means to be in the same WebRTC session, but whatever
you want here, make it more explicit, because people that don't use
WebRTC APIs should not be required to first read the WebRTC specs (or
probably implementations source code) to figure out what you mean by
that.
The issue is that video feed is used in this case to
get the screen
dimension.
Without it, we can't get touch event which use absolute position
(while for
mouse, there is a relative position mode for exactly this use case).
That's a problematic design. As I said, clients might scale the video
to reduce bandwidth use. Dino also has logic to adjust the video
resolution of cameras depending on available bandwidth.
And as I understood for mouse, it's not relative to the screen, but
relative to the previous position, aka a movement vector, like reported
from touchpads.
An screen relative position that is 0,0 is upper left corner, 0.5,0.5
is center of the screen and 1,1 is lower right corner, would work
independent of the target screen resolution.
An alternative would be to specify screen dimension
when establishing
the
remote control session.
Might work, but then you also need to cover the case where the screen
resolution changes during remote control.
No, its value is in pixels, the same as for the Web
API. Its double
because
pixels can be subdivided (High-DPI displays, transformations). I
realize that,
besides the link to MDN, this is not explicitly stated; I'll add a
notice in
future revisions.
The Web API uses double because they did weird things for HiDPI. On the
hardware layer, there are only pixels and if you click on a point on
the screen, it will always be on a pixel (at least in all OS that I am
aware of). The transformation of HiDPI in browsers abstract away from
actual pixels and 1px might be more or less than a physical pixel. But
why would you want to carry this abstraction through the network to a
system that shouldn't care about what browsers can do and what they
think a pixel is?
It was just to handle the case where no device is
accepted, there was
2
options:
- reject it totally
- say it's a simple screen share session.
I've chosen the later one. But indeed, data channel is then useless.
Can
change it for the other option.
We also don't allow Jingle file transfers of no file or RTP contents
without any codecs. As this protocol is for remote control, it should
remain entirely unused for screen share only.
- I'm not hard set on technologies, and I'm OK
to get rid of CBOR is
there is
consensus on it. I personally still think that it's a superior
solution.
To me the use of CBOR here feels not well motivated, except for obscure
"better performance" reasons before having done any measurement to back
that claim. From XMPP perspective, something in a Jingle XML stream
would be more canonical (because it reuses the stack we already have in
every XMPP client anyway) and anything diverting from that IMO should
be well reasoned.
If you're reasoning that CBOR provides significant performance gain
over XML, then why is it not a priority to figure out how we use CBOR
instead of XML everywhere in XMPP (e.g. by creating some XML<>CBOR
translation and using that as an optional stream feature).
- regarding using RFB for input events only, I'll
have a deeper look
at the
spec and evaluate it. It may be an option it is comparable in ease of
implementation, efficiency and flexibility to the current proposal.
I want to repeat that I haven't verified that RFB is particularly good
fit for the purpose, I just know it's very popular.
Best,
Marvin