Le mardi 21 mai 2024, 16:39:28 UTC+2 Marvin W a écrit :
Hi Goffi,
On Tue, 2024-05-21 at 12:47 +0200, Goffi wrote:
I know that, I've just ruled out using
<message> through the server
as it has
been proposed in another feedback.
Why do you rule that out? Because you don't see a purpose, when my
whole point is that I do see a purpose? Of course I can send whatever
CBOR/JSON you come up with as a base64 blob inside a <message> for my
usecase, but then I wonder why not to handle it in first place.
I missed that you had a particular use case in mind for using <message> via
the server.
Could you please tell me more about this use case and why it isn't covered by
Jingle transmission? Specifically, what is the advantage of sending <message>
through the server instead of handling it directly? I'd like to understand
your perspective better.
[SNIP]
RFB definitely is old, so these kind of things are expected. And, while
I see that you added clipboard as a potential future extension, it
seems odd to complain that RFB has a suboptimal implementation of a
feature your proposed XEP currently doesn't have at all.
That's just something that jump on eye after a quick check, if UTF-8 is not
natively supported, I can see problems coming. But I'll check in more details
anyway, as I've said before, I'm not against a protocol change if it makes
sense.
Regarding that it seems odd to you because clipboard sharing is not yet
specified, that's simply anticipation.
[SNIP]
I know that your specification doesn't transfer the modifier flags,
probably assuming they are superfluous. However, if your browser client
was to naively send the key events it receives as is without further
checking for plausibility, things will go wrong: I tested pressing the
keys that would logically result in the events meta down, control down,
control up, meta up and here are the results on different browsers:
https://imgur.com/a/zVxDAVa
That's the job of the controlling client to assure consistency. That may be
specified in business rules though.
From what I understand, the state of keyup and keydown
events in the
web API doesn't need to be consistent (e.g. there can be keydown
without keyup and vice-versa). Do we want the same behavior for this
protocol or something else?
The wire format comes from the web API, but we are not developing browsers, we
are developing XMPP clients.
I think you misunderstood my point. Using a smartphone
as a touch pad
or gamepad while playing a game on a screen next to you, is low latency
feedback (you can see the screen with low latency). Example for where
you don't need low latency would be when blindly typing into a remote
shell, because you won't get feedback there (except after confirming a
command which is probably not low latency).
And we come back to the point where I don't see the need to another way of
sending input, when there is already a low latency one. I'm not saying that
you are wrong, I'm saying that I don't see why we should have another
mechanism, when there is already low latency way to send input data to any
device. So please, provide me one or more use cases where the current
specification is not valid and would not work, or work sub-optimally.
At risk of repeating myself: I'm not closed-minded about changing protocols or
designs, and I appreciate feedback from people with other experiences, but
please provide clear examples of use cases where the current design is
incorrect.
[SNIP]
Again, it is not from scratch. It's re-using
existing protocols, in a
simple,
working, easy-to-implement, and efficient way.
I was talking about the remote control protocol, which is what runs on
the topmost layer (inside the webrtc datachannel or whatever other
Jingle transport is used). This protocol is mostly from scratch (it's
loosely based on web API events, but then only taking an arbitrarily
picked subset of events and event properties)
It's not arbitrarily at all, it's discarding data which don't make sense in
this context, and it has been done while doing an implementation with
Freedesktop remote control portal.
Which isn't an issue if web clients are not
relevant for my usecase.
And honestly, any kind of pointing to "you should support web clients"
sounds weird to me. It certainly is interesting that we can support web
clients, but really shouldn't siphon into unrelated specifications (and
this one totally is unrelated to web).
I've already said that I'll reformulate to only make is a suggestion, without
the "SHOULD".
[SNIP]
My point is: Either it's a Jingle session or it's not part of XMPP.
Jingle doesn't use WebRTC. It just happens that WebRTC APIs are
somewhat compatible to Jingle (because they are based on Jingle), but
from XMPP perspective, you never have WebRTC sessions. I don't know
exactly what it means to be in the same WebRTC session, but whatever
you want here, make it more explicit, because people that don't use
WebRTC APIs should not be required to first read the WebRTC specs (or
probably implementations source code) to figure out what you mean by
that.
Right, I'll review this section.
The issue is that video feed is used in this case
to get the screen
dimension.
Without it, we can't get touch event which use absolute position
(while for
mouse, there is a relative position mode for exactly this use case).
That's a problematic design. As I said, clients might scale the video
to reduce bandwidth use. Dino also has logic to adjust the video
resolution of cameras depending on available bandwidth.
I was thinking about sending the screen size at the beginning, but the issue
is when size change (e.g. remote application control when application is
resized). Issue with [0,1] coordinate is that you go into prevision loss or
rounding error troubles. I think that ideally the screen size should be send
separately and updated.
And as I understood for mouse, it's not relative to the screen, but
relative to the previous position, aka a movement vector, like reported
from touchpads.
An screen relative position that is 0,0 is upper left corner, 0.5,0.5
is center of the screen and 1,1 is lower right corner, would work
independent of the target screen resolution.
It would not work in the case of a FPS when you have already reached the right
corner of your screen and you need to go right again.
An alternative would be to specify screen
dimension when establishing
the
remote control session.
Might work, but then you also need to cover the case where the screen
resolution changes during remote control.
Yes, that with update on screen change is probably the best option.
[SNIP]
The Web API uses double because they did weird things for HiDPI. On the
hardware layer, there are only pixels and if you click on a point on
the screen, it will always be on a pixel (at least in all OS that I am
aware of). The transformation of HiDPI in browsers abstract away from
actual pixels and 1px might be more or less than a physical pixel. But
why would you want to carry this abstraction through the network to a
system that shouldn't care about what browsers can do and what they
think a pixel is?
I have no strong argument against this to be honest. I'm fine with int too.
It was just to
handle the case where no device is accepted, there was
2
options:
- reject it totally
- say it's a simple screen share session.
I've chosen the later one. But indeed, data channel is then useless.
Can
change it for the other option.
We also don't allow Jingle file transfers of no file or RTP contents
without any codecs. As this protocol is for remote control, it should
remain entirely unused for screen share only.
Sure, I'll change that.
- I'm not
hard set on technologies, and I'm OK to get rid of CBOR is
there is
consensus on it. I personally still think that it's a superior
solution.
To me the use of CBOR here feels not well motivated, except for obscure
"better performance" reasons before having done any measurement to back
that claim. From XMPP perspective, something in a Jingle XML stream
would be more canonical (because it reuses the stack we already have in
every XMPP client anyway) and anything diverting from that IMO should
be well reasoned.
If you're reasoning that CBOR provides significant performance gain
over XML, then why is it not a priority to figure out how we use CBOR
instead of XML everywhere in XMPP (e.g. by creating some XML<>CBOR
translation and using that as an optional stream feature).
While, if I had time and resources, yes I definitely think that CBOR or similar
would be a good serialization protocol. Bet let's not go down this rabbit hole
;)
- regarding using RFB for input events only,
I'll have a deeper look
at the
spec and evaluate it. It may be an option it is comparable in ease of
implementation, efficiency and flexibility to the current proposal.
I want to repeat that I haven't verified that RFB is particularly good
fit for the purpose, I just know it's very popular.
The idea is to check it. I want something flexible, easy to implement, and
efficient. If RFB or whatever else checkes the boxes, why not.
Best,
Marvin
_______________________________________________
Standards mailing list -- standards(a)xmpp.org
To unsubscribe send an email to standards-leave(a)xmpp.org