Proposed XMPP Extension: Jingle Remote Control

List overview All Threads
Download

newer

older

Proposed XMPP Extension: Chat...

XEP proposal: filtering chat...

Daniel Gultsch

14 May 2024 14 May '24

3:34 p.m.

Show replies by date

Stephen Paul Weber

15 May 15 May

2:08 a.m.

...

URL: https://xmpp.org/extensions/inbox/remote-control.html

I'm unclear on the benefits of this CBOR-over-Jingle approach vs a XML-based <message>-based approach? Unlike audio/video this data is tiny and does not benefit nearly as much from direct transmission or binary packing. If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs. Jingle can of course be used to send any data of any protocol and so any existing or future remote-control protocol can be used over Jingle. Do we really need something XMPP-specific for this purpose?

Marvin W

3:48 p.m.

Hi, I also am not sure why to do this custom-CBOR-via-Jingle approach. - I don't see a good reason to use CBOR instead of JSON or XML. We're just adding another object encoding scheme to the protocol suite without any obvious reason. - If we define a protocol for remote control, I would prefer this to be a <message>-based protocol that can be used either using a traditional XMPP connection or via XEP-0247 Jingle XML Streams. - Rather than defining our own protocol for remote control and using traditional video conferencing tech for screen content transfer, I'd prefer to reuse something like the RFB protocol (RFC 6143), aka. only create a spec to define how to use RFB via Jingle. Marvin On Tue, 2024-05-14 at 21:08 -0500, Stephen Paul Weber wrote:

...

URL: https://xmpp.org/extensions/inbox/remote-control.html

I'm unclear on the benefits of this CBOR-over-Jingle approach vs a XML-based <message>-based approach? Unlike audio/video this data is tiny and does not benefit nearly as much from direct transmission or binary packing. If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs. Jingle can of course be used to send any data of any protocol and so any existing or future remote- control protocol can be used over Jingle. Do we really need something XMPP- specific for this purpose?

MSavoritias

4:11 p.m.

On 5/15/24 18:48, Marvin W wrote:

...

Good points. But from what I read RFB is not the same way as what is proposed here. The XEP as is defines a protocol that can send input to any target without needing graphical interface or video calls. It even says it allows an xmpp client to work as a keyboard for something else for example.

...

to use remotely or simulate input devices for another device that may

lack them (e.g., single-board computers or IoT devices), and it establishes a framework that can be extended to share data (e.g., clipboard content). read the RFC of RFB that can't be done with that. see also in the design goals:

...

Work between devices of different screen sizes, or even if a device

has no screen at all. MSavoritias

...

On Tue, 2024-05-14 at 21:08 -0500, Stephen Paul Weber wrote:

URL: https://xmpp.org/extensions/inbox/remote-control.html

I'm unclear on the benefits of this CBOR-over-Jingle approach vs a XML-based <message>-based approach? Unlike audio/video this data is tiny and does not benefit nearly as much from direct transmission or binary packing. If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs. Jingle can of course be used to send any data of any protocol and so any existing or future remote- control protocol can be used over Jingle. Do we really need something XMPP- specific for this purpose?

_______________________________________________ Standards mailing list -- standards(a)xmpp.org To unsubscribe send an email to standards-leave(a)xmpp.org

Marvin W

5:35 p.m.

Sure, RFB does specify screen content transfer, but for the usecases where there is no screen content RFB can also be used exclusively for input, by setting screen size to 0x0. I'm also not set on RFB, e.g. SPICE might also be a good option as well. I'm just saying that prior art in this area might be worth considering, because, realistically, the proposed protocol is mostly for use as a remote desktop where screen content is transferred. For such a usecase, using a protocol like RFB would provide significant performance improvements (starting at simple things like local cursor rendering that allow for lag free cursor movement) over regular video encoders. When the intent is to not use this for remote desktop usecases, but any other remote control usecases, the strict requirement to Jingle seems even less useful, as single or few key press usecases seem very reasonable to me when remote controlling headless IoT devices - and for just a few key presses, the overhead of setting up a Jingle session would be unreasonable. In fact low latency, the main improvement of Jingle vs inband, is only really required if you do have more or less immediate feedback from the receiving device - e.g. when the screen content is streamed. And I bet as an IoT device developer you also want to avoid the overhead of a Jingle+WebRTC stack if you can. Marvin On Wed, 2024-05-15 at 19:11 +0300, MSavoritias wrote:

...

On 5/15/24 18:48, Marvin W wrote:

to use remotely or simulate input devices for another device that

may lack them (e.g., single-board computers or IoT devices), and it establishes a framework that can be extended to share data (e.g., clipboard content). read the RFC of RFB that can't be done with that. see also in the design goals:

Work between devices of different screen sizes, or even if a

device has no screen at all. MSavoritias

On Tue, 2024-05-14 at 21:08 -0500, Stephen Paul Weber wrote:

URL: https://xmpp.org/extensions/inbox/remote-control.html

I'm unclear on the benefits of this CBOR-over-Jingle approach vs a XML-based <message>-based approach? Unlike audio/video this data is tiny and does not benefit nearly as much from direct transmission or binary packing. If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs. Jingle can of course be used to send any data of any protocol and so any existing or future remote- control protocol can be used over Jingle. Do we really need something XMPP- specific for this purpose?

_______________________________________________ Standards mailing list -- standards(a)xmpp.org To unsubscribe send an email to standards-leave(a)xmpp.org

MSavoritias

16 May 16 May

8:01 a.m.

On 5/15/24 20:35, Marvin W wrote:

...

Yeah probably as is the scope it too broad. MSavoritias

...

On Wed, 2024-05-15 at 19:11 +0300, MSavoritias wrote:

On 5/15/24 18:48, Marvin W wrote:

to use remotely or simulate input devices for another device that

Work between devices of different screen sizes, or even if a

device has no screen at all. MSavoritias

On Tue, 2024-05-14 at 21:08 -0500, Stephen Paul Weber wrote:

> URL: https://xmpp.org/extensions/inbox/remote-control.html I'm unclear on the benefits of this CBOR-over-Jingle approach vs a XML-based <message>-based approach? Unlike audio/video this data is tiny and does not benefit nearly as much from direct transmission or binary packing. If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs. Jingle can of course be used to send any data of any protocol and so any existing or future remote- control protocol can be used over Jingle. Do we really need something XMPP- specific for this purpose?

_______________________________________________ Standards mailing list -- standards(a)xmpp.org To unsubscribe send an email to standards-leave(a)xmpp.org

Goffi

20 May 20 May

2:51 p.m.

Le mardi 14 mai 2024, 17:34:45 UTC+2 Daniel Gultsch a écrit :

...

The XMPP Extensions Editor has received a proposal for a new XEP. Title: Jingle Remote Control Abstract: This specification defines a way to remotely control a device using local peripheral inputs. URL: https://xmpp.org/extensions/inbox/remote-control.html The Council will decide in the next two weeks whether to accept this proposal as an official XEP. _______________________________________________ Standards mailing list -- standards(a)xmpp.org To unsubscribe send an email to standards-leave(a)xmpp.org

Hello, Sorry for the late reply; I was on vacation. Thank you for your feedback. I'll address your concerns below:

...

I'm unclear on the benefits of this CBOR-over-Jingle approach vs an XML-based <message>-based approach? Unlike audio/video this data is tiny and does not benefit nearly as much from direct transmission or binary packing.

There are many benefits to using CBOR: - It is smaller. While individual pieces of data may be tiny, the cumulative amount is significant, and efficiency is crucial. - Segmentation is inherent in CBOR, so you always know if you have all the data. This is beneficial for optimization and security. - Encoding and decoding CBOR are much more efficient, essential for quick and efficient data processing, especially by low-resource devices (like Arduinos).

...

If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs.

The whole point of Jingle applications is to specify the protocol. I'm not sure where the problem lies here.

...

- If we define a protocol for remote control, I would prefer this to be a <message>-based protocol that can be used either using a traditional XMPP connection or via XEP-0247 Jingle XML Streams.

Using server-based <message> would be highly inefficient. Why send gamepad data to the server, incurring delays and extra processing, when you can send it directly from your local network? Add e2ee concerns (e.g. OMEMO override in size and processing), and you're wasting a lot of resources. Regarding direct XML streams, CBOR is still more efficient. Additionally, the protocol is based on web APIs, and CBOR provides a direct mapping. Using XML would require reinventing the wheel.

...

- Rather than defining our own protocol for remote control and using traditional video conferencing tech for screen content transfer, I'd prefer to reuse something like the RFB protocol (RFC 6143), aka. only create a spec to define how to use RFB via Jingle.

The protocol described here is for input sending and potentially other features like clipboard sharing, gamepad, and haptic feedback. In combination with existing specifications, one use case can be remote desktop. The goal is to reuse existing XMPP building blocks to simplify implementation. That’s what XMPP is for: coordinating specifications. We already have an A/V transmission protocol. With WebRTC, it's extremely efficient regarding latency and bandwidth. It’s suitable for remote desktop streaming, including robust network traversal mechanisms. Regarding mouse cursor hiding, I've considered it and will include it in a future version (my implementation does hide the cursor). And, as mentioned, the protocol comes from Web APIs because they are simple, well-documented, and provides a well-thought-out abstraction of the hardware.

...

In fact low latency, the main improvement of Jingle vs inband, is only really required if you do have more or less immediate feedback from the receiving device - e.g. when the screen content is streamed.

Low latency is crucial for inputs, especially for devices like gamepads, touchpads, and mice. Even with keyboards, low latency can be important, for instance, when playing a game. I've used this protocol to utilize my phone as a graphics tablet for MyPaint on my desktop: lot of events, and low latency very much needed.

...

And I bet as an IoT device developer you also want to avoid the overhead of a Jingle+WebRTC stack if you can.

For very low-resource IoT devices, a "parent" device could handle the XMPP work, with the stream established directly with the IoT device. I see XMPP in IoT as a “router” device managing XMPP and communicating with other devices on the local network using other protocols. But XMPP itself could certainly be used directly in many modern devices, potentially with some optimizations. I hope that I have answered all your feedback. Regards, Goffi

Stephen Paul Weber

3:35 p.m.

...

- It is smaller. While individual pieces of data may be tiny, the cumulative amount is significant, and efficiency is crucial.

Why is efficiency crucial when events are being produced at the rate a human can produce them?

...

If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs.

The whole point of Jingle applications is to specify the protocol. I'm not sure where the problem lies here.

Do we have any examples of a XEP specifying a protocol yet? So far usually we reuse a protocol, such as RTP.

...

Regarding direct XML streams, CBOR is still more efficient.

No one is saying CBOR is bad in general, but I think if a new input-events-over-cbor-stream protocol is needed the XSF is not the place to specify it.

...

Additionally, the protocol is based on web APIs, and CBOR provides a direct mapping.

Perhaps this could be called out more? If this is indeed a "direct mapping" to an existing protocol maybe the concerns about it being bespoke are moot.

Goffi

8:10 p.m.

Le lundi 20 mai 2024, 17:35:13 UTC+2 Stephen Paul Weber a écrit :

...

- It is smaller. While individual pieces of data may be tiny, the cumulative amount is significant, and efficiency is crucial.

Why is efficiency crucial when events are being produced at the rate a human can produce them?

What do you mean? It's not like humans are entering the data themselves. Input devices are used (most of the time) by humans, but that doesn't mean the data volume is small. With touch devices, you can quickly have several touch points with diameters, pressure, etc., moving around. You quickly accumulate dozens of events. Sure, compared to a video stream, that's minor, but it still needs to be processed quickly and efficiently. Lack of efficiency means latency, which can be annoying at best and problematic at worst (e.g., in games or precision tasks). And in general, when the cost of using an efficient solution is low, why not use it?

...

If something over Jingle *is* desired, I'm a bit uncomfortable with specifying bespoke binary protocols in XEPs.

The whole point of Jingle applications is to specify the protocol. I'm not sure where the problem lies here.

Do we have any examples of a XEP specifying a protocol yet? So far usually we reuse a protocol, such as RTP.

Well, most XEPs specify protocols. Regarding Jingle wire protocols, reusing another protocol is essentially specifying it. And we don't have many Jingle applications so far. XEP-0166 actually requires specifying how the data is to be sent and received (https://xmpp.org/extensions/xep-0166.html#conformance-apps).

...

Regarding direct XML streams, CBOR is still more efficient.

No one is saying CBOR is bad in general, but I think if a new input-events-over-cbor-stream protocol is needed the XSF is not the place to specify it.

It's not a generic input-events over CBOR protocol; it's a specification to control devices via XMPP. This specification is useless without the XMPP and Jingle stack, and remote desktop use case reuses the existing stack. The wire format specified here is simple, efficient, and does the job. If using other standards for wire format is a bad thing, why do we have Socks5, WebRTC, HTTP, and so on used in XMPP?

...

Additionally, the protocol is based on web APIs, and CBOR provides a direct mapping.

Perhaps this could be called out more? If this is indeed a "direct mapping" to an existing protocol maybe the concerns about it being bespoke are moot.

It's not a 1:1 mapping of Web API because many data are useless or irrelevant in this specification. For instance, a Web `MouseEvent` has data such as `clientX`, `pageY`, and `relatedTarget`, which are not relevant in this case. Additionally, Web `MouseEvent` indicates key presses during the event, which are obtained through the `keyboard` device in the specification. Only the necessary data are kept. You are right that this should be explained in the specification. If the protoXEP is accepted, I'll add a section to explain that. I hope that I have helped to clarify things. Regards, Goffi

Marvin W

8:48 p.m.

Hi Goffi, See inline comments. Sorry for the wall of text and if it overlaps with one of the mails you wrote since I started writing this. On Mon, 2024-05-20 at 16:51 +0200, Goffi wrote:

...

There are many benefits to using CBOR: - It is smaller. While individual pieces of data may be tiny, the cumulative amount is significant, and efficiency is crucial.

The cumulative amount is about 10-20%% [1]. This isn't really a huge improvement and almost all events will fit into a single network layer frame anyway, further reducing the impact of encoding size.

...

- Segmentation is inherent in CBOR, so you always know if you have all the data. This is beneficial for optimization and security.

Segmentation is also inherent to SCTP, the protocol webrtc data channels use to transfer content frames. There is no win in segmenting the same segments twice.

...

- Encoding and decoding CBOR are much more efficient, essential for quick and efficient data processing, especially by low-resource devices (like Arduinos).

Not untrue, but probably negligible given the resource use of IP, UDP, DTLS, SCTP - all part of the protocol stack you're building on and thus involved in every event to be processed. Especially DTLS encryption is going to be much more resource hungry than the difference between CBOR parser and JSON parser. And notable, CBOR encoding is not a native function in web browsers, so if web is a goal of this thing (and seemingly it is, given all the references to web tech in the XEP), CBOR is probably not much better than JSON.

...

- If we define a protocol for remote control, I would prefer this to be a <message>-based protocol that can be used either using a traditional XMPP connection or via XEP-0247 Jingle XML Streams.

Using server-based <message> would be highly inefficient. Why send gamepad data to the server, incurring delays and extra processing, when you can send it directly from your local network?

XEP-0247 Jingle XML streams doesn't need to go via the server, it uses Jingle just like your proposed protocol. While the XEP isn't maintained for some time and makes weird references to other XEPs, nothing in it forbids using it with webrtc data channels. In fact this has been discussed as a useful tool for all kinds of things recently (like initial device crypto setup or device-to-device MAM). And of course latency when sending via a server might be sub-perfect, but it's a very similar latency you would see if the network environment requires to use a TURN server, which is one of the ways to use Jingle. And as mentioned, there are valid use cases for having input in cases where low latency is not that crucial. Think of keyboard input to a remote shell - essentially what SSH does - which is not uncommon to be routed through proxies/tunnels that add latency. Of course for game input, drawing and 3d modeling, that's probably not an option. It depends a lot on the usecase and that's why flexibility is very much a good idea. Building something that is exclusively/primarily designed around having a web browser XMPP client connected via Jingle webrtc datachannels doesn't sound like flexibility was part of the design.

...

Regarding direct XML streams, CBOR is still more efficient. Additionally, the protocol is based on web APIs, and CBOR provides a direct mapping. Using XML would require reinventing the wheel.

Just as you can "directly" map data from JSON objects from a web browser to CBOR, you can directly map them to XML. It's not really a good idea to do such a direct mapping in both cases though (e.g. if you used enumerated keys in CBOR instead of a string map, you can drastically reduce the payload size and improve parsing speed).

...

As I mentioned in another email: If you really feel like using RTP for screen content transfer, you can always decide to only use the RFB protocol (or something else) for the input part. I took it as an example for an existing protocol that (among other features) has logic for remote control input. Using RFB for screen transfer may be an adjacent topic, but not a requirement.

...

We already have an A/V transmission protocol. With WebRTC, it's extremely efficient regarding latency and bandwidth. It’s suitable for remote desktop streaming, including robust network traversal mechanisms.

Network traversal is on a completely different layer than the protocol to transfer screen content (RTP vs RFB). Nothing speaks against running the RFB protocol over webrtc datachannels. Running RFB over websockets in web browsers is also not well specified anywhere, but is still widely deployed [2].

...

And, as mentioned, the protocol comes from Web APIs because they are simple, well-documented, and provides a well-thought-out abstraction of the hardware.

Web APIs are designed around what browsers can reasonably do on the machines they run on. That doesn't mean they are well thought out for the generic purpose. I just played with the https://w3c.github.io/uievents/tools/key-event-viewer.html and it's still unclear to me when pressing modifier keys, which events are emitted when and what is the supposed state of the modifier flag for those events. I figured that the behavior is inconsistent between browsers (and probably operating systems) and also between different keys in the same browser. I bet this is not intended, but as the specification and MDN don't really tell me what the correct behavior would be, I can't really blame the browsers either. What I learned is that, as a web developer, you must be prepared to see modifier flags set without a keydown event being emitted for the pressing of the corresponding modifier key and also keyup events being emitted without a corresponding keydown event indicating the key was in fact pressed. So I definitely don't agree to the saying that it must be well- documented and well-thought-out just because it's coming from the Web...

...

Low latency is crucial for inputs, especially for devices like gamepads, touchpads, and mice. Even with keyboards, low latency can be important, for instance, when playing a game.

I'm not saying there aren't any cases where low-latency is important, where I disagree is that this is the case in all occasions. If you don't have low latency feedback from the remote device, low latency for input is very likely not crucial. Anyway, I remain not convinced that XSF is the place to specify a remote control protocol from scratch (which is what sections 8 and 9 of the XEP are about). Mostly because I feel the XSF does not have the competence for doing so (aka. we will probably do things terribly wrong, due to lack of experience in the field). That doesn't mean we don't need /something/ in XMPP to do the signaling for whatever is used to send remote control events. And using Jingle for this (be it using webrtc datachannels or any other Jingle transport) totally makes sense for low latency. -- There is a bunch of things I would suggest that are not related to this at all. Instead of `<device type="keyboard"/>` I would go with `<keyboard />`, allowing for attributes to be added for more information where there is fit (e.g. for a mouse have an optional buttons attribute with the number of buttons that are on the mouse, or for a gamepad, you might want to provide the layout, etc). This also means that to extend new devices outside this specification, one can just have a `<gamepad xmlns="urn:xmpp:remote-control:gamepad:0" />` or similar. As a general guideline, I feel attributes should only be used if the set of possible is finite. I would strongly opt to not make the use of datachannels a SHOULD in this protocol. It really doesn't matter for the purpose of this protocol and you don't want to need to upgrade this protocol if a new transport protocol becomes available that would be a better fit. Jingle does the abstraction to streaming vs datagram, so that application protocols don't need to deal with it. There is a lot of specification for interaction with the Jingle RTP and WebRTC protocols. This seems mostly unnecessary. - You already write in the requirements that everything should work even without Jingle RTP - You put that one MUST use the same "WebRTC session" (what is that even) for both Jingle RTP and Remote Control. I wouldn't know why this is. Of course using existing sessions in Jingle often makes sense (that's why it's a feature), but it definitely doesn't need a MUST here. - You write explicitly that Remote Control can be added with content- add to existing Jingle RTP sessions. This is already given by the Jingle specification, which doesn't limit what content can be added to a session (e.g. you can also add a file transfer to an existing call). - You say that touch devices should not be used when no video RTP session is active. I don't see why this shouldn't be possible. I do own a drawing tablet that doesn't have a screen but still is an absolute pointing device (aka "touch"). If that device was connected via XMPP, it wouldn't need a RTP session to transfer its input. - You say that absolute mouse events should not be used when no video RTP session is active. I also don't see why this restriction is in place - same as above. For both touch and mouse you use x and y coordinates "relative to the video stream". What does that mean? x and y are doubles, so are they supposed to be relative to the screen, so only values between 0 and 1 (inclusive) are valid? If x and y are absolute values in pixels, why are they doubles? If they are a pixel value, is it the pixels of the screen or the pixels of the video (as the video might use a lower resolution than the actual screen resolution)? I would suggest to go with relative values 0-1. If you want to use an absolute value in pixels, I suggest to make it screen pixels and somehow signal the screen pixels outside and independent of the RTP video resolution. Wheel events don't have a screen coordinate. I'm pretty sure they should have those, as the cursor position for the movement does matter a lot. If I understood correctly, you specify that a session is a screen share session by adding a remote control content without any device. This remote control content would thus effectively not be used, but still require setup of a data channel. This doesn't seem like a good protocol. The fact that a video is a screen share should be communicated outside this specification and this specification should not be involved at all in such a case (as it's not a remote control). A remote control without devices should be invalid. Marvin [1] https://gist.github.com/mar-v-in/003bedfcafb9e49a6ba6083ae374088b [2] https://github.com/novnc/noVNC/wiki/Projects-and-companies-using-noVNC

Goffi

21 May 21 May

10:47 a.m.

Hi Marvin, Le lundi 20 mai 2024, 22:48:42 UTC+2 Marvin W a écrit :

...

Hi Goffi, See inline comments. Sorry for the wall of text and if it overlaps with one of the mails you wrote since I started writing this. On Mon, 2024-05-20 at 16:51 +0200, Goffi wrote:

There are many benefits to using CBOR:

[SNIP] The cumulative amount is about 10-20%% [1]. This isn't really a huge improvement and almost all events will fit into a single network layer frame anyway, further reducing the impact of encoding size. [SNIP] Segmentation is also inherent to SCTP, the protocol webrtc data channels use to transfer content frames. There is no win in segmenting the same segments twice.

Note that while recommended, WebRTC Data Channel is not mandatory, and any streaming transport may be used. Your arguments are only valid for WebRTC Data Channels.

...

- Encoding and decoding CBOR are much more efficient, essential for quick and efficient data processing, especially by low-resource devices (like Arduinos).

Working with Web is a goal, but it should work of course outside web too (I currently have a web implementation for controlling device, and CLI ones for basic controlling device, and for controlled device). CBOR is not native, but there are many implementations available. Anyway, I'm not hard set on CBOR. If the consensus is to get rid of it, we can get rid of it. Regarding the choice of web, it's only because sending event, specially with keyboard, is hard to do well. There are many different way to encode depending on platforms, and various kind of keyboards with special characters. Web API is simple, documented, and abstract this complexity. The web has been around for 35 years, they have already gone through the rough patches. But again, I'm not against switching if there is something as simple and complete.

...

[SNIP] XEP-0247 Jingle XML streams doesn't need to go via the server, it uses Jingle just like your proposed protocol.

I know that, I've just ruled out using <message> through the server as it has been proposed in another feedback.

...

While the XEP isn't maintained for some time and makes weird references to other XEPs, nothing in it forbids using it with webrtc data channels. In fact this has been discussed as a useful tool for all kinds of things recently (like initial device crypto setup or device-to-device MAM).

In general I love the idea of XEP-0247 for many use cases. I just feel that XML is not adapted in this particular use case.

...

And of course latency when sending via a server might be sub-perfect, but it's a very similar latency you would see if the network environment requires to use a TURN server, which is one of the ways to use Jingle.

TURN relay is a worst case scenario. And even then, it's more efficient because you don't have to wait for server queue handling, and <message> processing.

...

And as mentioned, there are valid use cases for having input in cases where low latency is not that crucial. Think of keyboard input to a remote shell - essentially what SSH does - which is not uncommon to be routed through proxies/tunnels that add latency. Of course for game input, drawing and 3d modeling, that's probably not an option. It depends a lot on the usecase and that's why flexibility is very much a good idea. Building something that is exclusively/primarily designed around having a web browser XMPP client connected via Jingle webrtc datachannels doesn't sound like flexibility was part of the design.

It is not designed around having a web browser at all! It's not because it's inspired by web API that it's the case, otherwise every HTTP upload is designed for web browser too. Fact is there has been and still is a enormous amount of engineering into web techs, and many good things have emerged from there, like WebRTC, WebSockets, WebAssembly, etc. And again, I have a non web implementation already (and a web one). Sure with ssh latency is less a problem (while still annoying), but the current mechanism works in all cases, is simple, and efficient. While adding complexity with another mechanism because "there are valid use cases for having input in cases where low latency is not that crucial".

...

To have a successful specification, there is a balance to find between efficiency, ease of implementation and flexibility. I believe that it's the case with string map, and selectively mapping data.

...

[SNIP] As I mentioned in another email: If you really feel like using RTP for screen content transfer, you can always decide to only use the RFB protocol (or something else) for the input part. I took it as an example for an existing protocol that (among other features) has logic for remote control input.

Again I'm not hard set on chosen technologies. I'm not familiar with the internals of RFB, and will look at it. If it's a good fit, I'm not against replacing the current events wire format with it. From a quick glance at the Wikipedia page, I see "In terms of transferring clipboard data, "there is currently no way to transfer text outside the Latin-1 character set".[5] A common pseudo-encoding extension solves the problem by using UTF-8 in an extended format.[2]: § 7.7.27 ", which makes me suspicious though. One of the design goal of my proposal is to have something really simple and straightforward to implement.

...

Using RFB for screen transfer may be an adjacent topic, but not a requirement.

The discussed specification focuses on remote controlling a device, rather than screen/audio transfer. It explains how to use it in conjunction with the current specification for A/V calls for remote desktop, but designing the desktop transfer protocol is out of scope. Another XEP may be specified if XEP-0167 proves not to be sufficient for desktop transfer, and this proposal will be usable with it without issue. Such a XEP could utilize RFB, SPICE, or whatever.

...

[SNIP] I just played with the https://w3c.github.io/uievents/tools/key-event-viewer.html and it's still unclear to me when pressing modifier keys, which events are emitted when and what is the supposed state of the modifier flag for those events. I figured that the behavior is inconsistent between browsers (and probably operating systems) and also between different keys in the same browser. I bet this is not intended, but as the specification and MDN don't really tell me what the correct behavior would be, I can't really blame the browsers either.

There is no modifier flag used in the specification. There is the key value, and the location number. From my tests, it's consistent and corresponds to the documentation for the browsers that I've tried (Firefox and Chromium).

...

I have the feeling that you only see this specification with the remote desktop use case point of view. There are other use cases, and one another major one is to use a device as input for another one in the same physical location: use of a smartphone as ad-hoc touch pad or gamepad for instance. And if low latency is easily achieved, I still don't see the point to have other mechanism because in some niche case low latency is not that annoying (but still is, it's always annoying).

...

Anyway, I remain not convinced that XSF is the place to specify a remote control protocol from scratch (which is what sections 8 and 9 of the XEP are about). Mostly because I feel the XSF does not have the competence for doing so (aka. we will probably do things terribly wrong, due to lack of experience in the field).

Again, it is not from scratch. It's re-using existing protocols, in a simple, working, easy-to-implement, and efficient way. Thank you for your feedback, and for the rest of your message, I'll take it into account for next revision if if the protoXEP is accepted.

...

Instead of `<device type="keyboard"/>` I would go with `<keyboard />`, allowing for attributes to be added for more information where there is fit (e.g. for a mouse have an optional buttons attribute with the number of buttons that are on the mouse, or for a gamepad, you might want to provide the layout, etc). This also means that to extend new devices outside this specification, one can just have a `<gamepad xmlns="urn:xmpp:remote-control:gamepad:0" />` or similar. As a general guideline, I feel attributes should only be used if the set of possible is finite.

The specification says that other child elements can be used in <device> for parameters. But you proposition may be cleaner, I'll consider it for a next revision if the protoXEP is accepted. Thanks!

...

I would strongly opt to not make the use of datachannels a SHOULD in this protocol. It really doesn't matter for the purpose of this protocol and you don't want to need to upgrade this protocol if a new transport protocol becomes available that would be a better fit. Jingle does the abstraction to streaming vs datagram, so that application protocols don't need to deal with it.

The goal here is to be sure that it will work with web clients, as data channels are currently the only way to have direct connection with browsers. I can reformulate to only suggest it and get rid of the SHOULD.

...

There is a lot of specification for interaction with the Jingle RTP and WebRTC protocols. This seems mostly unnecessary. - You already write in the requirements that everything should work even without Jingle RTP - You put that one MUST use the same "WebRTC session" (what is that even) for both Jingle RTP and Remote Control. I wouldn't know why this is. Of course using existing sessions in Jingle often makes sense (that's why it's a feature), but it definitely doesn't need a MUST here.

WebRTC has sessions pretty much like Jingle; its ID is what you have in the o= line of your SDP. The goal here is to reuse the connection, and to know which streams are used for what. However, this is not ideal, I agree. I have a plan to get rid of this section and work on a separate specification to add metadata to distinguish which streams are used for what.

...

- You write explicitly that Remote Control can be added with content- add to existing Jingle RTP sessions. This is already given by the Jingle specification, which doesn't limit what content can be added to a session (e.g. you can also add a file transfer to an existing call).

...

- You say that touch devices should not be used when no video RTP session is active. I don't see why this shouldn't be possible. I do own a drawing tablet that doesn't have a screen but still is an absolute pointing device (aka "touch"). If that device was connected via XMPP, it wouldn't need a RTP session to transfer its input.

The issue is that video feed is used in this case to get the screen dimension. Without it, we can't get touch event which use absolute position (while for mouse, there is a relative position mode for exactly this use case). An alternative would be to specify screen dimension when establishing the remote control session.

...

- You say that absolute mouse events should not be used when no video RTP session is active. I also don't see why this restriction is in place - same as above. For both touch and mouse you use x and y coordinates "relative to the video stream". What does that mean? x and y are doubles, so are they supposed to be relative to the screen, so only values between 0 and 1 (inclusive) are valid?

No, its value is in pixels, the same as for the Web API. Its double because pixels can be subdivided (High-DPI displays, transformations). I realize that, besides the link to MDN, this is not explicitly stated; I'll add a notice in future revisions. The Web API was initially using int, and then moved to double. That's the kind of reason why I'm using a mapping for the Web API: they went that way, and the types are carefully chosen.

...

[SNIP] Wheel events don't have a screen coordinate. I'm pretty sure they should have those, as the cursor position for the movement does matter a lot.

Cursor position is handled by other devices (mouse or touch). Wheel by itself doesn't has any position (it can be an independent device not linked to a mouse).

...

If I understood correctly, you specify that a session is a screen share session by adding a remote control content without any device. This remote control content would thus effectively not be used, but still require setup of a data channel. This doesn't seem like a good protocol. The fact that a video is a screen share should be communicated outside this specification and this specification should not be involved at all in such a case (as it's not a remote control). A remote control without devices should be invalid.

It was just to handle the case where no device is accepted, there was 2 options: - reject it totally - say it's a simple screen share session. I've chosen the later one. But indeed, data channel is then useless. Can change it for the other option. Thanks for the time you took to review the spec and write this feedback. As a summary: - I'm not hard set on technologies, and I'm OK to get rid of CBOR is there is consensus on it. I personally still think that it's a superior solution. - regarding using RFB for input events only, I'll have a deeper look at the spec and evaluate it. It may be an option it is comparable in ease of implementation, efficiency and flexibility to the current proposal. - I will take other feedback into account for a future revision. Thanks! Best, Goffi

Marvin W

2:39 p.m.

Hi Goffi, On Tue, 2024-05-21 at 12:47 +0200, Goffi wrote:

...

I know that, I've just ruled out using <message> through the server as it has been proposed in another feedback.

Why do you rule that out? Because you don't see a purpose, when my whole point is that I do see a purpose? Of course I can send whatever CBOR/JSON you come up with as a base64 blob inside a <message> for my usecase, but then I wonder why not to handle it in first place.

...

From a quick glance at the Wikipedia page, I see "In terms of transferring clipboard data, "there is currently no way to transfer text outside the Latin-1 character set".[5] A common pseudo-encoding extension solves the problem by using UTF-8 in an extended format.[2]: § 7.7.27 ", which makes me suspicious though.

RFB definitely is old, so these kind of things are expected. And, while I see that you added clipboard as a potential future extension, it seems odd to complain that RFB has a suboptimal implementation of a feature your proposed XEP currently doesn't have at all.

...

One of the design goal of my proposal is to have something really simple and straightforward to implement.

RFB isn't really hard to implement either. And ther are a ton of implementations out there already.

...

I know that your specification doesn't transfer the modifier flags, probably assuming they are superfluous. However, if your browser client was to naively send the key events it receives as is without further checking for plausibility, things will go wrong: I tested pressing the keys that would logically result in the events meta down, control down, control up, meta up and here are the results on different browsers: https://imgur.com/a/zVxDAVa From what I understand, the state of keyup and keydown events in the web API doesn't need to be consistent (e.g. there can be keydown without keyup and vice-versa). Do we want the same behavior for this protocol or something else?

...

I think you misunderstood my point. Using a smartphone as a touch pad or gamepad while playing a game on a screen next to you, is low latency feedback (you can see the screen with low latency). Example for where you don't need low latency would be when blindly typing into a remote shell, because you won't get feedback there (except after confirming a command which is probably not low latency).

...

Again, it is not from scratch. It's re-using existing protocols, in a simple, working, easy-to-implement, and efficient way.

I was talking about the remote control protocol, which is what runs on the topmost layer (inside the webrtc datachannel or whatever other Jingle transport is used). This protocol is mostly from scratch (it's loosely based on web API events, but then only taking an arbitrarily picked subset of events and event properties)

...

Which isn't an issue if web clients are not relevant for my usecase. And honestly, any kind of pointing to "you should support web clients" sounds weird to me. It certainly is interesting that we can support web clients, but really shouldn't siphon into unrelated specifications (and this one totally is unrelated to web).

...

WebRTC has sessions pretty much like Jingle; its ID is what you have in the o= line of your SDP.

My point is: Either it's a Jingle session or it's not part of XMPP. Jingle doesn't use WebRTC. It just happens that WebRTC APIs are somewhat compatible to Jingle (because they are based on Jingle), but from XMPP perspective, you never have WebRTC sessions. I don't know exactly what it means to be in the same WebRTC session, but whatever you want here, make it more explicit, because people that don't use WebRTC APIs should not be required to first read the WebRTC specs (or probably implementations source code) to figure out what you mean by that.

...

That's a problematic design. As I said, clients might scale the video to reduce bandwidth use. Dino also has logic to adjust the video resolution of cameras depending on available bandwidth. And as I understood for mouse, it's not relative to the screen, but relative to the previous position, aka a movement vector, like reported from touchpads. An screen relative position that is 0,0 is upper left corner, 0.5,0.5 is center of the screen and 1,1 is lower right corner, would work independent of the target screen resolution.

...

An alternative would be to specify screen dimension when establishing the remote control session.

Might work, but then you also need to cover the case where the screen resolution changes during remote control.

...

The Web API uses double because they did weird things for HiDPI. On the hardware layer, there are only pixels and if you click on a point on the screen, it will always be on a pixel (at least in all OS that I am aware of). The transformation of HiDPI in browsers abstract away from actual pixels and 1px might be more or less than a physical pixel. But why would you want to carry this abstraction through the network to a system that shouldn't care about what browsers can do and what they think a pixel is?

...

We also don't allow Jingle file transfers of no file or RTP contents without any codecs. As this protocol is for remote control, it should remain entirely unused for screen share only.

...

- I'm not hard set on technologies, and I'm OK to get rid of CBOR is there is consensus on it. I personally still think that it's a superior solution.

To me the use of CBOR here feels not well motivated, except for obscure "better performance" reasons before having done any measurement to back that claim. From XMPP perspective, something in a Jingle XML stream would be more canonical (because it reuses the stack we already have in every XMPP client anyway) and anything diverting from that IMO should be well reasoned. If you're reasoning that CBOR provides significant performance gain over XML, then why is it not a priority to figure out how we use CBOR instead of XML everywhere in XMPP (e.g. by creating some XML<>CBOR translation and using that as an optional stream feature).

...

- regarding using RFB for input events only, I'll have a deeper look at the spec and evaluate it. It may be an option it is comparable in ease of implementation, efficiency and flexibility to the current proposal.

I want to repeat that I haven't verified that RFB is particularly good fit for the purpose, I just know it's very popular. Best, Marvin

Goffi

4:04 p.m.

Le mardi 21 mai 2024, 16:39:28 UTC+2 Marvin W a écrit :

...

Hi Goffi, On Tue, 2024-05-21 at 12:47 +0200, Goffi wrote:

I know that, I've just ruled out using <message> through the server as it has been proposed in another feedback.

I missed that you had a particular use case in mind for using <message> via the server. Could you please tell me more about this use case and why it isn't covered by Jingle transmission? Specifically, what is the advantage of sending <message> through the server instead of handling it directly? I'd like to understand your perspective better.

...

[SNIP] RFB definitely is old, so these kind of things are expected. And, while I see that you added clipboard as a potential future extension, it seems odd to complain that RFB has a suboptimal implementation of a feature your proposed XEP currently doesn't have at all.

That's just something that jump on eye after a quick check, if UTF-8 is not natively supported, I can see problems coming. But I'll check in more details anyway, as I've said before, I'm not against a protocol change if it makes sense. Regarding that it seems odd to you because clipboard sharing is not yet specified, that's simply anticipation.

...

[SNIP] I know that your specification doesn't transfer the modifier flags, probably assuming they are superfluous. However, if your browser client was to naively send the key events it receives as is without further checking for plausibility, things will go wrong: I tested pressing the keys that would logically result in the events meta down, control down, control up, meta up and here are the results on different browsers: https://imgur.com/a/zVxDAVa

That's the job of the controlling client to assure consistency. That may be specified in business rules though.

...

From what I understand, the state of keyup and keydown events in the web API doesn't need to be consistent (e.g. there can be keydown without keyup and vice-versa). Do we want the same behavior for this protocol or something else?

The wire format comes from the web API, but we are not developing browsers, we are developing XMPP clients.

...

And we come back to the point where I don't see the need to another way of sending input, when there is already a low latency one. I'm not saying that you are wrong, I'm saying that I don't see why we should have another mechanism, when there is already low latency way to send input data to any device. So please, provide me one or more use cases where the current specification is not valid and would not work, or work sub-optimally. At risk of repeating myself: I'm not closed-minded about changing protocols or designs, and I appreciate feedback from people with other experiences, but please provide clear examples of use cases where the current design is incorrect.

...

[SNIP]

Again, it is not from scratch. It's re-using existing protocols, in a simple, working, easy-to-implement, and efficient way.

It's not arbitrarily at all, it's discarding data which don't make sense in this context, and it has been done while doing an implementation with Freedesktop remote control portal.

...

I've already said that I'll reformulate to only make is a suggestion, without the "SHOULD".

...

[SNIP] My point is: Either it's a Jingle session or it's not part of XMPP. Jingle doesn't use WebRTC. It just happens that WebRTC APIs are somewhat compatible to Jingle (because they are based on Jingle), but from XMPP perspective, you never have WebRTC sessions. I don't know exactly what it means to be in the same WebRTC session, but whatever you want here, make it more explicit, because people that don't use WebRTC APIs should not be required to first read the WebRTC specs (or probably implementations source code) to figure out what you mean by that.

Right, I'll review this section.

...

That's a problematic design. As I said, clients might scale the video to reduce bandwidth use. Dino also has logic to adjust the video resolution of cameras depending on available bandwidth.

I was thinking about sending the screen size at the beginning, but the issue is when size change (e.g. remote application control when application is resized). Issue with [0,1] coordinate is that you go into prevision loss or rounding error troubles. I think that ideally the screen size should be send separately and updated.

...

And as I understood for mouse, it's not relative to the screen, but relative to the previous position, aka a movement vector, like reported from touchpads. An screen relative position that is 0,0 is upper left corner, 0.5,0.5 is center of the screen and 1,1 is lower right corner, would work independent of the target screen resolution.

It would not work in the case of a FPS when you have already reached the right corner of your screen and you need to go right again.

...

An alternative would be to specify screen dimension when establishing the remote control session.

Might work, but then you also need to cover the case where the screen resolution changes during remote control.

Yes, that with update on screen change is probably the best option.

...

[SNIP] The Web API uses double because they did weird things for HiDPI. On the hardware layer, there are only pixels and if you click on a point on the screen, it will always be on a pixel (at least in all OS that I am aware of). The transformation of HiDPI in browsers abstract away from actual pixels and 1px might be more or less than a physical pixel. But why would you want to carry this abstraction through the network to a system that shouldn't care about what browsers can do and what they think a pixel is?

I have no strong argument against this to be honest. I'm fine with int too.

...

We also don't allow Jingle file transfers of no file or RTP contents without any codecs. As this protocol is for remote control, it should remain entirely unused for screen share only.

Sure, I'll change that.

...

- I'm not hard set on technologies, and I'm OK to get rid of CBOR is there is consensus on it. I personally still think that it's a superior solution.

While, if I had time and resources, yes I definitely think that CBOR or similar would be a good serialization protocol. Bet let's not go down this rabbit hole ;)

...

I want to repeat that I haven't verified that RFB is particularly good fit for the purpose, I just know it's very popular.

The idea is to check it. I want something flexible, easy to implement, and efficient. If RFB or whatever else checkes the boxes, why not.

...

Best, Marvin _______________________________________________ Standards mailing list -- standards(a)xmpp.org To unsubscribe send an email to standards-leave(a)xmpp.org

Marvin W

22 May 22 May

2:46 p.m.

Hi Goffi, On Tue, 2024-05-21 at 18:04 +0200, Goffi wrote:

...

Could you please tell me more about this use case and why it isn't covered by Jingle transmission? Specifically, what is the advantage of sending <message> through the server instead of handling it directly? I'd like to understand your perspective better.

The use case I'm thinking of has low throughput and only short usage time. I might be sending 10 or 20 key events within a short time and then nothing for several hours. Technically, this can all be done with Jingle, but for just a few keys, the overhead of starting a Jingle session just for those keys probably adds way more latency than sending those keys via <message> through an XMPP server. And using Jingle would require way more complex software on both sides. Without a proper specification to send keys, I would do this via non standardized body messages. Works, but isn't particularly nice. I also noticed that in cases where XEP-0174 Serverless Messaging is used, an additional Jingle connection probably doesn't add a lot of benefit either.

...

That's the job of the controlling client to assure consistency. That may be specified in business rules though. The wire format comes from the web API, but we are not developing browsers, we are developing XMPP clients.

Well, XMPP clients that also speak a ton of other protocols, including the one you are just creating. My point is that not only does this need to be specified in the business rules, but also a ton of other things. There are probably a lot of side cases that you don't cover and where I can't reasonably expect Council to think about them.

...

There is no best way to do remote control (or precisely, input event sending). Low latency is a nice property to have and certainly doesn't hurt if it comes for free, but it doesn't come for free, it adds complexity, requires a multi-message setup phase and often the management of additional network sockets. And depending on the use case, it might be less important to have low latency than low complexity. The lowest complexity way to provide a streaming transport with Jingle is IBB and it still requires connection setup and is strictly worse than <message>'s for most purposes (except transporting of large binary blobs that exceed single stanza size limits). Both Jingle and especially WebRTC come with huge complexity. Your WebRTC library and your existing code for working with it might take away most of this from you, but that doesn't mean it's not there. By using Jingle and WebRTC you're effectively excluding clients, devices and platforms that can't easily run libwebrtc or any other popular WebRTC implementation.

...

It's not arbitrarily at all, it's discarding data which don't make sense in this context, and it has been done while doing an implementation with Freedesktop remote control portal.

I was already guessing it's not arbitrarily, but probably what made sense in your setup and for your usecase. However, not knowing any of that it *seems* arbitrary. The RemoteDesktop portal was clearly designed for remote desktop use cases, not other remote control cases. However, as you already mention that you designed the data sent around what is needed for the RemoteDesktop portal, why not send the information directly in a format that matches the design of RemoteDesktop portal, instead of a mix of Web API interfaces and RemoteDesktop portal? Also I noticed that the RemoteDesktop portal does not have a notion of an independent wheel, the mouse wheel is tied to the pointing device, why did you choose to not do it the same way? I also checked Wayland protocol and wheel axis events are tied to a wl_pointer as well - and as far as I know, apps do expect scroll wheel to be tied to a pointer so you know where to scroll.

...

Issue with [0,1] coordinate is that you go into prevision loss or rounding error troubles. I think that ideally the screen size should be send separately and updated.

The precision on a double (64 bit floating point) remain the same, no matter if you scale [0,1] or [0,<screen-width>]. The precision is about 15 decimal digits which should be more than enough (you barely see screen coordinates with more than 4 decimal digits), even if you do calculations on them (which may result in a few bits of precision loss).

...

It would not work in the case of a FPS when you have already reached the right corner of your screen and you need to go right again.

Assuming you refer the FPS games, those "lock" the cursor position to the screen center, so they never have that issue. To correctly reproduce this behavior you need a back channel to the controlling app so it can know the cursor position and/or lock if it is changed on the controlled device. (Above might not be correct on all platforms.) Also I did not intend to say that you shouldn't support movement vectors (like touchpads), I was just saying that absolute pointing could be relative to screen size, so that you don't need to know the absolute screen size.

...

While, if I had time and resources, yes I definitely think that CBOR or similar would be a good serialization protocol. Bet let's not go down this rabbit hole ;)

The advantage of going down this rabbit hole is: a) We improve XMPP for other usecases b) You can specify this protocol using XML and use Jingle XML streams. As the CBOR<>XML translation will take care of creating the CBOR for you, you still get the CBOR for this protocol, but without the need to make it explicit. And in cases where people prefer to not use CBOR, they can still use this protocol, just with XML. It's a win-win for everyone (except that you as the specification author have more work). If going forward, you still want to specify your own payload/application protocol (that is, the CBOR thing that is transferred with the Jingle streaming transport), I'd like to ask you: - To evaluate if a XEP is the right place to specify such a protocol, of if it is more a generic thing that could well be used outside XMPP and maybe should also be specified elsewhere. - If you consider a XEP to be the right place and want to stick with your CBOR protocol, I'd like to ask you to split it into two parts: 1. the payload protocol (sections 8 and 9 of the proposal) and 2. The Jingle signaling protocol (sections 5 to 7 of your proposal). This way the protocol can be used and referenced easily for use outside of Jingle context. If you feel it's possible to transition to a <message> based approach, this can of course be a single XEP (that will barely have anything to do with Jingle except for anecdotal mentioning that it can be used with Jingle XML stream or serverless messaging for lower latency). Best, Marvin

Goffi

27 May 27 May

8:54 a.m.

Le mercredi 22 mai 2024, 16:46:47 UTC+2 Marvin W a écrit :

...

Hi Goffi,

Hi Marvin, Seeing the proposition rejected is definitely disappointing, and I would like to have a clear statement of the reason why the Council thinks this work is "unacceptable". For now, the biggest criticism I've seen is that this protocol specification is… specifying a protocol (which again is required by XEP-0166 for Jingle applications). This seems quite arbitrary to me, and I would like to have a clear statement on why this specification is "unacceptable" to the Council. Specially when I've clearly stated several times that I'm open to changing the payload format and using an existing protocol if they prove easy to implement, flexible, and efficient. The experimental state is made for that. Thanks in advance.

...

Alright, I understand better now. That's true; because I have advanced features like Jingle and WebRTC implemented in my software, I'm willing to build on them, but they are not available everywhere. For ease of implementation, it could be indeed interesting to have an <message> based way, usable either via server or via Jingle. On the other hand, this introduces complexity by itself (now the payload can go through two different ways), and I'm not sure that, especially for a niche feature that most clients won't implement, that we should use an inferior solution because hypothetically, in a niche use case within the niche feature, a client may not have Jingle implemented. Jingle is a major stack of XMPP, and it should be implemented in any advanced client according to IM compliance suites. WebRTC is already optional; you can use any streaming transport.

...

Without a proper specification to send keys, I would do this via non standardized body messages. Works, but isn't particularly nice. I also noticed that in cases where XEP-0174 Serverless Messaging is used, an additional Jingle connection probably doesn't add a lot of benefit either.

That's another niche in the niche. That's really highly hypothetical, and even if this could be done directly, it doesn't hurt to add an additional Jingle connection. But that said, I'm not firmly opposed to moving to <message> based payload, if that can unblock the situation.

...

Of course, a proto-XEP is not meant to be perfect at first edition; that's exactly what the experimental status is for. And it's not the job of the Council to think about side cases - that's what standards@ and feedbacks from the whole community are for. Maybe I got it wrong, but for me, the job of the Council is to keep technical stuff on track by ensuring that advancements in XEP statuses are done in order (i.e., X independent implementations, Y feedbacks, etc. as stated in relevant XEPs), and vetoing things that are really unacceptable (e.g., copyright issues, something totally irrelevant, offensive content, etc.). And it's the role of the larger community on standard@ to work on technical stuff, side cases, ease of implementation, and optimization. I realize that there isn't a real definition of what should be an "acceptable" proto-XEP; maybe this should be specified? Because I've seen proto-XEPs refused by some Councils then accepted by others, and this seems quite arbitrary to me.

...

[SNIP] Both Jingle and especially WebRTC come with huge complexity. Your WebRTC library and your existing code for working with it might take away most of this from you, but that doesn't mean it's not there. By using Jingle and WebRTC you're effectively excluding clients, devices and platforms that can't easily run libwebrtc or any other popular WebRTC implementation.

Again WebRTC is not mandatory in my specification. Any streaming transport can be used, as designed by XEP-0166, including in-band via XEP-0261. So we're just talking about Jingle, and this can be implemented on any platform, which is required for advanced IM client according to current compliance suit.

...

I was already guessing it's not arbitrarily, but probably what made sense in your setup and for your usecase. However, not knowing any of that it *seems* arbitrary.

Use cases are already explained in the specification. For my current implementation, I have implemented a controlling device in a browser and a basic one in a CLI (currently sending only keyboard events for now). I have also implemented a controlled device in a CLI, which works with Wayland and desktop portal. The implementation should not be a problem on other platforms that I target in the long run (Windows, Mac, Android, iOS, BSD, etc.). Actually, it should not be a problem on any platform.

...

The RemoteDesktop portal was clearly designed for remote desktop use cases, not other remote control cases.

That's incorrect. Despite its name, you can actually only use the Remote Desktop portal to send input; the Screen Sharing part is entirely optional (and must be explicitly requested).

...

However, as you already mention that you designed the data sent around what is needed for the RemoteDesktop portal, why not send the information directly in a format that matches the design of RemoteDesktop portal, instead of a mix of Web API interfaces and RemoteDesktop portal?

The data matches, except for keyboard events that are represented using evdev codes on Linux, whereas I was looking for a more platform-independent solution. The Web API turned out to be the easiest option I've found, but I'm open to considering an alternative if needed.

...

Also I noticed that the RemoteDesktop portal does not have a notion of an independent wheel, the mouse wheel is tied to the pointing device, why did you choose to not do it the same way?

No, despite its method names (`NotifyPointerAxis` and `NotifyPointerAxisDiscrete`), the wheel device is independent of the pointer, actually no pointer coordinates are sent when sending wheel events. And that makes sense: it's not the pointer coordinate that's important, but rather where the focus is. You can change focus with a keyboard, for instance. As I've said in my previous message, the wheel device, while often associated with mice, can also be independent.

...

[SNIP] The precision on a double (64 bit floating point) remain the same, no matter if you scale [0,1] or [0,<screen-width>]. The precision is about 15 decimal digits which should be more than enough (you barely see screen coordinates with more than 4 decimal digits), even if you do calculations on them (which may result in a few bits of precision loss).

The issue is not about the number of digits, but the fact that some numbers cannot be represented by doubles. The first case I'm thinking of is 1/3, which can lead to a rounding error and having the wrong pixel selected at the end. Whether or not this is a problem depends on the use cases we want to handle, but using pixels directly avoids this issue. Anyway, using [0,1] is not a bad idea, as it avoids the need to transmit screen size and screen size updates. It can be a better solution indeed.

...

Indeed, it may be a better option. I can change that. I'll check how other protocols deal with this issue and may use one of them directly.

...

We have already EXI (XEP-0322) for that (I don't know how it compares to CBOR though). Again, I'm not against getting rid of CBOR if it is a show stopper for people.

...

If going forward, you still want to specify your own payload/application protocol (that is, the CBOR thing that is transferred with the Jingle streaming transport), I'd like to ask you: - To evaluate if a XEP is the right place to specify such a protocol, of if it is more a generic thing that could well be used outside XMPP and maybe should also be specified elsewhere.

I'll evaluate other specifications. But yes, a XEP is, in my opinion, definitely the right place to specify a protocol. The fact that part of it is a Jingle application doesn't change the fact that it's globally an XMPP Extension Protocol. XEP-0166 states that the application payload protocol must be specified. And even if we use XML extensively, XMPP is not about XML. We already use many non-XML data formats.

...

- If you consider a XEP to be the right place and want to stick with your CBOR protocol, I'd like to ask you to split it into two parts: 1. the payload protocol (sections 8 and 9 of the proposal) and 2. The Jingle signaling protocol (sections 5 to 7 of your proposal). This way the protocol can be used and referenced easily for use outside of Jingle context.

I'm willing to strike a balance between efficiency, ease of implementation, and flexibility. I don't care if it's CBOR or anything else. I've heard your argumentation, and will consider using <message>, or another existing protocol. It will take time, though; I'm busy with other things at the moment, and my current implementation is working well. If anybody is interested in implementing this specification anytime soon, please contact me - I can try to re-order my priorities.

...

If you feel it's possible to transition to a <message> based approach, this can of course be a single XEP (that will barely have anything to do with Jingle except for anecdotal mentioning that it can be used with Jingle XML stream or serverless messaging for lower latency).

Got it. I'll evaluate the various options we've discussed. Thank you for your time and detailed feedback - it's much appreciated.

...

Best, Marvin

Best, Goffi

Goffi

9:02 a.m.

Le lundi 27 mai 2024, 10:54:25 UTC+2 Goffi a écrit :

...

And that makes sense: it's not the pointer coordinate that's important, but rather where the focus is. You can change focus with a keyboard, for

instance. Correction here, after testing (I should have done it before posting :) ), it seems that I'm actually wrong, on my KDE instance it's not the focus but the pointer position which matters for the wheel. That doesn't invalidate my other points though.

Marvin W

10:29 a.m.

Hi Goffi, Before going on your specific points, I'd like to make clear that I'm all in favor of this happening and this vote was very uneasy to me, because I also don't want to discourage you. I still do think that this XEP is bad for many reasons, some of which I apparently fail to get across, so I'll try to make it more explicit and to the point in this email. On Mon, 2024-05-27 at 10:54 +0200, Goffi wrote:

...

For now, the biggest criticism I've seen is that this protocol specification is… specifying a protocol (which again is required by XEP-0166 for Jingle applications).

One major criticism is that it specifies TWO protocols. The first protocol (sections §5-7 of your proposal), which clearly belongs into the XSF, is how to use Jingle to signal a remote desktop session. As you rightly point out a Jingle application format (as per XEP-0166 §12.1) must DEFINE how the "media data" is sent. However, it doesn't need to SPECIFY the media data format. E.g. XEP-0167 defines you should use RTP packets as specified in RFC3550. The second protocol (sections §8-9) is the one that is the media data to send via the Jingle session. However that protocol is largely independent of Jingle or anything else and could be and IMO better would be specified entirely independent of that. When I said that you should consider to evaluate if a XEP is the right place to specify such a protocol, I was only referring to this part: As it stands right now, this second protocol could easily be used independent of XMPP and Jingle. I would thus see this protocol more as an RFC.

...

On the other hand, this introduces complexity by itself (now the payload can go through two different ways)

Under the assumption that all remote control signals are sent using <message> and it is up to the receiving client to decide to accept or error them, there is no additional complexity introduced, by adding the possibility of a different transport path. In fact, you wouldn't even need to specify which transport path to use, you specify the stanzas to be sent to do remote control (to maintain remote control session and for input events). The fact that one decides to send those via Jingle XML streams and others use serverless messaging doesn't need to be specified, it's exactly the same stanzas being sent in both cases.

...

Jingle is a major stack of XMPP, and it should be implemented in any advanced client according to IM compliance suites.

Except that many clients are not meant to be advanced IM clients. I wouldn't expect a remote desktop application to strictly also be an advanced IM client. Looking at the feature set we require from advanced IM clients, I'm pretty sure most existing remote desktop apps do not qualify as advanced IM client (probably not even core im client, because group chats are definitely not common for remote desktop apps). Again, You seem to be coming from the position that a client implementing this is already a very feature rich and advanced client like yours, but this assumption comes with a huge amount of restrictions.

...

Maybe I got it wrong, but for me, the job of the Council is to keep technical stuff on track by ensuring that advancements in XEP statuses are done in order (i.e., X independent implementations, Y feedbacks, etc. as stated in relevant XEPs), and vetoing things that are really unacceptable (e.g., copyright issues, something totally irrelevant, offensive content, etc.). And it's the role of the larger community on standard@ to work on technical stuff, side cases, ease of implementation, and optimization.

This does not match my understanding of Council work. I see Council as clearly a technical position, not a mostly organization position. In fact, some of the tasks you listed are clearly on the Editor side and shouldn't even reach Council (e.g. copyright issues and offensive content).

...

I realize that there isn't a real definition of what should be an "acceptable" proto-XEP; maybe this should be specified? Because I've seen proto- XEPs refused by some Councils then accepted by others, and this seems quite arbitrary to me.

We do have a Council of multiple members, because Council work is not solely fact-based, but also involves individual opinions of its members. If approving a XEP for publication was exactly following a checklist of well-defined points, we wouldn't need the Council for it.

...

So we're just talking about Jingle, and this can be implemented on any platform

I do agree that Jingle can likely be implemented on any platform, but it might be that you can only do so using Jingle IBB (e.g. because your network controller can only maintain a single TCP connection), in which case using Jingle is really not an improvement.

...

That's incorrect. Despite its name, you can actually only use the Remote Desktop portal to send input; the Screen Sharing part is entirely optional (and must be explicitly requested).

I'm pretty sure you can't send absolute pointing events (via NotifyPointerMotionAbsolute like for a drawing tablet) or touch motion events (via NotifyTouchMotion) using the RemoteDesktop portal without also opening a screen cast, because the PipeWire stream node of the screen cast is a parameter for those APIs. So while some events can be sent without the Screen Sharing, it's not entirely optional.

...

As I've said in my previous message, the wheel device, while often associated with mice, can also be independent.

The RemoteDesktop portal clearly ties it to the pointer device, it's not only that the name is NotifyPointerAxis but you also must request a POINTER device in the session (i.e. only requesting a KEYBOARD and TOUCHSCREEN device does not allow you to use the API for scrolling).

...

We have already EXI (XEP-0322) for that (I don't know how it compares to CBOR though).

I know we have EXI and I know basically nobody implements it. It's very complex, has a lot of features and for best results requires to agree on a common set of XML schemas. Having something else than EXI that is easier to implement really might be a good idea, because I doubt EXI is going to ever be successful. -- So as a summary, for me the deal breaker really is the two protocols in one XEP, one of which is not really an XMPP protocol. If you do like the two protocols that you built - after all, you have implementation experience that I entirely lack, so it may be that all my concerns are invalid and I'm happy to have that in Experimental - I just feel that the second one should best not be at the XSF and if there's really no better place, make it at least a separate XEP. Marvin PS: I also firmly believe that splitting those two protocols will make the protocol design better, because you can't (or at least are less likely to) have weird interaction between unrelated protocols anymore.

Goffi

29 May 29 May

10:06 a.m.

Hi Marvin, Le lundi 27 mai 2024, 12:29:36 UTC+2 Marvin W a écrit :

...

[SNIP] The first protocol (sections §5-7 of your proposal), which clearly belongs into the XSF, is how to use Jingle to signal a remote desktop session. As you rightly point out a Jingle application format (as per XEP-0166 §12.1) must DEFINE how the "media data" is sent. However, it doesn't need to SPECIFY the media data format. E.g. XEP-0167 defines you should use RTP packets as specified in RFC3550.

Specifying an external protocol is basically specifying the payload format, and specifying it inside the XEP is the same thing. In some case it's better to use external ones, in other cases it's relevant or even better to specify directly the payload. That said, I'm not saying that my proposal is the best option. I'll evaluate other protocols to see if there is a good match.

...

The second protocol (sections §8-9) is the one that is the media data to send via the Jingle session. However that protocol is largely independent of Jingle or anything else and could be and IMO better would be specified entirely independent of that. When I said that you should consider to evaluate if a XEP is the right place to specify such a protocol, I was only referring to this part: As it stands right now, this second protocol could easily be used independent of XMPP and Jingle. I would thus see this protocol more as an RFC.

It's not the same hurdle to create another specification: it requires additional time and energy (and writing this one already took a lot). For a simple and small protocol like the one I've proposed, I really think it's not worth it (given my limited available time, as I'm working extensively on many things). The necessary work is definitely higher for a RFC, and I'm not even sure where to start. For a bigger and more complex protocol, it would definitely make sense yes, but I don't think that it's the case here.

...

We lose all the interest of having a jingle application here. Jingle XML Streams is a Jingle Application, so if I use that, Remote Control can't be one anymore. We then don't have the possibility anymore to add remote control during a call session, or to indicate the intent. We first need to establish the XML stream, then find another protocol to advertise the remote control session.

...

Jingle is a major stack of XMPP, and it should be implemented in any advanced client according to IM compliance suites.

If we talk about remote desktop, we need Jingle for the video stream anyway. I've taken the IM example because it's the more common, but it works with A/V Calling compliance too.

...

Again, You seem to be coming from the position that a client implementing this is already a very feature rich and advanced client like yours, but this assumption comes with a huge amount of restrictions.

I think that Jingle is a major feature that is reasonable to ask as a prerequisite for remote control, it's the only "advanced" feature necessary. Many XEP do that already, or are assuming, e.g., that XEP-0045 is implemented, should we get rid of those because an "advanced" feature is required?

...

This topic worth discussion, and has been spanned on another thread, so I won't continue here.

...

So we're just talking about Jingle, and this can be implemented on any platform

If you are working on restricted devices, you can have a "host" device and only establish the stream connection with the controlled device. But anyway, one hand people say that my proposal is too flexible, and on the other hand we say that we should handle any niche case under the sun. In the vast majority of case, a streaming Jingle connection should be relatively easy to establish.

...

That's incorrect. Despite its name, you can actually only use the Remote Desktop portal to send input; the Screen Sharing part is entirely optional (and must be explicitly requested).

I'm pretty sure you can't send absolute pointing events (via NotifyPointerMotionAbsolute like for a drawing tablet) or touch motion events (via NotifyTouchMotion) using the RemoteDesktop portal without also opening a screen cast, because the PipeWire stream node of the screen cast is a parameter for those APIs. So while some events can be sent wihout the Screen Sharing, it's not entirely optional.

That's true for absolute pointing, that's why there are relative methods, and my specification says to use relative ones when there is no attached video stream. That doesn't change the fact that Remote Desktop portal is designed to work even if there is no ScreenCast.

...

As I've said in my previous message, the wheel device, while often associated with mice, can also be independent.

That's true for freedesktop API, but that's an implementation detail, nothing prevent to have dedicated permission. Both web API and desktop portal use separate events for the wheel, I've just followed that.

...

Sure that could be nice, but I definitely don't have time to work on this in the foreseeable future.

...

So as a summary, for me the deal breaker really is the two protocols in one XEP, one of which is not really an XMPP protocol. If you do like the two protocols that you built - after all, you have implementation experience that I entirely lack, so it may be that all my concerns are invalid and I'm happy to have that in Experimental - I just feel that the second one should best not be at the XSF and if there's really no better place, make it at least a separate XEP.

I have done implementation that's true, but I'm not an expert in remote desktop implementation either. I'm listening to feedback and comment, and will take them into account. I disagree with you and Singpolyma about the wire format description for the stream, and don't see the problem to have such a simple and small protocol described in the same specification. However, I'll explore alternative and see if there are better options. So to summarize: - I'll explore alternative, notably RFB (or any other suggestion if somebody has a good proposition). But only for remote control, I want by design to keep the desktop screen sharing separated, and under XEP-0167 (or any future specialised XEP if proven better). - If I can't find a good alternative, I'll evaluate the use of <message> instead of current CBOR based data, maybe in a separated XEP. - I'll evaluate the use of a protocol usable via server or Jingle XML Stream. But I'm currently really not convinced by that due to the reason exposed above and previously. However, I won't have time to work on that before months, I'm currently very busy with other things. If anybody is interested in implementing remote control meanwhile, please contact me.

...

Marvin

Thanks again for your time and extended explanation of your point of view. Even if I disagree on some parts, I'm listening and many points are sensible. Best, Goffi

Dave Cridland

27 May 27 May

1:07 p.m.

New subject: Council (and what it does, and what it should do)

Hey, I happened to notice this discussion buried in this thread, and thought it was important to surface it. On Mon, 27 May 2024 at 09:55, Goffi <goffi(a)goffi.org> wrote:

...

I see Council as doing a finger in the air, human intervention into the bits of the process we have left - sometimes deliberately - vague. If you like, Council fills the gaps that are simply too hard to codify. Marvin notes the Editor can, and should, catch many of the things Goffi lists - and that's true, but it is ultimately Council's responsibility to ensure these are caught. The Editor can't actually veto; Council therefore must. But we don't have a set of reasons for veto of proto-XEPs, in particular, and that's been a highly contentious issue, though thankfully also a very rare one. I've rejected protoXEPs as arbitrarily as anyone else when in Council, but loosely a few things crop up repeatedly: * Unwarranted duplication of effort: The problem being solved is already at least partially addressed by an existing solution, and it seems better to fix that than wholesale replace it. * Wrong venue: The problem is useful to solve, but needs to be solved elsewhere (usually the IETF), which has this kind of thing in scope, and has expertise to help. * Awful: The problem is being solved in a truly awful manner and I don't see how it can be fixed without nuking it from orbit and starting over - or - the problem is just entirely the wrong problem to solve and should be nuked from orbit and not started over. As I spent more time in Council, I tended not to veto protoXEPs for the latter. I'm not sure this was a good idea - it seemed fairer at the time, but then you end up with a bunch of people who think the awful idea was in fact good, and then you have to argue it out at Last Call which is much more fractious. The problem is that all of these reasons - and there are certainly others - are very much personal decisions, and people will entirely understandably disagree with individual decisions, and even the process (or lack of it) as a whole. There was a bit of a surge in the opinion that protoXEPs should be always accepted, a while back, and I've not been following how this has worked out in practice (and if it has). Equally, I've seen other proposals suggesting much higher bars for accepting a protoXEP, with in effect a pre-Experimental stage tacked on beforehand. I think this would be bad, too, and risks just accreting stages for no real benefit - but it's also essentially inevitable if the bar for accepting a protoXEP is raised too high. Anyway... I think it'd benefit the community to have a bit of a discussion on what they expect from a random Experimental XEP, and also the kinds of things that can achieve consensus as reasons for Council to veto. This might result in some process updates - or might just highlight that we don't want to codify, leaving it to personal opinion. But I think a discussion would certainly be beneficial. Although also noisy, sorry. Dave.

MSavoritias

3:10 p.m.

New subject: Council (and what it does, and what it should do)

On Mon, 27 May 2024 14:07:49 +0100 Dave Cridland <dave(a)cridland.net> wrote:

...

Hey, I happened to notice this discussion buried in this thread, and thought it was important to surface it. On Mon, 27 May 2024 at 09:55, Goffi <goffi(a)goffi.org> wrote:

This seems to also hint at a wider discussion on the XEP process that has been brought up here and there. One example recently with many xeps being in experimental while they are reccomended in the compliance suite and moving them all to stable which as far as I have seen basically means that changes are rare to happen and its encouraged to fork the xep instead. I also have heard that Experimental is supposed to meet some kind of bare minimmum but as we can see here or with XEPs like xhtml this is certainly not the case. I have heard of more cases but these are the ones coming to me now. My thoughts as I have said before and I think it was proposed somewhere would be: - Experimental XEPs shouldnt get a number. Like ietf does. This means no more "pollution" of the xep number space with random xeps that went nowhere. - Council should move to voting only to move to stable. Where xeps get a number and there is a certain bar to be passed for the XEP to be accepted. - We should be much more open to change Stable xeps in the sense of upgarding them to later versions instead of adding even more xep numbers to look for. case in point the horrible muc avatar situation right now. All this would mean that we can remove at least 4 states for XEPs to be in with a quick look. Which the states seem to be 11 now for some reason. MSavoritias

Florian Schmaus

2 Jun 2 Jun

6:30 p.m.

New subject: Council (and what it does, and what it should do)

Thanks Dave, I really appreciate you kicking off this discussion. On 27/05/2024 15.07, Dave Cridland wrote:

...

I've rejected protoXEPs as arbitrarily as anyone else when in Council, but loosely a few things crop up repeatedly: * Unwarranted duplication of effort: The problem being solved is already at least partially addressed by an existing solution, and it seems better to fix that than wholesale replace it.

I am very skeptical to have any kind of "duplication" criteria. First, in some cases there is no one-size-fits-all. Very possible that we we will not end up with the one-and-only XMPP IoT protocol extensions. But maybe multiple with different goals and tradeoffs. Secondly, the XSF should not encumber competition. At least not in early stages. And quite frankly, the XSF is also not in any position to do so. Just because a XEP is rejected, does not automatically mean that it will not get implemented. The developers and users of XMPP software also weight in and have an impact on the resulting ecosystem. Therefore, I suggest that the XSF embraces competition in the early stages and, in case of duplicated efforts, limits itself to advocating certain extensions.

...

Equally, I've seen other proposals suggesting much higher bars for accepting a protoXEP, with in effect a pre-Experimental stage tacked on beforehand. I think this would be bad, too, and risks just accreting stages for no real benefit - but it's also essentially inevitable if the bar for accepting a protoXEP is raised too high.

Such a pre-experimental stage already exists, whether we like it or not. People work on XMPP extensions, and if the bar is too high, they will just work on those extensions outside of the XSF [1]. And that is really a pity and something we should fix. What I'd like to see is that the XSF creates a place to cater for those ProtoXEPs (as how I will refer to pre-experimental XEPs in the following). Could be as simple as creating a directory protoxeps/ in xsf.git and ensuring that the contents of this directory rendered and available under xmpp.org/extensions/protoxeps. I hope that this will get us a long way towards fighting the fragmentation that we have [2]. We should make it crystal clear to readers of those ProtoXEPs that they did not undergo any expert review yet. As consequence, this means that the authors of those ProtoXEPs must be aware that it is not impossible that their specification may need a major overhaul before it can enter the 'experimental' stage [3]. In exchange, ProtoXEP may break their wire protocol without a namespace bump (which must also clearly documented). Consequently, this means that Council should focus on the technical side when presented with a ProtoXEP for adoption. With a particular focus on how idiomatic the XEP, when it comes to XMPP. For example, is there an attribute when it should be an element? And once a XEP has been honored with an number. Strict namespace versioning rules should apply. Interoperability is a valuable asset in XMPP land, that we must protect. Last but not least and not directly related to strict namespace versioning rules, but related to namespacing: I started to wonder if it were not better if we requiring a new XEP number in case of a namespace bump. This would help when referencing XEPs. For example, if I would task some random stranger to implement xep384, I would probably not end up with the implementation that I wanted. - Flow 1: This happened plenty of times, for example the various MUC alternatives (MUCLight, MucSub, etc.) 2. XMPP extensions residing on personal homepages and/or personal gits 3: A recent example for this was Guus' "PubSub Server Information" (xep485): At first reluctant, since there was already an implementation, Guus could be convinced that the wire protocol and XML design should be improved for the greater benefit.

Marvin W

7:33 p.m.

New subject: Council (and what it does, and what it should do)

Hi, On Sun, 2024-06-02 at 20:30 +0200, Florian Schmaus wrote:

...

Therefore, I suggest that the XSF embraces competition in the early stages and, in case of duplicated efforts, limits itself to advocating certain extensions.

I generally agree. However there has been cases, where a ProtoXEP is submitted without prior interaction was the community and then within the first feedback it is found that there is overlap in functionality with another XEP and that there is a good possibility to make them cooperate instead of compete. Often even the author agrees and is willing to do the adjustment, but changes to ProtoXEPs during the Council review phase are not encouraged (because they reset the voting). Competition might be purely accidental and we should filter those cases *before* they go Experimental (because Experimental is what people implement). I think we might want to have a "scribble" space, where people just dump ideas that certainly don't qualify as Experimental (e.g. because it's just examples). This should be super low barrier (and going through the git certainly is not) and allow for easy sharing of such ideas for very early feedback. In a perfect world, that would be an online tool where you can create and edit your own scribbles (using probably markdown) and make "edit suggestions" to those of others (that they can review and decide to merge or reject). Not sure if something like this exists, I am just daydreaming :) I think Experimental became a high bar not because of the Council, but because the processes don't match what people like to use during their early development and prototyping phase. However this is the phase where feedback is easiest incorporated. Marvin

Goffi

3 Jun 3 Jun

8:34 a.m.

New subject: Council (and what it does, and what it should do)

Le dimanche 2 juin 2024, 21:33:55 UTC+2 Marvin W a écrit :

...

I think Experimental became a high bar not because of the Council, but because the processes don't match what people like to use during their early development and prototyping phase. However this is the phase where feedback is easiest incorporated.

I'm not arguing in one direction or the other, just providing feedback based on my experience: for a couple of years, I've had to implement specifications that were entirely off the XSF circuit, but were really useful or de facto standards (e.g.: early OMEMO). This is terrible because you have to know about them or search for them, the website hosting them can be down at any time, and there is no standard workflow. This is currently the case for at least the following: - A way to encrypt XEP-0320 fingerprint for old OMEMO (https://gist.github.com/ iNPUTmice/aa4fc0aeea6ce5fb0e0fe04baca842cd), which is important for achieving real end-to-end encryption with clients that only support legacy OMEMO. - A new version of DataChannel signaling (XEP-0343), which I'm only aware of because I've discussed it on xsf@ (https://gist.github.com/iNPUTmice/ 6c56f3e948cca517c5fb129016d99e74). This is not even a specification, it's an XML dump. - XEP-0447: Stateless file sharing, where great improvements have been discussed for multi-files support and backward compatibility, but they are just discussions on xsf@ history so far. - Some interesting Auth mechanisms, notably OAuth on Ejabberd (https:// docs.ejabberd.im/developer/ejabberd-api/oauth/#ejabberd-listeners), and I think Prosody has also done work in this area. And probably others. Note that I'm not blaming anyone for this situation. I know how much work it is to write a XEP, even a small one, and everyone is busy. Most of this work is done on a volunteer basis. However, the result is a mess. It would be easier to have everything in a single place, even if it's not perfect or final (again, that's what the "Experimental" workflow is for). I'm also worried about implementing stuff based on snippets scattered around the internet, or missing super interesting features because they are not properly specified or I'm not aware of them. I'm not sure what the best solution for this is, but I wanted to provide my feedback. Best, Goffi

Peter Saint-Andre

2 Jun 2 Jun

8:43 p.m.

New subject: Council (and what it does, and what it should do)

On 6/2/24 12:30 PM, Florian Schmaus wrote:

...

On 27/05/2024 15.07, Dave Cridland wrote:

Once again I would like to suggest that we make it easier to publish experimental XEPs (basically first come, first served à la Internet-Drafts at the IETF). This was our policy in the early days of the JSF/XSF, until the Council decided that it needed to exercise more control or, if you prefer, provide more wise oversight. XEP numbers are cheap and I don't see why we can't rapidly iterate and innovate within the XEP space (consider that XEP-0045 went through 30 versions over ~60 days in 2002 before being advanced to Draft). If that ship has sailed because we now have convinced ourselves that XEP numbers have deep significance, then by all means let's provide an XSF place for ProtoXEPs. But we should recognize that the same urge to control things will rear its head eventually, and then we'll have a discussion about an XSF place for ProtoProtoXEPs. It's "proto" all the way down! Peter

Dave Cridland

3 Jun 3 Jun

8:46 a.m.

New subject: Council (and what it does, and what it should do)

On Sun, 2 Jun 2024 at 21:44, Peter Saint-Andre <stpeter(a)stpeter.im> wrote:

...

On 6/2/24 12:30 PM, Florian Schmaus wrote:

On 27/05/2024 15.07, Dave Cridland wrote:

get

us a long way towards fighting the fragmentation that we have [2].

Agreed, but also: The IETF used to have a pretty low bar for Proposed Standard RFCs, and drafts were really just that. However, if you have a gating point - the IETF Last Call for publication as an RFC, (and usually a WG Last Call before that), then people tend to use it, and it causes a "left shift", to borrow someone else's phrase (possibly Scott Bradner's, actually). This has happened in the IETF, so they in effect made I-Ds the new Proposed, as per the meaning in RFC 2026, Proposed became Draft (but not in name), and Internet Standard remained. While I-Ds are a free-for-all in the IETF, WG-adopted I-Ds are not, and they also involve a gating point, so the IETF frequently has high-numbered revisions of individual I-Ds eventually getting adopted (or discarded). In many WGs, it's the I-D stage where things are implemented, now, with the result that there's an ongoing discussion about whether a WG can stipulate that drafts have to have implementations before adoption... So gating functions get used, and generally more than they should be especially the further left they are, pushing the entire process along a notch. Dave.

Goffi

9:02 a.m.

New subject: Council (and what it does, and what it should do)

Le dimanche 2 juin 2024, 22:43:27 UTC+2 Peter Saint-Andre a écrit :

...

I completely agree with this. I don't think that creating another status or location for proto-proto-XEPs would be beneficial, as it would only add more confusion. We already have /inbox for this purpose. I want to add that rejecting a proto-XEP can be highly discouraging for contributors, especially first-time contributors who may feel that their work was for nothing (just to be clear: this is not my experience with the proto- XEP submission that sparked this discussion; but I have been in the XMPP community for over 15 years and have submitted several specifications before, so my perspective may be different from that of newcomers). I suggest that we clearly state somewhere (such as in a "write a proto-XEP" document) that talking to the community before starting any work is highly recommended. However, this should not be mandatory, as people may be experimenting with ideas and specifying them at the same time. The "Experimental" state is there for the feedback, improvement, and update cycle, or even retraction. Council input is valuable, but except in cases where a proto-XEP is really flawed, it should not be blocking a move to "Experimental" in my opinion. However, requests for council input should be summarized so that XEP authors can update their work accordingly. Also, statements such as "I don't like" or "this specification is bad" are not helpful and may be disrespectful of the work done by authors. All feedback from experienced community members is valuable, and I see no reason why council feedback should matter more. We should not be afraid of namespace bumps in the "Experimental" stage. I have seen people argue against it several times, but I believe this is a mistake. The "Experimental" stage is for experimentation, and breaking things is a part of the deal. If people implement experimental specifications, they should be prepared to update regularly. Ultimately, the final decision on the relevance of a specification will be made by implementers and users. Best, Goffi

Marvin W

10:37 a.m.

New subject: Council (and what it does, and what it should do)

Hi, On Mon, 2024-06-03 at 11:02 +0200, Goffi wrote:

...

The purpose of inbox is to ask for Council to review to move to Experimental, it's not a location for developing a XEP. The location for developing a XEP from its early stage is Experimental, however Experimental is also where we have production-grade XEPs that are widely implemented. In a perfect world, people would never release and enable functionality in production software to end-users, that is based on Experimental XEPs. Evaluation of the move to Stable is based on technical review and experimental implementations, production implementation is not a requirement for Stable (we only require production implementation for Final). However, practice has shown that in the past our processes have been to slow. The time from drafting a rough idea to implementing it in production software is often shorter than the time from rough idea to get a XEP in Experimental, let alone Stable. I tried to circumvent this by writing XEP-0447 (and the bunch of other XEPs I submitted at the same time) way ahead of when I want to invest the time to implement it. I personally haven't done a proper implementation of most of its features yet and was rather gathering feedback from the list. So there was no need to implement this for interoperability from anyone (as there was for some other early stage XEPs), yet there already have been implementations from the community in released software. This underlines my point that Experimental is considered "ready for use in production software" by some.

...

I want to add that rejecting a proto-XEP can be highly discouraging for contributors, especially first-time contributors who may feel that their work was for nothing (just to be clear: this is not my experience with the proto- XEP submission that sparked this discussion; but I have been in the XMPP community for over 15 years and have submitted several specifications before, so my perspective may be different from that of newcomers).

As explained, we have de-facto reached the state where something that's considered useful to developers is in Experimental for more than a few months, we will see it in production and therefor will have to consider interoperability and compatibility with this specific Experimental version for at least some time. And that is where the feeling for the need of a higher bar to Experimental comes from. So to reduce the bar for Experimental, we have to move to Stable faster or the ecosystem will move faster than we manage the XEPs.

...

I suggest that we clearly state somewhere (such as in a "write a proto-XEP" document) that talking to the community before starting any work is highly recommended. However, this should not be mandatory, as people may be experimenting with ideas and specifying them at the same time.

I agree it shouldn't be mandatory, but also we should encourage not only talking with the community before writing the XEP, but also to talk with the community before experimenting with the ideas. Because others might have done that before and/or have ideas that are worth including in the experimentation. The XEP creation process starts with the idea, not when writing it down, and IMO we need to find ways and provide the tools to support XEP creation before the formal writing process.

...

The "Experimental" state is there for the feedback, improvement, and update cycle, or even retraction.

As explained above, Experimental is implemented in production in practice, so retracting or rejecting it won't mean that people don't have to deal with it. We have the Deprecated state to refer to XEPs that have production implementations and need to be considered for interoperability and compatibility, but or not encouraged. However there is no path from Experimental to Deprecated - because Experimental shouldn't have had implementations.

...

Ultimately, the final decision on the relevance of a specification will be made by implementers and users.

So my personal takeaways on this are: === 1. We should move to stable faster. Developers that want to release functionality to their users that requires Experimental XEPs need to step up and push to accelerate the process, so that they don't have to release based on Experimental XEPs. 2. We should encourage the community to interact on XEPs early. Essentially make it such that if you plan to work on a topic, you throw a mail to the mailing list saying you're going to work on it so that everyone knows and can share any insights they have in the topic. We can also provide tooling to make this easier or refer to tooling provided by third parties. 3. We should more actively discourage release of functionality based on ProtoXEP and Experimental XEPs in production (except hidden behind feature flags or options clearly marked as experimental). 4. We should not have any bar for Experimental except for basics, because Experimental means nobody should implement it in production, so there is no harm in publishing it. === I'd love to get feedback on these 4 points so we can turn them into actionable items. Marvin

Goffi

12:10 p.m.

New subject: Council (and what it does, and what it should do)

Le lundi 3 juin 2024, 12:37:21 UTC+2 Marvin W a écrit :

...

The purpose of inbox is to ask for Council to review to move to Experimental, it's not a location for developing a XEP. The location for developing a XEP from its early stage is Experimental, however Experimental is also where we have production-grade XEPs that are widely implemented. In a perfect world, people would never release and enable functionality in production software to end-users, that is based on Experimental XEPs. Evaluation of the move to Stable is based on technical review and experimental implementations, production implementation is not a requirement for Stable (we only require production implementation for Final). [SNIP] As explained, we have de-facto reached the state where something that's considered useful to developers is in Experimental for more than a few months, we will see it in production and therefor will have to consider interoperability and compatibility with this specific Experimental version for at least some time. And that is where the feeling for the need of a higher bar to Experimental comes from. So to reduce the bar for Experimental, we have to move to Stable faster or the ecosystem will move faster than we manage the XEPs.

Experimental is for the specification. Adding whatever status or workflow change you want will not prevent developers from implementing whatever they feel is relevant. People have been implementing things that were not official XEPs because they felt they were relevant, experimental or not (e.g. OTR, OMEMO). Some teams may choose to implement only stable specifications, others want to move fast. I'm in the latter case and I'm okay to refactor or even rewrite something entirely once experience is gained. And I think that both ways are perfectly fine. The "experimental" state is clear: it may change and break. "Stable" may not be desirable for a while. For instance, I've not requested a state change on purpose for submitted specifications such as "XEP-0356: Privileged Entity" because I knew that it would take time and experimentation, and moving to stable would prevent that. I've made important changes years after the original submission. Also, I was working on the only software using it at the beginning; now at least Slidge, a popular bridge, is using it, and its feedback may be valuable for updates before moving to stable.

...

Once again, experimental is for that. Writing things down helps to clearly explain them and how we intend to do them. It's not a starting point for a feature, but it's definitely a good starting point for wider discussions.

...

As I've mentioned above, experimental or not, whatever new workflow won't prevent developers from implementing an unstable feature if it is relevant to them. Experience has shown this to be the case more often than not.

...

Relevant for some XEPs, but not all. Some time you don't want to move to draft on purpose, because you want to gain experiment, and feedback from others. You also need something written as a basis for discussion. My XEP-0356: Privileged Entity specification is an example.

...

2. We should encourage the community to interact on XEPs early. Essentially make it such that if you plan to work on a topic, you throw a mail to the mailing list saying you're going to work on it so that everyone knows and can share any insights they have in the topic. We can also provide tooling to make this easier or refer to tooling provided by third parties.

As long as it's not mandatory, it's surely a good thing to interact on something as early as possible. I personally like to have something written as a basis for discussions.

...

3. We should more actively discourage release of functionality based on ProtoXEP and Experimental XEPs in production (except hidden behind feature flags or options clearly marked as experimental).

It is already clearly stated. There is a red warning at the very beginning of the specification. If people chose to implement it, they have reason, and they know what they do. Specification is there for interoperability and explaining how things are done. XSF job is notably to build and maintain them, and help improvement through discussion and collaboration with the wider community. It's not the job of the XSF to say to developer how they should present features, and to tell them (beside current indicators) what to implement or not.

...

4. We should not have any bar for Experimental except for basics, because Experimental means nobody should implement it in production, so there is no harm in publishing it.

Beside extreme cases, I'm also advocating for letting most if not all specification go to experimental. However, beside current warning, it's developers choice to use them in production. Best, Goffi

Marvin W

12:58 p.m.

New subject: Council (and what it does, and what it should do)

Hi, On Mon, 2024-06-03 at 14:10 +0200, Goffi wrote:

...

In cases where things got widely adopted without being a XEP in first place, we use Historical XEPs to document that. Of course people can always implement things that are not official, but in this case it's obvious that they may have incompatibilities and take special care of this.

...

Some teams may choose to implement only stable specifications, others want to move fast. I'm in the latter case and I'm okay to refactor or even rewrite something entirely once experience is gained. And I think that both ways are perfectly fine.

It's not as soon as your change causes incompatibility issues with other clients if they don't follow and also "move fast" and implement experimental functionality - and if everyone needs to follow yours to be compatible, it's effectively stable, because changes can't be done without breaking compatibility again. So we should just move forward and call it stable, and be happy to fork into a new XEP number that supersedes the old one if needed.

...

The "experimental" state is clear: it may change and break.

That seems to not be clear to anyone, to be honest, especially not end- users and also not in how we market our products. OMEMO is in Experimental and yet a lot of clients feature it as regular functionality and breaking it would have terrible consequences and render communication between some clients impossible. Realistically, siacs OMEMO (XEP-0384-v0.3) is in fact a stable specification, even if we wanted to and eventually did breaking changes to it afterwards. It's a mess. The cleaner way would have been to advance v0.3 to stable and do a new XEP for a second version OMEMO that is what the current XEP. If I want to go to the extreme, I'd say that if something needs a namespace version bump, because an incompatible change was made to a specification that was already implemented and deployed, something went wrong. One XEP should match exactly to one namespace and a new version should just be a new XEP.

...

"Stable" may not be desirable for a while. For instance, I've not requested a state change on purpose for submitted specifications such as "XEP- 0356: Privileged Entity" because I knew that it would take time and experimentation, and moving to stable would prevent that. I've made important changes years after the original submission. Also, I was working on the only software using it at the beginning; now at least Slidge, a popular bridge, is using it, and its feedback may be valuable for updates before moving to stable.

Experimental certainly is not the place to just dump an example. You also don't create an Experimental XEP "I want to do Remote Control and will upload content here in 2 months after my experiments" - but I would want to encourage both public pre-announcements and a place to publicly scribble ideas. Of course writing things down properly helps to find these things, but it also means that the process takes longer and feedback and collaboration will only be possible at a later stage - which can worst case mean more wasted energy and time, and we all don't really have energy and time to waste.

...

The problem is that people DO implement features based on Experimental and consider this de-facto stable. That's why we do namespace version bumps: because developers don't expect the XEP to break, otherwise it would be no issue to change things in breaking way without namespace version bump. Reality is: The big red warning doesn't change how people see Experimental XEPs and claiming it bigger while not changing how we actually treat it really isn't going to change it.

...

Gaining experience and feedback from others and discuss doesn't require to publicly release and distribute software that uses it to a large userbase. Experimental is totally for these things: experimenting, discussing, incorporating feedback. It's not for production, because then you typically can't break anymore without terrible consequences. XEP-0356 might be a special case here, because it is not very important that it has good interoperability. Most usecases require a single server to work with a single component and those to be compatible. We don't have a large landscape of clients and servers involved that need to all agree on a single standard. So breaking changes there might be easier even if already deployed. Additionally, all software using it is largely considered experimental as well.

...

As long as it's not mandatory, it's surely a good thing to interact on something as early as possible. I personally like to have something written as a basis for discussions.

As I said above, writing a short mail to the mailing list "I am going to work on XXX the next weeks, who has done something in this area already?" is not a basis for a discussion and it's not meant to be. It's to get everyone informed and ensure we don't have duplicated efforts. This doesn't and shouldn't need to have something larger written. I also wouldn't want to do something like this mandatory, but it honestly wouldn't hurt if it was mandatory to publicly announce that you're working on topic X before proposing a XEP on the topic.

...

It is already clearly stated. There is a red warning at the very beginning of the specification. If people chose to implement it, they have reason, and they know what they do.

As explained above, reality shows that this is not the case. OMEMO being a perfect example. We'll be stuck with a shitty version of it probably for another decade because people implemented it into production while it was pre-Experimental or still very much Experimental. So I do disagree, people don't consider the consequences of Experimental XEP because in practice, we consider Experimental XEPs stable and will namespace bump instead of doing breaking changes, so there are no compatibility issues. And clients tend to implement multiple versions of Experimental XEPs for good interoperability.

...

It's not the job of the XSF to say to developer how they should present features, and to tell them (beside current indicators) what to implement or not.

We typically don't consider WhatsApp an XMPP client even if we know it's XMPP with just a bunch of not publicly documented "extensions". XSF can go as far and say "Client X is not compatible with XMPP as it implements a protocol that is still Experimental". We do keep a list of clients that "implement XMPP" so it is within the scope of the XSF to decide what it means to be an XMPP client and requiring that only stable specifications must be enabled in the default configuration of a client to be considered an XMPP client is certainly valid. Marvin

Goffi

2:06 p.m.

New subject: Council (and what it does, and what it should do)

Le lundi 3 juin 2024, 14:58:23 UTC+2 Marvin W a écrit :

...

Changes don't cause incompatibilities, we have namespaces to handle that. You can perfectly adapt to the change according to namespace. The issue is that people are afraid that other won't implement new things.

...

The "experimental" state is clear: it may change and break.

My client implements both old and new OMEMO and adapts to the right version thanks to namespaces. Doing a new XEP would not have changed anything. People have implemented OMEMO because there was a need (and pressure from users and other IM solutions). Waiting for a new version years later would have only put XMPP in oblivion. Now, projects have to handle priorities, and the current version is working well enough that they handle the implementation when it makes sense to them. Also the "proper" new version is more difficult to implement, so it makes sense that it's done progressively.

...

If I want to go to the extreme, I'd say that if something needs a namespace version bump, because an incompatible change was made to a specification that was already implemented and deployed, something went wrong. One XEP should match exactly to one namespace and a new version should just be a new XEP.

I totally disagree on that. Namespace bumps are not that expensive and should be used freely in experimental XEPs. But my position is known there, so there's no need to continue in circles forever. I would like to see other inputs.

...

> > I agree it shouldn't be mandatory, but also we should encourage not > > only talking with the community before writing the XEP, but also to > > talk with the community before experimenting with the ideas. > > Because > > others might have done that before and/or have ideas that are worth > > including in the experimentation. The XEP creation process starts > > with > > the idea, not when writing it down, and IMO we need to find ways > > and > > provide the tools to support XEP creation before the formal writing > > process.

If they have done it before there should be an experimental XEP. My point is actually that experimental XEP should be easier to produce. And we may not all work the same way, but implementing, and writing specification can go in parallel to clarify and draft an idea. Then experimental is there for community feedback/update/experiment cycle.

...

Experimental certainly is not the place to just dump an example.

I've never said that. I've just said that the current situation, and I've actually said that it's not good. It's actually quite the opposite: my point is that I would like to make it easy to create a minimal specification, with enough information to be implementable. A Council vote would ensure that the minimal requirements for an implementation are there, and then the feedback, update, and experiment cycle can start. This would be far better than having to rely on specifications, code, and dump snippets scattered all over the web.

...

You also don't create an Experimental XEP "I want to do Remote Control and will upload content here in 2 months after my experiments" -

Why is that? People can try to implement this version and give feedback on what is good, bad, or ugly, or even without implementing it. Why would it not be okay to update it based on experiments and feedback after a period of two months?

...

but I would want to encourage both public pre-announcements and a place to publicly scribble ideas. Of course writing things down properly helps to find these things, but it also means that the process takes longer and feedback and collaboration will only be possible at a later stage - which can worst case mean more wasted energy and time, and we all don't really have energy and time to waste.

Of course, the process would take longer: pre-announcement, explanation, waiting for feedback (how much? 24 hours? 2 weeks?), understanding feedback, explaining that the idea was not understood correctly, etc. Here, everybody knows that work is done on a specific topic when protoXEP is submitted, work has started, everybody can read the spec to see exactly what is it about and the vision of the author, and feedback. I don't says that pre-announcement is a bad thing, but it's clearly delaying everything and may actually make things more complicated.

...

I don't see a problem that people do implement experimental feature, but if they consider it as stable, that means that they can't read big red warning (or just the warning and state, people may be colorblind). Not sure about others, but I do implement experimental feature, I do expect them to break, and I'm not worrying because there are namespaces. Actually, why do you even think that developers don't expect the XEP to change? "break" is not even the right word here because they don't break precisely because there are namespaces.

...

Certainly not "publicly release and distribute software that uses it to a large userbase", who has said that? A first draft of a specification describing what is the idea, what is it used for, and how to implement it is certainly very much useful. Regarding the choice of implementing or not something, it's the job of developments teams, not of XSF.

...

[SNIP] As explained above, reality shows that this is not the case. OMEMO being a perfect example. We'll be stuck with a shitty version of it probably for another decade because people implemented it into production while it was pre-Experimental or still very much Experimental. So I do disagree, people don't consider the consequences of Experimental XEP because in practice, we consider Experimental XEPs stable and will namespace bump instead of doing breaking changes, so there are no compatibility issues. And clients tend to implement multiple versions of Experimental XEPs for good interoperability.

No, OMEMO was implemented due to need and pressure, and it's a good thing that it was done because it took years to have a good version. In the meantime, support for OMEMO has been advertised as a strength of XMPP in literature for years. It is absolutely possible to have old and new versions running in parallel (my client does it), and clients will eventually do so when the need arises. There are already several clients implementing OMEMO:2.

...

XSF can go as far and say "Client X is not compatible with XMPP as it implements a protocol that is still Experimental". We do keep a list of clients that "implement XMPP" so it is within the scope of the XSF to decide what it means to be an XMPP client and requiring that only stable specifications must be enabled in the default configuration of a client to be considered an XMPP client is certainly valid.

So, clients that implemented OMEMO when it was just a web page outside of the XSF would no longer be considered "XMPP clients"? If you want to kill XMPP once and for all, that's probably the best way to do it. Best, Goffi

MSavoritias

3:29 p.m.

New subject: Council (and what it does, and what it should do)

On Mon, 03 Jun 2024 16:06:37 +0200 Goffi <goffi(a)goffi.org> wrote:

...

Le lundi 3 juin 2024, 14:58:23 UTC+2 Marvin W a écrit :

The "experimental" state is clear: it may change and break.

Agreed. We don't need yet more XEP numbers to shift through. We already have way too many of them. Instead we should start using the namespaces more. And also we shouldnt be afraid to update Stable XEPs if there are substaincial changes. I mean we have feature discovery and namespaces for that reason :) MSavoritias

Marvin W

3:55 p.m.

New subject: Council (and what it does, and what it should do)

On Mon, 2024-06-03 at 18:29 +0300, MSavoritias wrote:

...

Totally viable option. I would even go further with this (knowing it doesn't work exactly like this for some specifications): - An Experimental XEP is identified only by its namespace. Namespaces are cheap so we can give them out to everyone without any evaluation whatsoever. - When a XEP is turned into Stable, it gets a number. The primary identifier remains to be the namespace. - When we bump a namespace of a XEP it's effectively a new XEP (because the identifier is a new one) and thus goes to Experimental. It also looses it's number (or the number continues to refer to the previous namespace) but we remember what number it was derived from. - When a XEP goes to Stable that was derived from a XEP that had a number, we reassign the number to the newer XEP, so the XEP number is always referring to the latest stable version of that XEP. Thereby we allow for breaking changes through namespace bumps, won't shift through numbers and a XEP when identified by its namespace would never see incompatible changes after being moved to Stable, but when identified by its number it would always refer to the latest version. Problem is: People already have an understanding of the XEP numbering system, so we can't easily change that. If two implementations implement a Stable version of a XEP number, we currently expect those two to be able to talk with each other. A namespace bump would prevent that, so it would be good to at least keep the specification for the old namespace around as Deprecated. I really dislike how we currently in OMEMO have to refer to a specific Experimental version when talking about what the production software implements. Marvin

Maxime Buquet

4 Jun 4 Jun

9:10 a.m.

New subject: Council (and what it does, and what it should do)

I'm catching up on this thread and I'm replying to two of your mails at the same time. On Mon, 03 Jun 2024 12:37:21 +0200, Marvin W wrote:

...

I tried to circumvent this by writing XEP-0447 [..] yet there already have been implementations from the community in released software. This underlines my point that Experimental is considered "ready for use in production software" by some.

No, this only says that this is a much needed feature, whether or not the protocol is considered "ready for use in production software" (whatever that means[0]) is irrelevant IMO.

...

Experimental isn't the issue. The thing is that people want features / improvements and will implement them, and that's great. There's no preventing it. By doing so they should also ready to also do the work to update their software whenever the spec gets updated.

...

because Experimental shouldn't have had implementations.

How does one even experiment without implementation?

...

3. We should more actively discourage release of functionality based on ProtoXEP and Experimental XEPs in production (except hidden behind feature flags or options clearly marked as experimental).

And that's how you end up with Pidgin not having MAM or the like for years. Because they indeed refuse to implement Experimental specs. Life happens, and people working on specs also get hit by buses / have other things to take care of. The burden can't be put on just one person alone to do the work. It happens but it's rare that council does something about it. In other words. You can discourage all you want, that won't stop anybody from implementing what they need. And I'd rather have people take a half-baked spec in Experimental and try to improve on it and report back. They may release it however they want. On Mon, 03 Jun 2024 14:58:23 +0200, Marvin W wrote:

...

On Mon, 2024-06-03 at 14:10 +0200, Goffi wrote: > The "experimental" state is clear: it may change and break.

...

That seems to not be clear to anyone, to be honest, especially not end- users and also not in how we market our products.

Yes it is clear. It's you that don't want to accept that people may choose to implement a protocol that will likely break. Your following example with 0384 is only good in that it would have made it much more easy to find if there was another XEP number assigned for > 0.3.0, as flow and goffi have said (or will say?) in this thread. [0]: https://bouah.net/2022/09/versioning/

Marvin W

10:18 a.m.

New subject: Council (and what it does, and what it should do)

Hi, On Tue, 2024-06-04 at 11:10 +0200, Maxime Buquet wrote:

...

No, this only says that this is a much needed feature, whether or not the protocol is considered "ready for use in production software" (whatever that means[0]) is irrelevant IMO.

Of course there are three kinds: (a) Those that consider the protocol ready for use in production software and thus use in production software (b) Those that consider the protocol not ready for use in production software, but don't care because they want the feature and don't want to fix the protocol before using it in production software (c) Those that consider the protocol not ready for use in production software, but need to implement it for compatibility with other production software, because of those in (a) or (b) I'd say that: (a) should just step up and make sure the protocol is turned stable, if it can't be turned to stable, they might even learn why the protocol is in fact not ready for use in production, so it's good for them if they try to move it further. (b) is just irresponsible behavior. Irresponsible towards your users (by shipping things to them you consider broken yourself) and towards the wider community (by requiring everyone to now deal with what is not ready for production in their production software). (c) is the worst that we have it, but impossible not to have as long as there is (a) and (b). I'd hope we can get rid of (a) and (c) through changes in the process and (b) by education. When I talk about production software, I'm referring to the server and clients that are used by thousands of end-users and that are therefor impossible to ensure are properly updated. And I'm talking exclusively about released production software (whatever that means for the project, aka what they suggest their end-users that want somewhat stable software to use).

...

I never said Experimental is an issue. The problem is also not that people want features that are currently only available in Experimental. The problem is that we don't manage to move specifications from Experimental (or even before that) to Stable before they are so widespread that changes are hardly possible. If you can't experiment with it anymore (because it would break production software), then it shouldn't be in Experimental.

...

How does one even experiment without implementation?

I was referring to production software implementations, not implementations in test software, feature branches or other experimentation places.

...

And that's how you end up with Pidgin not having MAM or the like for years. Because they indeed refuse to implement Experimental specs.

MAM should have been Stable for years before it was marked so that everyone has a basis they can develop on. Pidgin is right in saying they don't want to implement something that says "don't ship this to endusers" in it's header. It's just that many others decide to ignore that warning and therefor there is inconsistency in what's supposed to happen with Experimental XEPs.

...

Life happens, and people working on specs also get hit by buses / have other things to take care of. The burden can't be put on just one person alone to do the work. It happens but it's rare that council does something about it.

Not sure what you mean. I never said a single person should do a ton of work. I said that if a XEP becomes de-facto stable (because it has widespread production implementations and therefor can't be changed without causing compatibility issues) we should move it to stable. If there is still (backwards-compatible) work to be done for the XEP to be moved to stable and the author can't do it themselves, someone needs to act as a Document Sheppard as outlined in XEP-0001. The process is all defined, we just don't make use of it.

...

In other words. You can discourage all you want, that won't stop anybody from implementing what they need. And I'd rather have people take a half- baked spec in Experimental and try to improve on it and report back. They may release it however they want.

I am wondering why you think it's good they use Experimental in production. Wouldn't it be better if the spec they need is in Stable? The list of items I provided obviously only goes if we do all of them: Move things to Stable faster so that developers don't need to use things from Experimental in production.

...

Yes it is clear. It's you that don't want to accept that people may choose to implement a protocol that will likely break.

End-users don't decide what is implemented. If end-users see that the message editing they have in their client stops working from one day to the other (because other clients updated to a new, incompatible revision), they will be unhappy. We previously have worked around this by implementing multiple revisions of the same XEP at the same time. And this is where I question the use of the word Experimental (and the warning to not implement it): We do actually consider those revisions to be somewhat Stable and take care of compatibility with them. It's what client and server developers already do today. We look at old revisions to figure out how something was done in previous revisions if it's needed for compatibility. So apparently that old revision is in fact stable enough for it to be used even after it is replaced by a new revision.

...

Your following example with 0384 is only good in that it would have made it much more easy to find if there was another XEP number assigned for > 0.3.0, as flow and goffi have said (or will say?) in this thread.

There's two things that would have improved: a) We can reference and find the version being used due to the different XEP number b) We can tell people in the header that this is in fact ready for use in production as has been shown by a ton of people using it in production. Today XEP-0384 says it's unsuited for production in its header. It probably is not, but even if it is, that's only true for the latest revision. The older revision certain is fit for production. This is not about changing what people implement. This is about our process correctly reflecting what we do in practice. If you just want to rephrase of what Experimental means, we can also do that (making me wonder though what we will need Stable status for). Of course we can just adjust the wording in the header of Experimental XEPs to something like this:

...

This Standards-Track document is Experimental. Publication as an XMPP

Extension Protocol does not imply approval of this proposal by the XMPP Standards Foundation. Experimental status does not imply any encouragement or discouragement to implementations or deployments in production systems. Older revisions of this proposal may be implemented widely, the current revision might be not implemented at all and may be replaced anytime with a newer revisions. Some implementations will consider compatibility with older revisions of this proposal when implementing it, others may not. See [here] for information about implementations in exiting software. With [here] being a link to the data we gathered from DoaP. In this case I would suggest to update the DoaP specification to not point to a single revision when talking about implementation status, but allow to point to multiple revisions (but maybe that's already possible, not sure). Marvin

Goffi

12:52 p.m.

New subject: Council (and what it does, and what it should do)

Le mardi 4 juin 2024, 12:18:34 UTC+2 Marvin W a écrit :

...

Though I usually appreciate your feedback, I find this particular comment especially pedantic and patronizing. You are aware that you say people who implemented OMEMO, for instance, were irresponsible and should be "educated", right?

Marvin W

3:29 p.m.

New subject: Council (and what it does, and what it should do)

Hi Goffi, Thanks for your message. I know I'm not particularly good with words and my language sometimes tends to be perceived as aggressive or exclusive. I did not intend to attack or insult anyone and I apologize if I did. On Tue, 2024-06-04 at 14:52 +0200, Goffi wrote:

...

The people that *first* implemented and deployed OMEMO to a large number of end-users of the public XMPP network, before making a reasonable effort to stabilize the specification and to actually get the implementation itself to a stable state were in my opinion acting too careless. It's not always black and white, and to some degree the fault was and is often the XSF here, which is what this discussion was meant to be about: To adjust our XSF procedures to better reflect the need of the community. OMEMO was a mess, I think we all remember the days when half the messages on half of the devices would show up as "Message is OMEMO encrypted", even if their client was supposedly supporting some kind of OMEMO. Developers of clients were put on a public blame list for not implementing OMEMO fast enough. The reference for how things needed to work was not a specification, but a single implementation. And OMEMO still is a mess, next to nobody is implementing the latest revision, even though we know there are ways to upgrade that do not break anything. And those few that only implement the latest revision are totally screwed because their client is incompatible with what all others do, so they can't even do a lot of testing and are considered incompatible to OMEMO, even if technically it's everyone else that's incompatible. I sure hope we learn from this, "educate" ourselves and try to make sure it won't happen like that again. Marvin

Goffi

8:38 p.m.

New subject: Council (and what it does, and what it should do)

Hi Marvin, Le mardi 4 juin 2024, 17:29:00 UTC+2 Marvin W a écrit :

...

Thanks for your balanced reply. I'm not always diplomatic either :). Thing is, most of us are putting a lot of effort and passion into our work, often for many years. We may disagree on strategy, the way to do things, etc. This is perfectly fine and that's why we are discussing. But I trust the community to know well what it is doing, and even if we may disagree on the right way, I believe that dev teams know what they are doing.

...

AFAIR there was also a specification on Conversations website from early days. I don't think that is was a mistake, I've already exposed my opinion, but there was pressure from the IM alternatives and the users too, and at the time only OTR was available, and only implemented by few (with problems as we know). If we had waited, for the "right" version which was done at 2021-10-07 (version 0.8.1), I'm pretty sure that the state of XMPP ecosystem would be worst that it is nowadays. First specification is from 2015-10-25, that 6 years! OMEMO has been quoted many times in literature. I believe that XMPP is still considered as a viable protocol for IM and more by many partly thanks to OMEMO. The "Message is OMEMO encrypted" thing was notably due to one client using it by default, which is the only case I remember of a breaking protocol thing, and has nothing to do with specifications. This thing put a lot of pressure on developers. I don't blame the dev which did that, it was done for a reason and it's totally understandable, and maybe a good move. Again, we have disco and namespace to handle various versions, if a client implement a new version, it's a choice to keep previous implementation for compatibility, and I believe most are doing that in critical cases In the case of OMEMO, I only know one client that implement OMEMO:2 and not the legacy one, and this client is young and took some radical decision; I'm not sure if it's still the case, but at one point they didn't wanted to implement XEP-0045, despite its "stable" status, and wanted to focus on MIX, so the specification status didn't change a thing here. I really think that we should separate "specification" work from "implementation" work (put aside the reference implementation thing): you can have a buggy implementation of a final specification, or incomplete, and an "experimental" specification can have a rock solid implementation (which can be updated if new version arise). Also, it's not necessarily a good thing to rush to put something to stable: feedback also come from end-user, and they may suggest an interesting change which could be done with, say, a new attribute or element with a namespace bump, which is easily done with an experimental XEP, but hard to impossible in stable or final. And there are many use cases where important things may not be anticipated at all by developers because they don't know specific fields. I'm thinking about accessibility for instance. Feedback from community from implementing clients is previous there. Trying to freeze as many specification as possible as soon as possible can be counterproductive.

...

It's a choice to implement only latest version when everybody know that only few clients are implementing it at the time. First iteration took years too, because resources are scarce, and we all have our priorities. Without iterations, we would not have had OMEMO at all before end of 2021, and I'm pretty confident that several clients would then have disappeared. Best, Goffi

MSavoritias

5 Jun 5 Jun

6:50 a.m.

New subject: Council (and what it does, and what it should do)

On Tue, 04 Jun 2024 17:29:00 +0200 Marvin W <xmpp(a)larma.de> wrote:

...

As a person who is using OMEMO that is still the case. Nothing has changed. I get OMEMO encryption problems regularly to the point that i am using it only if: - The person I am talking to has only one device - Owns/will own the device for a long time and the key will remain static - Is 1:1 conversation Everything else is doomed to fail and get the XMPP is bad reaction. MSavoritias PS. I am NOT saying this to get responses of "File bug reports" or "Work together to improve things", for multiple reasons. One of them being that in general the community seems to have given up on making OMEMO work.

...

And OMEMO still is a mess, next to nobody is implementing the latest revision, even though we know there are ways to upgrade that do not break anything. And those few that only implement the latest revision are totally screwed because their client is incompatible with what all others do, so they can't even do a lot of testing and are considered incompatible to OMEMO, even if technically it's everyone else that's incompatible. I sure hope we learn from this, "educate" ourselves and try to make sure it won't happen like that again. Marvin _______________________________________________ Standards mailing list -- standards(a)xmpp.org To unsubscribe send an email to standards-leave(a)xmpp.org

Goffi

7:25 a.m.

New subject: Council (and what it does, and what it should do)

Le mercredi 5 juin 2024, 08:50:09 UTC+2 MSavoritias a écrit :

...

For the record, we'll have a meeting in Berlin next month (thanks to Debacle), to work exactly on that: interoperability issue with OMEMO (and A/V). For the wider community: please gather non working scenario, so we can work on them. Best, Goffi

Martin

10:16 a.m.

New subject: Council (and what it does, and what it should do)

Quoting Goffi <goffi(a)goffi.org>rg>:

...

For the record, we'll have a meeting in Berlin next month (thanks to Debacle), to work exactly on that: interoperability issue with OMEMO (and A/V).

I like to add: It is an open sprint, so everybody is invited to work on whatever they like :-) Please, everyone, register yourself, if chances are ≥ 1 % that you come to Berlin: https://wiki.xmpp.org/web/Sprints/2024-07_Berlin#Attendees Cheers

Stephen Paul Weber

4 Jun 4 Jun

1:43 p.m.

New subject: Council (and what it does, and what it should do)

...

3. We should more actively discourage release of functionality based on ProtoXEP and Experimental XEPs in production (except hidden behind feature flags or options clearly marked as experimental).

And that's how you end up with Pidgin not having MAM or the like for years. Because they indeed refuse to implement Experimental specs. In other words. You can discourage all you want, that won't stop anybody from implementing what they need. And I'd rather have people take a half-baked spec in Experimental and try to improve on it and report back. They may release it however they want.

The key this is not to refuse to implement experimental XEPs (indeed I would hope for at least one implemenation before it reaches experimental!) but rather to not refuse to make breaking changes to experimental XEPs based on the fact that people are using it. Implementors should be aware this is experimental and may change and be flexible to support multiple versions of it over time, etc.

Marvin W

3:45 p.m.

New subject: Council (and what it does, and what it should do)

Hi, On Tue, 2024-06-04 at 08:43 -0500, Stephen Paul Weber wrote:

...

Implementors should be aware this is experimental and may change and be flexible to support multiple versions of it over time, etc.

I think my point is that I don't have the feeling that this the case. With my developer hat, I for once do not generally expect that I will be forced anytime soon to do adjustments to the implementations of Experimental XEPs that are implemented in Dino. I obviously can't speak for others, but with OMEMO we certainly see that implementers are not generally aware that they should or are not willing to change to newer versions of it over time. Also, as some XEPs stay in Experimental for a long period of time, the people that originally implemented it somewhere might long have stopped working on it if an update happens years later - so there might just be nobody that can update the code for realizing a breaking change in some codebase. Marvin

Peter Saint-Andre

5 Jun 5 Jun

1:06 a.m.

New subject: Council (and what it does, and what it should do)

On 6/3/24 3:02 AM, Goffi wrote:

...

All feedback from experienced community members is valuable, and I see no reason why council feedback should matter more.

This is a very important point. Council members are not special people with special insight - they are simply community members who are temporarily serving in a role that is necessary within our standards process. When I was elected to the IESG, I wrote [1] the following quote on my whiteboard: "Power tends to corrupt." --Lord Acton Peter [1] https://stpeter.im/journal/1399.html

Dave Cridland

9:29 a.m.

New subject: Council (and what it does, and what it should do)

On Wed, 5 Jun 2024 at 02:07, Peter Saint-Andre <stpeter(a)stpeter.im> wrote:

...

On 6/3/24 3:02 AM, Goffi wrote:

All feedback from experienced community members is valuable, and I see no reason why council feedback should matter more.

Well, indeed. The question is, should their decisions be based solely on their own knowledge and ability, or should they based on a reading of community consensus, or some mixture? And if Council is acting at least to some degree as the enforcers of community consensus, then it ought to have the capability to enforce it outside of two cases (one of which it seems apparent we don't want).

...

When I was elected to the IESG, I wrote [1] the following quote on my whiteboard: "Power tends to corrupt." --Lord Acton

Lord Acton was only partly right - it doesn't inevitably corrupt, but it always produces a lot of waste heat. Dave

390

days inactive

412

days old

standards@xmpp.org

Manage subscription

43 comments

10 participants

tags (0)

participants (10)

Daniel Gultsch
Dave Cridland
Florian Schmaus
Goffi
Martin
Marvin W
Maxime Buquet
MSavoritias
Peter Saint-Andre
Stephen Paul Weber