[Standards] Rayo feedback.

Ben Langfeld ben at langfeld.me
Fri Aug 14 19:11:44 UTC 2015

On 14 August 2015 at 06:26, Kevin Smith <kevin.smith at isode.com> wrote:

> Again, Sorry Ben that I didn’t receive this mail at the time.

Hey Kevin!

I’ve elided lots of points that seem addressed (thanks).
> On 21 Jun 2015, at 20:53, Ben Langfeld <ben at langfeld.me> wrote:
> > 1) Does leading with the examples help or hinder here? I found the
> examples at the start of one particular use case left more more confused
> than I think I would have been jumping straight in to what it’s trying to
> achieve. (No impact on going to Draft)
> >
> > Would it be better, do you think, to move this example to be an intro to
> section 6 (Session Flow)?
> I think it might well be, yes. I’d like a second opinion, though to check
> it’s not just me being stupid. Fippo, perhaps?
> > 2) 5.1 (Actors) places requirements that these JIDs for
> components/mixers can only be only be under subdomains - why is this?
> AFAIK, this is the only part of XMPP that implies any relationship between
> a domain and a subdomain, and it doesn’t immediately seem like a useful
> restriction.
> >
> > Not true. The word I used was "perhaps". This is simply to point out
> that full JIDs must be used to address these entities and no relationship
> between domains may be assumed.
> I think that at least the table in 5.2 is quite explicit in requiring
> things to be a subdomain - I take it this wasn’t intended.

Actually quite the opposite:

> where elements in square brackets are optional

<call ID>@[<call sub-domain>.]<service domain>/<component ID>

Quite explicitly optional, I'd say.

> > 3) 5.1.6 Is calling things Components the most useful terminology here,
> when Components have a well-established meaning in XMPP (and a RAYO server
> is likely to be such a component).
> >
> > These are asynchronous, independent resources attached to a call. The
> term "component" came up in the very first days of this specification and
> has stuck. I would be open to suggestions for an alternative term if it
> appropriately conveys the meaning, but one does not immediately come to
> mind.
> Would resource work? These things seem to be addressed by their resource
> part in the JID. Again, I think another opinion would be helpful.

I'd be happy with resources, and will propose this change if someone else
involved (Ben Klang, Chris Rienzo, Jose de Castro or Fippo) agrees.

> > 4) 6.1’s reliance on a <show>chat</show> seems odd at best - wouldn’t a
> normal available presence be better here? I’m also not sure that the
> requirement for it to be directed presence is waranted - why wouldn’t
> broadcast presence work here?
> >
> > This is because the client's online status is disconnected from its
> availability to receive new offers in the same way as a human might be
> online but unavailable to engage in conversation.
> If their ability to receive offers is unrelated to their presence, does
> that not imply that presence is the wrong mechanism to be using here?

No, because they are available in as much as the Rayo server needs to know
that the client has not disconnected, but is simply not disposed to begin
new interactions. This is identical to a human in an IM situation.

> Regardless of that, if ‘chat’ is being used because things might be
> available but not receiving offers, I think there’s some explanation needed
> here. In what circumstances would a RAYO client want to be available but
> not taking calls?

When one is preparing a Rayo client for a smooth shutdown, particularly.
For example, in Adhearsion, this is the behaviour:

   - On SIGINT or SIGTERM Adhearsion does several things:
      - On first signal, Adhearsion marks its internal state as "shutting
      down" but continues to take and process calls normally.
      - On second signal, Adhearsion will continue to process calls already
      in the system, but will reject any new calls.
      - On the third signal, Adhearsion will send a HANGUP to all existing
      - In all of the above cases Adhearsion will shut down as soon as the
      call count reaches 0.
      - On the fourth signal, Adhearsion will stop immediately (forced

I don't want to specify in this much detail, but I will add a note to
explain the motivation for this feature.

The ‘why does it have to be directed?’ point still stands. To the service
> it’d be indistinguishable.

I will remove the requirement that it must be directed.

> > 5) 6.1 - if you want to rely on presence here, isn’t an unavailable
> presence the best way to signal unavailability? I don’t think it’s covered
> what receiving unavailable would mean here at the moment.
> >
> > See above.
> I think at least the second part of the question stands - what does
> receiving unavailable mean?

Means that the client has gone offline and will not interact with the calls
under its control any more. The Rayo server may choose to hang up those
calls, wait for the client to come back, or any other
implementation-specific behaviour.

> > 8) 6.2.1 How does the client discover the available URI schemes for
> to/from?
> >
> > No such discovery is specified, and it is assumed that a Rayo service
> would document this.
> It’s not clear to me what this means for interoperability. Does it mean
> that one can’t implement a Rayo client using this XEP and expect it to
> interoperate with an arbitrary Rayo service, because it won’t know what the
> available URI schemes are?

Even if this were available via Disco, it would make no difference. You
couldn't build your app to compensate. I think per-implementation/service
documentation is sufficient here.

> > 10) Use of presence for sending of notifications like this seems
> off. I realise this boat may have sailed, but it doesn’t seem right to me.
> >
> > We had this discussion during the Last Call, and the only alternative
> that was presented was a dependency on PubSub, against which I believe I
> presented a solid argument previously.
> I’m not exactly ignoring this comment, but I don’t have a sensible reply
> either.
> > 16) 6.3 The identifier for calls here is always a JID, isn’t it? If
> that’s the case, it’d make more sense to be using JIDs here, instead of
> adding the layer of indirection of a URI with a fixed scheme.
> >
> > A call URI will not necessarily always be a JID. It has been the
> intention since the start of this spec to leave open the option of other
> transports for Rayo, such as HTTP.
> In such a case, how will an entity know about the available schemes, and
> connect to them? If the implication is that there will need to be changes
> later to express how to interoperate with future systems, it suggests it
> wouldn’t be appropriate to push to Draft now with those changes pending.

Any such behaviour is very much a future concern; no-one has given it any
solid thought yet. Simply remaining generic in using URIs rather than
protocol-specific addresses seems harmless to me, though.

> > 17) 6.3 I think here we’re getting into the territory where presence
> stanzas are really not inappropriate for this
> >
> > Do you have an alternative suggestion, or a concrete argument against?
> I’d have thought that (for this case) just sending the message (probably
> as headline?) would be more appropriate? This seems to be trying to send
> what is logically a ‘joined’ message to the client, rather than an update
> of presence. Presence is generally the current state of an entity. If you
> use presence for ‘joined’ and you first joined A and then joined B, and so
> the most recent presence you received had ‘joined B’ in it, it implies
> under the usual XMPP semantics that your new presence has replaced the old
> one, and thus you’re no longer joined to A.

That's the first practical argument against the use of presence here that
I've heard so-far; thank you. I'll give it more consideration and either
propose a modification to the spec or produce a counter-argument.

> > 19) 6.4 "a server SHOULD represent a mixer internally using some
> alternative name scoped to the client's security zone and mapped to the
> friendly name/URI presented to the client for the emission of events and
> processing of commands” - I don’t entirely understand this. If it’s an
> internal representation, why is this important for interop?
> >
> > This is because mixer names may be important to the client (e.g. "sales"
> or "friday.meeting"), and should not be reservable by an individual client.
> Thus, the name of the mixer in memory should include some reference to the
> identity of the client which is interacting with it. This is not important
> for interop, but is important guidance for someone implementing a Rayo
> server.
> If it’s not important for interop or for security considerations (and this
> internal representation seems unlikely to be either), a non-normative
> implementation note seems more appropriate to me.

Absolutely; will propose.

> > 23) Example 44: This introduces ‘active speaker detection’, but doesn’t
> explain what this is (or reference an explanation), I think.
> >
> > It is what it says on the can, and is a common feature of media servers.
> Alright. I feel a bit uncomfortable introducing terms that I wouldn’t
> expect a typical XEP implementor to understand, but maybe it’s alright in
> this case.

I highly doubt a "typical XEP implementor" would be interested in
implementing a fully compliant Rayo server unless they were also a member
of the set of people who had heard that term before. See later points for

> > 24) "Once the last participant unjoins from the mixer, the mixer SHOULD
> be destroyed.” - in what scenarios would it be appropriate not to? Should
> this be discussed?
> >
> > I have nothing to say here. If someone does, I'd love to hear it :)
> I think how I phrased my question was a bit obscure. You used ‘SHOULD be
> destroyed’ - the use of SHOULD instead of MUST implies that there can be
> scenarios in which it is not appropriate to do it. As these scenarios
> aren’t self-evident it seems likely that this should either be a MUST, or
> some guidance on how a client would deal with it, and why a server would
> choose to do it might be appropriate.

In the absence of any good reasons to allow this flexibility (punting on
the decision not being one), I will make it MUST.

> > 25) 6.5 "A server SHOULD implement all core components” - what are the
> implications for clients if the server doesn’t implement some of these?
> >
> > They would receive a feature-not-implemented error attempting to execute
> these components, and it would limit the variety of applications that could
> be implemented on such a server.
> How would a client discover which were supported before it attempts them?
> Is there a potential interop issue if a server doesn’t implement the
> components that a client expects?

Yes. For example, if the core purpose of your application is to record
calls but the Rayo server does not implement the Record component, then the
best case scenario is that your application is quietly useless.

I guess we need to specify inclusion of the components' namespaces in

> >  30) - I think a quick description of the necessary addressing
> here would be useful.
> >
> > Which addressing are you referring to? The JID of the component? This is
> explained at http://xmpp.org/extensions/xep-0327.html#addressing.
> I’m not *sure* that it is. I don’t think it says what an output component
> is, and I don’t think that anywhere else (although I’ve only just glanced
> with a quick search) says that ‘output component’ is synonymous with either
> call, mixer or server component.

This is a subsection of the following, which very clearly explains what an
output component is:
6.5.3 Output Component

Media output is a core concept in Rayo, and is provided by the output
component. The component allows media to be rendered to a call or a mixer,
using the server's local media server. A server MUST support audio file
playback and MUST support the text/uri-list document format. A server MAY
support speech synthesis and MAY support SSML
<http://www.w3.org/TR/speech-synthesis/> (in which case the document should
be escaped or enclosed in CDATA). The component is created using an <output/>
command <http://xmpp.org/extensions/def-component-output>, containing one
or more documents to render, along with a set of options to determine the
nature of the rendering.

> > 31) Example 69 - I think this doesn’t give the units of time for the
> seek except in the example title and would be worth calling out.
> >
> > The units are specified as being milliseconds in the schema, so this is
> valid.
> The schemas are non-normative, so this should go into the normative text.

It is also in the normative text at

A positive integer, in ms.

> > 33) 6.5.4 - How is discovery of the optional/extensible mechanisms
> discovered?
> >
> > It's not. Server documentation only.
> If it’s not discoverable, how would a client written without reference to
> a particular server’s documentation interoperate with it?

It would not, and it could not reasonably hope to. I see no benefit to
discovery here; it wouldn't change the situation any.

> > 35) - When would the nomatch expect to be triggered? Presumably
> it’s not firing off e.g. whenever anyone says anything that isn’t a DMTF
> when a DMTF input is configured? Can it trigger multiple times, or is it
> removed after a match?
> >
> > A nomatch event would trigger in such circumstances that input is
> received which does not match a grammar. Input for a particular modality
> (eg speech or DTMF) is not received by a recognizer unless a grammar is
> specified for that modality. A nomatch is not a standalone Rayo event, but
> delivered as a completion event reason, and as such can only be fired once
> for a given component.
> >
> > These semantics are standard for speech recognizers and do not warrant
> specification in Rayo beyond what is already written.
> I’m not (yet) convinced that that’s true - one should really be able to
> implement a XEP without needing implicit knowledge of how it should be
> implemented. I think I could write a compliant implementation as things
> stand that is very much not what you expect, so tightening this up seems
> sensible to me. Others may disagree.

I disagree that one could expect this XEP to contain a recipe for an
implementation. If it were to attempt to it would run to many volumes. This
specification is not a typical small add-on to an IM scenario.

> > 36) 6.5.5 - I think the rules for what happens to the output when input
> begins aren’t defined. Although it’s implied that the output stops, does it
> continue again after input?
> >
> > No, this is specified as barge in behaviour, which is well understood in
> the field of IVR, and as such does not warrant re-specification in Rayo.
> I think the same holds true here as does for the previous point.

The point about "active speaker detection" holds here. If one is not
familiar with the term "barge in" and what happens in such a scenario as is
widely understood in the field, then one would not be successful in
building a useful implementation of a Rayo server.

At some point the specification of the protocol has to give way to what is
considered prevailing knowledge, much like MAM does not contain details of
how to implement a database.

> > 38) When there are joins involved, can’t there be multiple
> callers? If so, how does that affect e.g. "In send mode, only the audio
> sent by the caller is recorded.”?
> >
> > If CallA is joined to CallB and separately to CallC, and all joins are
> duplex, then a record component on CallA in send mode will record the same
> audio as is sent to CallB and CallC. If the record component is executed
> against CallB, then the audio sent from CallB to CallA, but not to CallC
> (because there is no path between B and C), is recorded.
> Would a line saying this be appropriate?

I guess I can add this as a non-normative note.

> > 40) are x-skill and x-customer-id defined anywhere? I think the
> <header…/> stuff is new here (it doesn’t seem consistent with previous use
> of <header…/>). What are the rules for header here?
> >
> > All use of <header/> elements in signalling related commands (like
> accept, answer, hangup, etc) are consistent. x-skill and x-customer-id are
> examples only, and there is no requirement to specify them.
> If they’re examples, how would a client understand them (presumably it
> does need to know which of these the server supports, and how to set them)
> - where are the possible headers documented?

Headers are specific to the signaling protocol. In the case of SIP, X-
prefixed headers are limitless in naming, much like HTTP. I will make this
point more explicit.

> > 41) 6.6.2 - if the client can’t handle the call, what’re the other
> options than rejecting it? (MAY)
> >
> > It may simply ignore the offer and allow it to be accepted by another
> PCP.
> Does that mean that this is effectively “MUST either reject the call, or
> ignore the offer to allow it to be accepted by another PCP”?

Sure, but it seems odd to me that we would specify that a client MUST not
take any action on a received stanza. Is that really necessary/desirable?

> > 42) 6.8.1 - is feature-not-implemented an odd error to use for a
> protocol violation?
> >
> > What would be the appropriate error to use here?
> bad-request is probably closer:
> "The sender has sent a stanza containing XML that does not conform to
>    the appropriate schema or that cannot be processed (e.g., an IQ
>    stanza that includes an unrecognized value of the 'type' attribute,
>    or an element that is qualified by a recognized namespace but that
>    violates the defined syntax for the element); the associated error
>    type SHOULD be "modify”.”
> whereas feature-not-implemented would be:
> " The feature represented in the XML stanza is not implemented by the
>    intended recipient or an intermediate server and therefore the stanza
>    cannot be processed (e.g., the entity understands the namespace but
>    does not recognize the element name); the associated error type
>    SHOULD be "cancel" or "modify”.”

This distinction is exactly why I chose feature-not-implemented. An
"unrecognized value of the type attribute" or other such bad-request would
look like this:

<message type="dog"/>

The protocol violation here would be of 6121, which this example (6.8.1)
does not violate.

Further precedent at http://xmpp.org/extensions/xep-0045.html#reservednick
and likely elsewhere.

> /K
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20150814/07b84699/attachment.html>

More information about the Standards mailing list