[Standards] Rayo feedback.

Ben Langfeld ben at langfeld.me
Sun Jun 21 19:53:47 UTC 2015

On 16 June 2015 at 09:26, Kevin Smith <kevin.smith at isode.com> wrote:

> Sorry this is terribly late - I’ve been reviewing the Rayo XEP prior to
> voting on Draft, and I had a couple of questions/comments. This only covers
> the first half of the XEP (up to the end of section 6), as it seemed more
> useful for me to get the comments out than sit on them until I’m finished.

Thanks for the feedback Kev, it's much appreciated. I've addressed each
point inline below. All changes referenced below are at

> 0) The initial diagram shows SIP being used, with Jingle being optional on
> the other side. I think this is just an example, but is it worth calling
> this out more explicitly in the diagram perhaps by replacing “SIP” with
> “e.g. SIP” and Jingle similarly?

Fix available at

> 1) Does leading with the examples help or hinder here? I found the
> examples at the start of one particular use case left more more confused
> than I think I would have been jumping straight in to what it’s trying to
> achieve. (No impact on going to Draft)

Would it be better, do you think, to move this example to be an intro to
section 6 (Session Flow)?

> 2) 5.1 (Actors) places requirements that these JIDs for components/mixers
> can only be only be under subdomains - why is this? AFAIK, this is the only
> part of XMPP that implies any relationship between a domain and a
> subdomain, and it doesn’t immediately seem like a useful restriction.

Not true. The word I used was "perhaps". This is simply to point out that
full JIDs must be used to address these entities and no relationship
between domains may be assumed.

> 3) 5.1.6 Is calling things Components the most useful terminology here,
> when Components have a well-established meaning in XMPP (and a RAYO server
> is likely to be such a component).

These are asynchronous, independent resources attached to a call. The term
"component" came up in the very first days of this specification and has
stuck. I would be open to suggestions for an alternative term if it
appropriately conveys the meaning, but one does not immediately come to

> 4) 6.1’s reliance on a <show>chat</show> seems odd at best - wouldn’t a
> normal available presence be better here? I’m also not sure that the
> requirement for it to be directed presence is waranted - why wouldn’t
> broadcast presence work here?

This is because the client's online status is disconnected from its
availability to receive new offers in the same way as a human might be
online but unavailable to engage in conversation.

> 5) 6.1 - if you want to rely on presence here, isn’t an unavailable
> presence the best way to signal unavailability? I don’t think it’s covered
> what receiving unavailable would mean here at the moment.

See above.

> 6) 6.2.1 Is how these metadata are handled defined?

Fix available at

> 7) 6.2.1 the uri attribute seems like it might be underspecified here. The
> server SHOULD try to create at the appropriate URI, but what happens if it
> decides not to (It’s not a MUST)? Similarly, what restrictions are there on
> how a client should form such a URI?

Fixed at

> 8) 6.2.1 How does the client discover the available URI schemes for
> to/from?

No such discovery is specified, and it is assumed that a Rayo service would
document this.

> 9) “Third Party” is introduced as a term here for the first time,
> without explanation of which party this is.

Fixed at

> 10) Use of presence for sending of notifications like this seems
> off. I realise this boat may have sailed, but it doesn’t seem right to me.

We had this discussion during the Last Call, and the only alternative that
was presented was a dependency on PubSub, against which I believe I
presented a solid argument previously.

> 11) Is it right that it has to treat this first as if there’s no
> join, and then process the join? So if it’s trying to join something that
> doesn’t exist, or is invalid, it should set up the call first, and only
> then say the join fails?

No, that was not the intention, and this was just bad wording. Fixed at

> 12) 6.2.2 Introduces “system” for the first time. Which of the entities is
> the system?

Fixed at

> 13) 6.6.2 Is requiring the server to immediately reject the call right
> here (I don’t know). I’m wondering if it might just let it ring, for
> example, until it has an available controlling party.

The behaviour specified here is correct. Such a rejection is clearly
specified as valid only before accepting a call. Your notion of a call
"ringing" maps to the <accept/> command.

> 14) 6.6.2 MUST offer simultaneously - is this required? Why might it not
> offer to different entities in some staged order?

Fixed at

> 15) 6.6.2 MUST wait indefinitely - why is this required? If the original
> caller hangs up, for example, wouldn’t the server be able to stop waiting
> for a controller?

Fixed at

> 16) 6.3 The identifier for calls here is always a JID, isn’t it? If that’s
> the case, it’d make more sense to be using JIDs here, instead of adding the
> layer of indirection of a URI with a fixed scheme.

A call URI will not necessarily always be a JID. It has been the intention
since the start of this spec to leave open the option of other transports
for Rayo, such as HTTP.

> 17) 6.3 I think here we’re getting into the territory where presence
> stanzas are really not inappropriate for this

Do you have an alternative suggestion, or a concrete argument against?

> 18) 6.3.4 introduces a direction attribute that I don’t think has been
> defined anywhere at this point.

Fixed at

> 19) 6.4 "a server SHOULD represent a mixer internally using some
> alternative name scoped to the client's security zone and mapped to the
> friendly name/URI presented to the client for the emission of events and
> processing of commands” - I don’t entirely understand this. If it’s an
> internal representation, why is this important for interop?

This is because mixer names may be important to the client (e.g. "sales" or
"friday.meeting"), and should not be reservable by an individual client.
Thus, the name of the mixer in memory should include some reference to the
identity of the client which is interacting with it. This is not important
for interop, but is important guidance for someone implementing a Rayo

> 20) "A mixer MUST be implicitly created the first time a call attempts to
> join it”. Is this required, or might there be scenarios where a mixer
> can’t/shouldn’t be created?

Fixed at

> 21) "Mixers MUST respect the normal rules of XMPP presence subscriptions.
> If a client sends directed presence to a mixer, the mixer MUST implicitly
> create a presence subscription for the client.” - but that isn’t the normal
> rule for presence subs, is it?

Fixed at

> 22) Example 43: It’s not immediately obvious to me what an empty output
> element means here, it seems to be different semantics to the use in
> Exmaple 6 of reading a document with text-to-speech.

Fixed at

> 23) Example 44: This introduces ‘active speaker detection’, but doesn’t
> explain what this is (or reference an explanation), I think.

It is what it says on the can, and is a common feature of media servers.

> 24) "Once the last participant unjoins from the mixer, the mixer SHOULD be
> destroyed.” - in what scenarios would it be appropriate not to? Should this
> be discussed?

I have nothing to say here. If someone does, I'd love to hear it :)

> 25) 6.5 "A server SHOULD implement all core components” - what are the
> implications for clients if the server doesn’t implement some of these?

They would receive a feature-not-implemented error attempting to execute
these components, and it would limit the variety of applications that could
be implemented on such a server.

> 26) 6.5.3 - a reference to SSML here would probably be appropriate.


> 27) "The component is created using an <output/> command, containing one
> or more documents to render” - I think this implies that the previous
> examples with <output…/> are invalid.

Fixed at

> 28) If the XML for SSML has to be escaped (which seems to be the case from
> the example), this should probably be called out.

Fixed at

> 29) - I’m not sure why this is a SHOULD instead of a MUST?

Fixed at

> 30) - I think a quick description of the necessary addressing here
> would be useful.

Which addressing are you referring to? The JID of the component? This is
explained at http://xmpp.org/extensions/xep-0327.html#addressing.

> 31) Example 69 - I think this doesn’t give the units of time for the seek
> except in the example title and would be worth calling out.

The units are specified as being milliseconds in the schema, so this is

> 32) 6.5.4 I think some reference to DTMF and SRGS specs would be useful
> here.

Fixed at

> 33) 6.5.4 - How is discovery of the optional/extensible mechanisms
> discovered?

It's not. Server documentation only.

> 34) - the SHOULD here seems more like it should be a MUST - is
> there a reason to do otherwise (and are there security implications or
> client implications?)

Fixed at

> 35) - When would the nomatch expect to be triggered? Presumably
> it’s not firing off e.g. whenever anyone says anything that isn’t a DMTF
> when a DMTF input is configured? Can it trigger multiple times, or is it
> removed after a match?

A nomatch event would trigger in such circumstances that input is received
which does not match a grammar. Input for a particular modality (eg speech
or DTMF) is not received by a recognizer unless a grammar is specified for
that modality. A nomatch is not a standalone Rayo event, but delivered as a
completion event reason, and as such can only be fired once for a given

These semantics are standard for speech recognizers and do not warrant
specification in Rayo beyond what is already written.

> 36) 6.5.5 - I think the rules for what happens to the output when input
> begins aren’t defined. Although it’s implied that the output stops, does it
> continue again after input?

No, this is specified as barge in behaviour, which is well understood in
the field of IVR, and as such does not warrant re-specification in Rayo.

> 37) 6.5.6 says that there are options supplied, but the example shows none
> - should the text say they’re optional?

Fixed at

> 38) When there are joins involved, can’t there be multiple
> callers? If so, how does that affect e.g. "In send mode, only the audio
> sent by the caller is recorded.”?

If CallA is joined to CallB and separately to CallC, and all joins are
duplex, then a record component on CallA in send mode will record the same
audio as is sent to CallB and CallC. If the record component is executed
against CallB, then the audio sent from CallB to CallA, but not to CallC
(because there is no path between B and C), is recorded.

> 39) Links like
> http://xmpp.org/extensions/xep-0327.html#def-component-record-initial-timeout
> seem to be deadends

Fixed at

> 40) are x-skill and x-customer-id defined anywhere? I think the <header…/>
> stuff is new here (it doesn’t seem consistent with previous use of
> <header…/>). What are the rules for header here?

All use of <header/> elements in signalling related commands (like accept,
answer, hangup, etc) are consistent. x-skill and x-customer-id are examples
only, and there is no requirement to specify them.

> 41) 6.6.2 - if the client can’t handle the call, what’re the other options
> than rejecting it? (MAY)

It may simply ignore the offer and allow it to be accepted by another PCP.

> 42) 6.8.1 - is feature-not-implemented an odd error to use for a protocol
> violation?

What would be the appropriate error to use here?

> /K
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20150621/9a2c2c94/attachment.html>

More information about the Standards mailing list