[Standards] XEP-0301 0.5 comments

Kevin Smith kevin at kismith.co.uk
Mon Jul 23 14:32:48 UTC 2012

Right, thoughts about 301 (consider them early Last Call feedback, I
guess. I think it would be worth addressing them, or at least
producing an errata list of your expected edits, before asking too
many other people to review this (e.g. LC) as it took me a
considerable time and it'd be a shame to waste people's effort
commenting on things due to be changed):

== Introduction ==
This seems mostly fine. I wonder about the reference to
realjabber.org. Partly because it's a reference to a potentially less
stable URL, and partly because I think the name is inflammatory - did
the XSF or Cisco grant the trademark use?
Do we need two references to how much deaf people like this within ten lines?

== Requirements ==
2.3.4 doesn't seem quite right - what we want is for it to be possible
to produce gateways for interoperability - not that XEP 301
implementations themselves interop with other networks?
2.4 Doesn't seem to be about Accessibility.
2.4.4 Doesn't make much sense to me.

== Glossary ==
"real-time text"'s definition seems wrong - it isn't necessarily
transmitted instantly in 301. It would seem more natural to define
this in terms of real-time, defined on the immediately preceding line.

== Protocol ==
"to allow the recipient to see the sender type the message" - I'd
suggest "to allow the recipient to receive the latest state of the
message as it is being typed" - RTT doesn't allow us to see the sender

Example 1: I suggest that this could be better demonstrated by not
cutting at the word boundaries "He", "llo, m", "y Juliet!" maybe, or
something like that. Experience and/or cynicism say that implementers
are quite likely to look at the examples, ignore the text, and
misunderstand what's going on if the examples provide convenient
semantics not required by the protocol.

"The bounds of seq is 31-bits, the range of positive values of a
signed integer" - I'd be inclined to make this something like "The seq
attribute  has an upper bound of 2147483647 (2^31 - 1). If this upper
bound is reached the following RTT element will reset the seq
attribute to 0, i.e. at the upper bound the values would be (in
successive stanzas) 2147483646, 2147483647, 0, 1)" or words to that

It's not clear to me why setting seq to a random initial value should
help with MUC or multi-resource cases - in these cases you know the
full JID of the entities involved and a random start point seems to
make it harder to understand what's going on, rather than easier.

"The event attribute MAY be omitted from the <rtt/> element during
regular real-time text transmission" - what is the the alternative
you're allowing clients, and what is "regular real-time text

4.2.2 - "Recipient clients MUST initialize a new real-time message for
display" - how things are rendered in clients are generally not in
scope for XEPs, maybe just remove 'for display'?

4.2.2 - "Senders MAY send subsequent <rtt/> elements that do not
contain an event attribute" if clients want to always send event
attributes, what would they send?

4.2.2 - "Recipients MUST treat 'reset' the same as 'new'." - I'm not
sure that's quite right. If recipients want to render 'new'
differently that seems to be fine. Maybe "Recipients MUST reset the
state of the current real-time message when receiving a 'reset'
(returning the real-time message to the same state as when receiving a

4.2.2 - event='init' - I'm reading the XEP linearly so maybe this will
be clear later, but at this point in reading the XEP it's not clear to
me what the inclusion of event='init' buys us.

4.2.2 - The normatives here don't seem to be congruent. event='cancel'
is OPTIONAL, yet we have a SHOULD for behaviour on receiving them. Why
not require recipient support?

4.2.3 - I don't think the intent here is clear. Particularly it's not
OPTIONAL if you're doing RTT correction. So I think we need to tighten
this up. There's a choice on discovery and it'll affect what needs to
be said.
  Choice 1) If you implement 308 and you also implement 301 you MUST
support (at least receiving) RTT correction and ids are not OPTIONAL
and MUST be included on the correction RTT.
  Choice 2) You can implement 308 and 301 yet not support RTT
correction - in which case supporting RTT correction is OPTIONAL, but
if you do you MUST advertise appropriate disco features and MUST
include ids etc.

4.3 - "The delivered message in the <body/> element is displayed
instead of the real-time message" - maybe "The content of the <body/>
element is considered the final text, rather than the state of the RTT

4.3 - "In the ideal case, the message from <body/> is redundant since
this delivered message is identical to the final contents of the
real-time message." - can we s/message/text/ here? Calling child
elements of <message/> stanzas messages seems potentially confusing.

4.3.1 - Is this redundant?

4.4 - The discussion of throttling here feels a bit odd. I don't like
having references to servers dropping messages as part of congestion
handling, as that's not compliant behaviour. The comments about 0.7
seconds being fine for not hitting throttles but smaller values
hitting it seems a bit hit-and-miss - servers are free to implement
whatever throttling they want, and I'm a little worried about
recommending here what we think the state of the network is likely to
be now or in the future.

4.5 - "the recipient can watch the sender" - this isn't quite right
(similar to previous comment).

4.5.1 - I'm not sure that the use of quite cryptic one-character
elements here is terribly useful.

4.5.1 - I think this has been commented on elsewhere, but using
'characters' here seems to be less clear than talking about code
points. I understand the desire to mask implementers from needing
exposure to code points, but I don't think that's going to ultimately
help uptake or interoperability.

4.5.1 - I think if there are going to be SHOULDs in supported features
we should try to explain in what circumstances it's acceptable to
ignore the SHOULDs.

4.5.2 - Talking about message length here probably needs clarification
- is it the number of characters (whatever they mean to different
people), code points, normalised code points, octets on the wire... - This might become clearer later, but at this stage it's not
clear what 'positions' are. - Apart from adding complexity I'm not sure what forward
delete is buying us vs. backspace.

4.5.4 - I don't think trusting that nothing in the chain is going to
transform unicode in any way is going to be sufficient for
interoperability here. I think we need to consider normalising the
text before RTT calculations are performed on it. I'm not entirely
convinced, without going through specs in some detail, that an
implementation that does choose to do normalisation somewhere on route
is non-compliant, which is what's asserted here. - Ah, OK. So you do require normalisation here - you need to
say which type is required. - This then forbids normalisation again. - Question for Unicode experts. Are there any code points that
would be illegal to transmit on their own, but are legal in
combination with others? If so, they'd get rejected with stream
errors, which would probably be bad. This section seems to imply that
illegal UTF-8 encoding is expected, which is in turn illegal XMPP. - "A single UTF-8 encoded character equals one code point" -
this isn't true, is it? - "different internal encodings (i.e. string formats) that is
different" - s/is/are/

4.6 - "XMPP servers may drop <message/> elements (e.g. flooding
protection)." - They can't.

4.6.1 - I think they need to do more than increment and check they
increment - I think they need to increment/check in steps of 1.

4.6.1 - "Recipients MUST keep track of separate real-time messages per
sender, including maintaining independent seq values" - I think what
you mean is that they "MUST track RTT per full-JID, and not collate
across multiple full JIDs", rather than the present text, which
suggests that they must track multiple RTT streams for a single full
JID without providing guidance how. I think this needs tightening up
to be clear of the intent.

4.6.2 - "Recipients MUST freeze the current real-time message" - it's
not clear what freezing a message means.

4.6.3 - "Retransmission SHOULD be done at an average interval of 10
seconds during active typing or composing." - this seems like a lot of
data getting sent across if these messages are large. I'd be much
happier saying something like "Retransmission SHOULD NOT be done more
frequently than once every 10 seconds

6.1.4 - "it is acceptable for the transmission interval of <rtt/> to
vary" - yet earlier there was a SHOULD saying it doesn't vary, wasn't

6.2.1 - I suspect this should be more prominent than buried inside
Implementation Notes

6.2.1 - I think that presence decloaking is probably a better approach
to this than sending init.

6.2.1 - That said, if people disagree and want another 85-ish
non-disco mess, I think this can be clarified a bit - at the moment it
sounds like disco and init discovery are alternatives, rather than
init only being a fallback for when disco isn't available. Perhaps
something like:
Activation of real-time text in a chat session (immediate or
user-initiated) can be done by:
* Immediately transmitting real-time text (if the feature is
advertised in by the recipient, as described in Determining Support);
* Where Disco knowledge isn't available (e.g. sending to an entity for
which presence information isn't available, and thus the full JID
isn't known and can't be queried) by sending a <message/> stanza
containing only a "<rtt event='init'/>". In this case there MUST be no
further transmission of RTT elements until the recipient indicates
support - either by exposing information necessary to use service
discovery, or by replying with a (non-cancel event) RTT element of its

6.3 - "All action elements only have absolute positioning, and
positioning does not depend on previous action elements" - this isn't
true, positioning is dependent upon processing of previous action
elements - a deletion will effect a change of index in all subsequent
code points.

6.4.1 - It might be useful to reference some method of calculating
this. It's not immediately obvious to me that it's trivial to work out
edits without resorting to something that ends up polynomial in the
worst case (or oversimplifying the edit), so some guidance would be
handy here.

6.4.3 - this says that implementations "may" do this, and I suspect
that it really is discouraged rather than truly optional (indeed, the
language elsewhere says as much).

6.4.4 - this looks like something discouraged, too, but this isn't
mentioned that I can see.

6.5 - "Upon receiving Action Elements in incoming <rtt/> elements,
they are added to a queue in the order they are received. This
provides immunity to variable network conditions, since the queueing
action smooth out the latency fluctuations of incoming transmission."
- it's not clear to me that it's the queuing that does anything to the
latency. Also 'action *will* smooth out'.

6.5 - " In addition, it is best to process <w/> elements using
non-blocking programming techniques." - I don't really know what this
is doing here.

6.6 - "There are other special basic considerations" - isn't that
nearly oxymoronic?

6.6.1 - "For specialized clients that send continuous real-time text
(e.g. news ticker, captioning, transcription, TTY gateway), a Body
Element can be automatically sent when messages reach a certain
length. This allows continuous real-time text without real-time
messages becoming excessively large." - Is this true? Sending a body
means you reset the state to the content of the body and terminate
that RTT message, which doesn't seem consistent with continuing RTT. - This doesn't seem like the wrong approach if RTT is wanted
in a MUC (at least until we have per-MUC disco stuff), but I'm
somewhat worried about the effect this has as an amplification attack.
I don't know what we should say here, but if people can have a think
it'd be good. - this seems inconsistent with an earlier section that (I
think) was recommending or mandating support for multiple full JIDs.

6.6.5 - seems somewhat out of place. How many systems are there these
days that can't keep up with a human typist? And telling people that
they need to make their applications flicker-free just seems odd.

6.6.6 seems redundant.

7 - these examples seem to be to a bare JID, and therefore can't have
had caps already indicate support, but lack support discovery. It'd be
good to note this.

7.4.2 - this includes an RTT including a wait in the element with the
body - but once the body is received the RTT state is discarded and
the body replaces it, if I remember earlier in the XEP correctly (and
it was quite a while ago now).

8 - Why are we picking out Google Talk as an XMPP exemplar?

8 - Why are we telling SIP clients what specs to use?

8 - All of this section seems somewhat out of place in a XEP.

10.1 - "It is important for implementers of real-time text to educate
users about real-time text. " - this doesn't really seem right.

10.1 - I think a sensible Privacy note would be to make RTT opt-in.

10.2 - "also needs to also "

10.2 - "(e.g. deferred XEP-0200)" - just XEP-0200, I'd have thought.

10.2 - I think blaming encryption for the increased number of stanzas
RTT generates is a little disingenuous.

10.3 - "The nature of real-time text result in"

10.3 - "than may otherwise happen in a non-real-time text
conversation. This may lead to increased" s/may/would/ s/may/will/
respectively will remove normative language.

10.3 - "including stanzas dropped by an overloaded server" - I think
"including stanzas dropped during a network or server error" would be
more appropriate.

10.3 - "Use of this specification in the recommended way will cause a
load that is only marginally higher than a user communicating without
this specification." - do you have numbers for this? It seems quite
counterintuitive, I'd expect it to increase the server load due to
message routing roughly by a factor of the number of RTT transmitted
between each typical <body/>.

10.3 - "Bandwidth overhead of real-time text is very low compared to
many other activities possible on XMPP networks including in-band file
transfers and audio" - This is a little disingenuous where IBB is a
fallback, and audio never travels over the XMPP network. I'd remove
the line completely.

14 - (I appreciate the acknowledgement, thank you)

14 - It's usual in XEPs that acknowledgements are done personally
rather than by affiliation, so I think it'd be sensible to just leave
the names in and remove affiliations.

14 - I find the comment acknowledging the invention a bit odd. It's
assumed that the XEP is your own work, and "invention" is a term I've
more commonly come across in relation to patents - I assume there
isn't a patent associated with this that you're assigning to the XSF?

Appendix B - it's usual to just have author name, email and JID here.
We don't generally link out to the authors' websites.


More information about the Standards mailing list