[Standards] UPDATED: XEP-0301 (In-Band Real Time Text)

Mark Rejhon markybox at gmail.com
Mon Jul 9 09:26:28 UTC 2012

On Mon, Jul 9, 2012 at 3:51 AM, Gunnar Hellström <
gunnar.hellstrom at omnitor.se> wrote:

>  This looks good. I have some comments, but very few influence the
> protocol.
> So even if there are minor adjustments to do, the spec looks mature.

Excellent comments, and the vast majority of your comments is useful --
most of your change will be implemented.
I will address comments to the ones that needs further discussion:

5. Section 2.4, Title. Change to "Usable for mainstream and accessibility
> purposes."

The current heading is a single word: "Accessible" -- I prefer to keep it
because it's correct, clear, and short.  It catches attention of the
accessibility people better, including the Access Board that has already
contacted us, about the specification too.  The word 'mainstream" is also
mentioned at end of section 1, also even in section 2.4, in section 6.2.  I
therefore believe that I've carefully balanced accessibility and
mainstream, and satisfied both targets, while aiming to achieve the goal of
eventually becoming an important part of future Accessibility standards.
(That said, I'll fix the first bullet)

14. Section 4.2.2 event='cancel'.  How does this behave through multi-user
> chat and multiple login situations? Is the event='cancel' sent through to
> all? I see a risk that one user sending event='cancel' would turn off rtt
> for all recipients. If this is true, I see three solutions:
> a) Delete event='cancel'. b) Add a sentence saying "event='cancel' SHALL
> not be used in a MUC or multi-login session.  c) Add a sentence saying
> "event='cancel' SHOULD be ignored in MUC and multi-login sessions.

1. It is appropriate for a multi-login session; there is no issue with
using the cancel during a multi-login -- it is completely appropriate for
multi-login.  (Regardless of whether or not you cancel before/after
switching, and regardless of whether or not you reactivate before/after
switching logins.  All scenarios result in acceptable behavior.)
2. I already mention that it should not be used for MUC, in the MUC
section: http://xmpp.org/extensions/xep-0301.html#multiuser_chat

> I have a slight preference for solution a), to delete cancel from the
> specification.
> If it is deleted, also the sections in 6.2.1 and 6.2.2 dealing with
> "cancel" shall be deleted.

It is already optional.  But some implementations need it.
For example, one party clicks a button to turn off real-time text.
This specific implementation requires ability to synchronize the disabiling
of real-time text.
How do we notify the other end of the intent to end a real-time text

Example use case:
- A party activates real-time text by pressing a button.
- Both ends synchronize the enabling of real-time text via <rtt
- A party deactivates real-time text.
 - Both ends synchronize the disabling of real-time text via <rtt

Various methods of synchronizing activation/deactivation of real-time text
is listed at:
Certainly, not all implementations necessarily need to follow the above
behaviour (maybe your implementation doesn't need it).
However, there are other vendors that definitely need to be able to do this
behaviour (after displaying a confirmation prompt)
As a result, I cannot remove event='cancel' and deny the other vendors the
ability to synchronize the disabling of real-time text.
That said, unidirectional real-time text is allowed by XEP-0301, so
synchronizing the enabling/disabling of real-time text is not a
requirement, but some vendors require this ability (much like synchronizing
enabling/disabling of audio/video after a confirmation prompt).  I intend
to respect both behaviours.

16. Section 4.4, line 3, after "conversation", add "in most network
> conditions".   On GPRS, having 1.5 s network latency, the usability
> requirement will not be met, and that must be accepted. ( F.700 requires 2
> seconds end-to-end for usable real-time text and 1 second for good
> real-time text. )

Technically you're right.  I'll make this wording adjustment, since it is
what F.700 says for technical compliance purposes.
....That said, real-world usability comment: I would like to comment that
the innovation of encoding key press intervals (
http://www.realjabber.org/anim/real_time_text_demo.html ) gives an
approximately 1.5x-2x multiplier to the maximum usable latency.  i.e. a
NRTT (Near-Real-Time-Text) "bursty" conversation with a 2 second latency,
is more uncomfortable than an NRTT "smooth" conversation with a 3 second
latency with key press intervals being encoded.   That said, I'm speaking
via real-world usability comments by actual users, as the RealJabber open
source software has a tester's latency interval adjustment for usability
trials that I have tried with several people.  But I agree -- we need to be
consistent with the F.700 definition of "real-time" for real-time text

> 18. Consider deleting the "Forward Delete" d action element. It cannot be
> used with the default value for p because that would point outside the
> real-time message. Therefore, a p must always be calculated and included.
> Then it is equal in complexity to use it as Backspace. Having both just
> seem to add complexity to implementations. ( It would have been different
> and of value if it worked from a current cursor position.)   But if you
> have good reasons, e.g. easily matching some editing operation result, you
> can keep it.

The idea of merging Backspace and Delete is an idea.  Eliminating Backspace
is not appealing because it makes Backspacing more inefficient (because
<e/> can be used without the p and n attributes, to backspace from the end
of the message, while Forward Delete action elements require a position
attribute even when deleting text at the very end of a message).
Eliminating Delete would simply force the requirement of using the
Backspace element to simulate the Forward Delete operation, making it
slightly more complicated for implementors, but would certainly be doable.
That said, I'm not sure it's the best way to proceed: Eliminating the
Forward Delete, forcing implementors to use Backspace to do all block
deletes "backwards", as well as for Forward Deletes.  I'd love to hear
comments from other people about the merits of merging Backspace and
Delete, by eliminating the Forward Delete action element.  (since Backspace
can do Forward Deletes, you'd just simply need to do a little bit of math
to pull it off.)   Testing out RealJabber, the Optional Remote Cursor
behaves very well in all combinations (senders having it or not
interoperates fine with recipients having it or not, in any possible
combination), and the action elements are compact in all situations.  But
if one action element was removed, it would specifically be only the
Forward Delete, due to Backspace's being more common, and its importance on
being simple and shortest (no attribute required) during the simplest
situation of simply backspacing at the end of a message.   Comments from

However, the danger of omitting the p attribute is overstated:
(1) It's harmless -- to omit the p attribute in a properly implemented
(2) It's moot -- because two places already show the p attribute is required

(1) Detailed info about "harmless".  The use of a default value of p is
harmless if you follow "Summary of Attribute Values": It simply "does
nothing" because you're deleting non-existent text.  It's the same
behaviour as trying to hit the Delete key when the cursor is already at the
very end in Microsoft Word: The action does nothing.  I already say the
*Note:Excess deletes MUST be ignored, with text being deleted only to the
end of the message in this case.*
*(Cited from
http://xmpp.org/extensions/xep-0301.html#element_d_forward_delete )*
Which is exactly what happens when you try to hit the Delete key at the end
of a document in Microsoft Word.  The Delete key does nothing.  This is
exactly the same thing.  Thus, therefore, it is harmless (unless there was
a spec violation, a bug, etc).  Also observe if illegal values are used, it
is already covered here:
*Senders MUST NOT use negative values for any attribute, nor use p values
beyond the current message length. However, recipients receiving such
values MUST clip negative values to 0, and clip excessively high p values
to the current length of the real-time message. Modifications only occur
within the boundaries of the current real-time message, and not other
delivered messages. *
*(Cited from
(2) Detailed info about "moot". I already clearly show in two places where
it's not allowed anyway because it's a useless argument.
I show that it's not even valid to omit this attribute for Delete:
I also even show that it's a required attribute in the XML Schema:

> 19. Section 4.5.2, third bullet point. I would like to see the words
> "Unicode Code Points" replace "Unicode Character Counting". Code points is
> the safe base that we count.

I use the word "characters" in other parts of the specification, such as
Action Elements:
Therefore, it is my opinion that it helps people immediately figure out
faster, because
(1) The Table of Contents index stays easy to read by "Unicode Character
*(Someone might think: "huh? why does the section exist? I guess there's
some special gotchas about counting characters!")*;
(2) People who don't really know what "Code Points" are, more quickly
associate the specification's definition of "Unicode Characters" as really
meaning "Code Points", because I already mention the word "characters"
several times elsewhere in the specification.  The person then figure out
it's simply the metric we are counting "Characters" as.
(3) It's compatible with "UTF-8 encoded characters", which is the same as a
code point.

Therefore, I think more people will figure it out if I keep the heading
"Unicode Character Counting", and go on to explain code points, rather than
use the heading "Unicode Code Points" and suddenly be a more confusing
Table of Contents.   That said, I'd like to hear other people's opinions,
there might be multiple different schools of thoughts, as well.   That
said, if nobody raises objections, I will be keeping the heading the same,
for the above three reasons.

0. Section At the end, insert paragraph: "Characters consisting of
> multiple Unicode code points SHOULD be sent together in the same <t/>
> element. Values of *p* and *n* SHOULD NOT result in pointing within such
> combinations of code points."    ( this is to avoid the situations
> described with the long note to section The actions to avoid it
> should be more on the sender side as I propose here.

It is a good recommendation, but it does complicate "treat it as an array
of code points".
My implementation of "Monitoring Message Changes Instead Of Key Presses"
(The most recommended method out of several "Real Time Text Transmission
Methodologies" in section 64)
An implementation is EncodeRawRTT() in the open source Java source code:
As you can see, it is easiest to implement these kinds of implementations
without worrying about adding the further complexity that you suggest.
So I am not 100% convinced I should add the sentences that you suggest.
Although I do observe you are using the word "SHOULD" rather than "MUST".
(I'd use "strongly suggested" since I've eliminated RFC2119 for
Implementation Notes now, upon Peter's advice)

> 23. Section The Note is correct, but very long. I would like to
> see it shortened but have not wording proposal at the moment. It aims at
> avoiding situations that I suggest prevent by my proposal 20 on the sender
> side.

Preventing this would significantly complicate implementor's ability to
So, this is a trade-off.

> 28. Chapter 5. last paragraph. I hesitate a lot about this simple way of
> detecting support. We need a proper way to detect RTT capability before we
> start to use it. There may be systems that have to select between different
> protocols for RTT, and they should not need to start sending in one
> protocol to try to discover if RTT is supported. Still, I realize the
> convenience of this simple method, and would let a discussion decide if it
> shall be kept.
If it is kept, the paranthesis characters on the second line should be
> deleted so that the rapid response on this initiation is made part of the
> protocol.

Several people already agreed with the method.
For accessibility compliance, I don't like blocking the sender's abiity to
(I mentioned it metaphorically: there's no padlock on the originating
phone's keypad.)
I should observe that sending a single <rtt/> element uses less bandwidth
than using disco.
Also, it produces a huge advantage in simplifying some implementations of
real-time text, by allowing detection, initiating and suspending real-time
text, completely "in-line" with the <rtt/> element.  It's not xmpp-ish, but:

*Disco is flawed from an Accessibility Perspective.  It might not be
compatible with future accessibility legislation.*
(1) As you saw from previous messages, several people have convinced me
that there are implementations that will turn off RTT by turning off disco.
 Unfortunately it looks like I can't enforce disco to be always available,
even when recipients turn off RTT.  Therefore, I'm not going to use it as
the primary method anymore.  I already wrote an earlier public message,
that I am not going to be a willing author of XEP-0301 that does not give
the "sender a chance".
(2) There is risk that senders will "turn off disco" as the method of
deactivating real-time text; preventing sender's ability to event signal
the desire to start real-time text.

> 33. Section 6.5.4. The default action for a non-completed message should
> be to regard it completed after some time, not to clear it. So, replace
> "clear (and/or save)" with "save".

I don't think it's appropriate to specify a specific action.  An example is
empty real-time text.  Sometimes from a security perspective, it's better
to clear -- especially during a denial-of-service attack.  You don't want
to consume disk space or screen space because somebody started attacking
you with fake real-time text that's easily detected as fake.  I've had at
least one private email from one big vendor, in the last 7 days, discussing
to me about concerns that convinced me to introduce the "Stale Messages"
clause: Security!!  (resource consumption concern)
So clear is a perfectly legitimate action in a DoS situation.

Thanks so much for your comments -- and I welcome discussion on my replies
to your comments!

Mark Rejhon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20120709/3cdc4dce/attachment.html>

More information about the Standards mailing list