[Standards] Comments on XEP-0301 (possible impact on -0308 in Section 4.2.3)

Paul E. Jones paulej at packetizer.com
Fri Aug 3 20:54:50 UTC 2012

Mark, et al,

I re-read the draft and here are my comments.  After I wrote all of this, I
thought it sounds like I'm pounding on everything.  Overall, the text was
great.. it has come a long way.  That said, here are my comments and

Section 1:
"and is favored by deaf and hard of hearing individuals who prefer text

I suggest you strike the above. What I have been told several times before
is that deaf people do not want special equipment and they want to use
mainstream technology.  While the above text is true, I do not want anyone
to view this extension as "something for accessibility" and dismiss it for
that reason.  It is an extension with wide applicability that I suspect
people would appreciate if it were there. I communicate with some people
today who type partial messages and hit ENTER just to move discussion
faster. That is evidence to me of broad utility.

I would also get rid of most of the examples of various prior
implementations.  It's somewhat more balanced, but still tilted toward
accessibility since 3 of the 6 examples are for that.  It would be useful to
mention "talk" and ICQ, since those are IP-based text messaging systems that
closely parallel XEP-0301.  I do have concern about mentioning ICQ by name.
Since AOL did this, perhaps say 'UNIX "talk" some proprietary instant
messaging systems'.

Regarding this:

"Real-time text is suitable for smooth and rapid mainstream communication in
text, as an all-inclusive technology to complement instant messaging.
Real-time text is suitable for smooth and rapid mainstream communication in
text, as an all-inclusive technology to complement instant messaging. It can
also allow immediate conversation in situations where speech cannot be used
(e.g. quiet environments, privacy, deaf and hard of hearing). Real-time text
is also beneficial in emergency situations, due to its immediacy. For a
visual animation of real-time text, see Real-Time Text Taskforce [5]."

I would suggest these slight changes:

"Real-time text is suitable for smooth and rapid communication,
complementing the existing en bloc mode for sending text messages.  It also
allows for immediate conversation in situations where speech cannot be used
(e.g. quiet environments, privacy, deaf and hard of hearing). Real-time text
is also beneficial in emergency situations, due to its immediacy. For a
visual animation of real-time text, see Real-Time Text Taskforce [5]."

Most importantly, we don't need to say it's suitable for mainstream.  It
suggests somehow that it might not be.  And this does complement IM, but
definitely does not replace it or operate separately in some way; I don't
want misinterpretation.  XMPP currently delivers messages en bloc and this
extension adds a means of giving the existing XMPP message delivery a
real-time feel.  "En bloc" may not be preferred, but I don't want folks to
assume this sits alongside (separate from) IM.

Section 2:

"Next Generation 9-1-1 / 1-1-2 emergency services"

This leaves out so many countries; very America/Europe centric.  See
http://en.wikipedia.org/wiki/Emergency_telephone_number.  Should we just get
rid of "9-1-1 / 1-1-2"?  Point is you want this for next generation
emergency services anywhere in the world.  I would not capitalize "Next
Generation", either.

Section 4:

Showing <body> not be included in the previous message containing <rtt> in
Example 1 might lead people to believe this is expected. I would suggest
making the first example one that had <body> in the end, since I suspect
that will be the typical case.  Perhaps a word about this somewhere might be
useful (if not already covered).

Section 4.2.1:

Why is "seq" only 31 bits?  Since the same memory is consumed for 31 or 32
bits, why not just makes it an unsigned 32-bit integer?  And why worry about
wrap-around?  I would allow it to occur.  Specify the behavior.

Section 4.2.2:

A value for "init" is that it would remove any ambiguity related to the
"seq" value.  The "seq" value could always start at 1 if "init" were
required.  The problem with "init", though, is that if a sender sends three
messages one after the other, the first two might go to client A and the
last one might go to client B.  This would happen if I have two XMPP clients
connected to the server and I disconnect one.  Therefore, "init" and
"cancel" seem pointless.  I'd suggest getting rid of them entirely.  I like
having "new" since that Client B I refer to would know that if it gets an
<rtt> that is not "new" it must be some message somewhere in the middle of
typing and can just ignore those until it gets a <body>, then pick up with
RTT on the next <rtt event="new">.

Section 4.2.3

XEP-0308 specifies use of "id" in <message> and <replace>.  Could we not
just use "<replace>" along with "<rtt>"?  It would require some text in
XEP-0308 that says that if <replace> is received without <body>, it shall be
ignored.  In -0301, it would not be ignored.  "id" works, but I would not
immediately recognize what that was for if I had not read this part of the

Section 4.4:

"be approximately 0.7 second" -> " be approximately 0.7 seconds"

I would even suggest saying 700ms, as I think that reads metter.

Section 4.5.1:

"Wait n thousandths of a second."

I would prefer "wait n milliseconds", especially since the wait time might
be 2300ms or more, for example.

Section 4.5.2:

"default value of n MUST be 1" -> "default value of n is 1"

"For the purpose of this specification, the word "character" represents a
single Unicode code point. See Unicode Character Counting."

Shouldn't the above be moved to Section 3?


"Support the transmission" --> "Supports the transmission"


"Support the behavior of Backspace" --> "Supports the behavior of backspace"


Suggest changing:
"Allow the transmission of intervals, between real-time text actions, to
support the pauses between key presses."


"Allow for the transmission of intervals between real-time text actions to
recreate pauses between key presses."

"Wait n thousandths of a second" --> "Wait n milliseconds"

Question on this:
"Also, if a Body Element arrives, pauses SHOULD be interrupted to prevent a
delay in message delivery."

Do you want to prevent a delay or realize a delay?  I believe you want the
entire <rtt> element to be fully processed, including delays, before acting
on <body>.  I'm not sure how to word that, but the above sentence was not
clear to ne.

Section 4.7:

" non-compliant servers that modifies messages" --> " non-compliant servers
that modify messages"

Section 4.7.2:

"line breaks MUST be treated as a single character, if line breaks are used
within real-time text."


"any line breaks MUST be treated as a single character."

Section 6.2.1:

I think the activation logic is complex.  Let each user turn it on or off as
he sees fit.  If you send <rtt> tags to my client, whether that gets renders
or not depends on my local settings.  I don't see a strong need to negotiate
this.  Just always send <rtt> and display it (if received) whenever the user
enables RTT.

Section 6.3:

Whether there is a visible cursor or not, the client has to take steps to
render text properly.  Since a cursor is not something sent via the
protocol, I see no point talking about it.  I'd remove this section.

Section 6.4.4:

I'm not sure what this is telling me.  Why is <t> and <e> "unsuitable for
most general-purpose clients"? And why encourage a device to use reset
rather than provide more complete support?  We know rendering is the bigger
challenge, but receivers must accept what is sent.  I see no reason to
suggest a sender be lazy.

I'd suggest removing this section unless there is something here of high
value that's going over my head.

Section 7.4.2:

It seems that all of the examples show show <w> used between every key
press.  However, if sampling the input buffer (as recommend earlier in the
text), one may not know the time between keystrokes.  Perhaps the device
samples the buffer and sees:

This would translate to:
<rtt><t>a</t><w n="100"/>pp</rtt>


Related to <w>, suppose I type "h" and then "e" with about a 100ms delay.
Further, suppose the IM client's 700ms timer fires and sends "h" on the wire
like this:


Now, the client restarts the 700ms timer, after which time it sends:

<rtt><w n="100"/><t>e</t></rtt>

Is this correct?  So, there was a 700ms "collection" delay, some message
transmission delay (perhaps 100 or 200ms) and then an artificial delay
inserted of 100ms.  So, between "h" and "e", the user might actually wait
700+200+100 = 1000 milliseconds?

Or, does the receiver maintain a running clock and as soon as the message
arrives, it sees that w=100, but it's internal "wait timer" is already at
700+200ms, so it displays "e" immediately?  (I assume this is the case and
it should be described.)

Section 8:

As was mentioned in one discussion thread, H.323 also supports RFC 4103, so
it might be useful to mention H.323 here, too.

Section 9:

How does XMPP indicate that a message should be displayed LTR or RTL?  Is
that derived from the language indicated in the <body> tag?  This is legal:

<body xml:lang="en">This would display left-to-right</body>

In any case, we do need to ensure we capture directionality for languages
like Hebrew.


More information about the Standards mailing list