[Standards] Review: XEP-xxxx: In-Band Real Time Text

Mark Rejhon markybox at gmail.com
Mon Feb 28 20:52:39 UTC 2011

Hello Kevin Smith,

Thank you very much for your comments on the proposed XMPP Real Time
Text Standard, and also for relaying the information from the meeting

With the help of others at realtimetext.org, I will take the
opportunity to rewrite portions of the standard to bring it into
better compliance with XMPP.org.  I am currently collaborating behind
the scenes with realtimetext.org at this time.  I also have open
source code I am releasing shortly (goal: end of March), which will
help demonstrate the proposed specification.  This may help you all to
determine what features are necessary, and what features are

In regards to some of the ballpark 'concerns':

1) Simplifying the specification

Up front, there are a lot of things that I could simplify in the
standard to reduce word count dramatically, perhaps by as much as
about 40%.  Removing redundant/duplicate information, unnecessary
fluff content, rewording certain  parts into more clear English,
remove some unnecessary requirements, remove less important features
such as Group Chat, etc.  I will work with the people of
realtimetext.org on this.

2) Complexity introduced by Delay Codes (Natural Typing)

Originally, I was going to make this a private feature of my own
implementation (i.e. private extension).  However, testing indicated
rave reviews.  It was a surprising/startling dramatic improvement in
the quality of real time text, to the point that we decided to make it
a recommended inclusion.  Some of us thought delay codes were not
necessary, however, after using the demo software, I had numerous
positive comments from every single tester, including:
Paul E. Jones (realtimetext.org)
"The 'natural typing' is key, I think.  It was truly quite impressive."
"After seeing it in action, my opinion is that delay coding (in
whatever form) ought to be a mandatory part of the spec.  In general,
anytime we make something optional, it does not get implemented.  I’d
prefer to see it get implemented in one way to ensure a consistent
quality of experience."
Barry Dingle (realtimetext.org)
"During the texting, with Natural Typing working, it was a very
pleasurable experience."
"I could not detect any difference between 500 and 1000 ms with
Natural Typing activated. I could detect a difference when NT was not
Gregg Vanderheiden (realtimetext.org)
"I had the same experience.  The delay codes make it very 'real'"
"Love the delay coding."

3) Complexity introduced by Real Time Message Editing Protocol

- Real time message does complicate the standard.  However, it is a
necessary inclusion for reasons already explained.
- We found it necessary to go well beyond simpler real time text
standards such as simply transmitting only text fragments and
backspace codes, or just retransmitting the whole message, for many
reasons listed below.
- Real time message editing done via a sequence of steps, and thus
needs a simple protocol.  An example is "insert text, delay, insert
text, delay, delete text, delay, move cursor, delay" etc.  (It is
noted that the 'delay' codes can be ignored).
- Technically, we could simply retransmit the whole message, which is
also allowed by the standard (Section 3.9.3).  However, this is
inefficient for long lines of messages, and makes it difficult to
serialize the real time message editing protocol.
- Cursor movements are included because it makes it much easier to
watch the remote person edit their real time text -- otherwise, edits
to the middle of their message sometimes went unnoticed more often by
the recipient (and lead to more misunderstandings due to missed
- Delay codes for inter-keypress delays are good because typing looks
natural (and does not 'burst'), irregardless of the interval.  Testing
shows that this is a highly desirable extension to the spec.   See
next section below.
- It is noted that both of these features are NOT made 'REQUIRED'
- Testing of the open source software clearly showed that highest
quality of real time text occured when I included support for delay
codes and cursor positioning.
-- The open source software that's being released soon, has an
adjustable interval, and allows turning on/off features (including
delay codes and cursor codes), so that you can all judge how
necessary/unnecessary individual features are.

4) Programming complexity

I realize the comment about programming simplicity is relative and
subject to interpretation.  I got the first version of the real time
text working in less than 2 days, in an initial round of programming
utilizing the open-source jabber-net library.  If I excluded the
optional cursor movements and delay codes, I actually found it really
simple to include real time message editing.  I had found most of the
complexity is actually found in the delay codes, as well as how I
prepared the messages for transmission.  Even with those advanced
features thrown in, I had a module (specifically for real time text)
that was still only 800 lines of code.  If you ignore all the
RECOMMENDED's and OPTIONAL's, the standard is actually much simpler to
implement and actually could be crammed into a much smaller document.
Perhaps the standard should clearly separate the features so that the
easy features are in a separate section from the advanced features, so
that it's easier for implementors to do a baseline version of this
spec.    Part of the reason why the specification looks more complex
than necessary because the easy and hard parts are interspersed with
each other.  By releasing open source code, it will help people
understand how easy or complex the specification is.

5) Rationale of Attributes ('seq', 'msg', and 'type')

I found it necessary to include these attributes, because of the
nature of real time message editing.  If a message gets lost, and the
message contained an edit  (i.e. an insert/delete in the middle of
text became lost), then subsequent edits are invalid -- message length
is different, so a subsequent edit won't occur in the correct
location, and the text will become mangled.  Therefore, perfect
integrity is needed in subsequent real time message edit operations,
using some sort of continuity mechanism (sequence ID) or other sync
verification method.  There was experimentation done with the
open-source software that showed the 'seq' was necessary, as was a
method of signalling the first received real time text message

Any comments about what I said on the general ballparks of concerns?

Mark Rejhon

On Mon, Feb 28, 2011 at 7:02 AM, Kevin Smith <kevin at kismith.co.uk> wrote:
> At its recent meeting, the XMPP Council decided not to accept
> http://xmpp.org/extensions/inbox/realtimetext.html as an Experimental
> XEP, not because of the feature it provides, but because they felt
> there were shortcomings in this proposal. At the time, I promised to
> start a discussion here, on the premise that community discussion will
> either lead to updates to the proposal and it could be resubmitted, or
> that another approach would present itself. My comments on the
> proposal follow, in no particular order or clarity.
> 1) This is a huge XEP for a relatively simple feature - I copy/pasted
> the non-boilerplate content into `wc` and ended up with ~=11,600
> words. This is a concern (although not a reason to avoid publication
> on its own, it is daunting).
> 2) This is a fairly client-complex solution to the problem. Some years
> ago, I also needed to solve this problem, and deployed a very simple
> "Send <message ...><fragment>I'm not done typing this
> messa</fragment></message> every so often" protocol that seemed to
> work fine. It may be that this approach is unsuitable for the general
> case, but given the relative complexities of the two approaches, this
> needs to be discussed (and possibly rejected) as I'd far prefer a
> simple solution than a complex one.
> 3) Niggle: "Although real-time text benefits everyone in many
> situations" - I don't see any support for the 'everyone' in this
> statement.
> 4) Goal "Make it easy for developers/implementors to implement Real
> Time Text in steps, with a minimum of code." may not be being met (see
> (1)).
> 5) Goal "Allow multiple modes of chat, including traditional IM user
> interface, split-screen chat, and other modes." seems like a UI
> feature, not a protocol feature.
> 6) Goal "Meet the quality requirements for real-time text. This is
> specified in ITU-T F.703 [4] with an end-to-end delay of less than two
> seconds and transmission loss of less than 0.2%." - this seems to
> primarily be a feature of the networks and servers involved, rather
> than this protocol.
> 7) 2.3 Server Performance - I'm picking this out over the other cases
> for no good reason. Quite a lot of this XEP sounds like it's comments
> to Council justifying acceptance, rather than a part of the
> specification. I'd have thought that moving large amounts of this into
> a trailing Implementation Notes section would make it read better.
> 8) 2.4 Real-Time Message Editing - isn't half of this just repeating
> the introduction and AIM advert from the start of the document?
> 9) 2.6 Multi-User Chat. This needs a great deal of care, as the stanza
> amplification possibilities for large rooms are horrid. I accept that
> this has a big experimental warning on the paragraph, but feel it
> should be noted.
> 10) 2.7 User Interface Considerations. I think how to design a UI is
> largely down to the reader and doesn't belong in Requirements,
> although I'm not opposed to something in Implementation Notes.
> 11) 3.1 Functional Goals of the Specification. Another set of goals?
> 12) 3.2 "Clients that do not support real-time text will only receive
> the full message at once" - they'll receive everything, they'll only
> understand the full message body.
> 13) 3.2 "SHOULD show the text immediately" - this sounds like an odd
> thing to have in a protocol spec. What does 'immediately' even mean
> here? Are we saying that the client needs to skip any other processing
> it needs to do to render these data?
> 14) 3.2 "For senders, the default interval SHOULD be 1 second (1000
> milliseconds)" - I don't think this is required for interoperability,
> so RFC2119 language seems inappropriate here. I think we probably want
> a reference to a suggested period in Implementation Notes or similar.
> 15) 3.3 "There MAY also be multiple <rtt> elements in a single
> <message>" - I'm not sure why you'd want to do this, and it does add
> client complexity.
> 16) 3.3 msg/seq Attributes. This says that the attributes should be
> incremented, but not the magnitude of the delta (I assume 1).
> 17) I'm not convinced by the need for both msg and seq, given later
> comments about XMPP compliance (and I note that the simple protocol in
> comment (2) does away with the need altogether).
> 18) General - all the protocol outlined in the XEP is illegal, as it's
> happening in the jabber:client (and jabber:server) namespaces. The new
> stanza children need to be namespaced away.
> 19) 3.3.3 "Recipients MUST be able to process type='reset' when
> transmitting <rtt> with type='reset'" I don't follow the requirement
> here.
> 20) A thought - how does this interplay with XEP-0258? (I suspect that
> many people write messages first, and then label them). This is just
> an interesting aside.
> 21) 3.4 Use of <html>. I note that the simple protocol from (2) can
> trivially support <html> elements.
> 21) 3.5.2 should probably be clarified that 'empty' means 'with no
> text child' or such, so that a reader doesn't believe that they may
> omit the required attributes.
> 22) 3.5.3 Didn't we earlier read that it was ok to include many <rtt> elements?
> 23) 3.5.6 "In evaluations of relative values of attributes, values of
> counters recently wrapped around shall be considered higher than those
> approaching its maximum value." If we're going to say something like
> this, we need to be clear what it means. When has a client 'recently
> wrapped around'?
> 24) 3.6 Error Recovery. Messages can get lost, certainly, (although
> -0198 mitigates significantly), but I'm not sure where the assertion
> about overloaded servers comes from. Servers that deliver messages out
> of order are incompliant, and we should file bug reports against them
> so the authors can know about it and fix it, rather that writing
> (complex) protocol around this. For the little it's worth, I don't
> know of any current mainstream servers delivering messages out of
> sequence. I note, again, that the simple (2) protocol is not subject
> to the problems that this section addresses.
> 25) 3.6.1 You have to try pretty hard to get duplicate messages
> through, but it is possible.
> 26) This has a SHOULD on processing msg, while earlier it was
> OPTIONAL (I assume you mean OPTIONAL, I don't believe NOT REQUIRED is
> covered by RFC2119).
> 27) 3.6.3 "a client MUST immediately pause updating" - this doesn't
> really need a MUST, there's no requirement for interop here.
> 28) Why is this OPTIONAL? It sounds like an interop requirement.
> 29) Given in-order delivery, I don't think this one's necessary.
> 30) 3.6.4 Here you're using 2119 in a section about UI, in the middle
> of the Protocol section of the document, and I've no idea what the
> Google advert is doing in there.
> 31) 3.7 Is XEP-0020 the appropriate negotiation method here? I'd have
> thought negotiating a simple Jingle session would be more appropriate?
> 32) The right way of querying for support will be disco/caps.
> 33) 3.8.3 Some clarification of what a message position is would be
> beneficial here.
> I'm cutting off the blow-by-blow review here, as I don't have more
> time to spend on this, but hopefully this will start sufficient
> discussion to work out what the next steps are.
> /K

More information about the Standards mailing list