[Standards] UPDATED: XEP-0301 (In-Band Real Time Text)

Mark Rejhon markybox at gmail.com
Mon Aug 13 15:29:02 UTC 2012

[ Real-Time Text at http://xmpp.org/extensions/xep-0301.html ]

Hello Gunnar,

Thanks very much for the minor corrections to XEP-0301.  I have queued
your edits.  My present judgement is that your edits are safely queued
until LC, however, I'd like comments from other key XSF members:
....There is ONE bullet meriting further discussion.  Talk related to
section 6.2 Activation/Deactivation.  Especially if
Kevin/Peter/M&M/etc has major comments about section 6.2 ... though
Kevin says it didn't need to be relocated to a "Business Rules"
section, and therefore is okay where it is for LC, I'm told.  But does
M&M disagree, for example?

On Sun, Aug 12, 2012 at 10:13 PM, Gunnar Hellström
<gunnar.hellstrom at omnitor.se> wrote:
> 1.   4.2.3 id
> The text should start with a description of what function this attribute
> supports.  Insert after the title:
> "The id attribute is used to enable real-time correction of the last completed message."

[Minor Clarification Change]
Good suggestion. I'll queue this edit till LC.
(unless a 0.8 is warranted prior, e.g. comments from m&m/peter/kevin et cetra)

> Insert the title of XEP-0308
> XEP-0308 Last message correction
> 2.  4.2.3 id
> On second line. Reception must always be supported if both 0308 and 0301 are
> supported. c/MUST use this attribute if/MUST support reception of this attribute, and
> MUST transmit this attribute if/

[Minor Clarification Change]
Clients that support incoming corrections don't necessarily do
outgoing corrections. Therefore, a different change is better:
c/clients MUST use this attribute if/clients MUST support this
attribute in situations where/
I'll queue this edit, too.

> 3.   6.2.1 Activation guidelines
> Can we really accept the weak indication that discovery is recommended? I think it shall be mandated.
> Proposal:
> c/Before sending real-time text, it is preferable for a sender client
> to/Before sending real-time text, a sender client SHALL/

Peter told me that RFC2119 normatives do not belong in "Implementation Notes".
Therefore, if RFC2119 is used, the whole section 6.2 would
automatically need a relocation upwards closer to "Protocol" instead
of "Implementation Notes".
I'm open to suggestions by others, such as moving to a "Business
Rules" section (just below "Discovering Support" and above
"Implementation Notes")  However, Kevin of XSF said that it is fine
where it is.  However, I agree this is an item meriting some
discussion, though I'm not 100% sure if this needs to be addressed
before LC.
(Comments from others?  Does it?)
Section 6.2: http://xmpp.org/extensions/xep-0301.html#activating_and_deactivating_realtime_text

> 4. Section 1, intro, last bullet point, to make the description of REACH112
> correct.:
> c/A component within Total Conversation, used by Reach112 [4] in Europe, an
> accessible emergency service with real-time text./The real-time text
> component within Total Conversation, for example used by Reach112 [4] in
> Europe, a project for accessible communication services including emergency
> service.

[Minor Clarification Change]
Thanks for the Reach112 clarification.  I'll queue this edit, too.

> 5. Section 4.1 Example 1 should have a more natural distribution of letters
> in the different <rtt/> elements, appearing from the transmission in regular
> intervals as specified in section 4.1. This comment has been made from
> different sources a number of times.

I already mentioned that the existing example is more readable to a
wider variety of less experienced people.  The smart people (like us)
will figure it out just fine, and the other examples illustrate this
already.   I'm going to cater for the majority here.

> 6. 4.7 Third paragraph. Language correction. This is an enumeration of only
> two elements. Connect them with or instead of comma:
> s/ messages, incorrect/ messages or incorrect/

[Minor Grammar]
Thanks, I'll queue this edit, too.
Note that "e.g." represents partial list of examples, and
automatically assumes "etc." at the end.  So there are other
situations as well, so it's not just two.  Even though only two are

> 7. Appendix G:
> 4 s/ Reach112: European emergency service with real-time text./ Reach112:
> European accessible communication and emergency service project with total
> conversation including real-time text./

[Minor Clarification Change]
Thanks for the Reach112 clarification.  I'll queue this edit, too.

> 8.  4.1 xml:lang
> Describe explicit or implicit use of the xml:lang attribute similarly as it
> is described for <body/> in RFC 6221.
> This attribute can introduce alternative language variants of the text in
> messages and other elements.
> The use is described in RFC 6221.
> For XEP-0301 it would be natural to offer the same opportunity to provide
> the alternative languages in the same message.
> This would at least go into section 4.2 RTT attributes and <t/>
> element
> Each language will have its own editing elements and values, so the xml:lang
> attribute should be on the <rtt/> level.
> I propose insertion a new subsection in 4.2
> -----------------------------------------------------------------------------------------------------------------------------------------------
> 4.2.4 Language
> Multiple instances of the <rtt/> element MAY be included in a message stanza
> for the purpose of providing alternate versions of the same real-time text,
> but only if each instance possesses an 'xml:lang' attribute with a distinct
> language value (either explicitly or by inheritance from the 'xml:lang'
> value of an element farther up in the XML hierarchy, which from the sender's
> perspective can include the XML stream header as described in RFC 6220 [
> ]). The support for language variants SHALL follow the principles of support
> for language variants in message bodies specified in RFC 6221[   ].
> This example provides a small part of real-time text in the default language
> English and the alternative language Check.
> <message from='juliet at example.com/balcony'
>  id='z94nb37h' to='romeo at example.net' type='chat' xml:lang='en'>
>   <rtt xmlns='urn:xmpp:rtt:0' seq='89002'><t>tho</t></rtt>
>   <rtt xmlns='urn:xmpp:rtt:0' seq='32304' xml:lang='cs'> <t>ty</t></rtt>
>  </message>
> --------------------------------------------------------------------------------------------------------------------------------------------------
> The second line from the bottom of 4.1 should be changed from
> "There MUST NOT be more than one <rtt/> element per <message/> stanza."
> to
> "There MUST NOT be more than one <rtt/> element per language variant in each
> <message/> stanza."

My judgement is I'm going to leave this out because it is a
non-typical case of real-time text.  People aren't going to send
multiple languages simultaneously, as people can't type in more than
one language simultaneously.   It is an edge case that would be useful
for things like European Union transcriptions and/or United Nations
transcriptions.   From what I can see, this edge case is easily
handled simply either by having separate <threads> for each language
(already recommended in last sentence of section 4.6) .... Each thread
can easily separately use one "xml:lang" each -- harmlessly -- without
xml:lang ever needing to be specified in XEP-0301 itself.     Heck,
it's also possible to use separate nicknames for each languges, or
separate MUC rooms for each language, "TranscriptEN", "TranscriptFR",
etc.  There's many solutions to cover the rare edge case of multiple
simultaneous language transcripts.

I feel that this is:
(1) This is an ultra-rare edge case;
(2) This edge case does not warrant inflating the spec by 1/2 a page.
(3) The problem can be solved without things this way.
(4) It's simpler and more backwards compatible for clients to take
advantage of multiple languages, using the above method already
recommended within XEP-0301.  Other techniques are possible too (e.g.
multiple nicks, multiple MUC rooms, etc) for multiple concurrent
(5) xml:lang can be used already, in the simpler and more backwards
compatible manner

I am thinking that many others would agree that XEP-0301 can more
easily already be used with multiple languages (and without
complicating the spec for a majority), in a different manner.  On top
of it, the techniques you describe is more complicated than
alternative methods of multiple-language transcripts.  I would suspect
that even Kevin (who originally suggested you were right, and maybe
that is why you bring this up again right now), now agrees with this
assessment.  I'd like to hear a comment from Kevin.

Comments from others?

> 10. Appendix A, dependencies need updating.
> [Discussion]
> Dependencies: now only contain XMPP Core and XEP-0020.
> XEP-0020 seems not to be used anymore, but at least XMPP IM, XEP-0030,
> XEP-0085, XEP-0115, XEP-0308
> [Proposal]
> Appendix A, dependencies,
> s/XMPP Core, XEP-020/XMPP Core, XMPP IM XEP-0030, XEP--0085, XEP-0115,
> XEP-0308/

[Minor Clarification Change]
Thanks, I'll sync up the dependencies.
However, XEP-0085 isn't a dependancy (it doesn't even need to be used
at all), and XEP-0115 isn't needed either (if you use XEP-0030 Service
Discovery instead).  But, yes, the dependencies still need to be
Note: Precedent shows that people generally typically leave out 0030
and 0115, so I'll do the same.

> 11. Appendix A. A short name should be assigned.
> [Discussion]
> Some possible short names:
> rtt, xmpp-rtt, real-time-text, xmpp-real-time-text

See section "12. XMPP Registrar Considerations" as "urn:xmpp:rtt:0"
The name will be based on that.

> 12. Chapter 1, 5th bullet point
> [Discussion]
> This point refers to a brand name AOL. I am not used to see brand name
> references in standards. I would guess that that tradition applies to XEPs.
> if so:
> [Proposal]
> s/The Real-Time IM [3] feature found in AOL Instant Messenger./A real-time
> text feature found in some instant messaging systems./

I kind of agree with you, though I'm divided about removing this
entirely, because it's the only mainstream program that currently has
real-time text, and is thus a good example. But I agree about removing
brand names, too.  The existence of mainstream-but-proprietary support
is also a good argument to allow the extence of an open platform
support.   But your change is doable.  I think it's not an LC
showstopper, though.

> 13. Section 4.4, second line
> s/This interval meets/This interval provides an opportunity to meet/
> [motivation] The F.700 requirement is for end-to-end latency ( that provides
> the human experience). We cannot guarantee network delay. Therefore the
> interval is set so that there is a good opportunity to meet the requirements
> in most network conditions.

[Minor Clarification Change]
Queue change is: "This interval makes it possible to meet"

> 14. Section 4.2.3 id
> [discussion]
> It has just been revealed in the XEP-0308 discussions that there may be
> cases when the id is not recognized by the receiver. It may be a just logged
> on receiver, or it may be a MUC server modifying id. Therefore add caution
> for that situation.

Good point, but I will wait until XEP-0308 is updated with handling of
unrecognized 'id'.
I plan to keep in sync with XEP-0308's recommendation.
Also, this does not seem to be an issue warranting action right now,
as reset is always accepted (one way or another) anyway.

> [discussion]
> Extra caution needs to be added so that a bursty output sender does not
> overwhelm a receiver or the network with too frequent packets. Text sources
> such as speech-to-text and cut and paste can produce more than one word per
> 700 ms and may therefore need to transmit more than one word per packet.

All transcription engines I've seen, even with 300-400+ WPM speakers,
tend to output in sudden phrases at a time rather than a word, so
naturally, I will instead queue a minor edit "word or phrase bursts"
rather than "word bursts"

Also consider:
(a) It will average out -- It's not harmful to have several sudden
bursts in short periods.
(b) I already specify a range of 300ms-1000ms.
(c) Even that range is only a recommendation, so you can go less than 300ms.
(d) Vast majority of networks today can handle a higher stanza rate
(e) Implementers can figure out that they can optimize/combine to
lower the stanza rate during transcription output.

> 16. Section 6.2.1 Activation.
> [discussion]
> Close to the end of this section, there is a sentence saying:
> "It is inappropriate to send any further <rtt/> elements, until support is
> confirmed".
> I wonder if there are any one-to-many situations with a very large number of
> recipients when it is not appropriate to expect answers from all recipients.
> If so, this sentence should be deleted.

This is only applicable to one-on-one chat, not for MUC.
Also, there was someone complaining about unsolicited <rtt>, so the
sentence can't be removed.
It can be clarified or upgraded to a Business Rule, though. (with
proper normatives and clarifications)
However, I'll wait for general section 6.2 comments (see above)

> 17. Section 6.2.1 and 6.6.2 Missing reference to XEP-0085
> There are three references to XEP-0085, none of them have the customary
> reference to the reference list and a link to the document.
> One is in section 6.2.1, two are in section 6.6.2

Two things:
XEP-0085 is "Chat State Notifications" ... Are you sure you are
referring to XEP-0085?
1. Read again -- click the Refresh button -- the first reference is
there (section 6.2.1)
2. By precedent, only the first reference is linked.  This is
applicable for everything else (example: RFC4103, T.140 is linked only
on the first time)

> 18. Section 6.4.4  "Basic real-time text.
> [discussion]
> This is an odd way of transmitting real-time text by sending the whole
> real-time message in each transmission.

I disagree: It is not odd;
(1) It is one of the most useful sections in XEP-0301 because it
illustrates that XEP-0301 is simpler than it looks;
(2) It is very easily rapid-prototypeable.  It allows quick test
programs/demonstratable programs in advance of full implementations.
(3) It can be used with stream compression, to save bandwidth.  (In
fact, if stream compression is used, it uses less bandwidth than all
the alternatives)
(4) Many proprietary real time text implementations, such as Bell
IP-Relay already use retransmits as their version of real time text.
If they don't feel like implementing the full XEP-0301, they should at
least support basic XEP-0301 -- I am leaving the door open to that.
(5) Clients have asked about this already.

Yes, the loss of key press intervals is an important big disadvantage.
 But I have already said that in that section, too.

However, I have given extenuating rationale, of keeping this section
(which is not odd to begin with).  For others, here's the link:
As it illustrates an easy use of XEP-0301.  By reading this section, I
think it is pretty clear that it warrants inclusion.

I also prefer to keep the section title, though other good suggestions
are welcome ("Simple Real Time Text", "Easy Real Time Text").  I don't
want to use a more complicated title name for this.   I could add more
text as a caveat, such as "This form of real-time text is also useful
for quick prototypes", etc.  But    that's sounding too much of an
implementation note.

> [proposal]
> Reinstate the sentence last in section 6.5 adapted to the current guideline
> language of that chapter:
> "If support for the <w/> element is not possible, receiving clients are
> recommended to use an alternate text-smoothing method, such as time-smoothed
> progressive output of received text."

[Implementation-Related Change Made]
Thanks, I've queued this change.

> 20. Section 4.3.1 Wrong reference
> [discussion]
> Section 4.3.1 says that <body/> is defined in XMPP CORE. It is instead
> defined in XMPP IM [10]

[Minor Clarification Made]
Thanks, I've queued this change.

Thanks for all your suggestions, all of which are essentially minor
changes and mostly implementation-notes related.
With the exception of section 6.2, which warrants further discussion
(Kevin, M&M, Peter?)

Mark Rejhon

More information about the Standards mailing list