[Standards] review of XEP-0301, sections 1-5
markybox at gmail.com
Sat Aug 18 03:00:41 UTC 2012
This is my Reply #1 (of at least 2)
Covering: Minor changes
On Fri, Aug 17, 2012 at 7:16 PM, Peter Saint-Andre <stpeter at stpeter.im> wrote:
> I've started to perform a complete review of XEP-0301. Here is the
> first half of my feedback, in order of reading. Many of these comments
> are nits but others are more significant.
Thanks so much for your great comments about XEP-0301 Real-Time Text:
Since you've made so many comments (mostly clarifications), perhaps I
should do a Version 0.8 before other people review? (I'd start doing
Gunnar's other recommended changes too as well)
When you do Part 2, please pay particular attention to Section 6.2
because that's the section that needs the most careful review, since
in the past (Kevin, Gunnar) suggested RFC2119 normatives, but it's an
Implementation Note, and you've told me not to use RFC2119 there. I
was waiting for LC comments, but since you're re-reviewing again,
please pay close attention to section 6.2. I would also like and let
me know if parts of it needs to be moved higher up (e.g. a creating a
small "Business Rules" section under "Determining Support" and above
> 1. Please expand acronyms on first use (e.g., TTY, SIP, UI).
> 2. "It can also allow immediate conversation in situations where
> speech cannot be used (e.g. quiet environments, privacy, deaf and hard
> of hearing)."
> This is a bit unclear. For instance, what is a "privacy situation"?
> This could be phrased more carefully, such as:
> "It can also allow immediate conversation in situations where speech
> cannot be used (e.g., in quiet environments, when using text preserves
> privacy better than speech would, and when one or more interlocutors
> is deaf or hard of hearing)."
> Another relevant scenario might be speech transcription. I think
> that's worth adding here:
> "It can also allow immediate conversation in situations where speech
> cannot be used (e.g., in quiet environments, when using text preserves
> privacy better than speech would, during speech transcription, and
> when one or more interlocutors is deaf or hard of hearing)."
Good one, just one word change to match precedent: c/interlocuters/participants/
The only XMPP spec that uses the word "interlocuters" is XEP-0148
According to a Google search on site:xmpp.org/extensions, I found lots
of XMPP specs use the word "participant".
One comment: The phrase "deaf and hard of hearing" is the phrase
standardized by Hollywood for "English SDH" = "Subtitles for the Deaf
and Hard of hearing", so I have decided to adopt the same phrase, as
the more-PC "hearing impaired" phrase is currently being deprecated
nowadays as deafies often prefer to use the word "deaf" anyway.
However c/and/or/ is acceptable.
> 3. "For a visual animation of real-time text, see Real-Time Text Taskforce ."
> If the author provides an image file, the XSF can host this page on
> xmpp.org. I would prefer to do so.
This may be treated as a separate task versus 0.8 -- Creating a
generic GIF animation (100% brand free) may take a while, since I now
have to hand-edit each invidual frame, or modify RealJabber UI to look
simpler/more genericized (eliminate OS-specific looks, etc) and run
multiple tools to capture it all. For this reason, I may leave the
spec to refer to realtimetext.org for at least a couple more weeks.
> 4. "Reliable real-time text delivery." Please at least make this a
> sentence: "Provide reliable real-time text delivery." More
> substantively, what do we mean by "reliable delivery" here? What level
> of reliability is expected (e.g., so-called "guaranteed delivery",
> at-least-once delivery, at-most-once delivery, etc.)? Are
> acknowledgements required per-stanza (e.g., XEP-0184) or is
> stream-level reliability (e.g., XEP-0198) enough?
[Comment & Suggested Change]
I leave interpretation of "reliable" to the implementer choice.
Perhaps I should change this to
"Allow real-time text to be as reliable as message-by-message transmission."
Would this be OK?
XEP-0301 can be used with/without XEP-0184 and XEP-0198. And XEP-0301
can be sufficiently "reliable" without either, with a good XMPP
server. You're immune to interference from things like NAT,
firewalls, port blocking, corporate firewalls -- as long as messages
are deliverable, and the server isn't configured to block extensions.
So, it essentially means, roughly, "whenever standard messages are
reliable, then XEP-0301 is reliable too" (for the large part)
> 5. What do you mean by "network traversal mechanisms"? We hope that
> all stanzas will traverse the network. :) Perhaps you mean NAT traversal.
Yes, but I was told by at least two people to not be specific about it.
I used to mention "NAT" but removed it several revisions ago.
It can also be corporate firewalls that block out-of-band connections
but keeps the XMPP port open. It can be carrier NAT.
> 6. "Compatible with multi-user chat (MUC) and simultaneous logins." =>
> "Be compatible with multi-user chat (MUC) and simultaneous logins."
> 7. "Protocol design ensures integrity of real-time text" => "Ensure
> integrity of real-time text". More substantively, what is meant by
> "integrity" here? See for instance RFC 4949:
> $ data integrity
> 1. (I) The property that data has not been changed, destroyed, or
> lost in an unauthorized or accidental manner. (See: data integrity
> service. Compare: correctness integrity, source integrity.)
> 2. (O) "The property that information has not been modified or
> destroyed in an unauthorized manner." [I7498-2]
> Usage: Deals with (a) constancy of and confidence in data values,
> and not with either (b) information that the values represent
> (see: correctness integrity) or (c) the trustworthiness of the
> source of the values (see: source integrity).
> Depending on the definition of "integrity" being assumed here,
> end-to-end encryption might be needed. More clarity, please. :)
Integrity is "correctness integrity", and is already accomplished by
section 4.6 "Keeping Real-Time Text Synchronized"
> 8. Please separate "allow extensions for new features" from "ensure
> integrity of real-time text" -- these are two quite different requirements.
I merged it to a single bullet to keep it 4 bullets per requirement.
I'll figure out how to juggle things around. But I might just delete the phrase
since it hopefully is already obvious (due to the extensible nature of
XMPP -- the "X" in XMPP stands for :-)
> 9. "Allow XMPP to follow the ITU-T Rec. F.703" => "Help enable XMPP
> applications conform to ITU-T Rec. F.703" (as far as I understand it,
> the RTT spec will not by itself enable XMPP applications to conform to
> F.703, and I don't think XMPP itself could be said to so conform).
Agreed, you need other specs, such as audio and video, combined with this.
> 10. "The bounds of seq is" => "The bounds of seq are"
> 11. "Recipient clients MUST ignore <rtt/> containing unsupported event
> values." => "Recipient clients MUST ignore <rtt/> elements containing
> unsupported event values."
> 12. event='cancel'
> "Clients MAY use this value to signal the other end to stop
> transmitting real-time text."
> What if the sending client disables RTT? Does it send a message with a
> body element and then no subsequent messages containing <rtt/>
> elements? Does it need to signal that it has disabled RTT? Could
> 'cancel' be used for that purpose?
(1) Sender disabling RTT while in the middle of typing shouldn't
automatically send a body element. The user should hit Enter or click
Send, before the message is transmitted. Turning off audio/video
abruptly stops audio/video, so turning off RTT should abruptly stop
real-time text. The message isstill be waiting in the sender's send
textbox, still in the middle of being composed -- and the sender might
want to finish composing the message without RTT. The <body> only
occurs when the message is actually "sent" (send button or hitting
Enter as usual).
(2) The sender software preferably should send <rtt event='cancel'/>
(the only element transmitted in response to a user manually disabling
RTT). When I say "clients", that means either the sender or
recipient, either or both ends can transmit <rtt event='cancel'/>.
It's not necessary for the receiver of <rtt event='cancel'> to
automatically transmit <rtt event='cancel'>
Maybe a note is needed to mention about this somewhere?
P.S. Side note. For MUC, behavior is different, as already explained
in MUC section. I designed XEP-0301 so that receiving RTT doesn't
have to cause extra outgoing stanzas. This makes it more immune to
amplification attacks, and increases viability of RTT during MUC. So
exiting participants should only send <rtt event='cancel'/> and
recipients shouldn't send extra stanzas in response to incoming
stanzas. Each user in MUC enables/disable real time text only for
themselves, and incoming RTT doesn't cause additional outgoing
stanzas. This avoid creating a vector for amplification attacks, and
to allow reasonably harmless use of RTT during MUC.
(Though we'll someday might need a server extensions of controlling
transmission intervals, or server mechanism of limiting RTT only to
lecturers / teachers / conference speakers / transcription engines,
since sometimes it's useful to only have one or two real-time text
users in a MUC. Such as <rtt> being only allowed for room operators,
etc. Possibly future server disco parameters. Right now, it's too
early, need a year or two of trials, before determining the needs of
real-time text during MUC)
> 13. "This id attribute refers to the <message/> stanza containing the
> <body/> that is being edited (See 'Business Rules' in XEP-0308). When
> used, id MUST be included in all <rtt/> elements transmitted during
> message correction of the previous message. The whole message MUST be
> retransmitted via <rtt event='reset'/> (Message Reset) when beginning
> to edit the previous message, or when switching between messages (e.g.
> editing the new partially-composed message versus editing of the
> previously delivered message)."
> Examples would help here. In particular, when you say "the whole
> message MUST be retransmitted" I assume that means something like this:
The actual existing text is: "The whole message MUST be retransmitted
via <rtt event='reset'/> (Message Reset) when beginning to edit the
previous message, or when switching between messages (e.g. editing the
new partially-composed message versus editing of the previously
I don't say retransmitting the message via <body>. Just only via <rtt
event='reset'/> at the beginning of beginning to edit a message.
Can you clarify how I can fix the sentence, to be more clear? As
explained in talks between me and Kevin, the Message Reset is useful
to re-populate the real-time message buffer with the text of the
message currently being edited; and this also allows it to function
properly even if the sender lost chat history, improves compatibility
with concurrent logins and MUC, and this also allows graceful
backwards-compatibility degradation so it doesn't "fail ungracefully"
(e.g. trying to edit a non-existent message, or unsynchronized message
text between sender and recipient). That's the only real purpose of
doing a Message Reset
The <body> only occurs when the message is actually "sent" (send
button or hitting Enter as usual).
XEP-0301 should (in most cases) ideally cause no changes at all to
<body> sending behaviour.
Refer to my XEP-0308/XEP-0301 combined examples in the August 4th
emails to the mailing list.
Suggestions welcome for a clarified sentence!
> 14. I think it would be good to make it 100% clear that the <body/>
> element is not qualified by the RTT namespace:
> The real-time message is considered complete upon receipt of a <body/>
> element in a <message/> stanza.
> The real-time message is considered complete upon receipt of a
> standard <body/> element (i.e., qualified by the 'jabber:client'
> namespace) in a <message/> stanza.
Got it, though I've rearranged the convoluted location of the
paranthesis to the end of the sentence, to make the sentence more
readable to some target audience of the spec:
"The real-time message is considered complete upon receipt of a
standard <body/> element in a <message/> stanza (i.e. The <body/>
qualified by the jabber:client namespace, and commonly transmitted in
a <message/> stanza).
> 15. I like the previous suggestion to change "0.7 seconds" and "0.3
> seconds" to "700 milliseconds" and "300 milliseconds" respectively.
I had kept 0.3 and 0.7 because the timing isn't critical, and can fluctuate.
Saying "700" or "300" sounds a bit precise, but I'm aware it sounds
scientifically better :-)
> 16. Why is support for the <e/> element only RECOMMENDED for senders?
> Given that most users will hit the backspace key (or equivalent)
> fairly frequently, I'd argue for REQUIRED.
That's not true, because:
1. Transcription. Many transcription engines don't support
backspacing, Sprint Captioned Telephone display corrections in
brackets right after the error.
2. Bots don't need spell checkers :-) News ticker bots. Real-time
stock quote bots.
3. Basic Real Time Text.
All message changes are transmitted only using message resets, which
only needs <t/> ... all message edits including backspace is supported
4. Combining Append-Only Real-Time Text
and Basic Real-Time Text (whenever <e/> is otherwise needed). A major
potential implementer has indicated they prefer this method for
simplicity (low CPU overhead compared to section 6.4.1 "Avoid Bursty
That's why <e/> only RECOMMENDED for senders.
It appears I created a failure of the spec to explain clearly why <e/>
isn't REQUIRED for senders, so I'm curious: Why you thought it should
be REQUIRED? I thought the spec already made it clear about many use
cases that don't require <e/>. Suggestion welcome!
> 17. Do the 'n' and 'p' attributes really need to be of type
> nonNegativeInteger? That seems a bit big for a typical message. For
> example, unsignedLong has a maxInclusive of 18446744073709551615 and
> unsignedShort has a maxInclusive of 4294967295. That seems quite
> enough for one RTT message!
[Comment & Change Made]
You're right, I don't need 64-bits. But I'd prefer closer to 32-bits
rather than 16-bits, for better integrity with seq randomization
("Keeping Real-Time Text Synchronized"), and to reduce concern about
wraparound. Since the bounds is actually 31-bits, it can be either
"unsignedInt" or "Integer", I thought "nonNegativeInteger" was a
convenient catch-all for either.
I'll change to "unsignedInt" anyway, but implementers are free to use
signed integers for convenience in languages like Java.
> 18. I find the bullet points in Section 4.6 slightly confusing, e.g.,
> "resuming after connecting". What is exactly is being resumed and who
> exactly is connecting? If I come online and you're in the midst of
> sending me messages, my client doesn't have anything to "resume",
> although it does need to adjust to the fact that there's a real-time
> text message in progress (somehow).
1. Sender user is typing a message.
2. Recipient user (or MUC participant) signs on.
3. Sender client sends a message reset in the background within 10
seconds or less (section 4.6.3 --
4. Sender user, blissfully unaware, is still continuously typing the message.
5. The Recipient user (or MUC participant) sees sender real-time text
suddenly "catch up and resume"
The sender can be continuously typing, and the clients will ensure
that the real-time text keeps synchronized (resumes) wherever
Can you help me re-word the bullet, to make this clearer?
The Message Reset mechanism (section 4.6.3) provide the magic
ingredient to resume real-time messages, independently of recipient
timing of logging on & independently of existence of recipient client
at the moment of time the sender started typing message. This
enhances usability and user experience, and prevents real-time text
from becoming lost, including in switching clients, and during MUC
(e.g. transcription/professor/conference presenter sender real-time
text, and allowing quick resumption of real-time text on all
recipients, thanks to the Message Reset mechanism)
Can you help suggest a change to the sentence/phrase, that would make
this a little clearer?
> 19. I found "switching systems, switching windows" confusing until I
> realized that you were talking about switching between devices or
> switching between windows/tabs in a given client. Please expand a bit.
I want to keep bullets compact, so I changed "systems" to "clients".
This now reads "switching clients, switching windows" for this first occurance.
Is this acceptable?
Switching clients and switching windows are two separate things.
Switching windows can be within the same client too, since some
clients allows opening independent threads of conversation to the same
JID (two simultaneous one-on-one conversations with the same person,
in two separate windows). Or different browser windows (e.g. two
GMAIL windows, two Facebook windows, etc) containing a copy of the
same conversation. Switching clients can be within the same system
or different system. Also, these JID's might or might not be
distinguished (in which case, the handling of conflicting <rtt/> is
already handled via "Keeping Real-Time Text Synchronized"). I've
designed XEP-0301 to be resistant to all possible strange
combinations, including people typing a message in one browser window,
then switching browser windows, and keep typing a different message in
the different window etc. The logic in "Keeping Real-Time Text
Synchronized" automatically maintains message integrity in such
I didn't think I need to explain further, because it sounded too
"implementation specific" -- I carefully designed the real-time text
synchronization it to be virtually implementation-independent, for
maximum flexibility, with a rudimentary correctness integrity check,
and recipient resumability. (bare JID or not, <thread/> or not, MUC
that filters full jids, conflicts caused by multiple windows,
conflicts caused by simultaneous logins that aren't distinguished,
etc.) ... But do you think I should mention implementation examples
such as the above?
> (Also under 4.6.3: "switched systems" => "the user switched from one
> device to another".)
I now use "(e.g. user switched between clients)"
Is this acceptable?
> 20. "For implementation simplicity, recipient clients MAY track
> incoming <rtt/> elements per bare JID." Please use the &LOCALBARE;
> entity here so that readers who are new to XMPP understand what we
> mean by "bare JID". (Same for &LOCALFULL; later in this paragraph.)
I didn't know about the LOCALBARE / LOCALFULL entities were
conveniences for this, thanks!
> 21. "Conflicting <rtt/> elements, from separate Simultaneous Logins,
> is handled via the remainder of this section." => "The handling of
> conflicting <rtt/> elements from separate Simultaneous Logins is
> described in the remainder of this section."
> 22. "Alternatively, recipient clients MAY keep track of separate
> real-time messages per full JID and/or per <thread/>." It would be
> good to specify what it means to keep track of RTT messages per
> thread. Citing XEP-0201 might help, as well.
> 23. "When the recipient sends a presence update (e.g. from offline to
> online);" ... does the same reasoning apply to other presence updates,
> such as from "away" to "xa" or from "dnd" to "chat"? I'd like to make
> sure that this advice is truly general.
Actually, you're right. I'll tweak this wording to be general.
> 24. "When the conversation is unlocked (e.g. section 5.1 of XMPP IM
> );" ... please consider citing XEP-0296 here, too.
The reason I do not cite XEP-0296, is because:
1. To strengthen XEP-0301, I want to cite only Draft or Final
standards, not Experimental standards. (the only exception is XEP-0308
which is now in LAST CALL state)
2. I cite so many different XMPP standards, my references section is
I offer to cite XEP-0296 when it gets upgraded to "Draft"
I use "e.g." because it implies "etc" which also implies XEP-0296
> 25. "Note: There are no restrictions on using multiple Action Elements
> during a message reset (e.g. typing or backspacing occurring at the
> end of a retransmitted message)." This seems potentially confusing.
> IMHO it would be friendlier for the recipient to process the reset as
> the state of the RTT message at a point in time and for the sender to
> then send additional <rtt/> elements for subsequent modifications.
> (Postel's Law and all that.) However, that's unenforceable so I
> suppose it's OK as-is.
--also split to different thread--
Event 'reset' is identical to 'new' according to section 4.2.2
You can even do this:
if ((event == "new") || (event == "reset"))
// Clear existing real-time text buffer
// Process action elements in RTT
Some implementers will add a presentation behaviour to 'new'
(e.g. green flash to indicate a brand-new message started)
Also, this is important to XEP-0301 when preserving key press
intervals. I don't want to disturb the cadence of key press intervals
(Wait Intervals) when I'm doing the 10-second Message Resets (in
"Keeping Real-Time Text Synchronized"). In RealJabber, I include 700
milliseconds of recorded typing at the end of the message reset
every-10-second, so that recipient software doesn't cause jerky key
> 26. "The Unicode characters of the real-time text needs to make it
> transparently from the sender to the recipient, without unexpected
> modifications after sender pre-processing." s/needs/need/ Also,
> "transparent" doesn't seem like the right word here. I would say "The
> Unicode characters of the real-time text need to be transmitted
> unaltered from the sender to the recipient, without unexpected
> modifications after sender pre-processing." And later "Transparent
> transmission" => "Unaltered transmission".
> 27. "Any inconsistencies that occur during real-time message editing
> (e.g. non-compliant servers that modify messages, incorrect Unicode
> Character Counting) will recover..." That is strangely worded. We
> don't talk about editing elsewhere, and the inconsistencies won't be
> the ones doing the recovering (although the recipient can recover from
> the inconsistencies). I suggest rewording this paragraph.
I'll rewrite this paragraph.
> 28. The section on Unicode Character Counting is blessfully clearer
> than I recall. I suggest clarifying the first bullet point even further:
Thank you for the compliment, the mailing list was abuzz under the
Unicode thread & I worked hard at satisfying as many people as
possible in confusing Unicode technology -- it took me many hours on
that section alone -- because I had to study the Unicode standard
carefully and cite proper terms and specifications.
> Multiple Unicode code points (e.g. combining marks, accents) can form
> a combining character sequence.
> Multiple Unicode code points (e.g. combining marks, accents) can form
> a combining character sequence. In addition, some combining character
> sequences (represented by multiple code points) can be transformed
> into a visually equivalent composite character (represented by a
> single code point), or vice-versa (e.g., under Unicode normalization).
[Comment & Change Made]
That's true. But as we already both know, not all combining character
sequences can be sent as a single composite character (e.g. single
code point). So I had hoped that was automatically implied, but I
guess I have to teach more Unicode here, eh? :-)
I prefer a shorter version:
"Multiple Unicode code points (e.g. combining marks, accents) can form
a combining character sequence. This can also occur in situations
where there isn't a visually equivalent composite character of a
single code point (e.g. when doing Unicode normalization)"
Is this shorter version acceptable?
> 29. "separate and concurrent to" => "separate from and concurrent to"
> 30. "Pre-processing before generating real-time text, include" =>
> "Pre-processing before generating real-time text includes"
> 31. "sender clients SHOULD ensure the message is in Unicode
> Normalization Form C ("NFC"), specified by section 3 of RFC 5198, and
> within". Note: NFC is not defined in RFC 5198! This would be clearer:
> "sender clients SHOULD ensure the message is in Unicode Normalization
> Form C ("NFC"), as recommended within section 3 of RFC 5198 and
> within..." Furthermore, it would be good to cite TR15 here:
You're right, subtle distinction issue I overlooked --
It's only "recommended by" RFC5198, not "defined by" RFC5198
> 32. "If Unicode combining character sequences (e.g. letter with
> multiple accents) are used for Element <t/> – Insert Text, then
> complete combining character sequences SHOULD be sent." This seems
> more consistent with NFD than NFC (which performs recomposition). That
> is: are you recommending that applications perform compability
> decomposition so that they break a composite character into a
> combining sequence? If so, then you really want NFD, not NFC. IMHO it
> would be safer to use composite characters wherever possible, rather
> than decomposing composite characters into combining sequences as a
> recommended practice. In any case, NFC will perform recomposition
> anyway, so this advice might be moot (or at least confusing).
Yes, but not all combining character sequences can be represented by a
composite character. NFC is better and more-bandwidth efficient, and
is more commonly used. It is more frequently used for networked
Unicode. So if a network channel rudely does NFC on my Unicode (e.g.
RFC5198 compliant transmission), the Unicode is not corrupted. I'd
rather be immune to subsequent unwanted normalization passes by the
most common normalization standard, so NFC is better. In compliant
architectures, this is moot, but everyone has told me I need to
specify a normalization standard, so I am following RFC5198
recommendation of using NFC.
Decomposition is not needed; the fact remains: Not all combining
character sequences can be represented by a single code point.
> "It is possible for Element <t/> – Insert Text to contain any subset
> sequence of Unicode code points from the sender’s message. This can
> result in situations where text transmitted in <t/> elements is an
> incomplete combining character sequence (e.g. Unicode combining
> mark(s) without a base character) which becomes a complete sequence
> when inserted within the recipient's real-time message (e.g.
> additional accent for an existing combining character sequence). These
> are still complete individual code points, even if the sequence is
When you follow Section 6.4.1
You may run into situations where the difference of text is a single
combining accent character. You have two choices:
(1) Make section 6.4.1 much more complicated. (not acceptable)
(2) Keep this paragraph instead. Allow implementers to use
differental encoders (section 6.4.1 compliant), like RealJabber
Pick your poison. I prefer the latter.
Which poison do you prefer? :-)
Granted, most GUI controls will automatically not display a Unicode
character until the combining sequence is complete. In this case,
this paragraph is "moot".
(A) However, sometimes in some GUI controls on some platforms,
accent-addition is real-time accumulative, so a text change event
might occur whenever every additional accent is added.
(B) Also, even if the GUI buffers until the sequence is complete --
there are also situations where a person might intentionally overwrite
a shorter combined character sequence with a longer combined character
(in situations where composite characters are not available for
This is the rasion d'être for the existence of the paragraph. Doing a
simple differential real-time text encoder (section 6.4.1 monitoring
message changes instead of key presses) will have the issue of picking
out an incomplete combining character sequence, in both of the above
cases (A) and/or (B). Therefore, it's not a moot issue.
> I'm not sure how the recipient's client will show a combining mark
> without a base character, but the potential for user confusion might
> be high, here.
That situation should not happen.
I am talking about modifying a valid complete combining character
sequence, to a new valid combining character sequence.
The standalone combining mark will never be displayed -- it's only
See differential encoding according to section 6.4.1 (e.g. turning a
valid two-character sequence into a valid three-character sequence, by
transmitting only the combining mark detected by differential encoder
algorithm in section 6.4.1)
Perhaps I need to add an additional sentence to make this little tidbit clearer?
If so, what do you suggest?
More information about the Standards