[Standards] Comments on XEP-0301 (possible impact on -0308 in Section 4.2.3)

Gunnar Hellström gunnar.hellstrom at omnitor.se
Sat Aug 4 20:43:40 UTC 2012

On 2012-08-04 12:38, Mark Rejhon wrote:
> On Fri, Aug 3, 2012 at 4:54 PM, Paul E. Jones <paulej at packetizer.com> wrote:
>> Mark, et al,
> Excellent comments.  (Now replying to remainder of unreplied comments)
>> Section 1:
>> "and is favored by deaf and hard of hearing individuals who prefer text
>> conversation"
> ----
>> I suggest you strike the above. What I have been....
> It is a compromise between the accessibility folks (and the need to
> get this spec included in accessibility documents, such as Access
> Board), and the mainstream folks (including you).  It is hard to
> satisfy both sides.   I will investigate further toning down, since
> Peter also agrees with you, but it's a balancing act.   I agree it has
> broad utility.
<GH> I do not mind deleting it. What we need is protocols that provide 
good usability.
>> I would suggest these slight changes:
>> "Real-time text is suitable for smooth and rapid communication,
>> complementing the existing en bloc mode for sending text messages.
> I would not use the word "en bloc" as it's not a phase common in XMPP
> terminology.  Perhaps "message-by-message mode for sending text
> messages" is more appropriate.
<GH> Agree.
>> Section 2:
>> "Next Generation 9-1-1 / 1-1-2 emergency services"
>> This leaves out so many countries; very America/Europe centric.  See
>> http://en.wikipedia.org/wiki/Emergency_telephone_number.  Should we just get
>> rid of "9-1-1 / 1-1-2"?  Point is you want this for next generation
>> emergency services anywhere in the world.  I would not capitalize "Next
>> Generation", either.
> Good point, If Peter agrees with you, I'd genericize it but will still
> mention the example.  Such as:
> - "Text messaging to next-generation emergency services (such as 9-1-1
> and 1-1-2, etc.)"
> Or even just (e.g. 9-1-1).
<GH> "IP-based emergency services" is also a suitable term.
>> Section 4:
>> Showing <body> not be included in the previous message containing <rtt> in
>> Example 1 might lead people to believe this is expected. I would suggest
>> making the first example one that had <body> in the end, since I suspect
>> that will be the typical case.  Perhaps a word about this somewhere might be
>> useful (if not already covered).
> It's valid to do it either way.
> Doing <body/> together or separate of the final <rtt> -- both
> behaviors are valid.
> If you're typing fast and then hitting Enter quickly (before the 700ms
> interval is up), it's quite efficient to send the last <rtt/> in the
> same stanza as the <body>.  You could even send two consecutive
> stanzas rapidly, instead (one each for the <rtt> and for the <body/>).
>   If you're typing slow, and hit Enter seconds later after finishing
> the message, your last <rtt/> can easily be sent several seconds
> before the final <body/>.
> Comments?  Peter, Paul?  Any spec clarifications warranted here?
>> Section 4.2.1:
>> Why is "seq" only 31 bits?  Since the same memory is consumed for 31 or 32
>> bits, why not just makes it an unsigned 32-bit integer?  And why worry about
>> wrap-around?  I would allow it to occur.  Specify the behavior.
> I used to define it, but it was more complex wording than in the past,
> because I had to accomodate for languages that don't have easy
> unsigned integers (e.g. Java doesn't have a native unsigned integer
> type), so syncing up wraparound behaviours is not worthwhile where the
> Message Resets occur once every 10 seconds.  When you're transmitting
> <rtt/> every 0.7 seconds, you've incremented only 15 times.    I don't
> see situations where incrementing happen within a human lifetime to
> cause a wraparound, unless you delibrately set the seq value very
> close to MAXINT.
> That's why I thought it was simpler to just skip defining a wraparound
> behaviour.
> I welcome alternatives though.  Peter, Paul, comments?
>> Section 4.4:
>> "be approximately 0.7 second" -> " be approximately 0.7 seconds"
>> I would even suggest saying 700ms, as I think that reads metter.
>> Section 4.5.1:
>> "Wait n thousandths of a second."
>> I would prefer "wait n milliseconds", especially since the wait time might
>> be 2300ms or more, for example.
>> Section
>> "Support the transmission" --> "Supports the transmission"
>> Section
>> "Support the behavior of Backspace" --> "Supports the behavior of backspace"
>> Section
>> Suggest changing:
>> "Allow the transmission of intervals, between real-time text actions, to
>> support the pauses between key presses."
>>      To:
>> "Allow for the transmission of intervals between real-time text actions to
>> recreate pauses between key presses."
>> "Wait n thousandths of a second" --> "Wait n milliseconds"
>> Section 4.7:
>> " non-compliant servers that modifies messages" --> " non-compliant servers
>> that modify messages"
>> Section 4.7.2:
>> "line breaks MUST be treated as a single character, if line breaks are used
>> within real-time text."
>> -->
>> "any line breaks MUST be treated as a single character."
> Peter seems to be agreeing with you here on all these above minor
> edits, so I'll add these edits to my todo's during LC (unless I'm
> asked to do a 0.7 before LC)
>> Section 4.5.2:
>> "default value of n MUST be 1" -> "default value of n is 1"
> Peter, shouldn't RFC2119 normative be used here?
>> "For the purpose of this specification, the word "character" represents a
>> single Unicode code point. See Unicode Character Counting."
>> Shouldn't the above be moved to Section 3?
> As part of the glossary?   That's an interesting idea.
> Peter, do you have comment about defining character in a glossary item?
> Example Glossary item:
> character: For the purposes of this specification, "character"
> represents a single Unicode code point.
> Or do you think it's best to keep it in-scope with the beginning of
> the "devil-in-the-details" which section 4.5.1 and 4.5.2 slowly starts
> diving into?
> http://xmpp.org/extensions/xep-0301.html#attribute_values
<GH> It should at least go into the glossary. Can be explained in a text 
section as well.
>> Question on this:
>> "Also, if a Body Element arrives, pauses SHOULD be interrupted to prevent a
>> delay in message delivery."
>> Do you want to prevent a delay or realize a delay?  I believe you want the
>> entire <rtt> element to be fully processed, including delays, before acting
>> on <body>.  I'm not sure how to word that, but the above sentence was not
>> clear to ne.
> Observe I only use a "SHOULD"; not a "MUST"
> Actually, you do WANT to interrupt delays, because otherwise you're
> lagging the body delivery.   Real-time text recipients should not be
> penalized with a delay in final message delivery.  Otherwise you're at
> a disadvantage to other clients not running real-time text.   Also,
> the software might be already handling <body/> deliveries
> synchronously (e.g. existing instant messaging software), and you
> don't want to modify that logic when adding the real-time text
> feature.  Otherwise you're adding buffering for <body/> and further
> complicating the retrofitting of an existing instant messaging client
> by modifying its <body/> timing logic where it's not really necessary
> to do so...
> That may occasionally means the last few keypresses surges at the end
> of a message, if the sender hits Enter quickly after finishing their
> message.  But that's "fair" and not noticeable by most users, as the
> most surge is only 700 milliseconds worth of typing, which would be
> equivalent to one word maximum.  No issue occurs if there's a pause
> before the message is sent, people often takes half a second before
> hitting Enter at least anyway.   And people hit Enter less often
> anyway during real-time text.
> Also, if you've got ping spikes and then get 2 or 3 seconds of
> backlogged <rtt/>, you might be 2 or 3 seconds behind in seeing the
> typing.  You'd rather see the typing surge to catch up, after a
> latency spike.    If your implementation does not have logic to catch
> up on the fly (as mentioned in last paragraph of section 6.5
> http://xmpp.org/extensions/xep-0301.html#receiving_realtime_text ...),
> then the <body/> catchup will then do it for you;
<GH> You should implement catch up.
> And...if you're using shorter intervals (e.g. 300ms, or even 100ms for
> LAN-based XMPP), then the surge is pretty much unnoticeable for the
> most part, since it often takes at least 100ms for someone to hit
> Enter after finishing typing :-)
> Or if you want your implementation to finish playing <rtt/> before
> displaying <body/>, you can.
> Or if you wish, you can even playback instantly (skip remainder of
> <w/> elements) and then compare to <body/> to make sure the final
> real-time message is identical to <body/>.
> There's many ways to interpret this -- all still interoperable -- but
> obviously, the most fair (everyone equally sees full body), the most
> easy, and the most retro-fittable way (to existing clients)
>> Section 6.3:
>> Whether there is a visible cursor or not, the client has to take steps to
>> render text properly.  Since a cursor is not something sent via the
>> protocol, I see no point talking about it.  I'd remove this section.
> It's increasingly more realistic to just remove this section now that
> I eliminated the <c/> elements by requiring senders to send empty <t/>
> elements.   Implementers could technically implement this on their
> own.   As you already remember, RealJabber has a remote cursor -- and
> it is quite useful.
> I'd still like to keep mention of a remote cursor somewhere, since
> it's relevant from the perspective of why a client can choose to
> transmit empty <t/> elements (solely for the purpose of a remote
> cursor) -- so it can't be removed entirely.
> Personally, I'd rather keep the section there, but perhaps shorten it
> significantly (perhaps half its size or less).
> Comments?
<GH> I think it can be kept.  It is in an implementation notes chapter.
>> Section 6.4.4:
>> I'm not sure what this is telling me.  Why is <t> and <e> "unsuitable for
>> most general-purpose clients"? And why encourage a device to use reset
>> rather than provide more complete support?  We know rendering is the bigger
>> challenge, but receivers must accept what is sent.  I see no reason to
>> suggest a sender be lazy.
>> I'd suggest removing this section unless there is something here of high
>> value that's going over my head.
<GH> Especially the second paragraph should be deleted. The first is a 
method. It ends saying that mid-message editing is not feasible with 
this method. The second paragraph describes a method that implements 
mid-message editing anyway with some negative effects. Then it is not 
append-only anymore.
> A long 1.5 hour one-on-one talk with a future implementer (that
> controls over 100 million users), revealed they would prefer this
> algorithm.   Section 6.4.4 has a fairly high value as a result.
> Append-only real-time text can still preserve key press intervals,
> when appending is being done.
> It can be removed while still allowing the implementer to do that algorithm.
> However, I should also note that I am a user of Sprint Captioned
> Telephone at www.sprintcaptel.com which uses proprietary HTML-based
> append-only real-time text (voice recognition transcription).
> Corrections are done by addendums rather than backspacing.   This is a
> perfect second use case of append-only real-time text.   Also, a lot
> of relay services (i711.com, relaycall.com, ip-relay.com, etc) all
> also use proprietary HTML-based real-time text that does not use
> editing but only backspacing.   This is a THIRD perfect use case too,
> I'd rather such services (That I use, since relay services are the
> only real way I can make phone calls to hearing people) implement
> standards-based real-time text such as XEP-0301.
<GH> The implementor can figure this out anyway. And the receiving side 
should make no shortcuts in protocol implementation, only the sending 
side. So, you can delete all of 6.4.4.
>> Section 7.4.2:
>> It seems that all of the examples show show <w> used between every key
>> press.  However, if sampling the input buffer (as recommend earlier in the
>> text), one may not know the time between keystrokes.  Perhaps the device
>> samples the buffer and sees:
>> "a"
>> "app"
>> This would translate to:
>> <rtt><t>a</t></rtt>
>> <rtt><t>a</t><w n="100"/>pp</rtt>
>> Right?
> No...for "a" then "app"
> 1. When you sample first-pass, you've detected the addition of "a"
> (difference between "" and "a")
> 2. When you sample second-pass, you've detected the addition of "pp"
> (difference between "a" and "app")
> 3. The <w/> interval between steps 1 and 2 is the difference in system
> time between step 1 and step 2
> Which results in:
> <rtt><t>a</t><w n="100"/><t>pp</t></rtt>
> RealJabber samples every text change event, not sample at a 100ms
> interval, so there'll be <w/> elements for each character, see:
> http://xmpp.org/extensions/xep-0301.html#monitoring_message_changes_instead_of_key_presses
> Section 6.4.1 already says "In addition, if Preserving Key Press
> Intervals is supported, then Element <w/> – Wait Interval records the
> time elapsed between text change events."
>> Related to <w>, suppose I type "h" and then "e" with about a 100ms delay.
>> Further, suppose the IM client's 700ms timer fires and sends "h" on the wire
>> like this:
>> <rtt><t>h</t></rtt>
>> Now, the client restarts the 700ms timer, after which time it sends:
>> <rtt><w n="100"/><t>e</t></rtt>
>> Is this correct?  So, there was a 700ms "collection" delay, some message
>> transmission delay (perhaps 100 or 200ms) and then an artificial delay
>> inserted of 100ms.  So, between "h" and "e", the user might actually wait
>> 700+200+100 = 1000 milliseconds?
>> Or, does the receiver maintain a running clock and as soon as the message
>> arrives, it sees that w=100, but it's internal "wait timer" is already at
>> 700+200ms, so it displays "e" immediately?  (I assume this is the case and
>> it should be described.)
> That would be a poor implementation
> Two better implementations:
> 1. (PREFERRED, RealJabber method) If you use a resetting timer (timer
> that restarts after an idle period), your timer restarts upon the
> first keypress, so both keypresses 100ms apart will fit in same
> <rtt/>, rather than being split to separate <rtt/>.
> ---OR---
> 2. If you have a proper synchronous timer implementation (independent
> of first keypress after idle)
> If you type "h" near the end of a 700ms interval, you are going to end up with:
> <rtt><w n="650/><t>h</t></rtt>
> Then in the next 700ms cycle, you transmit:
> <rtt><w n="50"><t>e</t></rtt>
> In both situations, you're going to have the same 700ms lag (assuming
> previous action elements in <rtt/> were already buffered)
> Regardless, in all cases, "h" will be 100ms before "e" in both correct
> scenario cases
> (assuming stable network ping :-)
> I didn't think I needed to explain these scenarios in the spec....but should I?
> Comments?
<GH> I did not think more explanation was needed. Maybe insert a 
sentence on these two timing variants: "The transmission interval timer 
may keep going during idle periods, or started when a keypress is 
detected."   It is a bit too much internal process topic, but maybe 
acceptable to have.
> Cheers
> Mark Rejhon

More information about the Standards mailing list