[Standards] XEP-0301 0.5 comments [Sections 1 through 5]

Mark Rejhon markybox at gmail.com
Mon Jul 23 19:17:54 UTC 2012

On Mon, Jul 23, 2012 at 10:32 AM, Kevin Smith <kevin at kismith.co.uk> wrote:

> Right, thoughts about 301 (consider them early Last Call feedback, I
> guess. I think it would be worth addressing them, or at least
> producing an errata list of your expected edits, before asking too
> many other people to review this (e.g. LC) as it took me a
> considerable time and it'd be a shame to waste people's effort
> commenting on things due to be changed):

Excellent comments.
Due to the large number of comments from a key person at XSF (you) I agree
with you.    I have many comments and questions for you first, that I'd
like you to address. I will reply in two emails -- this email is regarding
section 1 through 5.  I have prefaced with [Comment], [Question], and
[Change Made] in appropriate places for easy reading.

> == Introduction ==
> This seems mostly fine. I wonder about the reference to
> realjabber.org. Partly because it's a reference to a potentially less
> stable URL, and partly because I think the name is inflammatory - did
> the XSF or Cisco grant the trademark use?

Understood -- animation really helps explains real-time text, if they
haven't seen it before.
Can we use a more well-known site (i.e. realtimetext.org?) since we can put
my animations there too?  Alternatively, can we embed an image, like
XEP-0071 has an embedded image?  (If I make a generic animation, and
convert it to animated GIF format)
*P.S. Will inquire about the name concern (private inquiry to Peter).  I
had talked to many at Cisco over the last two years. Paul E. Jones is a a
member of R3TF who works at Cisco.  Nobody has ever brought up concern
about the RealJabber name, but I will now inquire specifically about this.*

== Requirements ==
> 2.3.4 doesn't seem quite right - what we want is for it to be possible
> to produce gateways for interoperability - not that XEP 301
> implementations themselves interop with other networks?
> 2.4 Doesn't seem to be about Accessibility.
> 2.4.4 Doesn't make much sense to me.

Peter (or anyone else at XSF), any comments about these?
I'd like a second set of comments on these points, and then I'd like to
confer with the R3TF group about revisions.

> == Glossary ==
> "real-time text"'s definition seems wrong - it isn't necessarily
> transmitted instantly in 301. It would seem more natural to define
> this in terms of real-time, defined on the immediately preceding line.

The Real Time Text Taskforce spent hours debating one of the shortest
possible explanations of real-time text that is also understandable to
laymen.  (The introduction needs to also be understandable by non-technical
people -- the rest of the spec can be technical).  We agreed that the word
was appropriate, because it is "instant" from a human perception timescale.

We discussed and decided it was was hugely consistent with today's usage of
the word:
- The technology is called "Instant Messaging" -- which isn't necessarily
instant, either.
- Other "instant" technologies like "Google Instant", has a latency, and
also utilizes rate-limiting algorithms (like a transmission interval)
- Instant Oatmeal and Instant Rice, is less "instant" than the use of word
"instant" here.
- "instant" is quite self-explanatory to a wide audience, without
explaining metrics.   The scientific-minded people will have some mixed
feelings, but even the scientific-minded in the R3TF has agreed that this
is the lesser of evil when attempting to invent an explanation of real-time
text less than 10 words long.

Some considerations:
- Some of us disliked timescales such as "within 1 second", except when
referring to a standardized interval (ITU-T Rec F.700), and this has to be
left out of a one-sentence definition of real-time text.
- Some of us disliked saying "character by character" because it's not
necessarily transmitted character by character in all situations.
- Some of us suggested "in real-time" instead of "instantly" but that's
redundant, and too many people already think instant message is already
real-time, so I need to put an 'unexpected' word there such as
"immediately"(rejected word) / "within 1 seconds"(rejected phrase), "live"
(formerly used, but confused too many people), etc ... until we finally
areed that "instantly"(the chosen phrase) was the best compromise for a
one-sentence quick explanation of real-time text (management-friendly,
facebook-friendly, tweetable, introduction-friendly, joe-user friendly,

Believe me, we spent hours deciding on a how to explain real-time text in
the shortest possible sentence :-)   This is also in the process of the
many hours we also spent in the agreement of the International Real-Time
Text Symbol (the one at www.fasttext.org ...)  It was a huge challenge, and
I'm pleased that we were able to finally come up with something that even
our older mothers (pre-computer era) can understand, and that confused
people requires little further explanation to educate.   We are open to
other suggestions, but keep in mind we are aiming at a standardized
explanation of real-time text that works across a wide variety of audiences
(including all levels of programmers).  We discussed separate technical and
laymen explanation of real-time text, but even the technical explanations
need a good one-sentence introduction, too.  It can be generally difficult
to market and explain real-time text.

== Protocol ==
> "to allow the recipient to see the sender type the message" - I'd
> suggest "to allow the recipient to receive the latest state of the
> message as it is being typed" - RTT doesn't allow us to see the sender :)

[Change Made]
Replacement made: *"...to allow the recipient to see the latest message
Good idea.  The sentence is already long, so I made a shorter change:
since the sender is already mentioned.   It's interpretable by people of
both front-end and back-end mindsets -- it can even mean programmatnically,
the recipient client software "sees" the text, even if it's not displayed.

> Example 1: I suggest that this could be better demonstrated by not
> cutting at the word boundaries "He", "llo, m", "y Juliet!" maybe, or
> something like that. Experience and/or cynicism say that implementers
> are quite likely to look at the examples, ignore the text, and
> misunderstand what's going on if the examples provide convenient
> semantics not required by the protocol.
> "The bounds of seq is 31-bits, the range of positive values of a
> signed integer" - I'd be inclined to make this something like "The seq
> attribute  has an upper bound of 2147483647 (2^31 - 1). If this upper
> bound is reached the following RTT element will reset the seq
> attribute to 0, i.e. at the upper bound the values would be (in
> successive stanzas) 2147483646, 2147483647, 0, 1)" or words to that
> effect.

[Comment & Question]
I agree, but I don't think it is worthwhile cluttering it by explaining
-- Incrementing only occurs once every second or slightly less (0.7
seconds).  In practical situations, wraparounds will never happen.
-- Even so, incorrectly handled wraparounds are mostly harmless as it will
only result in a brief pause (of less than 10 seconds) because of the
regular "Message Reset" during "Keeping Real-Time Text Synchronized":
What is your opinion?

== Protocol ==
> "to allow the recipient to see the sender type the message" - I'd
> suggest "to allow the recipient to receive the latest state of the
> message as it is being typed" - RTT doesn't allow us to see the sender
> :)
> Example 1: I suggest that this could be better demonstrated by not
> cutting at the word boundaries "He", "llo, m", "y Juliet!" maybe, or
> something like that. Experience and/or cynicism say that implementers
> are quite likely to look at the examples, ignore the text, and
> misunderstand what's going on if the examples provide convenient
> semantics not required by the protocol.

I don't like this change.  Are you sure?
In some earlier messages, I mentioned that word transmission is **greatly
preferable* *to broken-word transmission.
Also, if an implementer misunderstands, this detail is a more harmless
misunderstanding than broken-word transmission.

There are other examples in the spec.
Comments welcome from people other than Kevin and Gunnar -- I need more
comments because I have comments that they prefer this Introduction, so I
need to reconcile conflicting advice about the Introductory example.
 XEP-0301 permits you to transmit real-time text any way you want:
character-at-a-time, word-at-a-time, word bursts, original typing
intervals, time-smoothed, etc.   The Introductory Example is unable to
demonstrate all of the possible methods.  IMHO, I chose the 'safest'
introductory example.

Again, word transmission is greatly preferable over broken-word
transmission.  (There's been arguments in some accessibility organizations
in some countries, some say they prefer keypress intervals, some prefer
word transmission instead of keypresses, etc.)   I am talking to a guy from
a telco in UK, and he informed me of a political debate.

Can at least a few more "outsiders" comment on this change, please?  Thanks

It's not clear to me why setting seq to a random initial value should
> help with MUC or multi-resource cases - in these cases you know the
> full JID of the entities involved and a random start point seems to
> make it harder to understand what's going on, rather than easier.

Imagine a simultaneous login, both logins somehow started typing and the
client doesn't distinguish the two.  If both started at the same seq number
(0), then there will be some situations where the seq looks like they are
incrementing when recieving a <rtt/> from one client than an <rtt/> from a
different client.   This will cause occasional text scrambling to occur
(until it disppears in the next Message Reset).
-- By randomizing, this doesn't happen in practical situations
-- Also, only a few incrementings get a chance to occur before it's
randomized again during the next Message Reset.  During a 10 second Message
Reset interval, and a 0.7s Transmit Interval, only approximately 15
increments will occur.  Overflow is never going to happen in such an
-- You can distinguish resources using <thread/> and/or the full JID.
 However, this might not always be possible universally in all possible
-- If it ever happens, overflow from a user experience standpoint will be
fairly harmless -- manifesting itself as a small pause due to the regular
"Message Reset" interval.

Implementers can just start at 0 for every event='new' and event='reset',
and accept the risks, or use <thread/> at all times to distinguish the
different RTT threads to avoid conflicts.  Slightly less risky is
implementers to simply keep incrementing at all times, from the beginning
of chat session.   Overflow becomes slightly more likely, but would still
require a non-stop chat session lasting longer than a human lifetime!

I'm open to alternate ideas than randomizing, or even simplifying, but I'd
rather not complicate by having to explain wraparound situations, due to
the above.  Of course, this is mainly of concern to reduce odds of conflict
situations whenever there's no full JID support and no <thread/> --
otherwise recipients don't care what seq value you begin to use for a new
real-time message.

"The event attribute MAY be omitted from the <rtt/> element during
> regular real-time text transmission" - what is the the alternative
> you're allowing clients, and what is "regular real-time text
> transmission"?

[Change made]
Clarification made: "The event attribute is NOT required for <rtt/> when
transmitting changes to an existing real-time message."

Regular real-time transmission is message changes, like the second and
third stanza of the Introductory example.
However, there are some situations where you always transmit an event
attribute, such as Basic Real Time Text:

> 4.2.2 - "Recipient clients MUST initialize a new real-time message for
> display" - how things are rendered in clients are generally not in
> scope for XEPs, maybe just remove 'for display'?

[Change Made] -- Good suggestion.

> 4.2.2 - "Senders MAY send subsequent <rtt/> elements that do not
> contain an event attribute" if clients want to always send event
> attributes, what would they send?

[Change Made]
Clarification change: *"Sender MAY transmit changes to the real-time
message via subsequent <rtt/> elements that do not contain an event
attribute."   ...*I hope this makes things clearer, in conjunction to above?

4.2.2 - "Recipients MUST treat 'reset' the same as 'new'." - I'm not
> sure that's quite right. If recipients want to render 'new'
> differently that seems to be fine. Maybe "Recipients MUST reset the
> state of the current real-time message when receiving a 'reset'
> (returning the real-time message to the same state as when receiving a
> 'new')"?

[Comment & Question]
Yes, rendering 'new' and 'reset' is the reason that the two still are
treated separate.
(1) It's possible to receive <rtt event='reset'/> without ever receiving an
<rtt event='new'/>.  (e.g. recipient logs on after sender starts composing,
MUC participant joins, stanza with 'new' is lost, etc).
(2) Basic Real-Time Text at
(3) The 'new' _also_ resets any existing real-time message if any is shown.
 There is always only one real-time message per sending client.

Therefore, the behavior of 'new' and 'reset' needs to be identical, with
the /sole/ exception of presentation (e.g. implementor automatically
highlighting a new message being started.).  The wording that you
suggested, doesn't seem consistent with compatibility for (1) and (2).
I propose to keep my existing wording, but add "*(except for
presentation-related behavior)*", but adding implementer/user interface
details to Protocol is generally a Bad Idea in most cases?   If so, can you
suggest an alternative?

I could even merge 'new' and 'reset' since they are, for all practical
intents, interchangeable in my existing implementation.   However, if I
separate them, I would not be able to tell when a user *began* to compose a
message, versus if the client simply retransmitted an already-in-progress
message, one that might have started a long ago.  That's the only big (and
good reason) why 'new' and 'reset' is separate.

> 4.2.2 - event='init' - I'm reading the XEP linearly so maybe this will
> be clear later, but at this point in reading the XEP it's not clear to
> me what the inclusion of event='init' buys us.

[Change Made]
New clarified sentence: *"Clients MAY use this value to signal that
real-time text is being activated, prior to sending real-time text"*.
This allows activation to be done separately of creating a new real-time
message, in order to permit implementor behaviours, such as
Activation/Deactivation methods (which I do refer to), including
per-session acceptance/rejection mechanisms that implementers may choose to
do so. (and implementers have requested that such mechanisms be possible).

> 4.2.2 - The normatives here don't seem to be congruent. event='cancel'
> is OPTIONAL, yet we have a SHOULD for behaviour on receiving them. Why
> not require recipient support?

[Change Made]
Good observation.  I fixed the the chart to make it RECOMMENDED for
recipient support, while continuing to keep it OPTIONAL for sender support.
   I don't want to make it REQUIRED since there are situations where it's
not necessary.   Yalso end a chat session to do the same thing as <rtt
event='cancel'/>.   Also, many situations such as transcription services,
might never need to send outgoing <rtt event='cancel'/>

> 4.2.3 - I don't think the intent here is clear. Particularly it's not
> OPTIONAL if you're doing RTT correction. So I think we need to tighten

It's still optional, so that's my intent.  But I can improve the wording.

> this up. There's a choice on discovery and it'll affect what needs to
> be said.
>   Choice 1) If you implement 308 and you also implement 301 you MUST
> support (at least receiving) RTT correction and ids are not OPTIONAL
> and MUST be included on the correction RTT.
>   Choice 2) You can implement 308 and 301 yet not support RTT
> correction - in which case supporting RTT correction is OPTIONAL, but
> if you do you MUST advertise appropriate disco features and MUST
> include ids etc.

[Comment & Question]
If you implement 308 and 301, you can still do RTT correction without the
'id'.  The main difference is that the real-time text will show up where
the new message normally is (i.e. it becomes a copy of the previous
message).  I explained this carefully in a long email a few days ago to the
standards mailing list -- this is because a Message Reset is done when
switching between messages, so the real-time mesage will still continue to
function even when doing RTT without the 'id' attributes.  When the last
message is delivered, the real-time message disappears anyway (because of
the "Body Element" chapter).  The inclusion of 'id' is merely to provide a
better User Experience by allowing recipient clients to do thing such allow
"in-place" real-time editing of the real-time message, rather than in the
normal new real-time message.
But the inclusion of 'id' isn't essential for real-time text to continue to
function during retroactive message editing, thanks to my mentioning of
requiring a message reset everytime you switch messages. (which allows
backwards compatible behavior)

Some of my wording might be unclear, so I'd like ideas of clarifying the
still-optional intent.  Comments?

4.3 - "The delivered message in the <body/> element is displayed
> instead of the real-time message" - maybe "The content of the <body/>
> element is considered the final text, rather than the state of the RTT
> calculations"?

[Change Made]
Replacement sentence: *"The delivered text in the <body/> element is
considered the final text, and supersedes the real-time message."*
(I think "RTT calculations" is best held off until the "Action Element"
chapter.  Also the word "calculations" might not be appropriate if the
implementer is simply using Basic Real-Time Text by only using message

4.3 - "In the ideal case, the message from <body/> is redundant since
> this delivered message is identical to the final contents of the
> real-time message." - can we s/message/text/ here? Calling child
> elements of <message/> stanzas messages seems potentially confusing.

[Change Made] -- good point

4.3.1 - Is this redundant?

Enough people have asked that this information is necessary for
clarification.  However, I can simplify or change the words -- but some
variant of explaining backwards compatible is necessary, and some people
such as Paul E. Jones (who's also at Cisco) really likes that XEP-0301 is
backwards compatible, a major advantage of XEP-0301 -- it's not a
standalone real-time text mechanism but an enhancement to an existing
messaging network that generally doesn't even require server modifications
for this protocol to work.    Also, the section heading is also a
convenient linking anchor for "Backwards Compatible" links elsewhere in the

> 4.4 - The discussion of throttling here feels a bit odd. I don't like
> having references to servers dropping messages as part of congestion
> handling, as that's not compliant behaviour. The comments about 0.7
> seconds being fine for not hitting throttles but smaller values
> hitting it seems a bit hit-and-miss - servers are free to implement
> whatever throttling they want, and I'm a little worried about
> recommending here what we think the state of the network is likely to
> be now or in the future.

[Change Made]
I shortened the sentence: "*Conversely, a much shorter interval may lead to
[[[Congestion Considerations(link)]]]*."
I've seen throttling at 1 message per second but I can agree to reduce talk
of throttling/dropped messages in this section, and leave that talk within
the Congestion Considerations section. (wording there can also be improved,

> 4.5 - "the recipient can watch the sender" - this isn't quite right
> (similar to previous comment).

[Change Made]
I have replaced the word "watch" with "see", because "see" is compatible
with a software viewpoint.  (interpreted as "The software sees the changes
to the message" even if the message is not displayed) -- To keep things
simple for a wider variety of readers, I prefer terms that are both
human-compatible and machine-compatible, and seems acceptable in both
Merriam-Webster dictionary (meaning 2b, 3c) and Oxford dictionary (meaning

> 4.5.1 - I'm not sure that the use of quite cryptic one-character
> elements here is terribly useful.

[Comment & Change]
Proposed edit: *"The elements are a kept compact in order to save
bandwidth, since a single <rtt/> element can contain a huge number of
action elements (e.g. during [[[Preserving Key Press Intervals(link)]]])"*

There can be almost 100 action elements per <rtt/> transmission in certain
situations such as a key press being held down.
Is this edit acceptable?   (I used to have a similar sentence there before)

One-character elements are needed because if I preserve key press
intervals, by a 120 word per minute typist (my typing speed), can result in
almost 20 action elements per <message/> for a high-quality
full-implementation with complete full playback of real-time text.  By
using a one-character element, I save on bandwidth significantly.

Also, if a key is held down and you are preserving key press intervals --
then this results in 60 action elements per <message/> stanza at a 1 second
transmit interval.  (30 </t> keypresses per second and 30 <w/> pauses per
second).  In this case a single <message/> stanza containing only real-time
text, can becomes almost 1 kilobyte big.  If I made the action elements one
letter bigger, that adds another 60 bytes per stanza.   If I used an
average of five letters, I'm potentially adding an additional 300 bytes of
extra overhead per message stanza.

Therefore, it is necessary to keep the size of action elements very small.
Even though cryptic, the names were carefully while meeting the bandwidth
reduction requirement.

<t/> = insert *T*ext.  I avoided <i/> to prevent confusion with HTML
<e/> = backspace (*E*rase).  I avoided <b/> to prevent confusion with HTML
<d/> = forward *D*elete
<w/> = *W*ait element
*p = P*osition
*n* = common name for count

So the shortness, saves a significant amount of bandwidth, while making it
possible to preserving key press intervals as seen in:

4.5.1 - I think this has been commented on elsewhere, but using
> 'characters' here seems to be less clear than talking about code
> points. I understand the desire to mask implementers from needing
> exposure to code points, but I don't think that's going to ultimately
> help uptake or interoperability.

Further discussion is welcome.

4.5.1 - I think if there are going to be SHOULDs in supported features
> we should try to explain in what circumstances it's acceptable to
> ignore the SHOULDs.

I say "For detailed information, see List of Action Elements." which refers
to section 4.3.2
I already explain it subsequently in section  4.5.3 ( through
I also referenced Basic Real-Time Text.

> 4.5.2 - Talking about message length here probably needs clarification
> - is it the number of characters (whatever they mean to different
> people), code points, normalised code points, octets on the wire..

[Change Made]
*"The n attribute is a length value, in number of characters."*
*"The p attribute is an absolute position value, as a character position
index into the message, where 0 represents the beginning of the message."*

I already define what a character is, from the perspective
*"For text modifications, length and position (n and p) is based on
[[[Unicode Character Counting]]]."*
This defines the method of character counting that is being done.

> - This might become clearer later, but at this stage it's not
> clear what 'positions' are.

[Comment & Question]
I thought it was already explained in section 4.5.2:
The p attribute is an absolute position value, as a 0-based index (0
represents beginning of message).
If p is omitted, the default value of p MUST be the current message length (
p defaults to end of message).
Perhaps that is too confusing -- do you mind if you can suggest an
alternate wording, that would make it clearer to people like you? - Apart from adding complexity I'm not sure what forward
> delete is buying us vs. backspace.

About complexity: It only adds 5 lines of complexity to the implementation:
About reasoning:
... Reason 1. There are situations where it made a lot of sense to have the
two separate, including recipient-side time-smoothed display which was
something you also suggested.  For example, <e n="5"/> can
be automatically converted to the equivalent <e/><e/><e/><e/><e/> for
time-smoothed display with the cursor animated backwards.  And <d p='10'
n='5'/> can automatically be converted to the equivalent <d p='10'/><d
p='10'/><d p='10'/><d p='10'/><d p='10'/> for time-smoothed display with
the cursor staying stationary.   If we merged the two, then we can't have
distinctive time-smoothed display of either. (Gunnar is a strong proponent
of time-smoothed display)  But of course, it might not be that important,
even to Gunnar.
... Reason 2. Ability to do accurate journalling of edits, for emergency
purposes.  However, this reason can become moot, especially if we're not
using the 'n' argument, since a single-character backspace transmitted can
be indistinguishable from a single-character delete operation (even for
time-smoothed display).
... Reason 3. It slightly simplifies "Monitoring Key Presses Directly" for
http://xmpp.org/extensions/xep-0301.html#monitoring_key_presses_directly ...
(I know that's not the preferred method)
... Reason 4. It simplifies visualizing of text block deletes (i.e. cut
operations), since you're deleting from normal start position.
... There are other reasons.

If discussion of merger of backspace and delete is warranted, let's split
it into its own independent thread, as that is a major topic (if ventured)
with its own subject (e.g. "XEP-0301 Backspace versus Forward Delete"),
requires full and complete discussions of all angles and implications that
can occur, making sure all advantages and disadvantages can be accomodated
for (e.g. alternative mechnaisms of accomplishing the above reasons),
before embarking on a significant change to XEP-0301 protocol.  It has
merit and worth discussing, but it has distinct disadvantages that everyone
needs to discuss through first.

> 4.5.4 - I don't think trusting that nothing in the chain is going to
> transform unicode in any way is going to be sufficient for
> interoperability here. I think we need to consider normalising the
> text before RTT calculations are performed on it. I'm not entirely
> convinced, without going through specs in some detail, that an
> implementation that does choose to do normalisation somewhere on route
> is non-compliant, which is what's asserted here.

Can you re-evaluate 4.5.4 based on my comments below? - Ah, OK. So you do require normalisation here - you need to
> say which type is required.

It doesn't matter.  All normalizations work, as long as the normalization
is done /before/ the encoding.  Therefore, it is not necessary to say which
type is required.   It just merely say that all code point modifications
need to occur beforehand.

> - This then forbids normalisation again.

That's correct, it's a different part of the workflow.  See this flowchart:
Normalization can occur outside of the RTT codec path.  Just that code
point modification (including normalization) shouldn't happen *after* the
sender client encoding of RTT or *before* the recipient client decoding of
RTT. - Question for Unicode experts. Are there any code points that
> would be illegal to transmit on their own, but are legal in
> combination with others? If so, they'd get rejected with stream
> errors, which would probably be bad. This section seems to imply that
> illegal UTF-8 encoding is expected, which is in turn illegal XMPP.

There is no illegal UTF-8 encoding.
It should not imply illegal UTF-8.
A code point is a completely encoded UTF-8 character (including control
codes and non-displayable characters, etc), and I don't say anywhere that a
single code point can be ever broken down.  Therefore, a single UTF-8 code
point is never broken down.
Can you suggest an alternative wording, that you are able to understand

> - "A single UTF-8 encoded character equals one code point" -
> this isn't true, is it?

It's true.  Unless, you're misinterpreting "UTF-8 encoded character"?
Unicode.org and Wikipedia agrees with me, and my computer programming tests
show accurate (perfect) real-time text with this assumption, even with
random code points.
"UTF-8 encodes each of the 1,112,064 code points in the Unicode character
set using one to four 8-bit bytes"
Can you suggest an alternative wording, so that my explanations are clearer
to people like you?

> - "different internal encodings (i.e. string formats) that is
> different" - s/is/are/

[Change Made]

> 4.6 - "XMPP servers may drop <message/> elements (e.g. flooding
> protection)." - They can't.

[Comment & Question]
Can you suggest an alternative wording that explains situations I've
already observed.  I've seen this happen sometimes -- otherwise, they
wouldn't survive a DoS scenario.   Picture the scenario where you hold down
a keypress and you don't buffer the action elements (as recommended in
transmission intervals).  Theoretically, such an implementation would
results in about 30 XMPP messages a second (at the regular typematic output
of 30 characters per second for a key held down).   Real-time text
transmitted immediately per keypress, has resulted in lost messages during
some of my tests in the past.  When done on a public server, the server may
actually drop some of the messages through some kind of mechanism.   On a
good day, many servers can handle 30 XMPP messages a second, but that's a
bit extreme.    Also, if an XMPP server decides to buffer the messages
instead, there will eventually be server resource-consumption issues if a
book falls on a keyboard, or a cat sits on a keyboard.  But the servers (as
a matter of DoS code) tends to have some logic to handle a situation of
flooding.  Once this stops happening, I've seen "Keeping Real-Time Text
Synchronized" detect the prescence of lost message stanzas from
jabber.orgservers AND
talk.l.google.com servers.   So the situation does happen with
jabber.organd with
talk.l.google.com when I push boundaries of intervals to say, 16-30
millisecond transmission intervals by holding down a keypress.... Sometimes
it works when the server is running fast (successfully delivering 30 XMPP
messages per second in an extreme torture test), but if it's a busy day,
the servers can't keep up.  However, if my wording is wrong with what the
servers are actually doing (Are they kicking in DoS protection?   Are they
kicking in flooding protection?  Are they kicking in buffer-overflow
protection?),  Sometimes when I stop the flood test, and wait a few
seconds, it all catches up, other times, I detect discontinuous 'seq'
incrementing -- proof of lost message stanzas -- so some kind of
network/flooding/DoS/security/etc protection mechanism seems to be kicking
in, somewhere, somehow, along the chain. Of course, a 30ms transmit
interval might be appropriate for a purpose-built low-latency gaming XMPP
network that intentionally transmitted real-time text at much lower
latencies, but I'm sure it's not appropriate for jabber.org ....
Can you describe an alternate wording to replace "XMPP servers may drop
<message/> elements (e.g. flooding
protection)." that still opens the door for situations that lost message
stanzas still, in certain situations, happen?

4.6.1 - I think they need to do more than increment and check they
> increment - I think they need to increment/check in steps of 1.

[Change Made].  I added "by 1" in appropriate places.

4.6.1 - "Recipients MUST keep track of separate real-time messages per
> sender, including maintaining independent seq values" - I think what
> you mean is that they "MUST track RTT per full-JID, and not collate
> across multiple full JIDs", rather than the present text, which
> suggests that they must track multiple RTT streams for a single full
> JID without providing guidance how. I think this needs tightening up
> to be clear of the intent.

[Comment & Question]
No, it is not necessary to tighten up, because I permit clients to simply
use bare JID for tracking real-time text.   It does not need tightening up
because in a scenario when there's a simultaneous login, one starts typing,
pauses the message, and resumes typing on the other system.   The "Keeping
Real-Time Text Synchronized" will automatically swap the real-time message
with the real-time message from the other system (within 10 seconds) thanks
to Message Reset.   This is an acceptable user experience, and I want to
make full JID tracking optional, since it's simpler for implementations to
keep track of only one real-time message.

For example, an implementer might only support one real-time message per
chat window.  In this case, tracking by bare JID is acceptable, as the
"Keeping Real-Time Text Synchronized" will automatically and seamless
handle the situation of switching clients during simultaneous logins.
  Therefore, it's NOT necessary to tighten up (and shouldn't be done,
because it complicates simple implementations that wish to do a
one-real-time-message-per-chat-window method).    These clients would
presumably not be implementing MUC either.

Single message-per-chat-window implementations (even with Simultaneous
An explanation for this was given several times in the past -- the UX is
actually quite acceptable even when multiple senders (of simultaneous
logins) have multiple separate partially-composed messages in each
concurrent login, and switches between them.   It works intuitively for
end-users; the recipient just sees the whole real-time message gets
replaced by the correct sending client's real-time message (within 10
seconds) thanks to Message Reset).   No scrambling becomes visible to end
user, thanks to the "Keeping RTT Synchronized" mechanism.   Simultaneous
Logins, in more than 99%+ of the cases, only has one typist active at a
login, so the "Keeping RTT Synchronized" mechanism works well in these
situations -- the recipient's view of the real-time message simply swaps to
the correct one from the correct client automatically (during the next
Message Reset, which would occur within 10 seconds).

It is acceptable and simpler for implementers to NOT REQUIRED keeping track
per full JID, because I want to permit single-message-per-chat-window
implementations for simplicity.
But do you think I should explain why full JID is NOT REQUIRED?   If so,
I'd love to hear suggestions.

> 4.6.2 - "Recipients MUST freeze the current real-time message" - it's
> not clear what freezing a message means.

[Change Made]
I have made a change"Recipients MUST keep the pre-existing real-time
message unchanged"
When I said "freeze", I meant pausing the message (not cleared, but
unaffected by subsequent RTT elements).  In a vast majority of
implementations, this would simply mean the user experience simply see a
pause in incoming real-time text (if the software detects out-of-sync),
until it automatically recovers during the next Message Reset where the
user perceives the message text surging forwards and "catches up".   No
scrambling occurs since subsequent conflicting RTT is ignored.

4.6.3 - "Retransmission SHOULD be done at an average interval of 10
> seconds during active typing or composing." - this seems like a lot of
> data getting sent across if these messages are large. I'd be much
> happier saying something like "Retransmission SHOULD NOT be done more
> frequently than once every 10 seconds

Technically, I agree, but I don't like the "SHOULD NOT" because:
This would contradict against Basic Real-Time Text which may use a Message
Reset every 0.7 seconds for short messages:
Also, with stream compression, message resets actually use little
bandwidth.  It requires far less CPU on a mobile device than the
"Monitoring Message Changes Instead Of Key Press" method, as an example.
Although it has the disadvantage of not having key press intervals, it
makes an implementation simple, especialy when you're writing software that
only handles SMS-sized messages (160 characters or less).  There's often
more than 160 characters in overhead in just the <message/> stanza.
In fact, two years ago -- I seem to remember that one of the people at XSF
asked me why I didn't simply retransmit the message everytime a message
text changed.  I explained that it would be quite a lot of data for large
messages.   However, XEP-0301 technically allows it, for really simple
implementations of transmitting really short messages, even though I prefer
implementers to support preserving key press intervals (which would
preclude the use of Basic Real-Time Text).

You notice that I did include wording that this interval can vary to
optimize for long messages that are suggestive (i.e. short reset interval
for short messages, long reset interval for long messages, less frequent
reset intervals for slow message changes, more frequent reset intervals for
fast/large amount of message changes)  I'm open to wording a
warning/caution about bandwidth.   Comments?

(Replies about section 6 and beyond is split into a separate email message.
 Feel free to go ahead and reply without waiting for me to reply about
Section 6 and beyond, which might take me a while)

Mark Rejhon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20120723/cea8de83/attachment.html>

More information about the Standards mailing list