[Standards] XEP-0301 0.5 comments [Sections 6 and beyond]

Kevin Smith kevin at kismith.co.uk
Wed Jul 25 10:21:52 UTC 2012

I'll elide as much of this as I can...

On Tue, Jul 24, 2012 at 12:37 AM, Mark Rejhon <markybox at gmail.com> wrote:
> [Part 2 of 2, continued, ultra-long discussion regarding Kevin's comments]
> Note: Due to the large number of comments from Kevin, I'm focussing on
> addressing Kevin's concerns for now.
> I'd love to hear comments from others (Gunnar, Peter, Matt, etc) on the
> discussions between me and Kevin.
> On Mon, Jul 23, 2012 at 10:32 AM, Kevin Smith <kevin at kismith.co.uk> wrote:
>> 6.1.4 - "it is acceptable for the transmission interval of <rtt/> to
>> vary" - yet earlier there was a SHOULD saying it doesn't vary, wasn't
>> there?
> [Comment]
> http://xmpp.org/extensions/xep-0301.html#transmission_interval
> In the "Transmission Interval" section, I said "approximately 0.7 second"
> and it refers to "continuously-changing message"

My reading of 6.1.4 is that it's fine to vary the interval
significantly - when talking about varying for low-bandwidth concerns
I immediately thought we were talking about order-of-magnitude
variation. If this isn't the intention I think 6.1.4 could do with
tightening up.

>> 6.2.1 - I suspect this should be more prominent than buried inside
>> Implementation Notes
> [Comment & Question]
> I'm glad you think this section is important enough to be part of the
> Protocol.  Activation Methods is quite an important inclusion in the
> specification, even though some people may disagree (Gunnar prefers real
> time text to be activated at all times, for example -- and technically I
> agree -- but realistcally, implementers want to choose their own activation
> mechanisms).
> Perhaps I could split it into a "5.1. Business Rules" section (ala XEP-0085)
> but I'm not sure that this is appropriate.
> Alternatively, I can add it as a "6. Business Rules" (bumping Implementation
> Notes as a section 7)
> Peter, David, et cetra from XSF, any comments?

I checked some other XEPs and decided it's probably fine where it is.

>> 6.2.1 - I think that presence decloaking is probably a better approach
>> to this than sending init.
> [Change Made & Comment]
> "Signalling first (by transmitting <rtt [[[event='init'/> as the first
> <rtt/> element.)"
> -- The primary purpose of <rtt event='init'/> is not for disco. Therefore,
> decloaking has nothing to do with init (unless init is used for disco)
> -- That includes any reason, such as activating before typing.  Some
> implementers want an activation feature (e.g. button, menu, preferences,
> etc)
> -- Timing of activation can be separate of the timing of sender beginning to
> compose text.   The existence of <rtt event='init'/> allows decoupling
> activation timing from actual transmission of real-time text.
> -- Activation/Deactivation may occur multiple times during the same chat
> session.  It's useful for signalling re-activation of real-time text after
> <rtt event='cancel'/> because some implementations might otherwise ignore
> real-time text for the remainder of the chat session after  receiving <rtt
> event='cancel'/>.
> -- Clients can send <rtt event='init'/> even if they have some real-time
> text to begin immediately. (i.e. <rtt event='init'/> immediately followed by
> <rtt event='new'/>)  ... Thus, it is acceptable for implementers to always
> send <rtt event='init'/>
> -- Theoretically <rtt event='init'/> could be made REQUIRED, but that's not
> a good idea, especially because recipients can come online after the sender
> has already started composing a message (includes MUC and simultaneous login
> situations).
> Even if we eliminate the implicit discovery requirement and "Determining
> Support" is always followed, the use of 'init' is still a requirement for
> some implementers for activation/deactivation, to decouple the timing of
> beginning of real-time text, from the timing of actual creation of a
> real-time message.   So, sending init is still useful even you follow
> Determining Support.
> Can you provide any suggestions of any further clarifications for <rtt
> event='init'/>?

Perhaps it would be worth clarifying that init can be used to indicate
that RTT is activated prior to RTT being sent.

>> 6.4.1 - It might be useful to reference some method of calculating
>> this. It's not immediately obvious to me that it's trivial to work out
>> edits without resorting to something that ends up polynomial in the
>> worst case (or oversimplifying the edit), so some guidance would be
>> handy here.
> [Comment]
> -- It's actually simpler than it looks
> -- It is a linear calculation (CPU expense linearly proportional to message
> length), not polynomial.
> -- Text change event occurs every key press, so most of the time, message
> change is only 1 character between text change events!.
> -- In almost all cases, text change events will generate only 1 character of
> change.   Except for things like autotext, autocorrect, and pastes -- then
> it's a single block event.
> -- Therefore, I don't bother to compute more than one edit per change event.
> It's not worth optimizing for this edge case (it'll just look like one
> larger text change).

It's worth suggesting this, then - this was what I referred to in my
previous mail as 'oversimplifying the edit'. You're right that there's
a trivial linear implementation if you're prepared to reset inner
blocks that haven't changed due to bounding changes.

>> 6.4.3 - this says that implementations "may" do this, and I suspect
>> that it really is discouraged rather than truly optional (indeed, the
>> language elsewhere says as much).
> [Change Made]
> Beginning now says "It is possible for sender clients to implement
> [[[Message Reset]]] as the only method of transmitting changes to a
> real-time message."
> Although it already explains why it's discouraged, I've now removed the word
> "may" to reduce the permissive-sounding tone.

Thanks. I'd be inclined to add something like "This method of sending
is discouraged for general-use clients" or something. Plenty of wiggle
room for people who feel they have a need to do it, while providing
sensible guidance to people who really just want to know the Right
Thing to do.

>> 6.4.4 - this looks like something discouraged, too, but this isn't
>> mentioned that I can see.
> [Comment]
> 6.4.4 is useful if it's not humans generating real-time text.  For example
> transcription bots, gateways, etc.  So it's quite simple/useful to have
> append-only real-time text (and you can still do key press intervals, if
> needed, unless you're outputting fully-transcribed words one full word at a
> time)

Perhaps a "This sending model is unsuitable for general-purpose
clients, but useful if mid-message editing capabilities..." would

>>  6.5 - " In addition, it is best to process <w/> elements using
>> non-blocking programming techniques." - I don't really know what this
>> is doing here.
> [Change Made]
> "In addition, it is best to process <w/> elements asynchronously, to avoid
> interfering with client operation."
> This is simply a generic comment that indirectly refers to timers and
> multithreading, rather than inserting a "Sleep" statement in the middle of a
> single-threaded program.  This causes freezing in user interfaces,
> especially with long <w/> elements (e.g. <w n='500'/>) could cause a 1/2
> second program freeze while it's processing that action element, which is
> bad.   If you're doing MUC, or multiple windows, and you have lots of <w/>
> elements simultaneously, they all need to be processed asynchronously on
> their respectively real-time messages.

I think we need to assume that anyone implementing the spec has at
least a basic competence - this seems to be stating the obvious and
more likely to annoy people that we're treating them like twits than
to help anyone.

>> - this seems inconsistent with an earlier section that (I
>> think) was recommending or mandating support for multiple full JIDs.
> [Comment]
> I made comments earlier.
> Even when senders send to the full JID, recipients can just process
> real-time messages based on bare JID.
> This makes it simpler for implementers of clients to implement only a single
> real-time message per chat window.
> It is a significant user interface complexity concern to gain the capability
> of multiple simultaneous real-time messages in the same chat window user
> interface.

I don't think you need to expose multiple RTTs to the user, just to
track them independently.

> Also, it is intuitive behaviour because of "Keeping Real-Time Text
> Synchronized" so a simultaneous login user switching computers, the
> recipient would simply see their copy of the real-time message switch
> instantly from the partially-composed message from the old system to the
> partially-composed message from the active system.

This doesn't seem likely, does it? Shutting down one computer where
you were typing, booting up another computer and the text you were in
the middle of composing being there ready for you? And if someone did
implement this, they could just send a reset?

Anyway, not a hill for me to die on.

>> 6.6.5 - seems somewhat out of place. How many systems are there these
>> days that can't keep up with a human typist? And telling people that
>> they need to make their applications flicker-free just seems odd.
> [Comment]
> When retrofitting real-time text to an existing chat program, some tend to
> make their software cause a repaint every key press, so it's meritworthy to
> make a brief mention, although I agree it is quite borderline from the
> perspective of a "specification".   At 10 key presses per second for a
> 120WPM typist, the real-time message can be repainted 10 times per second,
> and if the repaint is not done efficiently, it can flicker or consume CPU,
> etc.
> Suggestions of a better wording is welcome?

It's not that a better wording seems needed - I just don't see that
this is helpful in the middle of a protocol spec - others may

If it does remain, I think it can be usefully contracted to something like:
"Implementors should be aware that processing incoming RTT can cause
many updates to a message each second".

This should probably also get a mention in Security Considerations as
a fun DoS vector.

>> 6.6.6 seems redundant.
> [Comment]
> It might be, but "Total Conversation" is quite significant among
> accessibility circles in Europe, it's not used as much in North America, but
> I must satisfy this audience, too.

I don't think this is true - there's no need to name-drop other groups
or protocols into the middle of a spec unless it adds value to

> Improved wording is welcome though, but
> I don't think anything in 6.6.6 affects protocol

Right, it's because it doesn't have any effect on the document that I
don't think it needs to be there.

>> 7 - these examples seem to be to a bare JID, and therefore can't have
>> had caps already indicate support, but lack support discovery. It'd be
>> good to note this.
> [Change Made]
> "For simplicity, these examples use a bare JID, even in situations where a
> full JID might be more appropriate."

Thanks. (Yes, I elided most points where you made changes I was happy
with, only left this one in because of the following comment)

> Also, when the resource is not locked yet (i.e. recipient hasn't replied
> yet) it is fine to send real-time text only to the bare JID.

It isn't - the bare JID doesn't have caps, so you can't be sending
because you know the target supports it, and the bare JID can't have
sent you a reply to your init, so you'll still be in the state where
you're waiting for it to send you an RTT element before you continue
sending, according to the rules in 6.2.1. (Note that we should have
something like MUST NOT (or maybe SHOULD NOT, at a push) in 6.2.1
rather than 'is inappropriate').

>  This makes the
> real-time text show up on all concurrently-logged in resources
> simultaneously.

It needn't, bare-JID handling will usually not send to all resources
unless the clients have activated carbons.

>> 7.4.2 - this includes an RTT including a wait in the element with the
>> body - but once the body is received the RTT state is discarded and
>> the body replaces it, if I remember earlier in the XEP correctly (and
>> it was quite a while ago now).
> [Comment]
> That's right. <snip reasons/>


>> 8 - All of this section seems somewhat out of place in a XEP.
> [Comment]
> I've managed to reduce the size of Interoperability Considerations
> significantly (to the best of my ability), but there are several people
> including actual implementers (outside the XMPP umbrella) that are demanding
> this text be bigger than it is now.  Gunnar made a gateway for SIP-to-XMPP
> interoperable real-time text, and it is a big raison d'etre of keeping
> Section 8, he is also sharing his experiences as well.  I'm also debating
> with people against making this size bigger by not adding too much
> information to this section, since I also agree that it's mostly out of
> scope of this specification.   Gunnar also repeatedly told me I should not
> make this section even smaller too.  Edward Tie wants me to add more TTY
> info to it. (I slipstreamed a small sentence "This can include TTY and
> textphones" after the gateway servers sentences.  There are other
> implementers outside of XMPP raising a big fuss about how to interoperate
> with XMPP, so I think some *semblance* of section 8 is extremely critical to
> satisfying a particularly vocal and important audience of accessibility
> advocates.
> The current Section 8 a compromise between what XMPP wants and what
> accessibility implementers want, as XEP-0301 is of interest to accessibility
> vendors, moreso than other specifications, and there are special reasons to
> make XEP-0301 intereoperate with other standards used in accessible
> communications.

OK - There is probably value in some brief mention of what you could
expect to come up against as analogues in other networks. I think
lines like:
" SIP is a popular real-time session control protocol, and there are
many implementations of real-time text controlled by SIP. This
includes some emergency service organizations (e.g. Reach 112)."
Seem to be in there to placate people rather than add value to the
document. Getting rid of this, and with your suggested change it seems

>> 10.1 - "It is important for implementers of real-time text to educate
>> users about real-time text. " - this doesn't really seem right.
> [Change Made] -- Good catch, I see the redundancy.
> "It is important for implementors to educate users about real-time text".

It wasn't about the redundant text - it's the spec saying that
implementors are required to start education programs for users! I
think the first sentence could be removed and for the following to be
"It is important for users of real-time text to be made aware..."

>> 10.1 - I think a sensible Privacy note would be to make RTT opt-in.
> [Comment]
> That depends on the market.  Mainstream client? (opt-in)
> Accessibiltiy-market client? (opt-out)   Emergency mode?
> I am in contact with different implementers who will pounce on me if I
> suggest either direction (opt-in versus opt-out).

Note that I carefully said "opt-in" not "prompting the user to accept"
or anything. This is a (very significant) security consideration and
we would be remiss to suggest that it's acceptable for a client to
activate RTT unexpectedly. It is, however, fine for a client designed
for RTT audiences to not prompt a user - they have, after all, chosen
to download a client for its RTT support.

I suggest the following text.
"An implementation MUST NOT activate sending sending of RTT without
the user's consent".
How implementers choose to interpret 'user's consent' is up to them,
and seems to be a sensible balancing of allowing RTT-specific clients
to behave sensibly when their users are expecting RTT while still
requiring that mainstream clients don't start sending out people's
passwords unexpectedly.

>> 10.3 - "Use of this specification in the recommended way will cause a
>> load that is only marginally higher than a user communicating without
>> this specification." - do you have numbers for this? It seems quite
>> counterintuitive, I'd expect it to increase the server load due to
>> message routing roughly by a factor of the number of RTT transmitted
>> between each typical <body/>.
> [Comment]
> Not always necessarily true -- The average instant message is short, often 1
> to 5 words. (under 40 chars)
> Most people on chat networks don't type large messages.  (Programmer types
> like me do, though)

Note that I said a *typical* body, not the newly-engorged-bodies that
result from RTT.

> Therefore, it's frequently about 2 to 3x more stanzas than what would have
> happened without real-time text.

I think that a 2x increase in load isn't really congruent with "a load
that is only marginally higher".

I feel this last paragraph is trying to defend itself against claims
that it will increase bandwidth use. It /will/ increase load - almost
all XEPs define protocols that to some extent or another will increase
load. I think we can just say this and move on, we don't need to say
"Oh, but it's not so bad, really, look, look, look, IBB is worse!".

>  On top of this, for an optimized server,
> additional messages sent shortly after the previous message, have only a
> small additional 'cost' in resources.

I don't believe this to be true. Most server cost for an 'optimized
server' would be associated with the routing of the stanza, not the
number of similar messages that will follow it, with the possible
caveat that sending similar messages will compress quite well.

>> 10.3 - "Bandwidth overhead of real-time text is very low compared to
>> many other activities possible on XMPP networks including in-band file
>> transfers and audio" - This is a little disingenuous where IBB is a
>> fallback, and audio never travels over the XMPP network. I'd remove
>> the line completely.
> [Change Made]
> "Bandwidth overhead of real-time text is very low compared to many other
> activities possible on XMPP networks."
> It is more generic.  I actually get questions of how much bandwidth XEP-0301
> uses, at least as a relative basis to other XMPP technologies.  To go into
> further detail, I could insert some details from the document that I made
> for Darren Sturman who said bandwidth considerations make or break a
> standard -- and I could insert the bandwidth-estimation formula I developed
> -- or go into generalities such as "average typing speed consumes about X
> bytes per second".    But I think this sentence should be sufficient;
> questions can be addressed separately from the spec.   Comments?

I think that it would be better to either say nothing, or to say
something that's demonstrably true - the current line seems to be
misleading (see above).

>> 14 - I find the comment acknowledging the invention a bit odd. It's
>> assumed that the XEP is your own work, and "invention" is a term I've
>> more commonly come across in relation to patents - I assume there
>> isn't a patent associated with this that you're assigning to the XSF?
> [Comment & Question]
> There is no patent.
>Ideally, it would be nice to be acknowledged for this idea somehow
> *somewhere*, one way or another, even if it just generically says "Mark
> Rejhon came up with the method of preserving key press intervals, which is
> called "Natural Typing" at R3TF".   (The technique is called "Natural
> Typing" within all of us at R3TF)
> Comments?

I think that wording (or something similar) is less confusing.
"Invention" is a word that sets off alarm bells for me.


More information about the Standards mailing list