[Standards] RTT, take 2

Mark Rejhon markybox at gmail.com
Fri Jun 24 02:40:27 UTC 2011


Re: XMPP In-Band Real-Time Text
http://www.xmpp.org/extensions/inbox/realtimetext.html

Peter -- Thank you again for your comments!  I've now made roughly 99% of
the minor edits, from everybody. I just have one slightly 'bigger' edit
(rewrite of "Internationalization Considerations"), which I'd like you to
comment on below:

On Thu, Jun 23, 2011 at 9:54 PM, Peter Saint-Andre <stpeter at stpeter.im>wrote:
[about message ID]

> > From what I recall, I think Kevin agreed on erring on the side of
> > simplifying, when I asked if I should remove the msg identifier from the
> > last spec.  It can be re-added as an optional feature, as an additional
> > integrity layer to error recovery.
>
Kev was right about simplification. I just wanted to gain some clarity
> on whether we had a problem to be solved here.
>

In the future, it may be a useful as an improved error-recovery enhancement,
especially to keep track of different real time messages if the specific
message with the <body> element gets dropped/lost somehow. However, we've
found reliability to be excellent, and the current simple error recovery is
solving >99% of our problem cases. So, we all agree, simpler is better.


On the wire is no such thing as a code point, there are only code points
> that are encoded using an encoding form like UTF-8 or UTF-16. For
> details, see:
> http://tools.ietf.org/html/draft-ietf-appsawg-rfc3536bis-02
> Given that XMPP is pure UTF-8, I don't see a compelling reason to count
> UTF-16-encoded code points or UTF-32-encoded code points.
>

We have to make it easier for the programmers "on average". Programming
platforms frequently use a different /internal/ format than for the /wire/
format (UTF-8):

If we process in UTF8 direct off the wire, then:
...Languages using UTF8: Easy
...Languages using UTF16: Complicated (more math)
...Languages using UCS4: Complicated (more math)

If we process in UTF16, then:
...Languages using UTF8: Somewhat Complicated (some math)
...Languages using UTF16: Easy
...Languages using UCS4: Minor math

If we process in code points instead, then:
...Languages using UTF8: Minor math
...Languages using UTF16: Minor math
...Languages using UCS4: Easy

This assumes that a specific programming languages don't have easy access to
string length/index counting routines for a different Unicode encoding. Some
languages make it difficult, and require manual mathematics.

The advantage is that "unicode code points" is exactly the same meaning for
all Unicode encodings, according to unicode.org and results in consistency
for the specification. We have therefore decided to go with code point
processing, and to eliminate UTF16. By saying "unicode code points" we don't
have to worry about mentioning "Unicode encodings", as the code points mean
the same thing for all Unicode encodings.

As you can see, we now believe that this is the only 'practical' alternative
to minimize "average complexity" spread across all possible programming
languages.  I would welcome an alternative that is simpler, but currently
our research appear to show that the "code point" method has the compelling
advantage of avoiding too much complexity for any specific programming
language.



> Yes, I figured that out, I just wasn't sure why we needed that kind of
> complexity. I'll look at it again.
>

I've added a small introduction paragraph to the top of the Use Cases, to
explain the purpose of the progressive examples leading up to the final
real-world example.

I plan to submit the document with all the minor by this weekend -- which
will make it on time

Thanks,
Mark Rejhon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20110623/ba954833/attachment.html>


More information about the Standards mailing list