[Standards] RTT, take 2

Mark Rejhon markybox at gmail.com
Wed Jun 22 19:24:02 UTC 2011


Regarding Real-Time Tex: http://xmpp.org/extensions/inbox/realtimetext.html

On Wed, Jun 22, 2011 at 2:56 PM, Simon McVittie <
simon.mcvittie at collabora.co.uk> wrote:

> On Wed, 22 Jun 2011 at 13:05:48 -0400, Mark Rejhon wrote:
> > UTF16 and UTF16LE, and even UCS2 has same behaviour in my RTT spec, so I
> > just say "16-bit Unicode".  Java, C#, ObjectiveC stores strings in
> 16-bit,
> > and the various flavours of Unicode C++ STL and stdlib++ also store
> strings
> > in 16-bit as well. Extensive research and testing shows they all process
> in
> > flat mode like an array of 16-bit integers
>
> IMO you should either count Unicode codepoints (the underlying data model),
> or bytes of UTF-8 (the XMPP wire protocol). Counting in units of what a
> particular implementation uses internally, if it isn't one of those two,
> seems attractive if you use that particular implementation, but complicates
> things further for everyone who isn't.


About the two options you suggested:

-- Codepoints Method: Code points would probably be the most ideal, even
though it would complicate programming for users of 16-bit Unicode
programming languages. But it's a viable option.

-- UTF8 Method: will make things even *more* complicated for some languages.
In Java, how do you quickly calculate the UTF8 size of a inserted fragment
of text that contains two displayable characters and five combining
characters? Yes, you could convert UTF16 to a byte array, and count the
length of the byte array, but if you're doing some real time text insertions
in the middle of a string, we're now concerned about UTF8 indexes and
offsets, which requires further math calculations, or researching what Java
class library can do the calculation...  (Same for C#, and other 16-bit
Unicode languages)

If we went with the suggestion, it would obviously have to be code-point
based, for simplicity. Right now, XMPP RTT is essentially equivalent to
editing an array of 16-bit integers. Going to a codepoint method would turn
XMPP RTT essentially equivalent to editing an array of 32-bit integers.

In fact, the same standard would apparently work exactly the same for
programming languages of either 16-bit and 32-bit Unicode strings right now
-- The differences only happen when somebody types a Unicode character
U+10000 and above.  Any programming mistakes in implementation would only
become apparent in this rare case (in the most countries that don't use
those characters). Even section 4.3.2 self-corrects the programmer's mistake
via the correct <body> transmission replacing the flawed real time message.

It still somewhat complicates most of the programming languages that all the
current XMPP RTT projects (several are in progress), since most of them are
16-bit Unicode.  But, yes, XMPP RTT is definitely a long-term standard, as
it is a candidate of a standard that replace deaf TDD/TTY communications,
with a mass-market-compatible real time communication mechanism...  So I
need to include long-term thinking!

Some thinking needed.

Sincerely,
Mark Rejhon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20110622/f48fed18/attachment.html>


More information about the Standards mailing list