[Standards] RTT, take 2
gunnar.hellstrom at omnitor.se
Thu Jun 23 09:43:23 UTC 2011
Mark said in the UTF-8 / UTF-16 discussion:
> However, I am thinking of following Simon's excellent suggestion.
> What do you think of his suggestion of using "code point" counting for
> length and position attributes?
> That'd pretty much essentially turn XMPP RTT equivalently into a
> standard for editing an array of 32-bit integers instead (allow use of
> native UCS4 string functions in programming languages that stores
> strings in UCS4 format). It makes my 16-bit programming slightly more
> complicated, but much easier than counting in UTF8. It might be a
> better long term goal.
Yes, counting in code points is the right decision. You do not need to
comment what that means for the programmer.
Some may want to work in native UTF-8. Then a Unicode codepoint is well
defined as a 1-4 bytes long UTF-8 transform, easily isolated.
Some may want to work in UTF-16. They then need to watch out for 16-bit
values in the range U+D800 to U+DFFF and count pairs of such codes as 1
codepoint while all other 16-bit codes are 1 codepoint.
And some may want to work in the 32 bit expanded Unicode.
Just specify that in the protocol, p and n are counted in Unicode code
More information about the Standards