[Standards] RTT, take 2
dave at cridland.net
Fri Jun 24 09:38:40 UTC 2011
On Fri Jun 24 10:24:50 2011, Remko Tronçon wrote:
> > So I'd say that we should refer to characters in a string, and
> deal with
> > Unicode code-points in the abstract.
> I'm wondering whether 'code points' are any better than UTF-8 based
> positioning. Isn't it possible that a codepoint position also points
> inside a character/glyph/...? Peter could probably shed some light
As in, adding a "C" character at the fifth code-point of "Tronçon"
might give you "TroncÇon", or "TronçCon", depending on whether "ç" is
a "c-with-cedilla" or a "c" followed by a "combining cedilla"?
Yes, I'm quite sure that's possible.
I don't have a solution, either, except to note that this applies to
UTF-8 octets etc as well, unless you normalize all strings first -
but then it's really not clear to me how to translate editing actions
in a GUI into that form.
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
More information about the Standards