[Standards] RTT, take 2

Dave Cridland dave at cridland.net
Fri Jun 24 09:38:40 UTC 2011


On Fri Jun 24 10:24:50 2011, Remko Tronçon wrote:
> > So I'd say that we should refer to characters in a string, and  
> deal with
> > Unicode code-points in the abstract.
> 
> I'm wondering whether 'code points' are any better than UTF-8 based
> positioning. Isn't it possible that a codepoint position also points
> inside a character/glyph/...? Peter could probably shed some light  
> on
> this.
> 
> 
As in, adding a "C" character at the fifth code-point of "Tronçon"  
might give you "TroncÇon", or "TronçCon", depending on whether "ç" is  
a "c-with-cedilla" or a "c" followed by a "combining cedilla"?

Yes, I'm quite sure that's possible.

I don't have a solution, either, except to note that this applies to  
UTF-8 octets etc as well, unless you normalize all strings first -  
but then it's really not clear to me how to translate editing actions  
in a GUI into that form.

Dave.
-- 
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at dave.cridland.net
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade



More information about the Standards mailing list