[Standards] RTT, take 2

Florian Zeitz florian.zeitz at gmx.de
Fri Jun 24 13:04:33 UTC 2011


On 24.06.2011 11:24, Remko Tronçon wrote:
>> So I'd say that we should refer to characters in a string, and deal with
>> Unicode code-points in the abstract.
> 
> I'm wondering whether 'code points' are any better than UTF-8 based
> positioning. Isn't it possible that a codepoint position also points
> inside a character/glyph/...? Peter could probably shed some light on
> this.
> 
FWIW, I think using codepoints solves somewhat different problem.

If we count codepoints we can delete "half a character", e.g. remove the
"combining cedilla" from ç, but if we count UTF-(8,16) based we can
delete "half a codepoint" rendering the result undecodeable which is far
worse.



More information about the Standards mailing list