[Standards] RTT, take 2

Kurt Zeilenga Kurt.Zeilenga at Isode.COM
Fri Jun 24 13:19:11 UTC 2011


On Jun 24, 2011, at 6:04 AM, Florian Zeitz wrote:

> On 24.06.2011 11:24, Remko Tronçon wrote:
>>> So I'd say that we should refer to characters in a string, and deal with
>>> Unicode code-points in the abstract.
>> 
>> I'm wondering whether 'code points' are any better than UTF-8 based
>> positioning. Isn't it possible that a codepoint position also points
>> inside a character/glyph/...? Peter could probably shed some light on
>> this.
>> 
> FWIW, I think using codepoints solves somewhat different problem.
> 
> If we count codepoints we can delete "half a character", e.g. remove the
> "combining cedilla" from ç, but if we count UTF-(8,16) based we can
> delete "half a codepoint" rendering the result undecodeable which is far
> worse.

The protocol ought to defined in wire terms… but state a few guidelines on handling of characters composed of multiple code points.

For instance, if a character is sent as <X> <Y>  (Y being a combining character), I have little problem with <Y> being edited away so long as <X> by itself is valid… or being replaced with <Z> (another combining character) without touching <X>.

It's my view that that the client needs to be aware enough of what's happening in the GUI and the wire to ensure both are sane.   If you try to design this such that clients don't have to be aware of what really going on the wire or in the GUI, it will be quite fragile and prone to interoperability problems.

-- Kurt


More information about the Standards mailing list