[Standards] Proposed XMPP Extension: Character counting in message bodies

Sam Whited sam at samwhited.com
Fri Dec 4 20:58:35 UTC 2020

On Fri, Dec 4, 2020, at 20:53, Florian Schmaus wrote:
> If you count the bytes of the UTF-8 encoded representation, then there
> is no way to have any fallback (as the indexes would be wrong).

Maybe I don't understand the fallback you're proposing. I do understand
your example, and assert that it doesn't matter. You're not likely to
have an invalid offset and if you do then we can define a fallback for
that. It might be "the range ends at the start of the codepoint" (so you
have to decode a single codepoint, not the entire range), or it might be
"this is an invalid range, don't display anything".

> This is, of course, because in the example the number of code points
> and graphemes is identical. But this allows developers to easily
> bootstrap this scheme by simply counting code points in the beginning.
> I wouldn't be surprised if that it would work so well that they never
> even switch to grapheme counting.

We could also easily count bytes and I wouldn't be suprised if that
worked well enough and we don't have to switch to anything else.


More information about the Standards mailing list