[Standards] Proposed XMPP Extension: Character counting in message bodies

Tedd Sterr teddsterr at outlook.com
Fri Dec 4 14:49:40 UTC 2020


> FWIW I was a big proponent of doing it this way too, but I've changed my
> mind after seeing too many grapheme segmentation implementations be
> broken in small, different, ways. My new position is that we have to
> just count bytes and figure out a sane behavior in case someone sends us
> an invalid offset in the middle of a codepoint or something. This is
> encoding agnostic (not that it matters for XMPP) and makes it very easy
> to count: go to that byte offset, check if we're on any sort of UTF-8
> boundary, if so call it a day, if not do whatever the fallback is.

Codepoints are preferable: https://mail.jabber.org/pipermail/standards/2019-October/036589.html
If you're indexing by clusters then you're just asking for trouble.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201204/124d929c/attachment.html>


More information about the Standards mailing list