[Standards] Proposed XMPP Extension: Character counting in message bodies

Sam Whited sam at samwhited.com
Fri Dec 4 14:27:55 UTC 2020


FWIW I was a big proponent of doing it this way too, but I've changed my
mind after seeing too many grapheme segmentation implementations be
broken in small, different, ways. My new position is that we have to
just count bytes and figure out a sane behavior in case someone sends us
an invalid offset in the middle of a codepoint or something. This is
encoding agnostic (not that it matters for XMPP) and makes it very easy
to count: go to that byte offset, check if we're on any sort of UTF-8
boundary, if so call it a day, if not do whatever the fallback is.

—Sam

On Fri, Dec 4, 2020, at 14:15, Florian Schmaus wrote:
> Reply containing rant about how unpractical grapheme cluster counting
> is in 3, 2, 1… :)


More information about the Standards mailing list