[Standards] Support for stickers (custom emojis)

Jonathan Lennox lennox at cs.columbia.edu
Mon Oct 21 14:06:44 UTC 2019


On Saturday, October 19 2019, "Sam Whited" wrote to "standards at xmpp.org" saying:

> On Sat, Oct 19, 2019, at 04:57, JC Brand wrote:
> > You might still have an offset in between two codepoints that should
> > ideally be shown together like "EU" making the EU flag, but this seems
> > less of an issue to me.
> 
> I don't know if this is better or not, and I'm still not sure how best
> to handle it. If you end up with text in the middle of a UTF-8 encoding,
> at least that's clearly an error. If it's in between the two letters in
> a flag emoji, that's not necessarily an error and there are tons of
> different ways you could handle it, which seems much more complex.
> Does this break the flag emoji back into the letter glyphs that are
> shown if it doesn't form a flag? What if it's between something and a
> zero-width joiner that would join it to another glyph, does that split
> that and now you have a dangling joiner? From a code perspective does
> this mean that highlighting always has to integrate with the text
> rendering engine? This seems like a *major* downside to me, as it likely
> makes the code much more complicated, and we may or may not even have
> the ability to manipulate how the text rendering engine handles things.

The right concept here is probably "grapheme clusters", as defined in
Unicode Standard Annex 29.  ICU has support for this.

-- 
Jonathan Lennox
lennox at cs.columbia.edu


More information about the Standards mailing list