[Standards] Proposed XMPP Extension: Character counting in message bodies

Florian Schmaus flo at geekplace.eu
Sat Dec 21 09:57:02 UTC 2019


On 18.12.19 16:00, Marvin W wrote:
> It's indeed a good question if anything in XMPP allows servers or
> in-between entities to do normalization. I was under the assumption that
> servers do not change the codepoints. In XML [1] Characters with
> multiple possible representations in ISO/IEC 10646 (e.g. characters with
> both precomposed and base+diacritic forms) match only if they have the
> same representation in both strings. Thus by XML specification,
> normalization is changing the body.

I am not sure if it is not a little bit far fetched to deduce from the
XML "string match" definition that XMPP entities are not provided with a
little bit of freedom to transform Unicode string representation within
a certain degree. At least I am currently missing the link from the XML
"string match" definition to "XMPP entities must use this when
serializing/de-serializing XML".

If we can make that link, then we do not need normalization. And we
probably want to clearly state that requirement in rfc6120bis, because
it is not obvious (at least for me).

> Also the main reason why we shouldn't ask for Unicode normalization to
> happen is that different Unicode version have different normalizations.> Thus if the sender normalizes with Unicode version X and calculates
> offsets from that, then receiver normalizes with Unicode version Y and
> determines the offsets there, they can end up in pointing to different
> characters.

We need Unicode agility anyway in XMPP, which I do not believe to be a
big issue. Especially since Unicode is likely to introduce lesser
changes with every future standard version.

- Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 618 bytes
Desc: OpenPGP digital signature
URL: <http://mail.jabber.org/pipermail/standards/attachments/20191221/7e291580/attachment-0001.sig>


More information about the Standards mailing list