[Standards] XEP-0301 0.5 comments -Unicode characters

Kevin Smith kevin at kismith.co.uk
Fri Jul 27 06:43:56 UTC 2012


On Thu, Jul 26, 2012 at 11:04 PM, Mark Rejhon <markybox at gmail.com> wrote:
> Generally, in most reasonable situations in XMPP, normalizing an
> already-normalized Unicode string, results in no changes.  Kevin says to
> specify a normalization format, but how do we know what normalization
> network equipment uses?   So we have to carefully choose the normalization
> standard that is least likely to be affected by further unexpected passes of
> normalization.

My Unicode knowledge is hazy at best, but I think that if we normalise
with e.g. NFC before the sender calculates the edits (that is - the
sender calculates the NFC pre-string, and the NFC post-string, such
that what is sent on the wire is NFC) and the recipient normalises the
incoming packets (that is - even if the network (or the language) has
renormalised to e.g. NFD, the recipient will have renormalised to the
same form as the sender, and so will perform the same transforms and
end up with an identical NFC buffer to the sender.

> - Again, rare normalization damage (which I have never seen, not even with
> realjabber.org, talk.l.google.com, or Openfire) is self repairing anyway via
> Message Reset.

I believe that some libraries will change the normal form of strings,
at least - so without explicit normalisation rules someone
implementing clients in these situations would end up with RTT that
didn't quite work right. It's true that resets will fix it every 10
seconds or whenever, but if we have the ability to easily resolve the
issue I think we should (the normalisation won't be a particular code
burden for devs, as all XMPP entities need to do Unicode mangling
elsewhere anyway, so will have the relevant tools).

/K



More information about the Standards mailing list