[Standards] XEP-0301 Real-Time Text: Unicode normalization, bidirectional, right-to-left text, etc. -- Comments needed

Mark Rejhon markybox at gmail.com
Mon Jul 2 21:35:25 UTC 2012


On 2012-07-02 2:45 PM, "Kurt Zeilenga" <Kurt.Zeilenga at isode.com> wrote:
> But I wonder if the XEP needs to say something about changes in valid
text to valid text which might produce invalid text in the edit?
Consider, if user replaces the single glyph in message, is it allowed to
send just the code points that changed, or its necessary to all the code
points of each glyph that was changed?  That is, consider the text
"tschuss" and the changed to add an diaeresis over the 'u'.  Using
decomposed characters, that a change of U+75 to U+75,U+308.  Is it okay to
RTT which inserts U+308 instead of replaces U+75 with U+75,U+308?
>

Either way is allowed, though all my implementations use a "sends
differences only" methodology, with success on all public XMPP servers
tried so far.

It is already covered in the second paragraph of the rewritten Section
4.5.4.2 "Guideline for Recipients" shown below:

> > Note that [[Element <t/> – Insert Text]] is allowed to contain any
subset sequence of Unicode characters from the real-time message. This may
result in certain situations where the text transmitted in <t/> elements is
allowed to be temporarily an incorrectly-formed Unicode string (i.e.
orphaned standalone combining mark, orphaned direction-change character for
bidi Unicode, etc.) but becomes correct when inserted into the middle of
the recipient's real-time message, and passes recipient
validation/normalization with no character modifications. Note that a
compliant XML processor does not modify or fix Unicode errors caused by
taking only a subset of characters from correctly-formed Unicode text. One
alternative way for implementers to visualize this, is to visualize the
Unicode text as an array of individual code points, and treat the p and n
values accordingly.
> >

A minor edit to to clarify this for multiple characters forming one glyph,
is to add "incompletely formed glyphs" to the list in the paranthesis.
Would that make sense?

Thanks,
Mark Rejhon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.jabber.org/pipermail/standards/attachments/20120702/6ae31f68/attachment.html>


More information about the Standards mailing list