[Standards] Proposed XMPP Extension: Character counting in message bodies
jonas at wielicki.name
Wed Dec 18 16:27:04 UTC 2019
On Mittwoch, 18. Dezember 2019 16:40:42 CET Marvin W wrote:
> On 12/18/19 3:22 PM, Andrew Nenakhov wrote:
> > In the end we have settled for counting characters of escaped string, so
> This sounds like a terrible idea. In encoded XML, ">", ">", ">"
> and "<!CDATA[>]]>" are equivalent. I just tried it out and servers
> indeed do convert all of those to their shortest well-formed variant
> (which is ">") so you cannot rely on their reference length at all.
> Servers may at their discretion convert non-ascii characters to their
> character reference form (starting with &#). I have seen this at least
> once happening with emojis.
I’m 100% with Marvin (and Ralph) here. Counting before escaping makes no
sense, because the character data of XML is codepoints after escaping, not
before on a theoretical level and for the reasons noted by Marvin on a
Having it written down for safety is a good idea tho.
> > to draw *&&&* in a client we count it as string with a length of 15,
> > thus <bold> reference points to characters 0..14:
> > <reference xmlns="urn:xmpp:reference:0" begin="0" end="14"
> > type="markup"><bold /></reference>
> Luckily for you, this looks pretty non-standard, so you don't have to
> deal with your implementation being incompatible with others. Also as
> soon as XEP-0372 becomes actually more stable, you are technically
> standard non-compliant because there is no <bold /> element defined for
> the namespace "urn:xmpp:reference:0". You are apparently mixing XEP-0372
> and XEP-0394.
> Also that's a weird counting there, usually I would expect end to point
> to the position after the last referenced character - at least that's
> what you do in most programming languages (e.g. "&&&"[0:14]
> will give you "&&&" without the last ";").
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: Standards-unsubscribe at xmpp.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 833 bytes
Desc: This is a digitally signed message part.
More information about the Standards