Proposed XMPP Extension: Character counting in message bodies

Marvin W xmpp at larma.de
Wed Dec 18 15:40:42 UTC 2019


On 12/18/19 3:22 PM, Andrew Nenakhov wrote:
> In the end we have settled for counting characters of escaped string, so 

This sounds like a terrible idea. In encoded XML, ">", ">", ">" 
and "<!CDATA[>]]>" are equivalent. I just tried it out and servers 
indeed do convert all of those to their shortest well-formed variant 
(which is ">") so you cannot rely on their reference length at all. 
Servers may at their discretion convert non-ascii characters to their 
character reference form (starting with &#). I have seen this at least 
once happening with emojis.

> to draw *&&&* in a client we count it as string with a length of 15, 
> thus <bold> reference points to characters 0..14:
> <reference xmlns="urn:xmpp:reference:0" begin="0" end="14" 
> type="markup"><bold /></reference>

Luckily for you, this looks pretty non-standard, so you don't have to 
deal with your implementation being incompatible with others. Also as 
soon as XEP-0372 becomes actually more stable, you are technically 
standard non-compliant because there is no <bold /> element defined for 
the namespace "urn:xmpp:reference:0". You are apparently mixing XEP-0372 
and XEP-0394.

Also that's a weird counting there, usually I would expect end to point 
to the position after the last referenced character - at least that's 
what you do in most programming languages (e.g. "&&&"[0:14] 
will give you "&&&amp" without the last ";").

