[Standards] Proposed XMPP Extension: Character counting in message bodies

Marvin W xmpp at larma.de
Fri Dec 20 14:24:23 UTC 2019


On 12/20/19 1:15 PM, Andrew Nenakhov wrote:
> You have sent a string '>>>>>', which was escaped to 
> '>>>>>' before sending to the server.

I have sent ">>>>>" verbatim (exactly the stanza I send you in the last 
mail was what went (TLS encrypted) to the server. According to XML 
standard "the ampersand character (&) and the left angle bracket (<) 
must not appear in their literal form" [1], but nothing is wrong with 
having > in literal form (if it doesn't appear after "]]" in which case 
it has to be replaced with a reference).

Apparently either your server or your client silently replaced the 
character with a reference (I could probably do the same in the other 
direction). I also think this is completely fine, because changing ">" 
to ">" does not change the XML document - again, they are the same in 
XML, so they should be the same in XMPP as well.

> To me, it works as designed - a sending entity had sent an incorrect 
> reference and predictably Xabber for Web worked displaying it as it should.

I totally understand why this happened (I intentionally produced this, 
because I know that many XML serializers do indeed serialize ">" as 
">" even when it is not required).

The underlying reason why this happened is that your "standard" has 
flaws. And I wrote this ProtoXEP to ensure there is one source of truth 
regarding character counting so that such flaws don't happen again. I 
will certainly update it to make sure everyone understands that ">" is 
to be counted as 1 character and not 4.

> It is true, we're not really good at 
> writing formal XEPs, in part because we're extremely busy building real 
> products that work.

I wrote this ProtoXEP because I wanted to build real products and felt 
that this need to be clarified. We need formal XEPs so that the real 
product is actually compatible with other real products in the same 
federated network and not cause issues with each other. If they are not, 
they at most qualify as a real broken product.

[1] https://www.w3.org/TR/REC-xml/#syntax


More information about the Standards mailing list