[Standards] Proposed XMPP Extension: Character counting in message bodies

Florian Schmaus flo at geekplace.eu
Fri Dec 4 16:10:21 UTC 2020


On 12/4/20 4:01 PM, Sam Whited wrote:
> On Fri, Dec 4, 2020, at 14:50, Florian Schmaus wrote:
>> But this String will be represented in your programming language's
>> native String representation, which may or may not match the bytes on
>> the wire.
> 
> That's the point, we can't guarantee what the representation is. > …
 > it might be one of the various east Asian encodings that are still
 > popular (or so I've been told).

XMPP uses Unicode because XML, upon which XMPP is build, uses Unicode, 
hence I doubt that you will ever find an API where e.g. 
Message.getBody() will return data that is not Unicode encoded, but uses 
some other encoding scheme.

So, I am sorry, but I do not see your point. Furthermore, the Strings of 
all modern programming languages, I am aware of, allow you to derive the 
Unicode code points they consist of. And from those code points one can 
derive grapheme clusters.

- Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 495 bytes
Desc: OpenPGP digital signature
URL: <http://mail.jabber.org/pipermail/standards/attachments/20201204/7d89d921/attachment.sig>


More information about the Standards mailing list