[Standards] XEP-0372: References

Florian Schmaus flo at geekplace.eu
Mon Mar 12 17:01:39 UTC 2018

On 12.03.2018 16:17, Jonas Wielicki wrote:
> On Montag, 12. März 2018 15:56:04 CET Sam Whited wrote:
>> On Mon, Mar 12, 2018, at 09:20, Jonas Wielicki wrote:
>> because just as
>> scalar values can be made up of multiple bytes, glyphs (or "grapheme
>> clusters") may be made up of multiple scalar values (and, as you pointed
>> out, the range could end in the middle of a grapheme cluster that uses
>> multiple scalar values).
>> In my mind there are only two things that make sense here:
>> - Use bytes and come up with a way to handle bad ranges that end in the
>> middle of a UTF-8 sequence 
> That proposal does not make sense at all. It doesn’t solve the issue of having 
> a range start or end in the middle of a grapheme cluster, and it introduces 
> extra complexity by requiring implementations to re-obtain a UTF-8 
> representation of the character data (or keep it around). Sounds like the 
> worst of both worlds (Grapheme Clusters vs. Scalar Values). XML Character Data 
> is specified in Scalar Values (they call it Characters, but it really is a 
> Scalar Value minus \uFFFF and \uFFFE), so it makes most sense to re-use that.
>> - Use grapheme clusters and require that
>> everyone implement the segmentation algorithm
> This will bring us all kinds of issues with different unicode versions.
>> I lean towards bytes because it keeps things simple and 
> Then let’s stay with Scalar Values, which is what XML works with, instead of 
> using a lower-level representation.

I'm also leaning towards this.

And possibly specify that a pointer to the start or the middle of a
grapheme cluster is not recommended, and if found, should be treated as
a pointer to the cluster itself.

- Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 642 bytes
Desc: OpenPGP digital signature
URL: <http://mail.jabber.org/pipermail/standards/attachments/20180312/49975243/attachment.sig>

More information about the Standards mailing list