[Standards] Council Minutes 2020-05-27

Sam Whited sam at samwhited.com
Tue Jun 2 20:34:47 UTC 2020


On Tue, Jun 2, 2020, at 16:17, Marvin W wrote:
> 1. I don't think we should add new rules, especially none that are
>    hard to implement - and most people (and also many programming
>    languages) have trouble with unicode, so I'd qualify this as hard
>    to implement.

This does not add new rules, it just documents something that would work
with the spec as it is today. I would be curious how it would be hard to
implement though? I'm aware of issues with eg. converting between
encodings and what not in Python and Javascript (for example) but not
with something a simple as inserting or removing a rune in a string. If
you're implementing XMPP you have to already be using UTF-8, so I would
think you'd already be able to deal with UTF-8 encoded text.


> 2. I also disagree that category C characters should not be allowed to
>    appear after the opening directive. An example where I think there
>    is valid use to put it after the opening directive are LRO/RLO.

Good point, I'll go with the other alternative or go back to the
drawing board if that doesn't work (read on). I was hoping to avoid
adding extra rules anyways, and making category C another thing that's
disallowed counts.

> 3. After checking again the definition, I disagree that WORD JOINER is
>    a good way to indicate no formatting should happen: "The function
>    of character is to indicate that line breaks are not allowed
>    between the adjoining characters, except next to hard line breaks."
>    Thus, if this character is put behind a space, it stops line breaks
>    from happening at that space which would normally happen, and I
>    don't think that's what people wanted.

Interesting, I thought it just split the function of ZWNBP from BOM
which used to be the same scalar value (it just depended if it was at
the beginning of the line or in text). The definition I read was more
expansive than that and mentioned different types of breaking up text,
but Wikipedia at least agrees with you. Let me read a bit further and
see if I can find some uses in the wild or a better character that would
work. If not we can always drop this section, I liked it because of how
simple it was and that it didn't require any new rules to implement, but
I'm not dead set on making sure that it's easily possible to disable
individual spans or blocks, it would just be a nice to have if it
doesn't require any more work from client authors.

> Solution could be:
> - If a space, the start of the string or a newline precedes the
>   opening directive, it can be disabled by prefixing it with U+200B
> - If another opening directive precedes the opening directive, it can
>   be disabled by prefixing it with U+2060
>
> Both are not sane solutions and I wasn't actually very serious when
> mentioning it. So maybe it's not a good idea to mention it in the XEP
> even though it technically works.

This would be adding a new rule like disallowing category C would, by
making the ZWS or whatever we ended up using a special case, which I was
hoping to avoid, but it may be worth considering. I'll keep thinking
about it. Thanks for the ideas!

—Sam


More information about the Standards mailing list