[Standards] UPDATED: XEP-0301 (In-Band Real Time Text) -- "Unicode Character Counting"

Mark Rejhon markybox at gmail.com
Mon Jul 23 05:12:15 UTC 2012

On Mon, Jul 23, 2012 at 12:17 AM, Mark Rejhon <markybox at gmail.com> wrote:

>  19. Edit deferred -- Explanation given in previous email. It helps
>>> reader associate WHICH definition of "character" we are using. Even the
>>> RFC's say that the word has multiple interpretations, so it's appropriate
>>> here in the title. The title is like a glossary entry, and the contents
>>> explain we're using code points as the method of counting characters.
>>  I still regard this dangerous and confusing. We are counting Unicode
>> code points, and that needs to be clear in all explanations.
> We will have to agree to disagree -- I think it's safer and less confusing:
> Did you know there are 47 occurances of the word "character" in the whole
> document?
> Therefore, I prefer not to remove the word "Character" in the heading
> "Unicode Character Counting".  Thus, it is like the heading of an extended
> *glossary* definition here -- and it is in my opinion safer and less
> confusing.   Obviously, the section is too big to move to the glossary
> section, but I am open to alternate ideas of defining the word "character"
> from this mailing list.
> For this, I defer to public comment (once 0.5 is up).

Referring to: http://unicode.org/glossary/ , which says the following:

*Code Point <http://unicode.org/glossary/#code_point>*. (1) Any value in
the Unicode codespace; that is, the range of integers from 0 to 10FFFF16.
(See definition D10 in Section 3.4, Characters and
Not all code points are assigned to encoded characters. See *code
point type<http://unicode.org/glossary/#code_point_type>
*. (2) *A value, or position, for a character, in any coded character set.*

Other rationale:
- Other XEP's use "character" terminology
- People are already familiar with "character" terminology.
- There's 47 occurances of word "character" in XEP-0301 .... (e.g.
"...Remove 1 character from...")
- Search-Replace all of them into "code points" would make document _even_
more confusing to those who are not familiar with "code point" terminology.
- Therefore, I feel that the lesser of evil is to treat "Unicode Character
Counting" as a definition of XEP-0301's use of the word "character". If an
implementer makes an error in interpreting the word "character" this this
section clarifies it.
- If several people here agree with Gunnar that "Unicode Character
Counting" should be renamed to "Unicode Code Point Counting", they would
probably also agree that the word still needs to be defined somewhere else
-- such as in the Glossary section.  (defining the word "character" from
the perspective of XEP-0301, since "character" has multiple
interpretations, so it is necessary to define the word "character", and I
chose "Unicode Character Counting" as the definition of "character") .... I
am open to alternative methods of defining "character", but it needs to be
less confusing, not even more confusing.

I'd like to hear opinions from others about this matter, as well as general
comments about "Accurate Processing of Action Elements" (of which "Unicode
Character Counting" is included within).

Mark Rejhon
