[Standards] XEP-0301 Real-Time Text: Unicode normalization, bidirectional, right-to-left text, etc. -- Comments needed

Peter Saint-Andre stpeter at stpeter.im
Wed Jul 4 01:21:37 UTC 2012

On 7/3/12 5:10 PM, Mark Rejhon wrote:
> On Tue, Jul 3, 2012 at 3:06 PM, Peter Saint-Andre <stpeter at stpeter.im
> <mailto:stpeter at stpeter.im>> wrote:
>     > A minor edit to to clarify this for multiple characters forming one
>     > glyph, is to add "incompletely formed glyphs" to the list in the
>     > paranthesis.  Would that make sense?
>     Do you mean multiple code points forming one character? I still find the
>     use of the term 'glyph' confusing here and would prefer to leave it out
>     if possible, because it doesn't seem that we're really talking about
>     "The actual, concrete image of a glyph representation having been
>     rasterized or otherwise imaged onto some display surface." I think it's
>     best if RTT talks about characters and code points.
> Yes -- that is what I meant. 
> I'll replace the word glyph with character.  The problem is I am trying
> to be consistent with what "character" means.  RFC6365 has multiple
> interpretations for the word "character", too.  Is it a code point? 

A code point is a number (often in hex) assigned to a character in a
coded character set. For example, the code point for GREEK SMALL LETTER

> Is it a displayable character? 

It's not clear to me what makes a *displayable* character different from
any other kind of character. Are you making a contrast with control
characters and the like? But even they are characters.

> Is it the 'char' data type (which can be 1,
> 2 or 4 bytes each depending on platform)? 


> Therefore, I like to avoid
> using the word "character" outside the context of a Unicode code point,
> this is how XEP-0301 defines a character as.

Characters can be coded differently in different coded character sets
(of which many existed before Unicode attempted to unify them all). For
example, HALFWIDTH KATAKANA LETTER KI (キ) is coded as U+FF77 in Unicode
but as B7 in Shift-JIS. Thus  "character" can be more precise than "code
point" (since the latter depends on which coded character set is used).

> However, I've now removed the word "glyph" from the document. 

Good plan. :)


Peter Saint-Andre

More information about the Standards mailing list