[Standards] roster schema
stpeter at jabber.org
Mon Jun 25 10:10:44 CDT 2007
Joe Hildebrand wrote:
> What do you mean by character?
The operative question is, what does XML mean by character? Here
http://www.w3.org/TR/2000/WD-xml-2e-20000814#dt-character seems to
define the scope:
[Definition: A character is an atomic unit of text as specified by
ISO/IEC 10646 [ISO/IEC 10646] [E67](see also [ISO/IEC 10646-2000]).
Legal characters are tab, carriage return, line feed, and the legal
characters of Unicode and ISO/IEC 10646. [E69]The versions of these
standards cited in A.1 Normative References were current at the time
this document was prepared. New characters may be added to these
standards by amendments or new editions. Consequently, XML processors
must accept any character in the range specified for Char. The use of
"compatibility characters", as defined in section 6.8 of [Unicode]
[E67](see also D21 in section 3.6 of [Unicode3]), is discouraged.]
 Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
Perhaps you can parse that better than I can.
> Do you have to perform some sort of canonicalization before counting?
> Combining characters make this particularly difficult, which is why we
> settled on something easy to describe and understand in JIDs.
Right. It may not be easy to specify in XML schema, because the length
of xs:string is length in characters as defined above.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 7358 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mail.jabber.org/pipermail/standards/attachments/20070625/678b240f/smime.bin
More information about the Standards