[Standards] roster schema

Peter Saint-Andre stpeter at jabber.org
Mon Jun 25 10:10:44 CDT 2007


Joe Hildebrand wrote:
> What do you mean by character?

The operative question is, what does XML mean by character? Here
http://www.w3.org/TR/2000/WD-xml-2e-20000814#dt-character seems to
define the scope:

***

[Definition: A character is an atomic unit of text as specified by
ISO/IEC 10646 [ISO/IEC 10646] [E67](see also [ISO/IEC 10646-2000]).
Legal characters are tab, carriage return, line feed, and the legal
characters of Unicode and ISO/IEC 10646. [E69]The versions of these
standards cited in A.1 Normative References were current at the time
this document was prepared. New characters may be added to these
standards by amendments or new editions. Consequently, XML processors
must accept any character in the range specified for Char. The use of
"compatibility characters", as defined in section 6.8 of [Unicode]
[E67](see also D21 in section 3.6 of [Unicode3]), is discouraged.]

Character Range
[2]    	Char	   ::=    	#x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF] 	

/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

***

Perhaps you can parse that better than I can.

> Do you have to perform some sort of canonicalization before counting?
> Combining characters make this particularly difficult, which is why we
> settled on something easy to describe and understand in JIDs.

Right. It may not be easy to specify in XML schema, because the length
of xs:string is length in characters as defined above.

/psa

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 7358 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mail.jabber.org/pipermail/standards/attachments/20070625/678b240f/smime.bin


More information about the Standards mailing list