Dnia 06-11-2007, Wt o godzinie 14:35 +0000, Dave Cridland pisze:
> > Let's take first 256 allowable UTF-8 characters [...]

> Can't do that, because many of those characters are going to be  
> illegal even in CDATA sections.

First _allowable_ 256 UTF-8 characters are for sure legal in CDATA

> But bear in mind that even then, to encode a single octet will yield  
> between 1 and 3 characters.

I would only use those UTF-8 characters that maps to maximum 2 bytes.
Leaving the 3byte and more...

And a better mapping:
Bytes that are valid UTF-8 characters are mapped 1 to 1.
Only the invalid ones are mapped to 2byte characters.

This way if the "binary" data is ASCII text, it stays human readable.

This is a simple 256 rows translation table, that could be defined

