[JDEV] look for help about unicode in jabber system
timbeau_hk at yahoo.co.uk
Sat Aug 17 07:46:05 CDT 2002
Apologies about the <NULL> tag jibe - it was late and I have never forgiven
C for having to be extended to get around what I saw as ox-headed string
Well, I cannot speak for other ex-PASCAL programmers, but when I used it on
64-bit OpenVMS Alphas we had awareness of 64-bit processing, quad
pipelining, hits due to call stacks, local and remote jumps, indirection,
L1&2 cache behaviour, register use, soft and hard page faults and compiler
optimisation strengths and weaknesses. The AXP compiler was red hot and took
care to make fast code out of PASCAL and C alike. I dug in to the assembler
to see how on occasion and to compare programming styles for future
So, PASCAL programmers concerned with efficiency did exist, as now do
reliable and robust C programmers, which I notice in abundance here, in the
Jabber world (and why I feel at home).
<crosspost type="Warning" list="jig">
I see the problems of UTF-8 and binary headers as very similar - both are
bit-packed conditionally-sized data. Thus, if we can handle UTF-8 properly,
we can handle binary headers properly. It is up to awareness in design to
avoid placing data across obvious boundaries. I would even go to say that we
need to be careful of embedded devices, so assuming 64-bit registers may
still be optimistic at this time.
My admittedly crude point about PASCAL vs. C was we should seek out and use
systematic and 'tight' practices, e.g. interfaces, strong typing or
On 16/08/2002 10:50 pm, "Dave" <dave at dave.tj> wrote:
> C doesn't require NULL-terminated strings. It's just that the standard
> C string library assumes that strings end in NULL (since that method's
> proven to be very effective for many applications). There are plenty
> of enumerated-string libraries for C, and because strings aren't built
> into the language, those libraries can be every bit as efficient as
> the standard C routines (but then again, PASCAL people don't really
> care much about efficiency, anyway ... if they did, they wouldn't be
> PASCAL programmers, now, would they?). If anything, one of C's sons
> (that bastard created by Mr. Stroustrup) makes it rediculously easy
> to use Unicode in the full UCS-32 format (or any of the other formats,
> for that matter), by creating a new character data type, and using the
> should've-been-in-STL basic_string template with that new UCS32Char type.
> If you'd prefer to avoid leaving C (a very wise choice, IMHO), you can
> use a wchar_t array ... or you can just stick with the extraordinarily
> simple (and very compatible) UTF-8 :-)
> As for alignment of structure elements, anything like that is guaranteed
> to cause portability headaches. If you really want to do it in C, you can
> either fake it using character arrays, or use an inline assembly block.
> Be aware that neither C nor PASCAL provides sufficient portability
> when you try to do that kind of stuff, because that requirement by
> definition violates any hopes of portability (which is not necessarily
> bad, but it's worth considering nonetheless). Also, the primary reason
> for system-dependent alignment is efficiency. If your 64-bit CPU has
> to fetch two seperate 64-bit words just to get a 2-bit value, you're
> losing lots of potential speed.
> - Dave
> Timothy Carpenter wrote:
>> I do not think CHAR to UNICODE is the answer. CHAR is 8 bit, but UTF-8 is a
>> way of sending UNICODE without breaking 'text' streams with data that looks
>> like CR, LF EOF EOLN etc etc. RCSU is also another mechanism that is very
>> intelligent use of packing, processing and compromising between ASCII and
>> full 16-bit character sets, but I cannot recall if this protects text stream
>> handlers from shocks. UTF-8 is less compact, but simpler, with no sliding
>> To convert is not a huge task, to my memory - just a little masking and bit
>> shuffling...shame no one uses PASCAL, as apart from not using <NULL> end
>> tags for strings (yeah!), you can define structures to have conditional
>> contents nailed down to the bit position, and even crossing
>> byte/word/longword boundaries. Thus the data slots in without too much math
>> nonsense all over the place.
>> Maybe this is why many C programmers quail at the thought of binary
>> bit-packed headers and say they are unmaintainable. They probably are...in
>> C. ;-)
>> On 17/08/2002 12:38 pm, "ÕÅ Æé" <jabberjaist at hotmail.com> wrote:
>>> do the jabber system support to east aisa GLYPH images,chinese ,japanese
>>> and korea.I want
>>> my jabber server support to unicode of east aisa.but I get a trouble. my
>>> friend tell me.
>>> just below ,is it right ,or have a better way to resolve the problem.
>>> 6¦1Jabber uses UTF-8 encoding
>>> 6¦1We have not been facing any problems because we have been operating in
>>> ASCII domain which is a subset of UTF-8.
>>> 6¦1We need to find some kind of encoding algorithm/API which converts
>>> to UTF-8 before we send out strings to the server and some kind of decoding
>>> Algorithm/API which does the opposite when we receive strings.
>>> 6¦1We need some kind of rendering mechanism has to make the mapping from
>>> unicode to the actual character.
>>> 6¦1There are a couple of Microsoft APIs called MultiByteToWideChar and
>>> 6¦1There is an Mlang API of Microsoft which has functions like
>>> ConvertStringToUnicode and ConvertUnicodeToString (I think this is our best
>>> bet. If we read this thoroughly we might be able to solve the problem)
>>> jdev mailing list
>>> jdev at jabber.org
>> Do You Yahoo!?
>> Everything you'll ever need on one web page
>> from News and Sport to Email and Music Charts
>> jdev mailing list
>> jdev at jabber.org
> jdev mailing list
> jdev at jabber.org
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
More information about the JDev