Friday, July 08, 2005

Unicode Philosophy

There is a philosophical discussion on the Unicode List. This is a nice change from the "fi ligature", "UCS-2/4 & BOM" and "JIS X 0208". Not that those can't be very important too.

However, I wish to give you a taste of the current conversation which is on a different plane altogether.

This is from Gregg Reynolds, Fri Jul 08 2005 - 18:58:27 CDT
Re: Demystifying the Politburo

"Seriously (I'll try), the question of participation of native speakers is (IMHO) and important and thorny one.

On the one hand, nothing says native speakers are the best informants. And as a matter of policy I see no reason why a *standards* body (especially an industry standard body) should have a requirement for native speaker participation; after all, the (industry-defined) goal is to get a standard, not to make everybody happy. No doubt such participation is desirable, but it's quite a different thing to say it's required. Standards have to work in the marketplace in order to become standards.

On the other hand, it's pretty obvious (to me at least) that participation of native speakers in standardization of cultural artifacts like written language is a Good Thing. (List: I know, I know, Unicode does not encode written language, it encodes characters/scripts/whatever. But the perception will always and inevitably be that it is an encoding or modeling of written language.)

I can't help drawing an analogy (if that's the right word) to the ideas often discussed by Edward Said, among others. He wrote extensively about how the West (that fearsome boogeyman) controls the narrative of/about the East. It doesn't really matter if I as a Westerner get it right; the East (South, Middle East, slightly East and a little South but ... etc.) should speak for itself. (Or something like that; it's been a while). Now, one may agree or disagree with his language (I'm not so crazy about it myself), but there is no denying that his views are supported by a large population in both East and West. Defining an encoding that models (in some way) non-Western languages without significant - and visible - participation of native speakers seems analogous to "us" telling their history instead of letting "them" tell their history.

On the third hand, it's clear (but maybe only to those who follow the Unicode list) that people like Mr. Everson work very closely with native speakers, so you can't really argue that the linguistic communities were/are not represented. We are clearly not the 19th century.

On the fourth hand, it's also clear (to me at least) that Unicode works great for some linguistic communities and not so great for others. (You knew it was coming, and here it is: Unicode is very bad indeed for the RTL community in general and Arabic in particular. ;-) This gets back to the design principles (and the interests that drive them) of Unicode, which work better for some languages than others.

And then there are the pragmatic issues which you have outlined concisely in another message.

Obviously I haven't quite wrapped my mind around these issues yet so I beg the indulgence of you and other Listerines. I (rashly?) assume that pretty much everybody on this list is interested in "getting it right" for everybody, and therefore might be a little interested in such considerations. It's not a case of blaming, but of understanding. I think.

Personally, I think Unicode is (well, may be) of enormous historical significance, yet it flies almost entirely under the cultural radar, at least in the US. I daresay most places in the world that will eventually be heavily influenced by Unicode are more or less oblivious to it.


Thanks, very interesting. I see many of the scripts being worked on list one "Everson" as the contact. Who is this mysterious and ubiquitous "Everson", anyway? Is it one person? Sounds an awful lot like the fictional Cecil Adams to me: (


To view this post in context link to the Unicode Mail List Archives here or in my sidebar. You can either join by following the instructions, in which case you can participate, or you can read the archives which are password protected. The password protection is only to avoid spam and the list is public: the password is posted.

One of the current issues is about how well publicised the work of Unicode is. I want to do my bit.


Post a Comment

<< Home