Sunday, September 11, 2005

Unicode Resources


I am trying to install language support on a new computer so I am seeing a lot of these little boxes.

This is a good time to mention that I have been expanding my sidebar to include a few more resources for working with Unicode. There are now Unicode Resources, Babelstone, many keyboards and pickers, and the home page of the Non Roman Script Initiative. Additional resources and comments are welcome as always.

Mike wore this T-shirt but I am passing on that for now since there is no one in my real life who has ever heard of Unicode. An image of the bumper sticker will have to do me. The T-shirts look rather nice though, organic cotton and all.


Addenda:

More little boxes - John Yunker talks about web globalization and google.


I have been reminded of Wikipedia's home page and the List of Languages orderd by code.


Windows update, which allows one to display Inuktitut, is available here.

6 Comments:

Anonymous Anonymous said...

The problem displaying Unicode is two-fold:

1. Some systems simply don't ship with adequate Unicode fonts (cough Windows, Linux).
2. Some programs and operating systems are not competent at finding Unicode glyphs from the available fonts.

All of which is too bad, because the vast Unicode character set is incredibly useful for both languages and symbols. Blogger Dave Shea experimented with this a bit in this post of his: http://www.mezzoblue.com/archives/2005/07/25/glyphs/

The only browser capable of finding and displaying all the correct glyphs in his example is Safari on OS X.

Another page I use as a test-page for Unicode functionality is Wikipedia.com. Do all the dozens of languages appear in their proper scripts near the bottom of the page? They should, without any extra fiddling or software installation.

8:47 PM  
Anonymous Anonymous said...

Thanks. I had forgotten about Wikipedia. This page is even better than the home page.

http://meta.wikimedia.org/wiki/List_of_Wikipedias#List_of_language_names_ordered_by_code

I don't want to forget about the windows update here.

http://www.microsoft.com/downloads/details.aspx?FamilyID=3fa7cdd1-506b-4ca0-bd47-b338e337a527&displaylang=en

It will add Inuktitut.

10:11 PM  
Anonymous Anonymous said...

That Wikipedia list is a good page. :) Looks like I'm missing Amharic and Laotian, but the Inuktitut works in my browser.

11:06 PM  
Anonymous Anonymous said...

I am missing Amharic, Inuktitut, Sinhala, Bengali, Lao, Khmer, Tamazight, etc. Quite a few actually.

Oddly, I notice that Cree is represented by a roman orthography in the wikipedia list.

The Trigeminal link in my sidebar http://www.trigeminal.com/samples/provincial.html has embedded fonts.

11:24 PM  
Anonymous Anonymous said...

That Trigeminal link is amusing. It reminds me of a Japanese friend who once came to Canada and remarked in frustration one day that she wished Canadians would just speak Japanese. :)

11:53 PM  
Anonymous Anonymous said...

anonym said:
> The problem displaying Unicode is two-fold:

Oh, there are much more problems.

1. Most operating systems and fonts from browsers support only Unicode V.2.1 and inconsistent parts of 3.0/3.1/3.2.
As example: I see the Cyrillic Supplement correct with Netscape 7.2 and FireFox 1.5, but not with IE6. The older Unified Canadian aboriginal syllabics from V.3.0 is missing - in all three browsers. Syriac contains 3.0 - they are correct.

2. Creating symbols using & # x and the Unicode-Codepoint as Hex is only supported over the basic multilingual plane, all characters lower then 65536. All Codepoints with higher numbers need a direct UTF-8 - encoding.

3. Some webmaster do not know basics about character encoding. The page is saved as ASCII, the header contains

<meta http-equiv='content-type' content='text/html; charset=UTF-8'/>

and they ask: Why are äöü etc. wrong?

The samples are from my Unicode-Database, all characters ordered by block or by category - and with their version. Seeing these problems was one reason to create this Html-Version.

8:37 AM  

Post a Comment

<< Home