Saturday, November 26, 2005

BabelStone Blog

Andrew West's recent post about What's New in Unicode 5.0 provided links to some interesting reading. First, he answered my question about Phoenician. You can read his answer here. I didn't bring this up to reopen a debate which I have no part in. Rather, I was away for the month of August and missed the end of that story.

However, I found a document called N2990 particularly useful. This document not only records votes but also records comments. Among the comments, I noticed this line.

Encoding Phoenician is redundant, and needlessly proliferates Canaanite diascripts.

I googled diascripts and came up with this document which supplied a definition. "Diascript is to script as dialect is to language." Good, one more thing to think about.

Next, in the same document on page 9, I found an interesting item.

For character names and named UCS sequence identifiers, two names shall be considered unique and distinct if they are different even when SPACE and medial HYPHEN-MINUS characters are ignored and even when the words "LETTER", "CHARACTER", and "DIGIT" are ignored in comparison of the names.

EXAMPLE 1
The following hypothetical character names would not be unique and distinct:
MANICHAEAN CHARACTER A
MANICHAEAN LETTER A


That answers another question I had for Andrew about character names. Now I know that the part of the name that designates it a 'character' or a 'letter' is not to be considered significant.

However, this is tricky because if the name of the character differs by the word 'letter' or 'symbol' they are indeed separate characters.

U+03F0 : GREEK KAPPA SYMBOL

U+03BA : GREEK SMALL LETTER KAPPA

While Andrew has tallied up the the number of characters in Unicode in How many Unicode characters are there? I have entertained myself with another of my trivial tasks.

These little trivia games I play sometimes are simply to familiarize myself with a script or a technical detail and entertain myself at the same time. Many have no point at all. Neither does this. It is a tally of the names of characters used in Unicode and gave me a happy half-hour of playing with BabelMap.

Character Names by Block for a few representative blocks.

Arabic Letter
Latin Letter
Bengali Letter
Bopomofo Letter
Braille Pattern Dots
Cherokee Letter
CKJ Unified Ideograph
Cypriot Syllable
Deseret Letter
Devanagari Letter
Ethiopic Syllable
Hangul Choseong
Hangul Syllable
Hiragana Letter
Katakana Letter
Linear B Ideogram
Canadian Syllabics
Linear B Syllable

This is just to condition myself so that in the middle of discovering Katakana at some future date I don't do a double take when I discover that they are letters and not syllables. Ethiopic, Cypriot and Hangul have syllables but Cherokee and Katakana do not. The name for Canadian Syllabics seems to feature the name of the block. Surely the character itself is a 'syllabic', while the system is 'syllabics'. I have to think about this too.

However, there they are and I am taking a step towards becoming familiar with these names. It helps if you want to search for a character by name to know the name. I also explored many of the features of BabelPad described in this post.

I look forward to hearing more about Phags-pa some day.

0 Comments:

Post a Comment

<< Home