Friday, September 30, 2005

Square Scripts

Phags-pa font from Babelstone.

Several Eastern scripts are called square scripts: Hanzi, Hangul, Phags-pa and Hebrew. It is clear that Hangul and Japanese were made to match Chinese Han characters. Phags-pa also developed in the context of the Han characters. However, the Hebrew script is also called square script.

Monotype Imagining gives these descriptions of Hanzi and Hangul.

"Regardless of the number of constituent strokes, each character is drawn within the confines of an invisible square frame. "

"Graphically, the syllables all fit within the invisible square frame used by Chinese characters. Each syllable is made by stacking all of its component parts, in a predictable sequence, into a square configuration. This characteristic has made it quite simple to visually mix Hangul and Hanja into the same text."

This reference from BabelStone indicates that Phags-pa was called "dörbelǰin üsüg" square script in Mongolian and sometimes 'quadratic script' in English.

Hebrew square script, ketab merubba`, can be seen here.

I noticed in passing that Phags-pa letters are called bāsībā zì 八思巴字 "'Phags-pa letters" in modern Chinese. First, the Chinese characters are used phonetically here to write Phags-pa, and second the word zi 字 is used to refer to alphabet letters, much as we would use alphabet to talk about any writing system. BabelStone.


This post has been modified to add "dörbelǰin üsüg", Mongolian for 'square script'. On BabelStone's page the j with caron did not display correctly for me and I saw an empty box. However, I have defined my font as Lucida Sans Unicode and the j with caron displays correctly in my preview window.

The next lesson I need to learn is how to close the font designation as it is skewing other font display throughout my blog. Quite the challenge. I am working on it.If you ever see empty boxes in my blog, other than those I acknowledge, please let me know and tell me what OS and browser you are using.

Addendum 2:

I have continued playing with the font on this page so that I can display the Mongolian term for square script properly without changing the font for the whole page.

Hôt Cuisine

I drove by a restaurant with this name yesterday and read the sign Hôt Cuisine as Haute Cuisine, and thought about how different Hot Cuisine sounds from Hôt Cuisine. When I googled the restaurant, it was listed as Hot Cuisine on the internet, and serves spicy Eastern dishes. Now I am puzzled, I thought surely this was a play on Haute Cuisine, (probably is) but it works better with the menu if you say Hot cuisine. Hmm. On the other hand maybe it isn't related to French at all and comes from some other language.

Now I really don't know how to pronounce it. I shall have to set up an experiment and say to a friend, "Have you ever eaten at that restaurant at such and such a corner that serves Malaysian food and so on, what is its name, do you remember?"

PS I am really losing it. Yesterdays post was supposed to be titled 'Peter Boodberg'. There is nothing else for it. I will have to read the article again, find a new quote, short and sweet this time, and post it with a link to yesterday's post. What a mess.

Thursday, September 29, 2005

Pinyin Accents


I shall have far more respect for accents now that I find that I can display them properly at least some of the time.

Mark from Pinyin News has sent me an article by Peter Boodberg. I have been absorbed in trying to understand at least some of it. His language is wonderful. He uses terms like 'the relation of graph to vocable,' and 'the living tissue of the Word' and 'graphic enteguement.'

If you read nothing else please read the last paragraph cited below. This is part of a classic debate that can be pursued further on the many pages of If I am not here I am probably over there. Reading.

"The investigation of the corner-stone problem of Chinese epigraphy, the relation of graph to vocable, has indeed been rather retarded than advanced by the new finds. Most students in the field have chosen to concentrate their efforts on the exotically fascinating questions of ‘graphic semantics’ and the study of the living tissue of the Word has almost completely been neglected in favour of that of the graphic integuement encasing it.

It is in the hope of dispelling this fog of misunderstanding that the writer presents in the following pages for the consideration of Sinologists a few hypotheses on the evolution of ‘sound and symbol’ in archaic Chinese, hypotheses that have in view the preparation of the ground for the discussion of this all-important problem.

Pictograms [graphic representations of natural objects] and symbolic signs do not constitute in themselves Graphs, i.e. elements of a written language. In order to become such, they must be conventionally and habitually associated with certain semantic-phonetic values.

Apart from a few exceptional cases, then, ‘ideographic’ characters as a class, we make bold to assert, simply do not exist. Those characters which appear to be such in the later forms of the script are predominantly ‘learned’ creations of Chinese schoolmen, graphical modifications of either original pictograms and symbols or perverse rationalizations of ‘organically’ developed phonetic compounds."

Some Proleptical Remarks on the Evolution of Archaic Chinese. Peter A. Boodberg. Harvard Journal of Asiatic Studies vol.2, No. 3/4 (Dec. 1937) 329-372

Henrik Theiling's Script Teacher

Here is an application to help you learn the Latin alphabet equivalent for certain scripts. I don't mind showing the world how much Bopomofo I know - absolutely none. But that may change now that I have this website added to my favourites.

Here is Henrik's announcement in qalam,

"Currently, you can learn Hiragana, Katakana, Hangul, Bopomofo, Kangxi-Radicals, Hebrew and Armenian.

More scripts will follow. Besides natscripts, there is a section with conscripts currently featuring Kamakawi's script Kavaka i Oala.

Any comments or requests are very welcome!


Natscripts!!! I love that. Thank you, Henrik.

Wednesday, September 28, 2005

September 26

Okay, I missed it but here is one more beautiful poster of scripts.


There are two particular glyphs that I have been thinking about for a long time.

(circumflex removed because it does not display)

U+5341 : CJK UNIFIED IDEOGRAPH-5341 : shí

In spite of appearances this has absolutely nothing to do with Unicode and display issues. The first is a character not an empty box; it is a Han character. These characters have the pronunciation of kou3 (meaning mouth) and shi2 (meaning ten). They are not soundless symbols representing ideas.

(In the preview window it becomes immediately apparent that 口kou3, above, is not an empty box since I have enlarged it by one size, but in my compose window it is still indistinguishable from an empty box. So that is a minor display issue with a happy ending.)

I hope that representing them to myself by sound, as well as meaning, will help me to think about these characters more efficiently. That is, I will be able to think of kou3, rather than about 'the character that means mouth' . Basically, I don't want to think about their meaning but about their shapes as glyphs. I would rather not refer to their meaning every time I mention them. But they have to have an associated sound or they can't be read and discussed. Period.

If I want to think about the letter 'a' I can do so, without thinking about an 'alligator'- if I am over five years old, of course. Well, the same for Chinese. This is about kou3 and shi2. That's it.

While I am building for myself a way of thinking about scripts, it seems worth reminding myself that Chinese characters can be called 漢字 Hanzi. 漢 Han for Han Chinese, I won't go further with that one, and 字 zi4 meaning letter/symbol/character/word.

I have just finished picking up a few ideas after reading some articles on reading in Chinese and I am testing out the notion that in reading, the first and strongest association is between graph and a unit of sound, (even if the sound is not immutable). Maybe this will help me remember these letter/symbol/character/word, these zi4. I am retraining my neural pathways.

I am not going to digress into reading theory; I will return to the shapes of these glyphs and how the notion of a square frame and a quadrant have influenced writing over the centuries, here and there. This is not some great theory of universals, just a collection of details, as I find them.

Addendum: I am getting some discrepancy between the posted display and the preview window. I will try not to be distracted by this.

Tuesday, September 27, 2005

Father and Son

I started a free translation of the passage in the preceding post using what might be called dynamic equivalence or a cognitive translation. However, I found that as the references became metaphorical it was hard to maintain the contemporary style.

This is Haimon talking to his father, Creon, in Antigone by Sophocles. Some things never change.

"You think you are right all the time,
That you are more articulate and more intelligent than everyone else,
But in fact you are pretty shallow,
If you were really smart you would be willing to learn more,
And not care so much about holding a tight rein all the time.”

Here is a 1954 translation by Elizabeth Wyckoff.

"Whoever thinks that he alone is wise
Ηis eloquence, his mind, above the rest,
Come the unfolding, shows his emptiness.
A man, though wise, should never be ashamed
Of learning more, and must unbend his mind.
Have you not seen the trees beside the torrent,
Τhe ones that bend them saving every leaf,
While the resistant perish root and branch.
And so the ship that will not slacken sail,
The sheet drawn tight, unyielding, overturns,
She ends the voyage with the keel on top."

Antigone translated by Elizabeth Wyckoff, in Sophocles I, ed. David Grene and Richmond Lattimore, U. of Chicago Press. 1954.

Free translation, appearing first, produced with the help of Liddell and Scott, 1875, a tattered old copy of an edition intended for use in American schools "in preparation for college."

Polytonic Greek Fonts

Here are the next three lines of the passage from Antigone by Sophocles with no defined font.

ὁρᾷς παρὰ ῥείθροισι χειμάρροις ὅσα
δένδρων ὑπείκει, κλῶνας ὡς ἐκσῴζεται,
τὰ δ᾽ ἀντιτείνοντ᾽ αὐτόπρεμν᾽ ἀπόλλυται.

Tahoma is the defined font:

ὁρᾷς παρὰ ῥείθροισι χειμάρροις ὅσα
δένδρων ὑπείκει, κλῶνας ὡς ἐκσῴζεται,
τὰ δ᾽ ἀντιτείνοντ᾽ αὐτόπρεμν᾽ ἀπόλλυται.

Microsoft Sans Serif is the defined font:

ὁρᾷς παρὰ ῥείθροισι χειμάρροις ὅσα
δένδρων ὑπείκει, κλῶνας ὡς ἐκσῴζεται,
τὰ δ᾽ ἀντιτείνοντ᾽ αὐτόπρεμν᾽ ἀπόλλυται.

Palatino Linotype is the defined font:

ὁρᾷς παρὰ ῥείθροισι χειμάρροις ὅσα
δένδρων ὑπείκει, κλῶνας ὡς ἐκσῴζεται,
τὰ δ᾽ ἀντιτείνοντ᾽ αὐτόπρεμν᾽ ἀπόλλυται.

The first section with no defined font does not display properly in my browser, IE, but the rest do. At work my post from yesterday did not display properly although it was in IE on WinXP. I believe the full WinXP was not installed since it is also missing the character map. I cannot go to our tech support at work and complain that my browser does not display classical Greek. Too bad.

A translation of this passage from Antigone by Sophocles will be in my next post.

Addendum: The translation is in the next post titled Father and Son

Monday, September 26, 2005

Polytonic Greek

ὅστις γὰρ αὐτὸς ἢ φρονεῖν μόνος δοκεῖ,
ἢ γλῶσσαν, ἣν οὐκ ἄλλος, ἢ ψυχὴν ἔχειν,
οὗτοι διαπτυχθέντες ὤφθησαν κενοί
ἀλλ᾽ ἄνδρα, κεἴ τις ᾖ σοφός, τὸ μανθάνειν
πόλλ᾽, αἰσχρὸν οὐδὲν καὶ τὸ μὴ τείνειν ἄγαν.

I installed my Polytonic Greek keyboard tonight. This is the first keyboard and language support that I have installed on this computer. I don't have too much trouble using it. There are dead keys for the accents and spirits (breathing marks) - more about this keyboard later. My first objective was to make sure that it would display properly.

I went to the Better Bibles Blog and read very carefully. The first rule is that one must use precomposed Greek glyphs for accented and marked vowels. In order to do this you have to use the fonts that have them - Palatino Linotype is a common one but Tahoma is a bit more modern looking, I guess. These fonts are already in WinXP and I hope most people have them.

So Polytonic Greek text must be inputted in Palatino Linotype, Tahoma or Microsoft Sans Serif. I settled on Microsoft Sans Serif. Now blogger doesn't offer any of these fonts as an option but if the Better Bibles Blog can display Polytonic Greek then I guess I can. So I dragged out an old HTML guide and looked up fonts. I found that fonts can be described thus. Well, I can't do it, can I? but anyway I did type the HTML for the Microsoft Sans Serif font write into the compose window with the correct code for defining a font, and then tried out the other fonts as well. It worked.

I have posted a few lines from one of my favourite plays. Somehow the dialogue in this play always sounds as if it was written yesterday, especially when the son says to his father "You wish to speak but you never wish to hear!" Some things never change. I don't have a very philosophical taste in Greek literature - give me a good family quarrel. I don't really like David Grene's translation of the section I have posted - maybe someone will give me a better one.

I really have learned something today that I thought I never would. Thanks, Simon, for your comment. Now I will check out your website.

Addendum: The topic of this post continues in Polytonic Greek Fonts and Father and Son

Sunday, September 25, 2005


It's neat to see that Roman Numerals have their own codepoints.

I have to post this before I forget it. I have no story prepared to go with this post. It is about wanting to find out what script you are looking at even if you are in an internet café or visiting friends and relatives, and not on your own computer, or don't even own your own computer - maybe you are at work and just have to know that character.

Go to this page Decimal, Hexadecimal Character Codes and paste in the character, then convert it to a hexadecimal codepoint. Then type the hexadecimal codepoint into this page The Unicode Character Code Charts By Script. This page will guide you to the appropriate code chart and you can then discover what you were looking at. In my case I was staring at a blank space. I found that U+0020 was a SPACE character and that there were several other space characters. Imagine different shapes and sizes of blank space. (Actually I do know what some of them do.)

3rdpageSearch makes these nifty multilingual input and search tools. Great! These pages by offer clickable input for the entire Unicode range.

On the other hand don't open these pages at work - they are way too much fun, and these guys/girls are really into soccer/football!

Addenda: I have added an image since it was part of how I wrote the original post but blogger would not accept images at the time that I was posting.

Character Matrix

When you find something of striking beauty you know it is meant to be shared. Here is a guide to the Character Matrix and the homepage of Brill Academic Publishers.

Saturday, September 24, 2005

Finding Syriac

More from the Wikipedia language page. Or further adventures of Suzanne in Wikipedialand.

This is, in fact, the reconstituted pieces of what was intended to be my initial post about BabelMap. I confidently perused the list of languages in Wikipedia and identified one which I was interested in - I thought it might be in Syriac, another script found on Quangzhou tombstones.

However, I wanted to confirm that this was indeed Syriac. The language code is 'arc' and the language is labeled in English as Aramaic.

I copied the language name, which I supposed to be Aramaic in the Syriac script, into BabelMap's main page edit buffer. I then clicked the mode button to convert the characters to NCR hex (numeric character reference, hexadecimal) and the edit buffer displayed the codes.
#x0715 #x0725 #x0712 #x072A #x0738 #x071D #x071B #x0020 (I had to remove the semi-colons to make sure that the codes would display here rather than the characters themselves.)

I then typed the first code number '0715' into the Go to Code Point window and there it was U+0715 Syriac letter daleth. It was definitely Syriac script.

So far no bells went off. However, I persisted and managed to transliterate this name as 'daleth e beth rish zlama yudh teth'. 'debrit' hmm. Maybe 'devrit'. This did not look good. I could not post this as Aramaic without further investigation.

However, Wikipedia is bountiful, pretty amazing actually, and I was able to identify Aramaya and Syriac below. By cross-checking with the codepoints I feel confident that these are accurate.

Aramaya ܐܪܡܝܐ
Suryaya ܣܘܪܝܝܐ

I confirmed that Aramaic is indeed called Aramaya here and here,

"Greeks had called Aramaic by a word they coined, 'Syriac', and this artificial term was used in the West, but never in the East, where it has always been known by its own name 'Lishana Aramaya' (the Aramaic language.) Paul Younan

So what does the Wikipedia page say? By reading the discussion page I find that Gareth Hughes has made a valiant attempt to figure out what the language name says. He suggests that it must be 'in Hebrew' misspelled. Gareth is one of the authors of the Syriac alphabet page.

D'ivrit? ܕܥܒܪܸܝܛ
‘Ivrit עברית That is Hebrew alright.

The language name in Syriac script does ideed look like a transliteration of the word Hebrew. So is Aramaic really Hebrew written in Syriac script? With respectful reference to the webpages cited above, I do believe that Aramaic is called Aramaya ܐܪܡܝܐ and that is what the Wikipedia language list ought to record.

However, some other unidentifed correspondent on the discussion page still labours under the misconception that it already does say Aramaic.

I originally thought that this would be an easy post leading back to ancient Chinese monuments but they will have to wait. For now, I am relieved that I did not simply copy and paste an unidentified word from Wikipedia into my blog without cross-checking in BabelMap. Thank you BabelMap.

If anyone has another explanation for what apears in Syriac script in the Wikipedia language list, I would be happy to hear it.

It also occurs to me that I now know how to find the author of a Wiki page and I could check that first and occasionally use a Wiki page with discretion.

Addendum: Authors names are few and far between in Wikipedia - I just hit it lucky with Gareth.

Friday, September 23, 2005

James Evans

Our garden near the Hudson’s Bay
Produced much more toil than pay
Potatoes thrive if they don’t freeze
And sometimes grow as large as peas

In their Own Voices

Here are a few little known details about James Evans. He was born in England where he apprenticed to a grocer, learning British shorthand at that time, circa 1820, well before Pitman's shorthand system was published in 1837.

He came to Canada at the age of 21 because his family had recently moved here. He became a school teacher, then married, and was later converted to Methodism. In 1833 he was ordained in New York City as a Wesleyan Methodist minister and, after that, became a minister to an Ojibwe congregation in Ontario, where he worked with Peter Jones.

It was after his ordination that the Wesleyan Church established a Canadian conference and in 1840 the Hudson Bay company agreed to allow Wesleyan ministers in their territory.

"In 1838, the Canada Conference sent him on a tour of the north shore of Lake Superior. In 1839 he met Governor George Simpson of the Hudson's Bay Company, who in January 1840 agreed to support Methodist missionaries, named by the Wesleyan Methodist Missionary Society in Britain, in its territory." Victoria University, Toronto

James Evans then moved north to Norway House, Manitoba, where he implemented his syllabic writing system.

More here:

Dictionary of Canadian Biography online
Manitoba Historical Society

Thursday, September 22, 2005

Is Syllabics an Abugida?

In early August I noticed this new webpage at Chris Harvey’s Native Language, Font, and Keyboard Page –

Chris begins,

"I have seen, with increasing frequency online, statements proclaiming that Canadian Syllabics (often under the name “Cree Script”) is an Abugida. I would like to list some reasons for why this may not be an accurate description of the writing system."

Here is the offending statement from Wikipedia,

"Canadian syllabic writing schemes are for the most part abugidas, where consonants are always marked in a manner which implies a specific vowel."

After a lengthy discussion on the topic, where he argues passionately that Syllabics is not an abugida, Chris concludes,

"I would appreciate comments from others interested in this subject."

Feeling his pain, I promptly emailed him the following information, which he has posted. I add my further reflections in brackets.

"Peter T. Daniels, who invented the term abugida, calls Cree a ‘sophisticated grammatogeny’, certainly not an abugida. (I asked him this question myself!)

W. Bright and Robert Bringhurst have labeled syllabics an ‘alphasyllabary’.

(I made an error here - it should read 'RB has labeled syllabics an "alphasyllabary", a term coined by WB.')

John Nichols has called it both ‘syllabics’ and a ‘mixed alphabet and syllabary’. (cautious man)

James Fevrier and Marcel Cohen, writing in French, developed the idea of the ‘neosyllabary’, or ‘secondary syllabary’. They also use the term 'alphabet–syllabaire'. (Very interesting books by these authors)

Henry Rogers, in a new book this spring on writing systems, is calling syllabics ‘moraic’. (not good )

I have personally tried out the term ‘compositional syllabic notation’, a term I picked up from an Indic writing systems group." (how about 'systematically composed syllabary'?)

Chris replied to my email,

"This is of course the great danger of Wikis and such. .... It ends up being like an urban legend, that spreads quietly until it becomes common knowledge. When I have free time, I think I'll rewrite the Wiki page. "

Chris is a very busy person and may not have time to rewrite the wiki page. Aside from the use of the term 'abugida' , there are many other peculiar details on this page. The Pitman shorthand reference here is another urban legend. Sigh.

Addendum: I just remembered that Unicode version 4 labels Syllabics a 'featural syllabary.' I have no idea what the logic or justification is for this term, nor have I ever heard it used before.

Addendum 2: I then asked the question about 'featural' in qalam and got this reply from Michael Everson.

"I said:"If you think about it you might suppose that it must have been because someone thought that regular rotations and superscription of base characters was a regular way of indicating relationships."

This is just me trying to interpret what "featural" might mean if applied to Syllabics. I did not apply the term to it. I don't think it is a particularly useful term with regard to the taxonomy of writing systems."

Well, there it is - maybe definitions are best left alone.

Wednesday, September 21, 2005

Tamil from Tranquebar

I was talking about early use of the printing press in India recently and how Tamil, recently recognized as a classical language, was the first script of India to be used in print. Here is Genesis published in Tranquebar in 1723. "The translation was probably commissioned by the Danish State Church." Wikipedia

"This New Testament in Tamil was the first to be printed in any of the languages of India. It was translated by Bartholomäus Ziegenbalg (1682-1719) and the text was revised with the help of Johann Ernst Gründler (1677-1720). .... By the summer of 1714, the Gospels had been completed but the large typeface that had been used meant that the supply of suitable printing paper was running low. A new, smaller typeface was cast and the complete New Testament, which was dedicated to Frederick IV, was issued in 1715." From "The Missionary Bible"

While Tamil was printed by the Danish as early as 1715, I have read elsewhere that the Portuguese Jesuits had a printing press for Tamil as early as 1578.

"TAMIL types had been used to print Doctrina Christam in Coolegio do Saluador at Cochin in 1578. Some years earlier in Lisbon, a Cartilha, or Christian Catichism, had been translitereated and printed in 1554. Those are known facts." Early Madras-Printed Tamil Books

Further discussion is found here.

"Though the Jesuits began to set up printing presses in several parts of Portuguese held India, trying their hands, with varying degrees of success, on the Kannada and Devanagari scripts, they did not succeed in establishing the idea of printing firmly on the subcontinent and toward the middle of 17th century all their efforts came to an end. Fifty years later in 1711, Bartholomaus Zieganbalg persuaded the society for promoting Christian knowledge in London to send a further Portuguese printing press to India and soon afterwards he was able to obtain a set of ' Malabari ' letters from Germany. From then on printing seems to have progressed steadily in India."

"European Missionaries and the Study of Dravidian Languages " (Notes on some books and manuscripts held in British Museum) Albertine Gaur, Assistant Keeper, Department of Oriental Printed Books andManuscripts, British Museum, London, UK. N.B. This webpage is published by the Tamil Heritage Foundation.

I find it interesting to note that in 1723 there were no word divisions in the printed Tamil text. It would be interesting to find out who introduced that convention to Indic scripts. Thai is still written without word divisions.

Caveat: I have recently used Wikipedia as a source of images in the public domain and as examples of multilingual electronic text. However, any discussion of writing systems in Wikipedia is so sprinkled with imprecision (to put it politely) that I would never recommend it as an academic reference.

Tuesday, September 20, 2005

"Insert in Plain English"

Romanized keyboards are not always unwelcome. View this Tamil search engine which instructs the user to "Please insert/type your phrase in plain English. "

I tried 'Microsoft' since it is a word I can easily recognize. மைக்ரோசாப்ட் Now how shall I write this in 'plain English'?



So why is roman or Latin alphabet input so popular? I was thinking about it and said to myself "I bet I can think of a dozen reasons offhand." Well, I am now going to respond to my own challenge.

You would rather use English QWERTY input on the keyboard for your own non-roman script if:

1. You learned to keyboard first in English.
2. You learned to keyboard before your script was encoded on the computer, so you used a transliteration instead of your own script.
3. There is no standard keyboard layout for your script.
4. It is cheaper and easier to buy a QWERTY keyboard.
5. The keyboard layout for your own script has changed with the new encoding.
6. You don't like using the shift key.
7. You can keyboard many different scripts with one keyboard layout, if you keyboard all the scripts by their romanization.
8. You find that there is more content to google in English.
9. You need English anyway for your job.
10. All other keyboard layouts for your own script are really awkward.
11. You travel a lot and use internet cafés.
12. You need to actually see the letters on the keys to type and you don't have a customized keyboard.

Um, not all completely different, but - how am I doing?

I have no problem with QWERTY input for any script. However, I don't think it is safe to assume that QWERTY should be considered universal input and acceptable as the only form of input for a non-alphabetic writing system.

The Cherokee keyboard

Paul sent me this screen shot of the Cherokee keyboard for Macs. (As always you must click on the image to enlarge it.) Here is his description.

"Here's how the built-in OS X Cherokee keyboard works: all four rows of the keyboard (including the number keys) type Cherokee characters, and holding shift gives you a different set of Cherokee characters. I counted 86 glyphs all together. If you need to type a numeral, Command-number key gives you the corresponding numeral (instead of a Cherokee character)."

I am very glad to have this image since I couldn't view the Mac keyboard otherwise. Here is a previous email which I wasn't able to follow up.

"Apple ships my Syllabic and Cherokee syllabic keyboard layouts. QWERTY and otherwise."
Michael Everson * * Everson Typography * *

So the 86 symbols of Cherokee fit on the keyboard and are input as syllables, one keystroke for each syllabic character. That is a glyph-based syllabic keyboard. Canadian Aboriginal Syllabics also has syllabic keyboards, but with fewer characters that is not so difficult.

The full Cherokee keyboard layout can be viewed here.

Monday, September 19, 2005

Bangla in BabelMap

Last week I was using the windows character map to find all the Indic scripts bundled with Windows XP by selecting any font that sounded Indic and trying it out. However, I missed one. The Vrinda font for Bengali.

I found Vrinda with BabelMap, a unicode character map for windows. While BabelMap's homepage displays the main window of this application, I have chosen a screenshot of the font analysis utility because, ahem, it took me a a few tries to find this. (It is in the tools menu, for anyone like me who doesn't know to open the tools menu first off.)

Click on this image to enlarge it and look in the checked dialog box for "List all fonts that cover this unicode block" ( a unicode block is usually a writing system). I tried Bengali right away because the blocks are listed in alphabetical order, and found that Vrinda was the font for that block; the Bengali characters displayed immediately in the sample text box.

By cross-checking with Alan Wood's Unicode Resources page for Bengali here I could see that I had complete Bengali support. Out of curiosity I went back to the main page where I selected the Vrinda font in the bottom right hand corner and selected Bengali under Unicode Block on the left above the edit buffer. The entire Bengali block appeared in the grid and I was able to see that the empty boxes represented 'reserved' spaces and not missing characters.

Indic scripts are quite intricate so I found the magnified character which appears on the right click a very attractive feature. Here is Bangla in BabelMap, main window, with U+09B2 : BENGALI LETTER LA magnified.

I was able to identify a few more writing systems and their fonts in BabelMap also but this is enough for tonight.

It wouldn't have been so difficult but I am working on a new computer which still lacks my favourite image editing software so these screenshots were produced using paint. It worked - just took a little longer than I thought.

Addendum: Not the fault of paint. It took me a while to find the resize button. Don't forget I am the same person who didn't know to open the tools menu without prompting.

Sunday, September 18, 2005


On the Unicode Mail Archive today there was a post on dead keys.

From: Jukka K. Korpela
Date: Sun Sep 18 2005 - 01:58:43 CDT

"Dead keys are an important practical problem. People have difficulties in learning to use them. People may have used computers for many, many years without ever realizing how they can use dead keys to type letters with diacritic marks. They have just wondered why typing "~" or "^" behaves somewhat oddly, in a delayed manner. "

There was a very interesting and informative thread on dead keys on the A12n-collaboration list last July which I passed on at the time but definitely worth reading the pros and cons presented there. Actually this is a post that has been sitting on the back burner since July and I am just getting around to it.

Subject: [A12n-Collab] Key order in combining diacritics (Re: Font companies serving academia/linguists)
From: Don Osborn
Date: Mon, 4 Jul 2005 18:54:18 -0500

"Interesting how things have changed - in handwriting the accent always is added last and even on typewriters you have to backspace to overstrike something. Actually when dealing with new computer users in languages that use accents and tone marks, it would be more intuitive to have either a single keystroke or the diacritic added after. And in fact perhaps simpler for the user, at least at first, to have the latter approach."

Subject: [A12n-Collab] Re: Key order in combining diacritics (Re: Font companies serving academia/linguists)
From: Chris Harvey
Date: Tue, 05 Jul 2005 09:57:09 -0400

"I'm not sure I'm happy about Microsoft's (or anyone else's) Technology dictating how languages ought to be typed. The technology should be developed to implement what people want, not the other way around. Other key layout developers (Keyman on PC, XML keylayouts on Macs) give much much more flexibility. Furthermore, MSKLC is often not very useful for non-alphabetic scripts.

Dead key arrangements "are" an option, just not with MSKLC. As to which key-order is the most intuitive, that depends on the user. I've heard arguments/preferences both ways. If you think of accented capitals in Greek, the accents go to the left of the capital, so perhaps typing the accent key first makes sense. Personally, I agree with all of you, I prefer to type the accent after the base letter, especially for languages like Vietnamese, Kaska, Han, Tutchone which stack accents. "

I have some sympathy with Chris Harvey, a very experienced keyboard designer, but, on the other hand, dead keys! I can't say little enough about them myself. There is an interesting array of perspectives in these threads in any case.

For further discussion on dead keys see Mike's post Dead keys are not intuitive , Friday, December 17, 2004.

Addendum: If Chris Harvey says he needs deadkeys to create keyboards for his clients, then I believe him. Take a look at this page of keyboards.

Saturday, September 17, 2005

Quangzhou Tombstones

Tombstone with cross on lotus flower and inscription in Phags-pa, photo by Ken Parry.

"Christian Angels on the South China Coast is an historically significant photographic exhibition featuring photos, never before seen outside China, of Christian tombstones of the Mongol Period from Quanzhou in South China.....The tombstones are unique and their significance lies in the fact that they provide evidence of a multicultural society in Quanzhou in the 13th and 14th centuries."

I was entranced by Andrew West's comment about Phags-pa,

"I was in Quanzhou (Marco Polo's Zayton) earlier this year, and the Quanzhou Maritime Museum has the most amazing collection of gravestones and architectural artefacts dating from the 13th and 14th centuries, when Zayton must have been the one of the most cosmopolitan cites in the world. In addition to a complete Hindu temple, a Tamil inscription dated 1281, and hundreds of gravestones inscribed in Arabic, Syriac and Uighur, there are a number of Christian tomb stones. The most important of them, in Latin script, was that of the 3rd bishop of Zayton, the Italian Andrew Perugia (Andreas Perusinus), dated 1332. However the gravestones for ordinary Christian Chinese had inscriptions written in Chinese using the Phags-pa script. These stones are a rare example of Phags-pa being used for private use rather than offical purposes, and I don't know of any non-Christian tomb stones or memorial stones that are written using the Phags-pa script."

I was trying to recount this story to a friend and, not knowing exactly where Quangzhou was, I had to read further. That is how I came on this radio program City of Light.

"Nearly all the Christian tombstones from South China are of the late medieval period, and they’re almost exactly contemporary with Marco Polo’s visit to China. And they are all early to mid 14th century..... They were all found in or near the medieval city port of Quanzhou near the modern city of Xiamen which is the capital city of the province. It’s just on the Taiwan Straits, as the nearest bit of China to Taiwan..... The Christian community in medieval Quanzhou came mainly from Central Asia, and by that time the Silk Road used a mixture of languages.

Well the first thing that’s apparent is the use of the cross, and of course this is unique to the Christian world. But the very interesting thing is that we find in the iconography is the cross is supported on a lotus flower, and we first find this actually in China dating back to the 8th century. So we find that on these tombstones, the iconography relates actually to earlier Christian iconography in China, and shows quite clearly I think, a continuity between the earlier and the later period."

Addenda: Visit Andrew's new webpage 14th Century Christian Tombstones from Quanzhou.

Thursday, September 15, 2005

A Vai Keyboard

I recently had a discussion with Michael Everson about the Vai keyboard.

It seems only equitable, however, that the Vai should be able, as the English are, to sit down and keyboard their own set of visual glyphs, transferring a visual image from mind to screen.

I will develop a QWERTY-based keyboard layout, because that is what they will have on their hardware, and as they are all familiar with the Latin alphabet (English being the official language of Liberia).

So the Vai are to have alphabet input for their syllabary. No glyph-based input method has been seriously considered. Michael felt that they would be happy to use the Latin alphabet to input their syllabary but I wasn't so sure. I thought it would be interesting to see what the Vai actually think so I emailed Tombekai Sherman, a Liberian who was consulted for the Vai Unicode Proposal.

This is the response from Tombekai Sherman:

"The indigenous Vai does not want to deal with English alphabets. They have rejected it up till now. The finding that those who learn to read first in a syllabic script, find it difficult to accept phonetic processing of the syllable is also true for Vai. It takes the average Vai about three months to become literate using the script. Using the English alphabet could take years."

And this is why I wonder if alternate input could not be developed for Vai. Of course, the focus of my blog is to talk about how glyph-based, or visual input can be made available for all scripts that are used as first scripts by any literate person.

I don't mean anything too complicated by "alternate input." A customized character palette or character picker that presented the characters in a familiar and traditional layout would be a good start. I am trying to imagine a 30 by 7 grid across the bottom of the screen where characters could be selected by the mouse and clicked in.

Wednesday, September 14, 2005

Character Map Windows XP

I have been having some fun with the character map in Windows XP by following the directions from PennState. I have been testing out the map and pasting Tamil into first wordpad and then Word 2003. Oh, I mean I have been trying to paste it into W2003. It only puts in one letter at a time and does not combine the characters so that doesn't work. Hm, I thought W2003 was supposed to accept these fonts. Oh well. Try again later.

Next question, how do I know which font to choose to input Indic fonts. Tamil I know - it is the Latha font. No problem. But the others, trial and error. One font at a time. I have picked out all the Indic sounding font names and tried them out. These are my results for Windows XP character Map.

Fonts for Indic scripts in Windows XP.

Devanagari - Mangal
Gurmukhi - Raavi
Tamil - Latha
Teluga - Gautami
Thaana - MV Boli
Thai - Browallia and many more
Kannada - Tunga
Gujarati - Shruti
Malayalam - Kartika

I found this out the slow way - here is the page from Basha India

Now I wonder if this matches what displays in IE? I listed the scripts that didn't display a few days ago. I'll check this with the Wikipedia page of languages by code. Looks like a match, I guess that would make sense.

Later, I will download and install some fonts mentionned in Alan Wood's Unicode Resources page. For now, I want to see what the baseline is.

Monday, September 12, 2005

Thai Input

I am testing some pickers tonight. I have the Thai Input Utility, sorry, it doesn't seem to have another name, open to my left. Richard Wordingham, who made it, says that it works in both IE and Firefox. Marco Cimarosti's book, Non Legitur, is off to one side. I have memorized three Thai words from Marco's book that I can type into this blog.

สปาก็ตตี and วิสกี are two fairly basic items. Marco wanted to know how to write the first word here in all 33 of the scripts in his book, but I had to break the news that it wasn't going to be so easy without some research. Check out Omniglot and see if you can figure out Marco's favourite word.

The third word, โฮเตล, is more cosmopolitan and when posted into google brought me these results . So I know it is spelled properly, that's a good start. I find that inputting Thai is a very visual exercize, one piece at at time - in visual sequence - very different from Tamil.

Here is another Thai picker, and although it actually gives the proper name for each letter I haven't attempted to use it. I have been reading the names of each letter from Marco's book instead.

Here is a very cool website called Learning Thai where I have learned how to spell my name in Thai ซูชาน. More on Thai here.

If you are interested in seeing how children learn to read in Thailand visit this site and run the video. View the interactive chalkboard for each lesson and see all the syllables in detail.

This is Thailand in Thai ประเทศไทย and it will get close to a million hits in google.

PS. Marco just emailed to let me know he has a new webpage about his book, Non Legitur.

Image from Portsmouth EMAS

Sunday, September 11, 2005

Unicode Resources

I am trying to install language support on a new computer so I am seeing a lot of these little boxes.

This is a good time to mention that I have been expanding my sidebar to include a few more resources for working with Unicode. There are now Unicode Resources, Babelstone, many keyboards and pickers, and the home page of the Non Roman Script Initiative. Additional resources and comments are welcome as always.

Mike wore this T-shirt but I am passing on that for now since there is no one in my real life who has ever heard of Unicode. An image of the bumper sticker will have to do me. The T-shirts look rather nice though, organic cotton and all.


More little boxes - John Yunker talks about web globalization and google.

I have been reminded of Wikipedia's home page and the List of Languages orderd by code.

Windows update, which allows one to display Inuktitut, is available here.

Saturday, September 10, 2005

Greek and Hebrew

When I was commenting on someone else's blog last spring I put in a few words in Greek, one of them was 'οορς', so you know it wasn't anything profound. I was later asked how to keyboard polytonic (Classical) Greek, with all the breathing and accents. At the time I had a computer with about 12 different language keyboards installed so I just clicked on my language bar and switched keyboards, a bit cumbersome, but I was working on it. Now I use one of those little input utilities listed in my sidebar.

It is not so difficult to input classical Greek if you happen to be one of those people who can keep track of accents that serve no practical purpose, which I am not. The problem is how to make them display properly in blogger. These are matters beyond my ken. I am following the topic on other blogs.

The Better Bibles Blog posted Blog Experiments: Hebrew and Greek Unicode
Sorting it All Out had a recent post Getting at those Hebrew Vowels.

Each of these posts refers to other webpages which I haven't had time to investigate. The plain truth is that I am happy with Modern Greek and Hebrew orthography standards and have decided I can live without accents and points for now. But one day when I have some spare time on my hands I will return to these pages and mosey around.

Addenda and Errata

Here are some posts that have significant additions and/or corrections added to them. I welcome any such comments.

Genghis Khan
Vai Manuscript 1834
Saki Mafundikwa

Thursday, September 08, 2005


Phagspa Script from
This is the script invented for Khubilai Khan in 1269 and used by the Mongol Empire for 100 years.

This reflects the bureaucratic nature of the script.

"In 1278, Khubilai decreed that Pags-pa should replace Uighur on the metal tablets that served as passports in the Mongol empire, identifying officially authorized travelers and mandating their safe passage and supply. Pags-pa was sometimes used in the official seals stamped on paper money, which circulated throughout the empire. Europeans regarded both the passports and the money as exotic curiosities. Pags-pa is written vertically, and is now often called "square" or "quadratic" script after the shape of its letters."

This paragraph refers to the use of the script in an artistic cross-cultural and religious context.

"In a 1306 illustration of the Robe of Christ in Padua, the robe not only was made in the style and fabric of the Mongols, but the golden trim was painted in Mongol letters from the square Phagspa script commissioned by Khubilai Khan. ... Old Testament prophets were depicted holding scrolls open to long, but undecipherable, texts in Mongol script. The direct allusion to the writing and clothing from the court of Khubilai Khan showed an undeniable connection between Italian Renaissance art and the Mongol Empire." Genghis Khan. Jack Weatherford.

Andrew West's Babelstone gives the purpose of the script in its original context. There are many authentic examples of Phagspa at this site and a detailed list of documents and historical references.

Addendum: Bibliographic reference from Weatherford's book.

Tanaka, Hedemichi, "Giotto and the influence of the Mongols and Chinese on his Art: A new analysis of the Legend of St. Francis and the fresco paintings of the Scrovegni Chapel " Art History (Tohuko University) vol.6 (1984)

Tanaka, Hedemichi, "Oriental Scripts in the Paintings of Giotto's period." Gazette des Beaux-arts Vol. 113 (January - June 1989)

Other Links:

The Scrovegni Chapel, Padua
Artistic Exchange: Europe and the Islamic World at the National Gallery of Art, Washington

I looked at the frescoes in the Scrovegni Chapel online but the definition is not clear enough to see the detail. Any more information on this would be welcome.

Wednesday, September 07, 2005

Genghis Khan

Genghis Khan: And the making of the Modern World. by Jack Weatherford.

Granted that the subtitle is both a cliché and an overstatement, this is still a great book to expand one's knowledge of history and culture from a western and renaissance perspective to a wider outlook. I thoroughly enjoyed this book and it helped to provide the background for other books I have read recently, My Name is Red and The Kite Runner.

"The Mongol court maintained scribes not only for the Mongol language but also for Arabic, Persian, Uighur, Tangut, Jurched, Tibetan, Chinese, and lesser-known languages; still, they experienced perplexing difficulties with the variety of languages. With only their Mongol-Uighur alphabet, the Mongols found it difficult to record all the administrative information they needed from their vast empire. In everyday administration, clerks had to be able to spell names as diverse as those of Chinese towns, Russian princes, Persian mountains, Hindu sages, Vietnamese generals, Mulsim clerics, and Hungarian rivers. Because the subjects of the Mongol Empire used so many different languages, Khubilai Khan attempted one of the most innovative experiments in intellectual and administrative history. He sought to create a single alphabet that could be used to write all the languages of the world. He assigned this task to the Tibetan Buddhist lama Phagspa, who in 1269 presented the khan with a set of forty-one letters derived from the Tibetan alphabet. Khubilai Khan made Phagspa's script the empire's official script, but rather than force the system on anyone, he allowed the Chinese and all other subjects to continue using their own writing system as well in the hope that the new script would eventually replace the old by showing its superiority. Chinese scholars felt too attached to their own ancient language to allow themselves to be cut off from it by a new, and obviously barbarian, system of writing, and most subject people eventually abandoned the mongol writing system as soon as Mongol power waned." p. 205

"He sought to create a single alphabet that could be used to write all the languages of the world."

I distinctly remember that last week I read that Bell's Visible Speech was "the first system for notating the sounds of speech independent of any particular language or dialect." But was it really? Neither Phagspa nor Bell's Visible Speech became a permanent writing system. However, the idea of a universal phonetic writing system was common to both of them. And they were both invented at a strategic point in the history of an Empire, either the Mongol Empire or the British Empire. So rather than trust the Eurocentric assessment of Bell's Visible Speech and its place in history, I prefer to remember that in the 13th century the court of Khubilai Khan was coming up with similar solutions to similar problems.

For further information on Genghis Khan visit Genghis Khan on the Web by Tim Spalding.

Addendum: Thanks for this reality check from Andrew West of Babelstone

"A little misleading; more realistically the Phags-pa script was intended to be a national script for the Mongolian empire that could be used for writing both Mongolian and the other major languages spoken throughout the Mongolian empire. However, unlike Visible Speech or IPA, Phags-pa is not a generic writing system that can be applied to any language regardless of its phonetic makeup. It had a fixed set of 41 letters specifically intended for writing languages such as Mongolian, Chinese, Uighur and Persian, but it has no inbuilt mechanism for representing sounds not found in these languages. Thus, when Phags-pa was later used to write Sanskrit, a number of new letters had to be devised to represent the Sanskrit series of retroflex letters. So although the script was designed for writing multiple languages, it is not a language-independent script, and cannot be considered a "universal phonetic writing system".

In the edict promulgating the "new Mongolian script", as the Phags-pa script was known, Khubilai Khan explicitly notes that nations such as the Jurchen (Jin dynasty), Khitan (Liao dynasty) and Tanggut (Xi Xia dynasty) all had their own unique scripts reflecting their national identity, whereas the Mongolians, under Genghis Khan, had borrowed the ill-fitting Uighur script. This Uighur-derived script, which we now see as being quintessentially Mongolian, was seen by Khubilai Khan as a second-hand borrowing, and it was considered a matter of national shame that the Mongolians did not have their own unique script as other nations did. I believe that this was the real motivation behind the creation of the new script, not the desire to create a universal script that could be used for all languages."

Tuesday, September 06, 2005

Istanbul's Book Bazaar

"Printing machines came to Istanbul in 1729, reducing the number of handwritten books. The Sultan prohibited the printing of religious books, in order to preserve the art of calligraphy. But calligraphic art still diminished, sharply decreasing the number of hand-illustrated, handwritten books. Today books come to the market from the estates of deceased people, as they have for centuries. Fascinating auctions are held regularly. Anyone may attend and all the bookstall owners have schedules of the auctions. In the past, book sellers had guilds. Shopping was done according to religious rules; shops opened and closed with prayers. The book dealers' guild started with Abdullah Yetimi. Guild members were privileged to participate in an annual parade at the palace, where second-hand books were displayed for the Sultan."

By Jerri Clark Kirby (scroll down to the third article.)

The Old Book Bazaar is best seen in this photo.

This quote comes into focus as one reads the last page of My Name is Red. I really had to look this up to find out more about the decline of calligraphy after the 16th and 17th centuries. The prohibition on printing religious books is also something I had heard about recently.

Japanese Thumb Shift Keyboards

What is thumb shift? Yesterday I looked at the Typematrix keyboard which had moved the enter, tab and backspace key from the outside to the center. This keyboard moves the shift key to the center where it can be accessed by the thumb rather than the baby finger.

Here is a description of the thumb shift keyboard.

"Thumb shift tries to eliminate the irrationality described above. How was it made possible? The engineers at Fujitsu placed two keys (the taller ones in the picture) for the thumbs. These keys are supposed to be hit by thumb, simultaneously with the other keys. You can place three characters on one key (one without shift, one with shift on the same hand and one with shift by other hand's thumb (we call this cross shift)). In this way all the hiragana can be accommodated in the three rows, just like the English keyboard. "

And here are some thoughts on transferring this technology to other languages.

"There are so many languages in the world. The IT revolution should benefit all the people in the world, regardless of the language they use. As I explained earlier, by using thumb shift, more characters can be packed in the keyboard and by selecting the most suitable layout the productivity will actually increase. This is how the IT revolution should work for the benefit of the people. "

Monday, September 05, 2005

Chinese at PN and LH

I have to mention these two posts since I don't want to lose them.

Taiwan’s first periodical in romanization from Pinyin News talks about the history of the roman alphabet as the 'first real script of Taiwan.'

Language Hat also has a great discussion with comments by Xiaolongnu and many others on the Chinese term for Jew. It's a good lesson in etymology.

Alternative Keyboards

Funky! That's it. Don't know if I can say anything more profound.

All right, what's this about? Alternative keyboards are relevant to two groups of people. Those who want an alternate input system for ergonomic reasons and those who want to work with something other than the good ole English alphabet. I am the latter. However, paths converge, scope broadens, and so on.

This image is from an educational site, (not a commercial site so not plugging any one brand) called the Typing Injury FAQ - very cool. There are fixed split and adjustable split keyboards, contoured keyboards and vertical keyboards, as well as a discussion about why the chording keyboard is not so fast after all. Other links here and here .

For those with further questions on specific keyboards there is a Yahoo group called altkeyboards. This keyboard on the right demonstrates the move towards putting the non-letter keys in the middle rather than off to the side. So enter, tab and backspace are centre rather than side keys. Shift, however, is left on the sidelines.

This keyboard, Morita, was discussed in the yahoo group altkeyboards in this message.

This discussion will connect somehow to a further post on the thumb shift keyboard, coming up some day - well I hate to promise anything since this is such a spontaneous hobby for me. However, there are numerous real writing system articles on the thumb shift page.

NB. Keyboards one and two are QWERTY keyboards so they are not alternative in layout. Morita and, of course, Dvorak are alternative layout keyboards. However, they are both discussed at the Typing Injury FAQ.

Sunday, September 04, 2005

Bell's Visible Speech

In 1864 Melville Bell developed a universal system for writing speech sounds called Visible Speech.

Alternative Handwriting and Shorthand Systems.

"Although not intended as a replacement for longhand, this system provides a means of recording human speech sounds, and not just those used to make words, but virtually any speech sound! Alexander Melville Bell, whose more famous son was Alexander Graham Bell of telephone fame, developed Visible Speech in 1864 as a kind of universal alphabet that reduces all vocal sounds into a series of symbols. He was working with the deaf and wanted to illustrate for them how speech sounds are made by using a shorthand system based on anatomical positions within the human vocal tract.

It was the first system for notating the sounds of speech independent of any particular language or dialect. The International Phonetic Alphabet is based on Bell's work."

Further details can be read here. Morris Halle makes these comments about the original vision and eventual fall in to oblivion of Visible Speech.

"The prohibitive cost of casting the type was the reason for the replacement of Bell's alphabet with that of the International Phonetic Association (IPA), where sounds are represented by letters of the Roman alphabet and some diacritics available in most print shops. In fact, the IPA Principles expressly counsel writers against use of diacritics, wherever possible. The replacement of the Bell alphabet by that of the IPA had the unfortunate effect of obscuring and ultimately consigning to oblivion Bell's important discovery that the atoms of language are not the sounds, but the features."

Two features of this script attract my attention. It is presented as the first alphabet to represent speech apart from any particular language or dialect. In that sense it is a universal system for writing the sounds of speech. This is in direct contrast to other universal systems which are an attempt to communicate meaning without reference to sound.

See this review of Jill Lepore. A is for American: Letters and Other Characters in the Newly United States. New York: Alfred A. Knopf, 2002, for an interesting discussion on the difference in the work of Gallaudet and Alexander Graham Bell. Gallaudet believed that sign langauge was a universal system for communicating meaning.

The other reason for my interest in Bell's Visible Speech, 1864, is that its physical characteristics, its shapes or glyphs, use the feature of rotation or orientation in four directions. It takes its place in the Utopian family of scripts that I was writing about earlier this year, a direct descendant from More's Utopian alphabet and part of the same family as Cree Syllabics, invented in 1841.

Addendum: In Visible Speech the different orientations indicate the place of articulation of a consonant. This reflects the featural nature of the script. This bears no resemblance to the manner in which syllabics represents speech, where the shape represents the consonant and the orientation represents the vowel. Further links Omniglot, Fonts .

My use of the term 'Utopian' scripts is idiosyncratic and reflects a group of shapes not a type of phonology, which is consistent with my interest in writing systems as concrete shapes or glyphs. The term Utopian also represents a philosophy of universalism and equity.

Saturday, September 03, 2005

Pinyin: Word vs Syllable

I am catching up on some other blogs that I read. First, Mark, of Pinyin Info, has posted a new page on the role of the word vs the syllable in Pinyin input titled "Pinyin-to-Chinese Character Computer Conversion Systems and the Realization of Digraphia in China" by Yin Binyong. This is a very practical and helpful selection since anyone who uses Pinyin input knows that it has changed, but this explains how it has changed and what this actually means about the concept of words in Chinese. Here is an example.

"The second stage may be called “whole words and phrase-based conversion systems”, in which entire spoken words or set phrases are entered as input and Chinese characters appear as output. For example, if we input “fengguang”, this system will convert it into the Chinese characters 風光; if we input the phrase “fengherinuan”, meaning “warm and sunny weather”, the Chinese character output will be 風和日暖 , etc. Obviously, this approach does not follow the outdated myth of Chinese as a monosyllabic language, but is rather based on the realities of the present-day spoken language, and thus it has achieved some significant results. Nowadays, both in China and abroad, most Pinyin-to-Chinese character conversion systems are based on this principle. By taking whole words and phrases as the basic input units, the rate of confusion of homophonic characters has been greatly reduced."

From Pinyin News I was directed by this post back to Gary Feng's Shadow. Here are two of his recents posts that intrigue me and link to issues I have thought of in the past.


"《小学汉语拼音教学研究》 is a collection of papers from a PinYin Instruction conference held in Dalian, China in 2001. Most of the papers were written by teachers and county educational institutes. Few touched on theoretical issues. Nonetheless it make available some of the arguments for the "whole-syllable" based method of teaching PinYin."

Teaching Pinyin and word parsing in Chinese

"I am reading 徐通锵’s new book on 字本位, an attempt to build Chinese linguistics on the basis of 字, which usually corresponds to a syllable in the language and a character in the script, and a concept (not morpheme) in the mind of the speaker. I hope it will provide a new angle on the age old debate on word parsing. "

Sadly,for me, the rest of these posts are in Chinese.

I only do 'pseudo' Pinyin input since I don't speak or read Chinese. However, I am actually beginning to put a few sounds to characters every once in a while. Scary thought! As far as words are concerned in Chinese, I can only remember a couple. When I was in Beijing last year I learned to say "I am a teacher" in Chinese. Teacher, 老师, lao shi, is a two syllable word, but I never once thought that I was saying two words instead of one when I said 'teacher' in Chinese.

Vai Manuscript 1834

This is the earliest known Vai manuscript, copied from Konrad Tuchscherer and P. E. Hair's article, Cherokee and West Africa: Examining the Origins of the Vai Script, History in Africa 29, 2002.

Here is a paragraph from John Singler's chapter on African Scripts in The World's Writing Systems.

"Most literates find the need for only forty to sixty characters. In many ways the participants at the 1962 conference 'filled in the blanks' creating symbols where none had existed before. Thus the conference largely introduced into the writing system distinctions between pairs of syllables beginning with s and z, f and v, wV andV, and the palatal consonants c, j, nj, and y. Very often, a contrast already existed between pairs of consonants with some vowels; now it was extended to all seven vowels. Thus most of the seeming systematicity in the shape of characters is artificial, imposed in 1962 and never in fact accepted by script users.(According to Welmers 1976: 11, the system did not originally distinguish between b (implosive) and mb (implosive), d (implosive)and nd (implosive), or [k] and [ng], these distinctions were only introduced into the writing system around 1900.) A further point about the relationship of the chart to ordinary use is that the usual form of some charcters represents an inversion, reversal, or turning of the version in the chart."

The World's Writing Systems edited by Peter T. Daniels and William Bright. Oxford University Press. NY. 1996.

This is a followup from yesterday's post about IBM's SHARK software. I can't offer any more details except to say that the 1834 manuscript uses approximately 60 symbols which corresponds to Singler's estimate. Therefore, the 1900 and 1962 standardization and the current Unicode proposal have all significantly increased the inventory of the Vai syllabary. This is just one more curious fact about writing systems.

Addendum: This comment comes from an email from Tombekai Sherman, a Liberian consultant to the Vai Unicode Proposal.

"I find myself around a minimum of 200 characters. All of these characters are not used in every piece of writing. But they are all needed in order to communicate fully at all times. Old writers who use very minimum characters are difficult to be understood. "

Friday, September 02, 2005

My Name is Red

My Name is Red by Orhan Pamuk was my best summer read. While it is not about writing systems, it is about books; in particular, book illustrations. As this review comments, it was a dense beginning, and I put it down. However, when I picked it up again, everything fell into place. Each chapter initially comes off as a short story on its own; then they start to fall into place one within another and so on.

This mystery is about 16th century Istanbul book illustrators who are integrating artistic elements from China, the Mongol empire, Herat, Tabriz and Venice into their paintings. Romance and realism, sex, art and philosphy all blend together in a well-balanced and tastefully arranged banquet of themes.

The high point of the story for the true book lover will be when the main character accompanies an artist through the Sultan's treasury, leafing through book after book, seeking a particular illustration. Colour and line are so vividly described that rich images live on in the mind.

Well, I enjoyed this book, what more can I say? Oh yeah, the author speaks so authentically with the voices of both men and women that one has to wonder how he can get it so right.

IBM's Shark

This isn't exactly new but I wasn't familiar with it before. Someone described this to me as if it was handwriting recognition, but it is not; it is 'rapid keyboarding' - 'shorthand aided rapid keyboarding,' hence SHARK. It definitely offers an alternative to qwerty for text entry, but one would want to start with a standardized pattern right off the bat. Use a stylus instead of trying to push those teeny weeny buttons on the cellphone or blackberry.

The IBM Almaden Research Center has a good set of webpages to answer questions. Also Alphaworks. Now be sure to view the demo, promise. Here is a post from a nice tech blog I hadn't seen before either jkOnTheRun .

My interest in this is whether it would work for writing systems other than the alphabet so that the qwerty keyboard would not be the defining parameter for a keyboard arrangement.

For example, what would the optimal number of letters be for this arrangment? How would the Cree , Cherokee or Vai syllabary look in this layout? 36 symbols, 85 symbols, 250 symbols, where is the cutoff? Can Vai be reduced to a smaller set of symbols necessary for writing Vai but not English loan words in Vai? That is the question?

Thursday, September 01, 2005

Universals in Writing

Wilkin's Real Character
I know there is some mail to go through and I haven't done it. However, I am simply going to indulge myself for a while and write about what pops into my head today. I have started on the trail of the 'universal' writing system and cannot change direction for a while.

First, I think it wise to say it in so many words, since I did not realize this before, that a 'universal writing system' is one of two completely different beasts. In the one case, it can be the search for a system with which to communicate meaning directly without reference to a particular sound system. This is the kind of system which European philosphers have been looking for, first, in Egyptian hieroglyphs, and second, in Chinese characters. They were disappointed each time in discovering that these writing systems were, in fact, attached to a particular representation of sound. (I know some people are having a hard time giving up this idea. The association with a Utopian state makes it a tenacious concept.)

Okay, the following is from someone's dissertation, but it is the clearest way that I can communicate the search for a universal writing system in 17th century Europe. Thanks to Jaap Maat at the Institute for Logic, Language and Computation, U. of Amsterdam, for this precis.

"The creation of a universal and philosophical language was a widely discussed topic in the seventeenth century. One of the goals to be achieved by putting such a language into practice was to overcome language barriers. Another goal was to have a language that was more efficient and easier to learn than existing ones. Furthermore, the envisaged artificial languages were meant to incorporate an accurate representation of knowledge, so that learning the language would entail acquiring knowledge of the world of nature. Some authors even believed that a philosophical language could be instrumental in the growth of knowledge in being a tool that greatly improved our thinking. Many efforts were made towards the construction of artificial symbol systems of various kinds. Among the schemes that were completed, those of two English authors stand out for presenting fully-fledged artificial languages. These were 'Ars Signorum' (1661) by George Dalgarno (c. 1620-1687), and the 'Essay towards a Real Character and a Philosophical Language' (1668) by John Wilkins (1614-1672). The present dissertation provides detailed description and discussion of both languages. In addition, the work of Gottfried Wilhelm Leibniz (1646 - 1716) in this area is examined."

This sets the background for the Wilkins script pictured above, which I wrote about in Real Character last month. Each segment of the script represents a certain unit of meaning not a sound.

The other kind of beast that a universal writing system might be, is a system which represents sounds without reference to a particular language. This would be the International Phonetic Alphabet. More about these universal writing systems later.

Now, how is all this relevant? Am I going to wander around in the past for ever? Yes, probably. However, I will stop on the way to contemplate the role that Unicode now plays as a universal 'writing system'.