Wednesday, November 30, 2005

Syriac Vowels

Last time I posted on Syriac I was asking myself about Syriac vowels. The vowels in the Eastern and Western versions of Syriac are quite different. I had actually assumed that they would be reflected in different fonts. I was surprised when I found out they they are encoded separately. I have no idea if this is a good thing or a bad thing.

It seems to me that it would create two separate encodings for the same word and more difficulties for searching. Someone please tell me this is not so. I also suppose that there was some good reason that this was done. I'll be keeping an eye open for some discussion of this if it ever comes up.

The same effect occurs in Cree. Here are the Eastern Finals in the top row and the Western Finals below. They are also encoded separately.

Thanks to Omniglot for these images. I also see that Omniglot has a Cree text on this page, which represents Cree as I have seen it written. There are no points other than the mid-dot and Western Finals. It is a fast fluent way to write, close to shorthand, as each spoken syllable is represented by a simple stroke on paper and the final vowels are a brief tick. That was how it was originally used.

Well, I digress. This is enough for tonight since neither of these scripts are searcable on the internet yet. I wait to see what happens. Google is an established English way of life now, but for some scripts it is still a very log way off.

Tuesday, November 29, 2005

Syriac Keyboards

I thought that I would try out the Syriac keyboards tonight. There are two. The first one is not romanized and does not relate to any other keyboard I know.

I tried out the second one. It is called a 'phonetic' keyboard which seems to mean, in this case, a romanized keyboard. It matches the QWERTY keyboard as much as possible.

However, take a good look at these keyboards - these images are close to life size. Now I have to say that I have tried onscreen keyboards from lots of different developers and they are all the same in this respect - they are completely unreadable by anyone over 40 and by many children.

Frankly, it is somewhat reassuring for me to know that I have this onscreen keyboard in the accessibility options, as long as I don't actually intend to use it.

Next step. I opened Wordpad and set it for Estrangelo Edessa font size 26. Then I keyed in the letters across the QWERTY keyboard with this result. Beautiful. It was a keeper.

Nowever, I had one more step to complete. I switched to BabelPad and keyed in the same sequence then I clicked on u ̈ and produced the second image.

In this display the letters are in their 'logical' left to right order. Using right to left is no big deal for me since I have studied Hebrew ... once upon a time ... but if I work in logical order then the cursor goes with me and not against me. That makes it worth considering. The major advantage is that I now have the independent forms not the connected ones.

Now, if only I knew some Syriac to type. I have found the image from yesterday's post and type in the wordlist. (Minus the two words which have letters that are too small for me to decipher.)

ܛܘܪܐ - turā mountain
ܡܕܝܬܐ - mdittā city
ܡܠܟܐ - malkā king
ܡܠܟܬܐ - malktā queen
ܥܡܐ - ʿammā people
ܟܬܒ - ktab to write
ܢܦܠ - npal to fall
ܥܪܩ - ʿraq to flee
ܫܡܥ - šmaʿ to hear

Syriac is absolutely beautiful and keying it in was a dream. Learning more Syriac actually seems possible. I don't have any unusual abilities in the area of visual memory so there are only a few scripts that I am truly comfortable with. I hope that Syriac will become one of those. I was pleasantly surprised by all the books on Syriac available from Amazon. Neat.

I did notice, however, that there were extra symbols, superscripts or diacritics in the text of the Syriac (Jacobite) script version of the Little Prince. I have no idea what they are. Vowels I would guess, but I don't really know.

The only difficulty I had with this post was that there is no SMALL LETTER T WITH DOT BELOW in the Lucida Sans Unicode Font, which is where I found the left half ring. A problem for another day.

Sunday, November 27, 2005

Inside Malcuno Zcuro

Wolfgang has kindly sent a view of the inside of Malcuno Zcuro, both in the Syriac script and in the Latin script. The book also has a wordlist at the bottom of each page which makes it even more attractive for language learners. Click on these images to enlarge.

I am reposting Wolfgang's email since he provides this interesting information.

Saint-Exupery's "Le Petit Prince" was translated by the "Circle of Aramaic Students" at Heidelberg University, Germany.

I contacted the professor who initiated the Aramaic translation. He assured me that "zcuro" is the correct translation for "little" as far as the Tur Abdin dialect is concerned. He assumes that the persons who came up with "zeuro" must have consulted a dictionary of the Old Aramaic language.

BTW a copy of "Malkuno Zcuro" (ISBN 3-937467-15-7) can be obtained from the following book company:

And yes I did consult a dictionary of Old Aramaic. However, I have since looked at a few books that are available at on Syriac. These include a dictionary, grammar and various other books. In one I found an example of Syriac vocabulary transcribed with the left half ring for the 'ayn as had been suggested earlier by Simon. However, in the Little Prince orthography the 'ayn is written with a 'c'.

Books available at on Syriac are A Compendious Syriac Dictionary and an Introduction to Syriac: An Elementary Grammar With Readings from Syriac Literature with this editorial review.

Syriac is the Aramaic dialect of Edessa in Mesopotamia. Today it is the classical tongue of the Nestorians and Chaldeans of Iran and Iraq and the liturgical language of the Jacobites of Eastern Anatolia and the Maronites of Greater Syria.

Syriac is also the language of the Church of St. Thomas on the Malabar Coast of India. Syriac belongs to the Levantine group of the central branch of the West Semitic languages. Syriac literature flourished from the third century on and boasts of writers like Ephraem Syrus, Aphraates, Jacob of Sarug, John of Ephesus, Jacob of Edessa, and Barhebraeus.

After the Arab conquests, Syriac became the language of a tolerated but disenfranchised and diminishing community and began a long, slow decline both as a spoken tongue and as a literary medium in favor of Arabic. Syriac played an important role as the intermediary through which Greek learning passed to the Islamic world. Syriac translations also preserve much Middle Iranian wisdom literature that has been lost in the original.

Tim May has pointed out that Meltho Open Type Syriac fonts are available Beth Mardutho.
Syriac is notable for being one of the scripts on the Xian Stele in China, as well as on the tombstones in Quangzhou. (I have not found and image for this yet.)

(Oddly the Estrangelo Syriac script also appears on the bookplate for the Gleason Moss Collection of H.A. Gleason, Jr., who was my first and well-loved linguistics professor. His father was the botanist H.A. Gleason.)

And finally a nice link here to look at a few related scripts and their transcriptions together in a table. And there is the right half ring and the left half ring. Now I get it.

Actually I intended to end here but really I have to identify the variant of Syriac script which appears in Malkunoc Zcuro. It looks like Jacobite or Serto script from comparison with the Omniglot page. At Amazon dot com I have found a Syriac Bible in the Jacobite script.

Here is a clip from the Syriac Bible: Jacobite Script, Ancient and for comparison a chunk of non-continuous text from Malkuno Zcuro.

Would it be fair to say that Syriac has several diascripts? Hmm.

Saturday, November 26, 2005

BabelStone Blog

Andrew West's recent post about What's New in Unicode 5.0 provided links to some interesting reading. First, he answered my question about Phoenician. You can read his answer here. I didn't bring this up to reopen a debate which I have no part in. Rather, I was away for the month of August and missed the end of that story.

However, I found a document called N2990 particularly useful. This document not only records votes but also records comments. Among the comments, I noticed this line.

Encoding Phoenician is redundant, and needlessly proliferates Canaanite diascripts.

I googled diascripts and came up with this document which supplied a definition. "Diascript is to script as dialect is to language." Good, one more thing to think about.

Next, in the same document on page 9, I found an interesting item.

For character names and named UCS sequence identifiers, two names shall be considered unique and distinct if they are different even when SPACE and medial HYPHEN-MINUS characters are ignored and even when the words "LETTER", "CHARACTER", and "DIGIT" are ignored in comparison of the names.

The following hypothetical character names would not be unique and distinct:

That answers another question I had for Andrew about character names. Now I know that the part of the name that designates it a 'character' or a 'letter' is not to be considered significant.

However, this is tricky because if the name of the character differs by the word 'letter' or 'symbol' they are indeed separate characters.



While Andrew has tallied up the the number of characters in Unicode in How many Unicode characters are there? I have entertained myself with another of my trivial tasks.

These little trivia games I play sometimes are simply to familiarize myself with a script or a technical detail and entertain myself at the same time. Many have no point at all. Neither does this. It is a tally of the names of characters used in Unicode and gave me a happy half-hour of playing with BabelMap.

Character Names by Block for a few representative blocks.

Arabic Letter
Latin Letter
Bengali Letter
Bopomofo Letter
Braille Pattern Dots
Cherokee Letter
CKJ Unified Ideograph
Cypriot Syllable
Deseret Letter
Devanagari Letter
Ethiopic Syllable
Hangul Choseong
Hangul Syllable
Hiragana Letter
Katakana Letter
Linear B Ideogram
Canadian Syllabics
Linear B Syllable

This is just to condition myself so that in the middle of discovering Katakana at some future date I don't do a double take when I discover that they are letters and not syllables. Ethiopic, Cypriot and Hangul have syllables but Cherokee and Katakana do not. The name for Canadian Syllabics seems to feature the name of the block. Surely the character itself is a 'syllabic', while the system is 'syllabics'. I have to think about this too.

However, there they are and I am taking a step towards becoming familiar with these names. It helps if you want to search for a character by name to know the name. I also explored many of the features of BabelPad described in this post.

I look forward to hearing more about Phags-pa some day.

Friday, November 25, 2005

Phoenician Alphabet

First, I have been reading, but not commenting on, the Tel Zayit Abecedary controversy. Somehow, conducting a functional literacy assessment for 1000 BC seemed a little daunting. However, I have now checked out all the links provided by Language Log of Nov. 14 and Nov. 21, 2005. (I can't seem to figure out how to link to these posts directly.)

I also note that Phoenician, among other writing systems, has been accepted for encoding in Unicode version 5. Proposed New Characters: Pipeline Table. So it doesn't seem out of the way to practise typing in Phoenician to get myself accustomed to a new keyboard.

Fortunately, Nizar Habash has posted a little demo here. Actually he is using some kind of frames on this site so follow Research> Human Computer Interface> Phoenician Nuun Demo (Phoenician-English Input Method.)

You can see that I have faithfully reproduced the sequence of letters from the Tel Zayit Abecedary. This keyboard is pretty intuitive and uses only two letters in the shift position: teth and sade. (I can't seem to get SMALL LETTER S WITH DOT BELOW to display for me in blogger. Maybe another day.)


I must have forgotten a snippet of code yesterday because when I went in today and defined the font as Microsoft Sans Serif the desired character was just fine, thank you very much. This is what I wanted: ṣādē.


I received this email a few days ago.

I wanted to ask you about something that I believe I once saw somewhere online but I can't find now. It pertains to a Hebrew and Arabic alphabet reform that someone was proposing, an odd combination of the two alphabets. Does that ring a bell? If so, I'd appreciate it if you could tell me who is behind this so I can look it up. Thanks!

This is something that I had never heard of and it doesn't google very well. So I posted this message in Qalam, the writing systems forum. In about half an hour I received a reply.

While I am delighted to receive such emails - flattered really, readers can themselves go straight to qalam and bypass me althogether. There you will find 268 script enthusiasts. Right now it looks a little quiet. But here is good too - lots of commenters to augment my musings, thank goodness.

The answer might possibly be the Alphabet of Semitish by Nizar Habash. He has an interesting site to explore. He is an "Associate Research Scientist at the Center for Computational Learning systems in Columbia University." Habas has also invented the Delason Constructed Language and writing system. Of particular interest here is his Palisra Gallery with this introduction.

What is Palisra?

Palisra is an artistic exploration of the nature of a world where Palestinian and Israeli nationalisms never existed. They are replaced by a merged nationalism, that of the people of the Holy Land.

This is an ongoing project that includes creating all elements of an alternative merged nationalism: flag, money notes, stamps, religious art, and language (an Arabic-Hebrew esperanto we are calling Semitish).

Is this a vision of things to come or an elaborate escape of a bloody reality?

That's up to you to decide.

In my next post I hope to feature Nizra's Phoenician input utility!

Chinese Input Method Popularity

Here is an interesting post on Chinese input Methods from Lee Sau Dan in Sci.Lang.

Jer writes:

Hi - Has anyone read statistics about input methods used in China? I assume the Pinyin systems would be the most popular, followed by Wubi. (Wubizixing).

Lee Sau Dan writes:

In the sphere of traditional characters, Cangjie is quite popular,because it's ubiquitous and it's what professional typists are trained to use. Zhuyin (based on the bopomofo phonetic transcription system) may come next, but there are many people using various other methods. e.g. People in Hong Kong like to use Jian3yi4, which is a sort of broken Cangjie. Many use Cantonese-based input methods. In Taiwan, some Minnan-based methods are popular, too.

Jer writes:

Anyone care to predict the future? Will it stay as it is now where most people prefer to type the pinyin pronunciation then choose the correct character, but more serious people put in the time to learn Wubi?

Lee Sau Dan writes:

Even those how don't bother to learn Wubi are using something other than plain Pinyin, because the latter is too slow to be used intensively. e.g. there is Jian3pin4, which substitutes some digraphs in Pinyin (e.g. "zh", "sh") with single keystrokes. And phrase-based input methods are gaining ground because of the increased inputting speed.

Jer writes:

I can't really picture a system faster than Wubi taking over.

Lee Sau Dan writes:

Go beyond the "type character by character" mindset and you'll be able to imagine faster methods. Is it too hard to imagine typing "i18n"and have the input method turn it into "internationalization" for you automagically? I don't think so.

Lee Sau Dan 李守敦

These last couple of lines hint at what is ahead in input methods.

Thursday, November 24, 2005

Addenda and Errata II

I hope no one thinks that this is an encyclopedia; or that I shouldn't be posting if I make the occasional error. Especially when I copy something verbatim from somewhere else without checking the tiny details.

This one I found quite interesting so I'll blog about how I have checked this out.

First, Simon commented here,

I believe the transcription of title should be "Malkuno Zeuro", not "Malkuno Zcuro". It's hard to tell whether it's a "c" or an "e" in the script on the second image, and also the "kaph" and "e" are quite similar in the first image, but ܙܥܘܪܐ makes more sense as "little".

I have to say that it still looks like a 'c' to me but ... I then checked out Simon's blog. Right, he posts in Hebrew so maybe there is something to this.

I then got out Holladay's Hebrew and Aramaic Lexicon. I am not sophisticated enough to find an online dictionary for Aramaic yet so this will have to do. It is just barely back on the shelf from checking out 'Emeth Hesed'. (Yes, it is an 'aleph' that was removed not an 'e' to turn 'emeth' into 'meth'. Another detail that I copied from someone else's story. Actually I knew it was an aleph but the story was being told in English so I went with it. Sloppy, sloppy!)

Anyway... in the Lexicon I found צעירו masculine singular for 'little' or 'small'. So 'zeuro' it is.

Next, to see how the confusion came about I checked the two possibilities that Simon mentioned in Syriac. They do indeed look somewhat similar.
ܙܟܘܪܐ zcuro (a non-existant word)
ܙܥܘܪܐ zeuro meaning little [Addenda: 'zcuro' would be the correct transliteration of this word since 'ayn is often tranliterated with a 'c']

Okay, 'zcuro' is an error, [Addenda: zcuro is correct] and now I can see how the error came about. Checking in BabelMap I easily found that the first is 'zain, kaph, waw, rish, alaph' and the other is 'zain, e, waw, rish, alaph.'
[Addenda: The 'e' is better labeled 'ayn' and is pronounced as a pharyngeal fricative, transliterated by 'c']

Really, no need to make that mistake, but I think the fact that it looked like a 'c' in English threw me off. [Addenda: Yes, it is a 'c'.]

No excuses though. One of the reasons I am blogging is in order to have this kind of give and take, and learn more. I found this little bit of research quite fun, and confirmation that one does not have to just let something go just because it is in another script and an image. Thanks, Simon. I assume that bloggers don't have to be infallible, do they?

I also have updates to these posts.
The Italic Ampersand
Vietnamese Revisited
'Qness' or the tradition of 'Q'
Greg Vilk

Now, where can one buy this book? Hmm. This is the info from Wolfgang Kuhl.

"Malkuno Zcuro" Antoine de Saint-Exupéry's "Le Petit Prince" (The Little Prince) in modern Aramaic language (Tur Abdin dialect) spoken in South East Turkey was printed in Germany and will be available in November 2005. The text is printed in Aramaic script (Syriac) with Latin transcription. The book also contains vocabularies in German, French, English, Turkish as well as in Swedish. BTW "Malkuno" means "prince".

You can find Wolfgang's original notice about this book on this webpage with his email address. Maybe the book is now available.

Endnote #1:

This comment from Lameen Souag has clarified that it is, in fact, zcuro. Thank you, Lameen.

This is etymologically correct; Proto-Semitic (and Arabic) s.aghiir > s.ghiir > zghiir by voicing assimilation > z`iir by regular sound shift. (Dunno why it's got -uu-.) However, it's not orthographically correct: that's a c, not an e, because Semitists often use a c to represent the pharyngeal `ayn.

Lameen also has a fascinating post today about Oldest African Dictionaries.

Endnote #2:

This is from Wofgang Kuhl, who sent me the information in the first place. My apologies for doubting the original orthography, Wolfgang.

Saint-Exupery's "Le Petit Prince" was translated by the "Circle of Aramaic Students" at Heidelberg University, Germany.

I contacted the professor who initiated the Aramaic translation. He assured me that "zcuro" is the correct translation for "little" as far as the Tur Abdin dialect is concerned. He assumes that the persons who came up with "zeuro" must have consulted a dictionary of the Old Aramaic language.

BTW a copy of "Malkuno Zcuro" (ISBN 3-937467-15-7) can be obtained from the following book company:

Endnote #3:

Simon continues,

Though as a Unicode purist, I would myself prefer to write it as ʿyn, using U+02BF MODIFIER LETTER LEFT HALF RING

First, why is the 'ayn labeled Syriac letter e in Unicode?

[Paragraph removed to the comment section.]

Definitely Firefox is becoming increasingly necessary because these extra characters are not displaying well in IE especially in the comment section.

Wednesday, November 23, 2005

Where is Your Son?

சிற்றில் நற்றூண் பற்றி நின் மகன்
யாணடூளனோ ஏன வினவுதி ஏன் மகன்
யாண்டு உளன் ஆயினும் ஆறியேன் ஒரும்
புஸி சேரநது பொகிய கல் ஆலை போல
இன்ற வயிறோ இதுவே
தோன்றுவன் மாதோ போர்கள்ளத் தானே

'You stand against the pillar
of my hut and ask:
Where is your son?
I don't really know.
My womb was once
a lair
for that tiger
You can see him now
only in battlefields.'

Kavarpentu puranamuru 86 (transl A.K.Ramanujan 1985:184)

This is a poem cited by Sanford Steever in his article on Tamil in World's Writing Systems edited by Peter T. Daniels and William Bright. This book has articles on 80 writing systems. My favourite characteristic of this book is the short selection provided in each writing system with a transliteration, transcription and translation.

(See the full version at the bottom of the page. I have omitted the transcription and left the transliteration unmarked by accents. I haven't learned to keyboard underdots and macrons yet. Sorry.)

I always find these selections reveal something about culture, human nature or both. I chose this poem to keyboard since I was in the mood to type a little Tamil.

I started with the Inscript keyboard and soon found that I needed to use the shift key for every second letter. I had the on-screen keyboard from Start> Programs> Accessories> Accessibility> On-screen keyboard open. However, it only displays either the base state *or* the shift state not both at once. So hunt and peck didn't work. I then found that there were syllables in the text that I could not readily identify. This is not suprising given that World's Writing Systems uses a variant form of Tamil font.

This is the Tamil keyboard in Windows. I have put the two together myself just to have a way to view them both at once.

I finally ended up using the Tamil phonetic (romanized) keyboard here with syllable display and that went well. Pretty easy once you get used to it. Actually there are two vowels where the shift key is needed. I had forgotten that.

Tamil is where it all began for me. I was working on a multilingual computing project a couple of years ago when I tried getting young people, who were somewhat familiar with typing Tamil in a previous encoding, to use the Inscript keyboard for Unicode Tamil. No way.

It took me over a year to get things sorted out for Tamil - I dropped the project and the rest is history. But if it weren't for this keyboard I would not have felt the need to connect with others and find out more about Unicode and related issues. Most other languages that we needed i.e. Chinese, Russian, Greek, Hebrew, Japanese, Korean and other Latin keyboards were no problem. Vietnamese ... well yes and no. Other languages just didn't seem available at the time.

Text of poem with transliteration and literal translation.

சிற்றில் நற்றூண் பற்றி நின் மகன்
cirril narrun parri nin makan
small house pillar leaning your son

யாணடூளனோ ஏன வினவுதி ஏன் மகன்
yantulano ena vinavuti en makan that you.ask my son

யாண்டு உளன் ஆயினும் ஆறியேன் ஒரும்
yantu ulan ayinum ariyen orum
where that I.don't. know once

புஸி சேரநது பொகிய கல் ஆலை போல
puli cerntu pokiya kal alai pola
tiger joining going stone lair like

இன்ற வயிறோ இதுவே
inra vayiro ituve
begot womb this

தோன்றுவன் மாதோ போர்கள்ளத் தானே
tonruvan mato porkallat tane
appear indeed battlefield only

You stand against the pillar
of my hut and ask:
Where is your son?
I don't really know.
My womb was once
a lair
for that tiger
You can see him now
only in battlefields.

Kavarpentu puranamuru 86 (transl AKRamanujan 1985:184)

From Poems of Love and War, selected and translated by A.K. Ramanujan, 1985. Columbia University Press.

Pater Noster

This is a long overdue post. The Christus Rex website displays the Lord's Prayer in 1322 different dialects and languages. Some of these are images of the Lord's Prayer in tiles from the Convent of Pater Noster. Here is the Lord's Prayer in Armenian.

"The Convent of the Pater Noster was built over the site where Jesus taught His disciples the Lord's Prayer. The walls are decorated with 140 ceramic tiles, each one inscribed with the Lord's Prayer in a different language."

If you can add to this internet collection, contact the Christus Rex website (email is on the website.) The website is well-known and has received many internet awards.

A collection of Hail Mary Prayers on this website have been contributed by the Marion Library Collection in Dayton, Ohio.

Thanks to Wolfgang Kuhl who contributes to the Christus Rex website and told me about it last year. He also sent me information about the Little Prince in Syriac here.

Tuesday, November 22, 2005


Christopher Green has written me the following,

I study an African language called Senari for which a native speaker and myself are devising a standardized orthography in hopes of being able to develop computer programs to promote literacy in the language.

A graduate student in sociolinguistics at Florida State University, his blog is on "a wide range of linguistic topics, many of which are about language maintainance and policy."

This is from his post Language of the Week - "N"

The language is Nafara, a dialect of the Gur-language Senari spoken by a cultural group in the northern part of Cotê d'Ivoire. I've had the privilege of studying Nafara alongside a native speaker of the language...who incidentally also speaks English, Dyula, French, and Yoruba! This may sound like an amazing and unusual talent, but a great deal of people living in multiethnic west Africa often known 4 or more languages fluently.

So why do I love Nafara so much? Well, back when I first decided that I wanted to be a linguist, I was introduced to Sidiky Diarrasouba, the native Nafara speaker I mentioned just above. He is an educator turned linguist, who decided to come to the United States to investigate a way to develop the necessary materials to revitalize his native language and to promote literacy within his culture.

I have been assisting Sidiky in analyzing the discourse structure of Nafara fables in order to determine a functional grammar and the rules of syntax of his language. We have also attempting to find a practical orthography so that his language can begin to be written.

I thought that I would look up the little that is already available about this language for starters. Above is the Hail Mary in a previous orthography dated 1931. Next, according to this link, "Detailed dialect survey work is currently being carried out by the SIL in the area." The Rosetta Stone Project also records some kind of orthography for Senoufo (Senari) here.

However, the Ethnologue reports these rather bleak literacy rates so it doesn't sound as if any orthography has much currency at the moment. "Literacy rate in first language: 1% to 5%. Literacy rate in second language: 5% to 15%." and further references here. This is a bit of a reality check for some of us.

For a few dry details, traditional issues in orthography creation or revision, are whether the orthography is similiar or dissimilar to the official language orthography; whether it will be phonemic or morphophonemic; at what level it will be standardized, i.e. village, region or district; and whether it will underdifferentiate or not. These are some of the linguistic considerations and there are dozens of books on this topic, so enough of that.

I spend most of my time now checking to see if an orthography 1. displays well on the internet, 2. is easy to search and 3. most of all how easy it is to keyboard.

Some people of interest when working on African orthographies are Don Osborn at who has written about Senufo here. Also Chris Harvey and Moyogo. Good luck, Chris!

Saturday, November 19, 2005

The All India Keyboard

Recently I wrote about the All India Alphabet. This alphabet has been replaced by an all India transliteration scheme called ITRANS.

There is also an all India keyboard called the Inscript keyboard. This keyboard works well for Devanagari, with its 34 consonants and 12 vowels. The vowels are encoded as both initials and diacritics so that makes 58 letters altogether and a few more symbols. No upper and lower case so all is well.

Tamil, on the other hand, has only 18 consonants and 12 vowels. These vowels have two forms, as in Devanagari. Because these forms are context dependent there is an argument that the two forms could both be input with the same keystroke. That would make 30 letters altogether. In that case, the basic Tamil writing system could be represented on the keyboard in the unshifted state.

Using the Inscript keyboard for Tamil means using a keyboard with 4 blank spaces in the unshifted state, while 3 more keys in the unshifted state have Grantha letters on them. These are letters for writing Sanskrit and are not part of the basic Tamil alphabet. Likewise 7 of the basic Tamil consonants are in the shift state.

You really should be able to type Tamil without using the shift key at all. It may be hard to see but here in the Tamil99 keyboard all the basic letters are in the unshifted state.

In actual fact most Tamil probably use a transliteration IME since that means the shift key is never needed. Who can imagine anything better than that? However, direct input keyboards and typewriter keyboards (IME's) are necessary to provide input for those unfamiliar with the English alphabet or a transliteration scheme.

So why bother mentioning this oddity, the Tamil inscript keyboard? First, because when I started learning to type in Tamil, I was told that this Inscript keyboard was the 'ordinary Tamil keyboard'. And second, because the Inscript keyboard for Tamil is the only Tamil keyboard packaged in Windows.

So there I was 2 years ago trying to learn this strange keyboard and getting more frustrated by the moment. People thought that I was a whiner for complaining about it at all. Now I know better and use an IME of some kind. I actually know how to use this keyboard but when I want to work with someone who is Tamil I generally give it the go-by.

More recently plenty of Tamil transliteration programs and other keyboards have become available as free downloads. My favourite is the online syllabic editor, of course, which was adapted by Richard Wordingham from a Hindi online keyboard, for me to use with Tamil children.

However, the Inscript keyboard remains as the only Tamil keyboard in Windows. If anyone knows what it is doing there, drop me a line.

Greg Vilk

Greg has sent me a copy of his new novel Golem so I have indulged myself for a few days in attempting to decipher the central puzzle of this novel. I have not succeeded in unraveling the mystery but I have spent some enjoyable hours trying.

This novel is set in Thule Bay in northern Greenland. This could only be Qaanaaq, a settlement whose name is a palindrome. Several clues point to the use of the palindrome in deciphering the two 'keywords' of the story, the words written on the scroll placed in the golem's mouth.

In trying to decide if these words were in Hebrew, Latin or English, I first researched the history of the palindrome. Palindromes are an ancient tradition, dating back to 275 BC. I found famous Greek and Latin palindromes but less use of the palindrome in Hebrew. Along with palindromes there are also reversable words. This offers much more scope for decipherment.

The first keyword is the 'word of creation' which brings the golem to life; and the second keyword, a reverse of the first, will destroy him. I found that the effect of the script, with its many reversed letters, (a realistic feature in my books, since I am familiar with many real scripts with reversed letters) distracted me from perceiving the sequence of the letters in reverse. Therefore I reconstructed the keywords by number.

I wrote down the 'word of creation' as 12134521 and its reverse as 21543211. To visualize this better I organized the letters like this 121-345-21 and 21-543-211.

Now, assuming first the simplest interpretation, that the words are understandable in English, I worked on combinations of letters that would fit this pattern. The double final letters could only be ll, ss, or ee. The other possibilities, zz, and ff, seem too improbable. However, maybe I am barking up the wrong tree.

Next, I switched to researching the legend of the golem in history. I found out that one of the original 'words of creation' was 'emeth' (truth) written on the golem's forehead. With the erasure of the 'e' altering 'emeth' to read 'meth' (death), the golem was destroyed. I assume a similar method must work with Vilk's two keywords.

This was just the beginning of the investigations I pursued in working on this puzzle. Overall, the historic elements in this novel refering to the creation of the golem stand up as highly accurate to the original golem legend, which is a pleasant surprise these days. Good work, Greg.

While I have not succeeded in deciphering the ancient script, there are many more tantalizing clues embedded in the text. There are allusions to the first chapter of Genesis, the first chapter of John's gospel, the Lord's Prayer and other famous quotes. I have not ruled out the possibility that the names of the characters also provide clues. You have to read the novel and decide for yourself.

There is one little detail I do have to mention in the interests of 'herstoricity'. The female character should give up her pantyhose, since this item of attire was not invented until 1959, some 17 years after the setting for this novel.

There is an interesting discussion about 'speech' and the letters of the Hebrew alphabet here and here.

Update: In response to a comment on Language Hat I need to add that 'emeth' is אמת and without the aleph מת is 'dead'. This is actually the triliteral root מןת. I think there is an expression ךבר אמת 'word of truth'. However, in this novel certain conversation points in the direction of a 'word of creation.' Hmm. Help welcome.

On other points, I can not guarantee that I am pointing anyone in the right direction on deciphering Greg's script.

Thursday, November 17, 2005

'Qness' or the tradition of 'Q'

I had a very positive reaction to the Telex input method mentioned by Michael Farris and quoted in my Unikey post. (f, s, r, x, j become the tone keys) Afterall, the index fingers on the 'f' and 'g' keys, are made for multitasking.

However, Mark saw it differently. His reaction was "Ackj! Ohx myg eyesf!" and I thought "What does this have to do with his eyes?" His sensitive fingers maybe - but surely not his eyes.

This alerted me to the fact that not everyone perceives the relationship between the key and the letter stenciled on it in the same way. For me there is an arbitrary relationship at best between the letter portrayed on the key and the key itself.

A key may have a certain English letter stenciled on it but no one key has any one letter as its essential quality. The quality of the upper left lettered key is not 'Qness'; it simply happens to have 'Q 'stenciled on it. It has no 'Qness' unless I am typing in the Latin alphabet on a QWERTY keyboard. Then I assign it temporary 'Qness'.

So I was surprised to read another post today on the Better Bibles Blog in which I discovered that indeed there are others who believe in essential 'Qness' or in "Wisdom in the Q Tradition".

From Announcing a perfectly accurate Bible Translation I heard for the first time about a new Bible translation theory in the tradition of 'Q'. Here is an oft-quoted verse in this new translation.


While Mike Sangrey, the author of this post, intends to publish a dictionary of neologisms to support this new translation, I believe that Mark S. would be able to shortcut that process significantly by teaching readers how to understand the essential quality of each key. They need to realize that the letter stenciled on the key is, in fact, the literal *signification* of that key, and any divergence from this literal truth is a perversion of the intent of the original author of the keyboard.

I, however, am not such a literalist, and tend to be more flexible in my assignment of essential qualities. I am a Thomas concerning the 'Qness' of Q and and open to consider the possiblility that 'Q' may actually represent Θ in this context.

Note: Mike Sangrey offers a complementary sushi knife for those who order this translation today.

Update #1:

I guess I should explain this. Q is not actually the input key for Greek theta when using a Greek Unicode keyboard. However, in the symbol font, a Greek look-alike font for Latin, theta replaces q.

Here is the qwerty keyboard set for the Symbol font. I hope it works.




And there is the mysterious little digamma, (#6) I believe, fourth from the end in the 'v' position. Correct me if I am wrong.

Update #2

This is the same text as the quote above but with symbol as the defined font. It is the Latin character set with a Greek look-alike font. It had me fooled the first time I saw it. Somehow I learned to use Greek Unicode first and then I saw this. But for many people it is the other way around.

This is John 3:16. For God so loved the world...


Tuesday, November 15, 2005

Spelling in Chinese

After posting on Zhuang I went back and carefully reread the 9 methods of composing characters in Zhuang and non-standard Cantonese in this article.

A Comparison of the Graphical Conventions in the Written Representation of Zhuang and Cantonese by Prof. Robert S. Bauer

I left off with this last sentence,

For various reasons neither the old Zhuang script nor the written form of Cantonese has undergone the formal process of standardization; the lack of standardization has created the phenomenon of allography in both writing systems.

I don't want to go into all 9 conventions here but this is the last one cited.

9) Graphs whose pronunciations are "spelled" by their two component characters; that is, two (typically standard Chinese) characters are combined to form the target character, and the Zhuang or Cantonese reading of one of the characters represents the initial consonant of the target character, while the rime of the second character corresponds to the rime of the target character (this method resembles the 反切 principle that was employed in the ancient Chinese rime books).

I get the impression that rather than using two distinct characters as in fanqie, two components are combined in one character. This is described by the author as "spelling" out the pronunciation in a character.

I returned to Dylan Sung's website on the history of the Chinese language and script for a description of fanqie. (View his sitemap here.)

Splicing sounds

In order to fix the sounds of a character, we needed a method in which to do it. Very early on in the late Han period (25-220), splicing two characters for the intial and rhyme was the method to pin down the sounds. This is known as the FanQie (反切) method. Prior to the Sui (581 - 618) and early Tang (618 - 907) dynasties, the character "fan" 反 was used to symbolise this splicing. After the establishment of the Tang Dynasty, the character "qie" 切 was used.

Here is an example of how Fan and Qie splicing work.

[This character has the] old pronunciation "tung", and both methods use two extra characters, the first of which is the initial, and the second an exact rhyme to our example. The splicing works exactly the same way in both examples.

For a further discussion of fanqie I went here.

The fanqie spelling is a word-based analogical spelling system in which words are spelled in terms of other [familiar] words. Fanqie was never intended to, nor is it capable of, making distinctions beyond those of the words of any given speaker or reader. Neither the rhymes nor the fanqie spellings of the words of any given dialect or literary tradition can be arbitrarily extended (or "refined") so as to include the rhymes or words of another dialect which may have distinguished them differently or which did not distinguish them at all, as the Qieyun compilers indicate.

Or read the book.

I have recently made the delightful but necessarily time-consuming discovery that if a book is listed at Pinyin info it is likely available at the university library near me. I have a stack of these books on my desk, and some of them I have actually read.

Two thoughts from reading all this. First, different kinds of phonography were used to generate new characters or 字 zi. Second, allography is a great term for a phenomenon which fascinates us all - non-standard writing. (Well, most of us.) In the midst of the all-encompassing standardization that is happening as graphs and systems enter Unicode, many of us will be mourning 'allography' or trying to find ways to keep it alive in spite of itself.

Sunday, November 13, 2005


Here are some responses to the Vietnamese search problem that focus on the search engine and not the keyboard. I think this is an issue that anyone who is searching the internet needs to be aware of.

First, Andrew C. commented,

The key issue is that Google, like many web services does not bother to normalize Unicode strings. Google seems to take it byte by byte. The result is that the microsoft layout compared to a precomposed (NFC) string or even a NFD string produces different results.The W3C have released a draft version of part of their character model that tackles normalization.

Then Simon reponded,

Actually Google makes the effort to normalise the search strings.For example, for Greek, Google knows about cases (does case mappings):ιστολόγιοΙΣΤΟΛΌΓΙΟΙΣΤΟΛΟΓΙΟιστολογιο and also can work irrespective of accents! This might come from the case mapping rules for Greek; when you capitalise words, the accents are often removed. For more, see

Then Andrew C. continued,

As Simon has indicated, Google has put a lot of work into some languages to optimise searching in those languages. But if you use a language they haven't optimised for, you tend to have problems. As far as I can tell, Google seems to operate on byte sequences rather than character sequences. One trap people fall into is the assumption that because Google has an interface translated into a langauge, then Google is a suitable search tool for that language.

Recently, I've been researching Khmer search engines. The Google interface has been translated into Khmer, but it doesn't seem to be possible to actually search sucessfully in Khmer unicode, even though there are Khmer unicode sites that have been indexed by Google.

I also know that I don't need accents to google in French. And this week I have been busily working away on my own little project on Andreas Müller (1630-1694). 'Muller', 'Müller' and 'Mueller' all give me the same search results. After a little testing it seems that the precomposed accents - acute, grave, cirmumflex and umlaut are normalized. However, maybe not the combining diacritics or even precomposed letters with two diacritcs. Hmm. I can't really say.

However, here is another little problem - when I get to the page I want and use the edit:find feature, I have to be exact and use every little accent. I have to search the page using Muller, Müller and Mueller as separate searches. No normalization there! I wondered why all those pages gave me no results.

Well, Müller is not going anywhere so I can catch up with him now.

Additional Comments:

On another topic altogether, I don't have time to quote and comment on the many great posts that I read. I assume that if they are in my sidebar people will find them eventually.

However, here are a few things worth mentionning. First, Andrew West has made his first post Tibetan Extensions 1 : Astrological Pebble Symbols on his new BabelStone blog. Then there is Lameen Souag's post on A comparative linguist of the 10th century and finally the ongong discussion of the Tel Zayit Alphabet on Language Log.

Update #1: See Mike's post for a more refined search engine experiment.

Update #2: See further comment here .


I haven't posted much about keyboards lately so this seems like a good time. This is about the Unikey Vietnamese keyboard which has "all 3 popular input methods: TELEX, VNI and VIQR." (Screenshots)

Michael Farris has made this comment about it.

Not exactly your comment, [that's okay. Michael, this is a blog, remember] but for Vietnamese, I use a non-microsoft keyboard called unikey. It has several options, I use unicode precomposed characters and telex input, a vietnamese system that takes a little getting used to.

Here's a list of some words, with the input on the left, output in the middle and English gloss on the right.

vieejt Việt Vietnamese
nguwowfi người person
tooi tôi I
owr ở at
sawsp sắp imm. future marker
ddax đã past marker

the tone keys (f, s, r, x, j) can be typed either after the vowel or after all the segmental letters of the word have been typed. The latter method is probably better as it assigns the tone marker better in ambiguous cases (but I'm used to writing tone as I go along). It's much faster than when I inputted a 100 or so pages of dictionary entries using keyboard shortcuts of my own devising in a floating accent system that I hate with a passion now (can you say awkward and time consuming and frustrating?)

Thanks, Michael, for explaining this. It sounded a little odd at first but entirely suitable kinesthetically. There is a big difference between just finding all the accents in the first place, and then finding an input method that can be easily typed. I still find French awkward. Especially since I have switched keyboards a few times over the years.

Here is another comment on the Telex input method.

That is also the case in vietnamese "telex style" input. A very popular input method as it allows very fast typing. The vowels with a circumflex, as well as the D stroke, are written by redoubling the letter. Then, unused letters of the latin alphabet (j,x,...) are used to indicate the different accents. But those letters can be typed almost anywhere on the syllabe (vietnamese is written with syllabes separated by spaces). For example "Vietnam" in vietnamese is written with the "e" having acircumflex accent and a dot below the letter. With the telex input method: "Vieejt Nam" but also must be accepted"Vieetj Nam" (yes, the accent is always on the last vowel of a syllable with several vowels).

If you think about how the lettered keys will look as you type, this will throw you off. But think of what will display on the screen instead, as the accents are added either after the letter or after the syllable which they modify, up to you. More intuitive than dead keys and no long finger stretches to the top row.

However, the top row is way better than at the side on the quotation mark key. Some of us have very disobedient pinkies - they never do as they are told - better for drinking tea, really.

Another recommended Vietnamese keyboard is VPSKeys.

For Mark, look at this comment about using telex input for Pinyin. Have you ever seen that?

I'm off for a cup of tea. The power of suggestion!

Further from Michael Farris:

Unikey telex input is also forgiving in that you don't have to delete wrong accents. If I mistype owr as owf I just add the r after f (owfr) and it corrects the tone. And tone placement is a little tricky in words with, for example, the sequence -oa- as the tone mark goes on either the o or a depending on the final. Typing the tone right after the vowel is less accurate than typing the tone as the final element (which always places it correctly).

Also, of the fine "tone" letters, three are used in Vietnamese, r, x and s are all initial consonants (so their use after vowels is unambiguous).

Friday, November 11, 2005


I have been thinking a lot about the Zhuang writing system lately mainly because it was new on Omniglot last month. Here is the paragraph which interests me.

A method of writing Zhuang based on the Wuming dialect and using a mixture of Latin and Cyrillic letters and a number of IPA symbols was devised in 1955. A reform in 1986 removed the non-Latin letters and replaced them with individual Latin letters or combinations of Latin letters.

Simon shows the difference between the earlier set of letters and the current set. The main difference is that now they are all Latin letters and easier to keyboard.

The earlier Zhuang system which mixes letters from several alphabets represents the same design model as the All India alphabet and comes from the same era. There must have been a sense that either, one could just create typewriters with this mix of symbols, or more likely, create text from a printing process in which letter sets could easily be mixed. Then the computer came along!

This qalam post by Andrew West adds further details.

Actually, the unwieldy Zhuang phonetic alphabet devised in 1955 that uses a mixture of Latin, Cyrillic and IPA letters together with the special tone letters ... is no longer in official use, but since 1981 has been replaced by a new phonetic alphabet using ordinary Latin letters only.

There was indeed a tradition of writing Zhuang using a mixture of existing Chinese ideographs (to represent either the pronunciation or meaning of a Zhuang word) and specially devised ideographs that represent the meaning and/or the pronunciation of a Zhuang word in the same manner as the Vietnamese nom script. These Zhuang-usage ideographs are known as "saw ndip" in Zhuang or "fangkuaiZhuangzi" in Chinese.

However this seems to have been a rather ad hoc system, which varied considerably from manuscript to manuscript, and was never formalised as a systematic script. Educated Zhuang tended to use Chinese for written communication, and the Zhuang-usage ideographs were mainly used for writing down folk songs and such like.

I've not yet met anyone of the Zhuang nationality who is familiar with this form of writing. A dictionary of Zhuang-usage ideographs _Gu Zhuangzi Zidian_ was published by Guangxi Minzu Chubanshe in 1989, ...

"The Zhuang, with a population of about 18 million, are the largest ethnic group group in China. Most of the Zhuang people live in compact communities in the Zhuang Autonomous Region in Guangxi, with the rest scattered throughout Yunnan, Guangdong, Guizhou and Hunan provinces."

Pinyin News has some interesting comments on Zhuang population statistics and official attitudes towards minorities.

Abstracts from the Workshop on Zhuang Language Department of Linguistics The University of Hong Kong 12 May 2005 give this information.

Some Zhuang speakers would prefer to write Zhuang with the old Zhuang script which is a combination of Chinese characters, Chinese-like characters, and other symbols. Dating from the Tang Dynasty, this written form of Zhuang has recorded folktales, myths, songs, play scripts, medical prescriptions, family genealogies, contracts, communist revolutionary propaganda, etc. One of the most astonishing features of the old Zhuang script is the large number of allographs (or variant graphs) — as many as a dozen or even more — that may be associated with one morphosyllable.

As for written Cantonese, only in Hong Kong is it widely used in newspapers, magazines, comic books, personal correspondence, play scripts, etc.; the Cantonese writing mixes together standard Chinese characters with nonstandard or dialect characters and letters of the English alphabet. For various reasons neither the old Zhuang script nor the written form of Cantonese has undergone the formal process of standardization; the lack of standardization has created the phenomenon of allography in both writing systems.

This is altogether a fascinating discussion of non-standard Chinese characters. Way too much information here for a blog post but this article is too good not to mention.

I hope nobody tries to read this unless they are very interested in either the Zhuang *or* the thousands of non-standard Chinese characters ... because, um, this blog post is a little long-winded. Thanks to Gary for mentionning non-standard Cantonese characters recently and giving this link.

Vietnamese Revisited

[Warning! This post suffers from inaccurate terminology. When I wrote different 'encoding', I really meant different 'character sequences.' ]

Last June I posted about how not to help someone do a google search in Vietnamese. It was one of my more frustrating experiences and I just walked away and forgot about it. However, Mike has made me think about it again. I went back and visited my post and saw what was wrong with it. So I am giving it another kick at the can.

I was asked to help a Vietnamese speaking social worker to do an internet search in Vietnamese. He said "You don't need the accents - just use the English keyboard." We tried that and got some hits. I didn't have a Vietnamese keyboard at that moment, so we went to VietDic and using that got an encoding for our search term in Vietnamese - many hits.

At home I tried the Microsoft Vietnamese keyboard. Not so many hits.

So once again here is my experiment from last June - updated.

First, I am using google:images results. The term bãi biển means beach. If I get pictures of beaches preferably in Vietnam I consider that a good hit.

Here is the test with terms displayed this time in Arial font.

1. VietDic site - bãi biển 654 hits

2. Microsoft Vietnamese keyboard - bãi biển 5 hits

3. Combining accents only - bai biên 473 hits (Not all beaches)

4. No accents - bai bien 207 hits (Not all beaches)

Okay, so I don' t speak a word of Vietnamese but it does seem that something is not right with the MS Vietnamese keyboard. There must be two different encodings that look identical and no normalisation in the search engine. If anyone can explain this I would be interested in hearing the story.


Mike says in the comment section that "individual standards that cannot represent other languages are an evolutionary blind alley -- as is deciding the best encoding for a language by measuring google hits! :-)"

I accept your point, Mike, I won't defend using google hits to prove anything. They are basically for fun. With an image I know I have beaches. With a different encoding I still have beaches but not the same beaches. I concede this point.

However, what is meant by "individual standards that cannot represent other languages are an evolutionary bind alley"? Maybe someone thinks that I am not using Unicode. I wouldn't know how not to use Unicode.

Here are my Unicode codepoints (upgraded from my comment section) for the 'ể' in biển.


#2 'ể' is a two characters U+00EA : LATIN SMALL LETTER E WITH CIRCUMFLEX *and* U+0309 : COMBINING HOOK ABOVE

They are both Unicode aren't they, but shouldn't one of them be the standard? Who sets the standard?

Update #2: Sorry, I have not been using the right terminology. So I hope nobody thinks that this is a techblog. Instead of saying 'encodings', which indicates Unicode and some other encoding standard, I should say two different 'character sequences'. (There is a paper about all this terminology, &c.)

And it turns out that the two sequences for 'ể' are canonically equivalent. Thanks Andrew C. However, the normalization that should occur for 'canonically equivalent character sequences' doesn't appear to work in either google or yahoo.

Update #3: Two people have recommended alternate Vietnamese keyboards. Unikey and VPSkeys. Great! Thanks Michael for your description of Unikey.

Comments continue on Mike's post.

Exploring Christianity

I have been fascinated by how one shape can morph into another for some time. This applies to ancient as well as current writing systems. A while ago I posted on the Hanzi 口 and 十 . I had been thinking at the time of how easily ten in Chinese, morphs into X ten in English. I believe this is pure happenstance, no mysterious theory here, just an observation of how some glyphs, that is shapes, are basic - urglyphs, so to speak.

Here is an example of how X has morphed into a cross t.

(Okay, in century gothic it looks like a cross but it may not otherwise.) In any case I decided to see how many steps it takes to morph the 'x' in 'exploring' into the cross, passing through the Greek letter 'chi' χ on the way.

First I wrote 'exploring', then changed it to 'eχploring'. Next, I made the chi italic so 'eχploring', then I enlarged the chi, 'eχploring'. Then I changed the font to verdana 'eχploring'.

Now I had to paste a screenshot into an image editor and erase three of the crosspieces so that the lower one would appear elongated and create a tilted cross. Then I saved it as an image.

No, I did not do all this out of my own imagination - exactly. I saw a poster which used this effect and wondered how it was done. I followed the link to the exploringchristianity website and saw to my disappointment that this effect is not represented on the website. I am sorry I did not take a photograph of the poster. My image seems fuzzy and amateurish but I assure you that the poster was eχcellent.

Update: I have lost some formatting when I uploaded this post so the italic 'chi' cannot be viewed. It stays stable in Word and I pasted my Word image into the image editor.

The Italic Ampersand

I didn't quite get it right. When I opened the Arts & Life section of the Vancouver Sun this morning I found that in the text they did indeed use the & character. However, in the advertising the ampersand appears in its other form, as a distinct 'et' ligature. I understood that this might be the italic form of the & so I am checking a few fonts for the right shape.

I haven't done an exhaustive search but I did scroll through a few fonts and found that Palatino Linotype does the job! Here it is &. This is the bold italic version of the Palatino Linotype ampersand and shows the et ligature which is found in this movie poster.

Now for a few more images. Six different ampersands appear here. I can see from this that Palationo Linotype is not the font used in the poster. It is close - but not a match.

And the true derivation of the et ligature is demonstrated here. (Actually I am not too sure about this one.)

An even more wideranging discussion of this character appears here.

I would love to find out what this page in Japanese says about the use of the cross as 'and' in this Romeo + Juliet poster.

Update: Thanks to Emeth Hesed for providing a translation from the Japanese.

What is an ampersand?

Well, let’s take a look at the picture on the left. It is the DVD jacket for Leonardo DiCaprio and Claire Danes’ Romeo & Juliet.

And, because it’s in white it’s hard to see, but can you tell it says “hope & despair, tragedy & love”? Inside the red cross beneath there is a black “&”.
This “&” mark means “and” of course, but why?

Actually, this comes from Latin (a dead language not spoken by anyone anymore). It is a stylish way of writing “et” (meaning “and” in English). As you can see in the chart below, there are various designs.

This mark is called “ampersand.”

A long time ago, when learning the alphabet at school, children memorized it by saying from A, “A-per-se-A, B-per-se-B, ...” (A by itself A, B by itself B). And then, when they finished Z, there was an “&” and they said, “and-per-se-and.” That became “ampersand.” Continue here.

Wednesday, November 09, 2005


There is a new Pride and Prejudice movie out, titled 'Pride & Prejudice.' I haven't seen it yet. (Oops, I think it opens tomorrow?) Just one more thing to add to the list. I wouldn't mind checking out the costumes though and the real estate. However, I was curious to see how the ampersand is faring on the internet.

On the first page of Google results, only 3 out of ten sites use the ampersand in the movie title. The other sites talked about the movie Pride and Prejudice.

Here are the results.

Pride & Prejudice - Tucson Citizen, New York Daily News, SheKnows

Pride and Prejudice - San Francisco Bay Guardian, New Yorker,, Monsters and Critics, Rolling Stone, Globe and Mail, Cinematical

On the second page only two out of ten used the ampersand and so on.

The ampersand is not hard to find, right up there in the middle above the 7, so what was so difficult about that? I found it myself without asking for help!

I can, however, being so cynical, think of about ten reasons why one might not use the ampersand. You think it is a lazy shortcut, a handwritten shorthand symbol that does not relate to printed text, an everyday equivalent of 'and'. Or maybe the person at the keyboard is processing phonetically, lexically, or kinesthetically, and not visually. Maybe the typist is making a grammar correction along the lines of 'In this context the ampersand really should not be used.'

But didn't someone make a big fuss about this being a distinguishing feature of the movie title? Who said what exactly on the subject of the ampersand?


I have been completely sidetracked by now because I have just discovered HTML Ampersand Character Codes. This is a site that explains how you can keyboard unusual characters in html that do not appear on your keyboard, using the ampersand and the name of the character (from the chart, of course).

I can now enter æ þ ¿ ¡ which I have never missed - as well as some I find it ridiculous to live without - ¢ ° ¹ ² . My keyboard has dollars but not cents.

The one that has me totally puzzled is the code for ampersand itself. Would someone please tell me what use it is to know how to enter ampersand in code, which requires the ampersand, unless you have the ampersand, in which case you don't need to enter it in code?

Thanks to my sister Liz for alerting me to the use of the ampersand in the movie title Pride & Prejudice. Fortunately this blog is about writing systems and not movies because I haven't checked out any details on the Pride & Prejudice movie yet. But then I can't afford the real estate.


Now I only need to know the sequence which would enable me to write about these codes without invoking them if you like. Off to a tech site for me.


Thanks to help from commenters I was able to open the HTML Ampersand character code page and using view:source, I was able to see how the code is written in order to display the code to write these characters. So " for " and þ for þ and so on. I also found out the use of the code for ampersand since it is essential to write these codes for display. Not that I can explain this properly but it does work if I just copy from the code displayed when I open source and don't think about it too much.

Of course, none of this works in the blogger comment page itself.

Tuesday, November 08, 2005

The Aleph-Beth-Gimel

Archaeologists digging in July at the site, Tel Zayit, found the inscribed stone in the wall of an ancient building. After an analysis of the layers of ruins, the discoverers concluded that this was the earliest known specimen of the Hebrew alphabet and an important benchmark in the history of writing, they said this week.

If they are right, the stone bears the oldest reliably dated example of an abecedary - the letters of the alphabet written out in their traditional sequence
. From the New York Times

More on Language Log.

The De Landa Abecedario

Enough already about how Apocalypto doesn't mean 'new beginning'. Did we think it did? If Gibson wants to create a new beginning out of an apocalypse, so be it.

What is more interesting is that on Language Log I read that Gibson gained inpsiration from "texts by 16th century bishop Friar Diego de Landa y Calderon, who wrote the book 'La relacion de las cosas de Yucatan' (The relation of things of the Yucatan)."

So who was Diego do Landa? He is best known for two separate accomplishments in his life. First, in about 1560 he recorded information about Mayan religion, language and culture, capturing the sounds of the Spanish alphabet in Mayan glyphs. He recorded these signs in relation to letters of the alphabet but the glyphs were eventually more correctly interpreted as representing syllables.

Second, de Landa was personally responsible for destroying as many Mayan books as he could get his hands on, possibly only 27, but maybe many more. Since many others have perished in the moist environment, this leaves 4 perserved manuscripts written in Mayan glyphs. While de Landa may have been witness to Mayan child sacrifice, he is himself known for his exceptional cruelty in a very cruel time.

Diego de Landa (1524-1579) was a Franciscan friar who arrived in Yucatán in 1549, and twelve years later became the Franciscan Provincial. He recorded many details of the Maya culture through the native informants Gaspar Antonio Chi and Nachi Cocom. His report, Relación de las Cosas de Yucatán ("Account of matters in Yucatán"), was written in 1566 after he was forced to return to Spain in 1563 by Bishop Toral, who had complained to the Council of the Indies about Landa's treatment of the Indians, including his burning of 27 Maya hieroglyphic codices at Maní in 1562 in protest against child sacrifice. Landa returned to Yucatán as Bishop in 1573, replacing the now-deceased Toral

The Relación provides an essential chronicle of Maya life, reporting on their houses, farming practices, religious ceremonies, and calendrics, plus detailed information on Maya hieroglyphs including a partial syllabary. The manuscript for the Relación was probably seen by late 16th century Spanish historians Lopez de Cogolludo and Herrera y Tordesillas (Thompson 1963), and was rediscovered in 1863 by the French antiquary Abbe Brasseur de Bourbourg in the Madrid Biblioteca de la Academia de la Historia. This highly important source, first published in 1864, three centuries after Landa compiled it, has been central in deciphering Maya script. Athena Review

If you have time for a little more history, here is a short take on the decipherment of Mayan by Michael Coe,

Why did it take such an unconscionably long time before we could actually read the writings of the most brilliant civilzation of pre-Columbian America? The fact of the matter is that a true "Rosetta Stone" key for the Maya decipherment challenge had been available since the mid-1860's when Bishop Landa's sixteenth-century account of the script was rediscovered in a Spanish archive. Recounting the testimony of his native informants, Landa had claimed that the system was based upon an abedeceario (sic), an alphabet, and he gave examples of how sentences could be written with it. Unfortunately, when this was applied to the then-known Maya codices, the results were ludicrous and were dismissed-along with Landa's "ABC"-by serious scholars. Michael Coe in Difficult Characters ed. by Mary S. Erbaugh

Until 1952 scholars interpreted Mayan glyphs as nonphonetic ideographs. It was Knorosov who showed that the Maya writing system was not an 'alphabet but a syllabary' ... 'similar in structure to other early scripts such as Sumerian, Egyptian, and Chinese'.

Athena Review
Journal of Historical Review
Canadian Museum of Civilizaation

Paranthetical: In Coe's article, 'abecedario' is actually spelled 'abedeceario'. Does this reflect a simple typing error, an inability to sequence the phonological syllables, or an ideographic relation between an idea and a word written in the English alphabet, bypassing the phonological processing route altogether?

What speaks against its being a typing error is the additional 'e' following the 'c' which has been added to make the word conform to English spelling rules. I have no idea who it was that made this error, but I remark on it only as a curiosity.

Sunday, November 06, 2005

La Plume Caporal

I was looking for a particular title in the university library today and my eye fell on a large old book on Chinese cursive writing. It was the Dictionnaire des Formes Cursives des Caractères Chinois par Stanislas Millot, Lieutenant de Vaisseau. 1909. Intrigued I sat and read the preface.

French Original (Scroll down for English):

Les sinologues sont souvent arrêtes dans leurs travaux par le rencontre de caractères cursifs ou antique et la mondre inscription sur un objet de collection peut les discréditer dans l'ésprit des profanes.

Le 20 juin nous étions à Takou à bord du croiseur le Pascal au milieu d'une trentaine de bâtiments de guerre de diverses nationalités.

On manquait de nouvelles de la colonne Seymour et des assiégés de Tien-tsin et Pékin et l'ánxiété était par suite à son comble lorsque l'ón recût un message en caractères cursifs à l'addresse d'un amiral chinois prisonnier. On espéra y trouver des renseignements intéressants mais les Japonais eux-mêmes malgré l'usage constant qu'il font des caractères chinois déclarèrent n'y rien comprendre.

On était sur le point d'envoyer un croiseur à Tchefou pour y faire traduire le document lorsque l'on songea à nous le montrer. Grâce à l'étude spéciale que nous avions faites, par hasard, de l'écriture cursive nous pûmes, non sans peine fournir l'interpretation désirées.

Ad Hoc Translation:

Sinologists are often stopped in their work when they meet cursive or antique characters and the least inscription on a collectible can discredit them in the minds of laypeople.

The 20th of June we were at Taku on board the cruiser Le Pascal in the middle of about 30 warships of different nationalities.

We lacked news of Seymour's column and the besieged at Tien-tsin, and anxiety was at its peak when we received a message in cursive characters from an imprisonned Chinese general. (The officers) hoped to find interesting information in it but even the Japonese in spite of the constant use that they make of Chinese characters declared that they understood none of it.

They were on the point of sending a cruiser to Chefoo to have the document translated when they thought of showing it to me. Thanks to the special study that I had made, by chance, of cursive writing, I could, not without difficulty, provide the desired interpretation.

There is some historic information on what a French vessel was doing on June 16, 1900 off Taku here and more about the Boxer Rebellion here.

More information about this unusual book was found in a comment on a post about Ricci on Pierre Haski's blog.

J'avais trouvé un recueil de toutes les formes cursives de hanzi, répertoriées par un lieutenant de vaisseau, stanislas millot en 1909. Le bouquin est intégralement écrit à la plume caporal, un truc incroyable. dans la bibliographie de ce type, outre ce "dictionnaire des formes cursives des caractères chinois" on trouve des ouvrages aux titres exotiques comme "notice sur deux abaques pour problèmes de tactique navale".


I found a collection of all the cursive forms of Hanzi, indexed by a ship's lieutenant, Stanislas Millot in 1909. The book is fully written with a corporal's plume, an incredible accomplishment - in this guy's bibliography, as well as 'dictionnaire des formes cursives des caracteres chinois' one can also find works with exotic titles like "notice sur deux abaques pour problemes de tactique navale."

I found that Millot's birthdate was 1875 which makes him 25 years old in 1900! And this book is written entirely in beautiful calligraphy. I missed the reference to the corporal's plume so I will have to go back and have a further look. I am not sure that I have translated this correctly but what else could it be? Fire away.

An interesting post on Chinese cursive script here.


Dictionnaire des Formes Cursives des Caractères Chinois par Stanislas Millot, Lieutenant de Vaisseau. 1909. Leroux. Paris.

Image of Cursive Script from Appreciation of the Art of Chinese Calligraphy

Pierre Haski's blog was quite a find for me since I was so taken by his book Ma Yan which I bought in Hong Kong last year on the way home from Beijing with my daughter - but I digress.

PS I returned a day later to the library to check on the reference to 'la plume caporal' and the book was not on the shelf. I have put a trace on it and will go back later.

Update: I have found a reference to 'la plume caporal' here . "c'est que même si sur amazon ils proposent de rédiger un petit mot, je préfère prendre du canson, ma plume caporal et mon encre de chine et écrire moi-même un petit-mot..."