Saturday, April 15, 2006

Additions to the Myanmar Script in Unicode

I was surfing the internet looking for more on the diæresis (note that I have used the more correct U+00E6 today) when I received notification of a proposal that included this dear little thing. Not a diæresis after all, it is the proposed character MYANMAR VOWEL SIGN GEBA KAREN I U+1097.

The usual MYANMAR VOWEL SIGN I is U+102D. (I have just used Babelmap to confirm that I am reading this correctly.)



The argument reads that since both vowel signs can appear together in text, albeit a grammar text, they need to be represented by different codepoints, so the different glyph (shape) can be represented in plain text, and not just by a different font.

The entire document was recently released here. Preliminary proposal for encoding Karen, Shan, and Kayah characters in the UCS.

Here is a text from this document showing the Karen vowel sign I and the Myanmar vowel sign I. I recommend the entire document. Many new characters are proposed.

Update: The previous title 'Myanmar Block Unicode Proposal' was truly terrible. I was thinking that it was the Mayanmar 'block' in Unicode, not necessarily only Myanmar 'users' of the script. Then I went back to change the title and my wonderful spam blocker shut me out of blogger for a while. Thanks Paul for mentioning this.

Friday, April 14, 2006

Windows Character Map


I was reading Language Hat's post on the peace jacket and let me tell you, I am very glad I am knitting my socks with variegated wool. There is no way I can have pretensions with something like that. I even paid money for a pattern instead of googling or making it up myself - ouch.

However, I always forget where to find all those letters with accents and whatnot on short notice so here is a reminder. The Windows Character Map can help with all this stuff. I have set it at the font with the widest range, MS Reference Sans Serif. There is the letter s with hook, but you can read the unicode name on the map. The character map is under programs> accessories> system tools. And this page is the best for finding a copy and paste letter when you need it. I will add it to my resources in the sidebar.

But most people really do just find a piece of text elsewhere and copy and paste. It took me a while to figure that out.

History of the Diæresis

Rather than leaping into a search for the best way to keyboard Coptic, I have decided to step back and examine each of the three necessary diacritics in the Coptic alphabet first. These three are the diaeresis, the overline denoting a nomina sacra or abbreviation, and the jinkim which looks like a macron and gives the consonant a syllabic quality. There are other diacritics but these three occur most often in the manuscripts and have such a diverse and obscure history that I think I will start here and not worry about the others for now.

The papyri in this image, P52, represents the earliest dated instance of the diaresis that I have been able to find. This fragment is a portion of the crucifixion story and is dated 125 - 150 CE.

In this case the diaeresis is at the beginning of a word. In the second line the text reads 'oudena ina' and the diaeresis serves to indicate that the 'i' is pronounced separately from the 'a'. It actually begins a new word and new clause. There also appears to be a trace of a diaereis on the third letter of the top line.

This is the only discussion of the early history of the diaresis that I have found.

    [U]ndoubtedly the occurrence of diaeresis and the omission of iota adscript can be used as criteria of date and, comparatively rare at the beginning of the second century, were increasing in frequency with each successive decade. Statistics for these phenomena do not appear to have been collected (a systematic investigation of the subject might be of some value for palaeography), but such search as it has been possible to make shows that the date assigned to 1 is not affected by them.

    The use of diaeresis over i or u was exceedingly rare till the second century, but it was not entirely unknown before then. Originally introduced to distinguish as separately pronounced a vowel accompanying another vowel with which it would otherwise make a diphthong, the usage was soon extended to vowels standing alone, and therefore became meaningless.

    It is only the latter use which is relevant to the present case. P. Fay. 110 (A.D. 94) contains in euu`perbaton (l. 9) and twi i`diwi (l. 2) instances of diaeresis which, though an extension of the original use, cannot be regarded as wholly incorrect, since adjoining vowels are being distinguished; but i`na (ibid., 11.6, 9) is a clear case of the incorrect use, dusi u`dasi (l. 17) is at best a further extension of the use in euu`perbaton and twi i`diwi.

    Systematic search might perhaps reveal other early examples, but so far as the statistics collected are concerned there are none in exactly dated documents before A.D. 110. (From Fragments of an Unknown Gospel )
The article goes on to discuss the 'correct' versus the 'incorrect' way to use the diaeresis, which seems like a tedious approach to me. I am relieved to find in antiquity the occasional lack of respect for rigid spelling standards.

I have also scanned through pages of prechristian papyri without finding the shadow of a diaeresis. I would love to hear more about this history and earlier diaereses if they exist.

Update: I misspelled diaeresis in the title and almost left it but 'diaresis' doesn't google as well as
'dieresis' or 'diaeresis.'

Friday, April 07, 2006

The Coptic Writing System

The first image is the name 'Judas' from the third/ fourth century (?) manuscript the Gospel of Judas written in Coptic. The second is 'Judas' from the Codex Alexandrinus, a major fifth century Greek New Testament manuscript in the British Library.

Just to be clear. I am not commenting on these manuscripts other than describing the script they are written in. They are both written in Greek Uncials. To scholars studying these manuscripts, these two documents appear to be written in the same writing system, and they are. However, the Coptic script has a few more letters.

Greek letter shapes changed over the centuries and the uncials are no longer used, even for copies of the Greek New Testament. They exist in manuscripts studied in museums. Coptic, however, did not evolve in the same direction. The Coptic church still uses a system which resembles the Greek uncials. This website is posted in English and Coptic, so there it is on the web, the Coptic writing system.

The Coptic and Greek writing systems were disunified in Unicode last year. Yesterday, I linked to the two relevant Unicode blocks. Scroll to the bottom of this page for a discussion on the disunification.

I haven't tried a google search yet - some other time. You should google 'Gospel of Judas' in English, not Coptic, to read all about it. For more on Coptic with images of other manuscripts see this site.

Now, to get down to business. I am having a few of the usual problems. I did not include the diacritics in my image in yesterdays post. The three that should have been included are the combining diaeresis U+ 0308 , the combining macron U+ 0304 and the combining overline U+0305. The combining overline was perfect and I will use it sometime. But I could not get the other two in the right place. I am hoping for help on this.

I even tried to select text from the PDF file supplied by the National Geographic Society and, of course, do I need to say this. It appears to be a precomposed non-Unicode font. Then I went back to the Tenaspi Remenkimi site and it isn't Unicode either.

So the Coptic writing system appears on the net. It is visible and it is in the process of being implemented as a Unicode writing system. It will be interesting to watch.

Please comment if you can add to or correct any of this information. Or just to say "Hi".










Thursday, April 06, 2006

Gospel of Judas

These are the first three lines from the Gospel of Judas. (PDF) I watched the news conference tonight and then found the National Geographic site. I wasn't able to open the actual manuscript image pages at the time so I decided to work with the font.

I downloaded and installed the New Athena Unicode font from Dave McCreedy's Gallery of Unicode Fonts and copied these lines for myself below. The Coptic alphabet is found in the Coptic Unicode block and the Greek and Coptic block. Five of the seven characters in the Greek and Coptic block were necessary for this text. These letters are basic to Coptic, so don't lose them. Just so you know, it you are looking to input Coptic. Use both blocks. The disunification of Coptic from Greek in Unicode just occurred in the past year.

My text is below, a simple exercise in order to figure out where I would find the different characters. They are basically in the same order as Greek with a few others characters like U+03E2 : COPTIC CAPITAL LETTER SHEI and U+03E4 : COPTIC CAPITAL LETTER FEI.

I copied out the name of Judas Iscariot underneath the text here - this name can be seen at the end of the second line and the beginning of the third line.

One thing I did find was that it was much easier to copy a known name than the rest of the text. The text is in the Sahidic Coptic, the major literary variety of Coptic.

I couldn't access the actual images of the manuscript when I started out this evening, however, they are available and on a second try I was able to view a few select pages here. Some excerpts of the English text are here.

The National Geographic special is on this Sunday, April 9. Great previews of old manuscripts on the news tonight.

Update: Here is where you can download a Coptic keyboard.