I was working with a Vietnamese social worker recently when he asked if we could look up autism in Vietnamese on the internet. Since I had heard that there was some difficulty in keyboarding Vietnamese, we went to the Vdict dictionary for a reference and then copied and pasted into google. Sure enough we got some hits.

However, he was watching all this intently and said "Oh, no, you don't need the accents - I never use them, just type in the word from the English keyboard." I demurred.

He said, "Look at this. I type 'bai bien' for beach, google and there they are, beaches." Hmmm.

So later, by myself, I ran a little test. I keyboarded 'bai bien' in 4 different ways and then tested them out in google to see what I would get. In spite of the fact that these words to not display in this blog they did display properly in google and I did get hits.

bãi biển - copied from the Vdict dictionary - 403 hits
bai bien - no accents, right off the English keyboard - 103 hits
bãi biên - a mixed encoding, one level of diacritic but not the other - 218 hits
baÞi biêÒn - from the Microsoft Vietnamese keyboard - 1 hit

The hits for bai bien, no accents, were approximately 75 % Vietnamese beaches, some French horses and a few other things. Still lots of beaches. If I had designated the language would I have done better? No further comment at this time.


Michael Kaplan said...

I would tend to think of it as a bug in Google's language-specific searching (also not sure what you meant by the MS keyboard -- those characters are noe on that keyboard unless you are dealing with an app. that is doing bad Unicode conversions? :-)

Anonymous said...

Okay, there is a definite problem with blogger because I have retested the word in the blogger post preview and it still messes up. However, it looks okay here in the comments box.

bãi biển

This word is straight from the Windows keyboard and it is exactly what I posted earlier with all those wonky characters.

This is also exactly what I put into google as the #4 example. Still 1 hit.


