Fun with Computational Linguistics

I am smarter than a high schooler, or, at least, better at some types of quizzes. I ran across a mention of the North American Computational Linguistics Olympiad, and amused myself by taking the first-round test given in February. Depending on whether you count two of my answers as correct, I either scored 97.95 out of 100, beating all of the high schoolers who took the test; 95.97, placing second; or 95.18, placing third. (In one case, the question asked which verb was irregular, I wrote down the verb stem, and the solution book gave the conjugated form. In the other, my answer is correct, but the answer in the solution book is wrong. I count both as moral victories, but for the purpose of competition, I have to take into account how the tests are actually graded. And, to be completely fair, I should note I allowed myself a break while taking the test, although my total time was twenty minutes under the limit and I didn’t let myself revisit any questions I’d done before the break.)

To give you an idea of what the test was like, one question gave eight sentences in a Native American dialect, with the English translations in random order, and you had to work out which sentences were the same. Another gave two articles on the same subject, one in English and one in Indonesian, which were not translations of each other, and challenged you to figure out what various Indonesian words meant. The test designers deliberately chose obscure languages from distant language families to test your ability to tease out principles and correspondences. It’s a fun challenge, and my experience with Hungarian probably served me in good stead, by giving me a more flexible notion of what grammar can be like.

I also ran across a quick and easy vocabulary quiz which estimates how many English words you know. I got an estimated English vocabulary of 39,200 words. Then I retook the test and only checked the boxes when I knew the Hungarian translation of a word, and got an estimate of 4,280 words. Interestingly, if you look at the graph of results for non-native speakers, I’m right at the peak—4.7% of non-native speakers had a vocabulary between 4,250 and 4,749 words, although the median was higher, at 7,826 words. Taking the test the way I did isn’t really a valid measurement of anything, and comparing the results to a self-selected sample of non-native English speakers is even less valid, but it’s still kind of interesting.