Intro

Recently, I had the honor of being invited to give a training lecture to the Taiwan team for the International Linguistics Olympiad. It’s basically an annual competition where participants from countries all over the world get together to solve linguistics puzzles. You don’t need to have official training in linguistics in order to solve those puzzles, though knowing some linguistics does help. That’s why I was asked to give this lecture.

A past problem set

The whole point of the competition is not so much the scope of linguistic knowledge one has as the analytic ability of identifying the systematic patterns buried in the given data. The process is actually quite similar to solving programming problems on LeetCode.

Take this problem set from the 16th competition in 2018 for instance1.

The problem set is from Terêna, a language spoken in Brazil. Given the information provided here, participants are asked to fill in the gaps indicated by numbers 1 through 14. NLP models might be able to solve a problem like this one day, but that would require way more training data than provided here. But we humans are pretty good at making generalizations with only a limited amount of data. I encourage those of you who know little about linguistics to work on this puzzle before checking out the answer2.

Lecture slides

The Terêna problem set happens to be relevant to one of my favorite research topics, which is why I chose this title for my lecture: verbal person marking & pronominal clitics. It may sound like gibberish if you don’t know anything about linguistics. Although the technical terms may sound obscure, the linguistic patterns they describe are in fact quite straightforward. To find out, check out some examples in the lecture slides below.

A problem set from Taiwan

In addition to the slides, I created a problem set based on a language of Taiwan, which is called Kavalan. You’ve probably heard of this name because of Kavalan Whisky. That’s right. This is the same Kavalan we’re talking about here. I wrote my MA thesis on the Kavalan language, but I’ve never tried Kavalan Whisky.

Here’s the Kavalan problem set, most examples of which are drawn from this paper3.

SN Kavalan English
01 pukunankuisu. I beat you.
02 pukunansuiku. You beat me.
03 qaRatannaiku. He bit me.
04 maiiku pmukun wasusu. I didn’t beat your dog.
05 pmukuniku sunisku. I’m beating my child.
06 mai wasusu qmaRat wasuku ni? Didn’t your dog bite mine?
07 maipamaisu maynep ni? Aren’t you asleep yet?
08 mayneptiisu ni? Are you already asleep?
09 pmukuntiisu wasusu ni? Did you already beat your dog?
10 maiiku pukunanna. He didn’t beat me.
11 qaRatansuiku ni? (1)
12 maiiku qmaRat sunissu. (2)
13 (3) I haven’t beaten my child yet.
14 (4) Didn’t you beat your child?

The task is to fill in the gaps indicated by (1) through (4). If you go through the slides, you’ll be in a better position to solve this puzzle. I encourage you to share your solution in the comments below.

Footnotes

  1. Here’s the complete Terêna problem set

  2. Here’s the complete solution to the Terêna problem set. 

  3. The Cluster-internal Ordering of Clitics in Kavalan (East Formosan, Austronesian by Doris Ching-jung Yen and Loren Billings, presented at the Annual Meeting of the Berkeley Linguistics Society.