For the past month I've been involved in the TIDES Suprise Language project, which is a cool new ideawhere a bunch of computational linguistics researchers get together and have exactly one month to develop various bits of NLP technology (machine translation, named entity extraction, etc) for a given human language. The
surprise bit is that they don't know which language it will be until the month begins—the only thing they know is that it will be a language that people haven't put a lot of effort (in terms of NLP) in to before.
Beyond getting people to start working on a new, presumably relevant, language, this is also a great way of assessing the ability of the research community to do
rapid start NLP—something the government is very interested in as of late.
The language was Hindi. The month ended yesterday. Overall the project was pretty successful, I think. All the big players were involved. It was interesting to see a cooperation-based project rather than the competition-style events, which is what I've been involved in before, for conferences. There's still competition in terms of who can develop the best system, but resources, tools, etc. were all shared between researchers.
We submitted a named-entity tagger. It'll be interesting to see how it fares.