Papers, projects, etc.

Projects

My current projects revolve around the theory and practice of representing linguistic data in the electronic form. This means XML technologies in general and the TEI scheme in particular.

Since the autumn of 2006, together with Beata Wójtowicz of Oriental Studies, I've been involved in creating a new Swahili-Polish dictionary (with the Polish-Swahili part to follow). It's been Beata's dream for years and I talked her into trying to make it real — we're making some progress, with a few conference presentations and papers already behind us, though the real thing is still way ahead, and we count on some serious financing, because the end result will require a team of lexicographers and quite a few manhours. Beata has been a teacher of Swahili for several years (apart from lecturing on language technology) and wrote her PhD diss on how she envisions this dictionary. I add to this my interests in morphology, lexicography and XML/TEI. And it works — we're in the process of producing a small test Swahili-English dictionary that we will release under a free license.

Since October 2007, together with Radek Moszczyński (my former student, if the fact that he participated in a few of my courses justifies saying that; hmm, I was also the reviewer of his thesis on multiword expressions), we're getting deeper and deeper into another project, concerned with an XML representation of multiword expressions (which we understand as strong collocations and idioms). The project is set in the context of the English-Polish dictionary by Tadeusz Piotrowski and Zygmunt Saloni, which the authors kindly released under GNU licenses. We already have a small RELAX NG regular expression grammar of multiword expressions (still being polished) that can be plugged into the schema for the XML version of the dictionary. And we have learned quite a lot about how much cleaning the electronic form of the dictionary still needs, and that is going to take a lot of evenings still.

My XML experience started with the project (headed by Adam Przepiórkowski) that created the IPI PAN Corpus of Modern Polish where I attempted to advocate stand-off annotation for the corpus (following the guidelines of the Corpus Encoding Standard, an application of the early TEI), which was beautiful in theory but not manageable in practice yet — the proper support for XInclude and XPointer technologies was introduced only after the project was completed. Still, I wrote some HTML-to-XML converters for it, which exposed me to the intriguing charm of XSLT (back then, it was still version 1.0). I'm still under its spell — version 2.0 is even more fun.

I also co-admin the TEI wiki. I feel I owe something to the nice and helpful TEI community, so this is my way of paying back :-)

My long-standing project of another sort is GLiP, a generative-grammar conference chain with two fathers. GLiP was conceived over a pizza (as far as I recall; me eating with me fingers and washing it down with beer, and Adam P. using cold steel and drinking hot tea or some such stuff) in mid-1999. We have always managed to gather excellent guests and friendly audiences, and even to publish a proceedings volume each time. Then came a bit of burnout, a heap of other projects, and a few years' break, and now GLiP is coming back for April 2008.

The academic disorder1


[1] It has to do with the Latin scribere 'write' and Greek mania 'madness'. It doesn't really hurt the patient. (Ambiguity intended.)


Valid XHTML 1.0! Back to the main page.

Last modified: Wed Nov 21 01:58:52 2007 CET