"Communication in Slovene" Slovensko Slovensko

Training Corpus

the goal of the activity is a manually annotated and validated training corpus designed for the training of statistical POS-taggers and parsers
the corpus will contain four levels of annotation:
lemmatization
part-of-speech tagging
dependency parsing
named entity recognition
the corpus will contain 400,000 words
setting standards for the training corpus compilation will be in progress from June 2008 to December 2008
the training corpus annotation will be in progress from January 2009 to December 2010

Leader: Simon Krek, Amebis, d.o.o., Jožef Stefan Institute

MŠŠ EU