"Communication in Slovene" Slovensko Slovensko

Indicator 10

Training Corpus – 400,000 words (December 2010)

Indicator 10 is a completed manually annotated training corpus – morpho-syntactically and syntactically annotated: 400,000 words.

Morpho-syntactically annotated: 500,000 words.
Syntactically annotated: 200,000 words.
With named entities identified: 100,000 words.


The operation is partly financed by the European Union, the European Social Fund, and the Ministry of Education and Sport of the Republic of Slovenia. The operation is being carried out within the operational programme Human Resources Development for the period 2007–2013, developmental priorities: improvement of the quality and efficiency of educational and training systems 2007–2013.