Reference Corpus of Slovene and Slovene Lexical Database, with Grammatical Annotation Tool
The goal of this module is to compile:
a reference corpus of 100 million words, which will include a spoken subcorpus;
a Slovene lexical database, which will contain information on lexicon features, such as frequency of occurrence, pronunciation, morphology and syntax, sense discrimination, phraseology, etc.
a grammatical annotation tool, which will consist of a lexicon of inflected forms, as well as a tagger and parser for Slovene text analysis