LTStraipsnyje apibendrinami rezultatai ir patirtis, sukaupta atliekant kompleksinį mokslinį darbą-kuriant pirmą lietuvišką sistematizuotą garsyną LTDIGITS. Pristatomi visi šio garsyno kūrimo etapai-akustinio ir fonetinio turinio parinkimas, medžiagos (įrašų) kaupimas, priemonių, skirtų garsynui apdoroti, kūrimas. Supažindinama su problemomis ir patirtimi, susiformavusia kuriant ir apdorojant pirmąjį sistematizuotą lietuvišką garsyną bei kitus garsynus, naudojamus VU KHF reikmėms. Stengėmės parodyti, kokią svarbią įtaką kalbos signalų duomenų bazės turi tolesnei kalbos technologijų raidai.
ENThis paper presents our activities working on Lithuanian speech corpora LTDIGITS. This database has been created in 1998-2000 under the grant of Lithuanian science foundation. It is well known fact that successful development of speech technologies depend strongly on speech corporas - properly designed and collected speech databases. LTDIGITS is the first attempt to create systematic Lithuanian speech corpora. Paper presents all stages of LTDIGITS creation: design and selection of phonetic content, collection of recordings, preparation of a set of tools for corpora processing and preparation of corpora processing methodic. We used only proprietary software tools since there are no standard technique for speech corpora collection and processing. LTDIGITS contains utterances provided by more than 200 male and more than 200 female speakers. Each speaker read a single set of phrases containing various combinations of Lithuanian digits, some phonetically difficult phrases as well some set of Lithuanian words that are suitable for control by voice applications. The value of speech corpora increases significantly when database is properly processed. This means acoustic and phonetic labeling of speech material contained in corpora. We describe our tools developed for acoustic-phonetic labeling of speech. It contains software tools to find boundaries between phonetic units and to check correctness of labeling. Finally we propose some suggestions about further development of LTDIGITS and other Lithuanian speech corporas. At first LTDIGITS should be transformed to the telephone line quality database, preferably both wired and wireless.Tools for corpora conversion for wired environment are prepared and tools for corpora conversion for wireless environment are under preparation. Telephone speech databases are important since various services for telecomunication environment are developing rapidly. Further development of Lithuanian speech corporas including new sets of phonemes and words is necessary. We believe that collection of task and phoneme group oriented corporas are perspective. We hope to collect Lithuanian corporas of plosives in different contexts and most common diphtongs.