Publications

Scientific publications

Рогов А.А., Сидоров Ю.В., Седов А. В., Гурин Г.Б., Котов А.А., Некрасов М.Ю.
Некоторые особенности формирования электронного корпуса текстов с синтаксической разметкой
Rogov A.A., Sidorov Yu.V., Sedov A.V., Gurin G.B., Kotov A.A., Nekrasov M.Yu. Some features of formation of digital corpus of texts with syntactic markup // Digital Libraries: Advanced Methods and Technologies, Digital Collections: Proceedings of the XI All-Russian Research Conference RCDL'2009. Petrozavodsk: KRC RAS, 2009. Pp. 276-283
In paper we describe system of syntactical analysis. For marking we use 39 structural schemes. The main marking unit is the clause – simple sentence.
For marking was created application in Delphi. In it user can move from clause to clause and attribute them. We also analyze some paths to improve effectiveness of program.
The structure of dictionary database tables was developed. Query execution speed was constantly analyzed during all period of work. The results of the analysis were taken into account while improving the structure of tables.
For online access to the database the web-resource was created. Modules of the information system were realization on PHP 4. For providing the maintenance of the pre-revolutionary Russian alphabet symbols all texts and wordforms are stored in the coding Unicode. The type Palatino Linotype is used for representation.
At present, database is composed of works, which belong to F.M. Dostoevskij and his contemporaries from the magazines “Vremja”, “Epoha”, “Svetoch”, “Sovremennik”, “Molva”, “Biblioteka dlja chtenija”, “Zarja”, “Grazdanin”. It also contains texts by Dahl and some other publicistic texts.

Some features of formation of digital corpus of texts with syntactic markup (475 Kb, total downloads: 149)

Last modified: October 16, 2009