Zusammenfassung:
|
The main goal of this dissertation is to put different text classification tasks inthe same frame, by mapping the input data into the common vector space of linguisticattributes. Subsequently, several classification problems of great importance for naturallanguage processing are solved by applying the appropriate classification algorithms.The dissertation deals with the problem of validation of bilingual translation pairs, sothat the final goal is to construct a classifier which provides a substitute for human evalu-ation and which decides whether the pair is a proper translation between the appropriatelanguages by means of applying a variety of linguistic information and methods.In dictionaries it is useful to have a sentence that demonstrates use for a particular dictio-nary entry. This task is called the classification of good dictionary examples. In this thesis,a method is developed which automatically estimates whether an example is good or badfor a specific dictionary entry.Two cases of short message classification are also discussed in this dissertation. In thefirst case, classes are the authors of the messages, and the task is to assign each messageto its author from that fixed set. This task is called authorship identification. The otherobserved classification of short messages is called opinion mining, or sentiment analysis.Starting from the assumption that a short message carries a positive or negative attitudeabout a thing, or is purely informative, classes can be: positive, negative and neutral.These tasks are of great importance in the field of natural language processing and theproposed solutions are language-independent, based on machine learning methods: sup-port vector machines, decision trees and gradient boosting. For all of these tasks, ademonstration of the effectiveness of the proposed methods is shown on for the Serbianlanguage. |