Comparison of document classification approaches for turkish texts
tarafından
 
Çobanoğlu, Özlem Ece, author.

Başlık
Comparison of document classification approaches for turkish texts

Yazar
Çobanoğlu, Özlem Ece, author.

Yazar Ek Girişi
Çobanoğlu, Özlem Ece, author.

Fiziksel Tanımlama
xi, 71 leaves:+ 1 computer laser optical disc.

Özet
Internet usage is exponentially growing day by day. This rapid growth in Internet usage leads to an explosion in the number of electronic documents being produced daily. The huge bulk of documents make it difficult accessing the necessary and relevant information. Due to lack of logical organization, retrieval and processing of the desired information from huge number of documents becomes a complex and time consuming task with human effort. Therefore, document classification is significant task to manage and process the documents. In this thesis, the performance of different classification approaches produced from several algorithms is thoroughly evaluated. The main goal of the thesis is to determine the best combination of document preprocessing steps and classification algorithms. Different feature weighting, construction and selection methods are experimented on Turkish documents. Stemmed and original words and their bi-gram and tri-gram forms are used to construct the features which represent the documents. The effects of several weighting algorithms and the combination of feature selection and weighting algorithms on 3 different classification approaches are interpreted. The performance of 216 different classification process combinations are analyzed. Experimental results show that C4.5 (C4.5 Decision Tree) classification algorithm has the highest accuracy results in 95% of the results. SVM (Support Vector Machine) algorithm produces the closest results to C4.5 and it provides the highest accuracy in 5% of the experimental results. NB (Naive Bayes) algorithm has always the lowest accuracy rate in these 3 different classification algorithm results.

Konu Başlığı
Text processing (Computer science).
 
Automatic classification.

Yazar Ek Girişi
Aslan, Burak Galip

Tüzel Kişi Ek Girişi
İzmir Institute of Technology. Computer Engineering.

Tek Biçim Eser Adı
Thesis (Master)--İzmir Institute of Technology: Computer Engineering.
 
İzmir Institute of Technology: Computer Engineering--Thesis (Master).

Elektronik Erişim
Access to Electronic Versiyon.


LibraryMateryal TürüDemirbaş NumarasıYer Numarası
IYTE LibraryTezT001394QA76.9.T48 C65 2015