Login New user?  
Mathematical Sciences Letters
An International Journal


Volumes > Vol. 4 > No. 2


Persian and Arabic Text Recognition with NN, Decision Tree and K-Nearest Neighbor

PP: 209-217
Hamid Parvin, Reza Parvin,
A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. This paper proposes an innovative approach to improve the classification performance of Persian texts considering a very large thesaurus. The paper proposes a flexible method to recognize and categorize the Persian texts employing a thesaurus as a helpful knowledge. In the corpus, when utilizing the thesaurus the method obtains a more representative set of word-frequencies comparing to those obtained when the method disables the thesaurus. Two types of word relationships are considered in our used thesaurus. This is the first attempt to use a Persian thesaurus in the field of Persian information retrieval. The k-nearest neighbor classifier, decision tree classifier and k-means clustering algorithm are employed as classifier over the frequency based features. Experimental results indicate enabling thesaurus causes the method significantly outperforms in text classification and clustering.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved