Exposure Of Document Similarity Using Sequential Clustering Algorithm
Abstract: Document similarity measures are crucial components of many text scrutiny tasks, including information recovery, document categorization, and document clustering. Extracting features from document is preliminary task found in mining. On the base of extracted features likeness between two documents is calculated. A new similarity measure is projected to calculate the similarity between two documents with respect to a feature. The proposed measure takes the subsequent three cases into account. The three cases are, the feature appears in both documents, the feature appears in only one document and the feature appears in none of the documents. The similarity increases as the dissimilarity between the two values linked with a present feature decreases. Further more, the involvement of the difference is usually scaled. The similarity decreases when the number of presence-absence features increases. An absent feature has no contribution to the similarity. The expected measure is unmitigated to estimate the similarity among two sets of documents.
Document classification, document clustering, similarity measures, Clustering algorithms
Click Here
International Journal for Trends in Technology & Engineering © 2015 IJTET JOURNAL