Beta 1

Title Textual Similarity
Author Jensen, Mads
Supervisor Sharp, Robin (Embedded Systems Engineering, Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark)
Institution Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark
Thesis level Bachelor thesis
Year 2010
Abstract This thesis is about recognizing textual similarity between two texts. A wide range of methods for this purpose are discussed, and two of them have been chosen to be implemented, and discussed in depth. These two algorithms are an ontology based algorithm where the important words from the text are found, and then generate a tree by looking up the words’s hypernyms in WORDNET. The other algorithm is Levenshtein which is a typical edit-distance algoritme, i.e., computes how many words which are required to make two texts equivalent. A comparison between these two methods has been made, and discussed to see if one of them is to be preferred over the other. It is concluded that Levenshtein may be good in some cases, and that this area of detecting semantic similarity is quite large. The ontology based algorithm was not fully implemented, but it was throrougly discussed.
Imprint Technical University of Denmark (DTU) : Kgs. Lyngby, Denmark
Series IMM-B.Sc.-2010-12
Original PDF bac10_12_net.pdf (0.36 MB)
Admin Creation date: 2010-06-02    Update date: 2010-06-02    Source: dtu    ID: 262856    Original MXD