Beta 1


Title Classification of Protein Sequences using Markov Models
Author Thomsen, Claus
Larsen, Simon
Supervisor Fischer, Paul (Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark)
Institution Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark
Thesis level Master's thesis
Year 2004
Abstract This project deals with a specific classification problem in the area of bioinformatics and biology. The problem, typically referred to as secondary structure prediction deals with how the structure of protein sequences may be classified using a number of predefined structure classes. This project analyses the possible use of Markov models for this classification problem. Markov models are statistical models which may be used to infer the different structure classes for protein sequences based on some training data. The performance of the developed models are compared to other known models in the area, specifically the GOR models, which are similar to Markov models since they are both statistical models. The obtained results show that Markov models may be used for secondary structure prediction achieving better performances than just guessing at the most frequent structure class. Starting out with a simple Markov model able to predict around 51% of the structures correctly, the model has been extended and combined with other methods resulting in a prediction accuracy of 57.2% (an increase of around 6%). This resulting model may be characterized as a first generation secondary structure predictor. Given the time needed several of the weaknesses found in the Markov models may be removed or at least minimized possibly resulting in better performances. The models proposed in this project are not directly usable compared with some of the best predictors current available (having prediction accuracies of around 80%). However there may be room for further development incorporating biological background knowledge into the proposed Markov models.
Imprint Department of Informatics and Mathematical Modeling, Technical University of Denmark, DTU : DK-2800 Kgs. Lyngby, Denmark
Fulltext
Original PDF imm3099.pdf (1.21 MB)
Admin Creation date: 2006-06-22    Update date: 2012-12-21    Source: dtu    ID: 154807    Original MXD