Abstract:
The protein folding problem is examined. Specifically, the problem of predicting protein secondary structure from the amino acid sequence is investigated. A literature study is presented into the protein folding process and the different techniques that currently exist to predict protein secondary structures. These techniques include the use of expert rules, statistics, information theory and various computational intelligence techniques, such as neural networks, nearest neighbour methods, Hidden Markov Models and Support Vector Machines. A pattern recognition technique based on statistical analysis is developed to predict protein secondary structure from the amino acid sequence. The technique can be applied to any problem where an input pattern is associated with an output pattern and each element in both the input and output patterns can take its value from a set with finite cardinality. The technique is applied to discover the role that small sequences of amino acids play in the formation of protein secondary structures. By applying the technique, a performance score of Q8 = 59:2% is achieved, with a corresponding Q3 score of 69.7%. This compares well with state of the art techniques, such as OSS-HMM and PSIPRED, which achieve Q3 scores of 67.9% and 66.8% respectively, when predictions on single sequences are made.