Abstract:
Malaria is the cause of over one million deaths per year, primarily in African children. The parasite responsible for the most virulent form of malaria, is Plasmodium falciparum. Protein structure plays a pivotal role in elucidating mechanisms of parasite functioning and resistance to anti-malarial drugs. Protein structure furthermore aids the determination of protein function, which can together with the structure be used to identify novel drug targets in the parasite. However, various structural features in P. falciparum proteins complicate the experimental determination of protein three dimensional structures. Furthermore, the presence of parasite-specific inserts results in reduced similarity of these proteins to orthologous proteins with experimentally determined structures. The lack of solved structures in the malaria parasite, together with limited similarities to proteins in the Protein Data Bank, necessitate genome-scale structural annotation of P. falciparum proteins. Additionally, the annotation of a range of structural features facilitates the identification of suitable targets for structural studies. An integrated structural annotation system was constructed and applied to all the predicted proteins in P. falciparum, Plasmodium vivax and Plasmodium yoelii. Similarity searches against the PDB, Pfam, Superfamily, PROSITE and PRINTS were included. In addition, the following predictions were made for the P. falciparum proteins: secondary structure, transmembrane helices, protein disorder, low complexity, coiled-coils and small molecule interactions. P. falciparum protein-protein interactions and proteins exported to the RBC were annotated from literature. Finally, a selection of proteins were threaded through a library of SCOP folds. All the results are stored in a relational PostgreSQL database and can be viewed through a web interface (http://deepthought.bi.up.ac.za:8080/Annotation). In order to select groups of proteins which fulfill certain criteria with regard to structural and functional features, a query tool was constructed. Using this tool, criteria regarding the presence or absence of all the predicted features can be specified. Analysis of the results obtained revealed that P. falciparum protein-interacting proteins contain a higher percentage of predicted disordered residues than non-interacting proteins. Proteins interacting with 10 or more proteins have a disordered content concentrated in the range of 60-100%, while the disorder distribution for proteins having only one interacting partner, was more evenly spread. Comparisons of structural and sequence features between the three species, revealed that P. falciparum proteins tend to be longer and vary more in length than the other two species. P. falciparum proteins also contained more predicted low complexity and disorder content than proteins from P. yoelii and P. vivax. P. falciparumprotein targets for experimental structure determination, comparative modeling and in silico docking studies were putatively identified based on structural features. For experimental structure determination, 178 targets were identi_ed. These targets contain limited contents of predicted transmembrane helix, disorder, coiled-coils, low complexity and signal peptide, as these features may complicate steps in the experimental structure determination procedure. In addition, the targets display low similarity to proteins in the PDB. Comparisons of the targets to proteins with crystal structures, revealed that the structures and predicted targets had similar sequence properties and predicted structural features. A group of 373 proteins which displayed high levels of similarity to proteins in the PDB, were identified as targets for comparitive modeling studies. Finally, 197 targets for in silico docking were identified based on predicted small molecule interactions and the availability of a 3D structure.