TULIP software and web server : automatic classification of protein sequences based on pairwise comparisons and Z-value statistics

Show simple item record

dc.contributor.author Grando, Delphine
dc.contributor.author Ortet, Philippe
dc.contributor.author Joubert, Fourie
dc.contributor.author Marechal, Eric
dc.contributor.author Bastien, Olivier
dc.date.accessioned 2010-06-23T06:16:05Z
dc.date.available 2010-06-23T06:16:05Z
dc.date.issued 2009-03
dc.description.abstract A configuration space of homologous protein sequences (or CSHP) has been recently constructed based on pairwise comparisons, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) and following evolutionary assumptions. A Z-value cut-off is applied so as proteins are placed in the CSHP only when the similarity of pairs of sequences is significant following the Theorem of the Upper Limit of a score Probability (TULIP theorem). Based on the positions of similar protein sequences in the CSHP, a classification can be deduced, which can be visualized as trees, called TULIP trees. In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees. To date, no tool has been made available to allow the computation of TULIP trees following this model. The availability of methods to cluster proteins based on pairwise comparisons and following evolutionary assumptions should be useful for evaluation and for the future improvements they might inspire. We developed a web server allowing the local or online computation of TULIP trees based on the CSHP probabilities. The input is a set of homologous protein sequences in multi-FASTA format. Pairwise comparisons are conducted using the Smith-Waterman method, with 100-1,000 sequence shuffling to estimate pairwise Z-values. Obtained Z-value matrix is used to infer a tree which is then written to a file. Output consists therefore of a Z-value matrix, a distance matrix, a TULIP treefile in NEWICK format, and a TULIP tree visualisation. The TULIP server provides an easy-to-use interface to the TULIP software, and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions. TULIP trees are consistent with phylogenies in numerous cases, but they can be inconsistent for multi-domain proteins in which some domains have been conserved in all branches. Thus TULIP trees cannot be considered as conventional phylogenetic trees, following the MIAPA (Minimum Information About a Phylogenetic Analysis) recommendations. A major strength of the TULIP classification is its statistical validity when analysing samples including compositionally unbiased and biased sequences (i.e. with biased amino acid distributions), like sequences from Plasmodium falciparum. The TULIP web server is a service of the Malaria Portal of the University of Pretoria, South Africa, and is available at http://malport.bi.up.ac.za/TULIP/ en
dc.identifier.citation Grando, D, Ortet, P, Joubert, F, Marechal, E & Bastien, O 2009, 'TULIP software and web server : automatic classification of protein sequences based on pairwise comparisons and Z-value statistics', Open Bioinformatics Journal, vol. 3, pp. 18-25. [http://bentham.org/open/tobioij/] en
dc.identifier.issn 1875-0362
dc.identifier.other 10.2174/1875036200903010018
dc.identifier.uri http://hdl.handle.net/2263/14325
dc.language.iso en en_US
dc.publisher Bentham Science en_US
dc.rights Bentham Open Science en_US
dc.subject TULIP software and web server en
dc.subject.lcsh Automatic classification en
dc.subject.lcsh TULIP (Information retrieval system) en
dc.subject.lcsh Amino acid sequence en
dc.subject.lcsh Sequence alignment (Bioinformatics) en
dc.subject.lcsh Configuration space en
dc.title TULIP software and web server : automatic classification of protein sequences based on pairwise comparisons and Z-value statistics en
dc.type Article en


Files in this item

This item appears in the following Collection(s)

Show simple item record