Abstract:
Long genes should be rare in archaea and eubacteria because of the demanding costs of time and resources for protein production. The search in 580 sequenced prokaryotic genomes, however, revealed 0.2% of all genes to be longer than 5 kb (absolute number: 3732 genes). Eighty giant bacterial genes of more than 20 kb in length were identified in 47 taxa that belong to the phyla Thermotogae (1), Chlorobi (3), Planctomycetes (1), Cyanobacteria (2), Firmicutes (7), Actinobacteria (9), Proteobacteria (23) or Euryarchaeota (1) (number of taxa in brackets). Giant genes are strain-specific, differ in their tetranucleotide usage from the bulk genome and occur preferentially in non-pathogenic environmental bacteria. The two longest bacterial genes known to date were detected in the green sulfur bacterium Chlorobium chlorochromatii CaD3 encoding proteins of 36 806 and 20 647 amino acids, being surpassed in length only by the human titin coding sequence. More than 90% of bacterial giant genes either encode a surface protein or a polyketide/non-ribosomal peptide synthetase. Most surface proteins are acidic, threonine-rich, lack cystein and harbour multiple amino acid repeats. Giant proteins increase bacterial fitness by the production of either weapons towards or shields against animate competitors or hostile environments.