Species identification analysis pipeline primarily relies on a powerful tool chain in bioinformatics, aiming to rapidly and accurately detect and identify microbial species from complex genomic data extracted from clinical or environmental samples.
Analysis pipeline uses Prodigal, a gene prediction software specifically designed for prokaryotes and eukaryotes, to efficiently identify gene regions encoding proteins from FASTA genomic sequences. Subsequently, it utilizes Diamond software to compare the predicted coding sequences with the existing microbial genome database on the website. The database provided by the website contains genomic sequences of various microorganisms, which have been carefully selected and organized to cover a wide range of species and subspecies, ensuring the accuracy and comprehensiveness of the comparison.
During the comparison process, the analysis pipeline calculates the similarity between the uploaded sequences and the sequences of various species in the database, including the degree of sequence matching, mismatches, and insertions or deletions. Through comprehensive analysis of these similarity data, precise identification of microbial species is achieved.
NA
Related links:
https://www.expasy.ch/resources/uniprotkb-swiss-prot
Ondov, B.D., Treangen, T.J., Melsted, P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 17, 132 (2016). https://doi.org/10.1186/s13059-016-0997-x