Whole genome annotation is the process of identifying features of interest in a set of genomic DNA sequences, and labelling them with useful information. Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files.
Prokka uses a variety of databases when trying to assign function to the predicted CDS features. It takes a hierarchical approach to make it fast.
A small, core set of well characterized proteins are first searched using BLAST+. This combination of small database and fast search typically completes about 70% of the workload. Then a series of slower but more sensitive HMM databases are searched using HMMER3.
The three core databases, applied in order, are:
ISfinder: Only the tranposase (protein) sequences; the whole transposon is not annotated.
NCBI Bacterial Antimicrobial Resistance Reference Gene Database: Antimicrobial resistance genes curated by NCBI.
UniProtKB (SwissProt): For each --kingdom we include curated proteins with evidence that (i) from Bacteria (or Archaea or Viruses); (ii) not be "Fragment" entries; and (iii) have an evidence level ("PE") of 2 or lower, which corresponds to experimental mRNA or proteomics evidence.
Related links:
https://github.com/tseemann/prokka
Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30(14):2068-2069. doi:10.1093/bioinformatics/btu153