Fig. 3: Bioinformatic analysis workflow. Genes, represented as colored “beads on a string”, are grouped together based on 100% protein sequence identity. The location of identical proteins (plasmid, chromosome, or unassembled contig sequence) is recorded, along with the number of copies in those locations. Multiple identical protein sequences in a genome are called “duplicated”, while unique protein sequences are called “single-copy”. Antibiotic resistance genes were scored based on NCBI RefSeq protein product annotation. Each genome is categorized into one of twelve ecological categories, or as “Unannotated”, based on the host and isolation source metadata in its NCBI RefSeq record.