Mycobacterium tuberculosis

Organism

Mycobacterium tuberculosis (M. tb) is a species of pathogenic bacteria in the family Mycobacteriaceae and the causative agent of tuberculosis (wiki). M. tb is one of the most important human pathogens, killing 1.5 million people in 2020 and second only to COVID-19 in its impact on human health (WHO).

The genome of M. tb was first sequenced in 1998 (Cole, et al. Nature 1998) and genomic analysis continues to be extremely important for drug-susceptibility testing (Cox and Mizhari, NEJM 2018, CRyPTIC Consortium and the 100,000 Genomes Project, NEJM 2018), surveillance (Votintseva, et al. JCM 2017), and historical analysis (Sabin, et al. Genome Biology 2020).

Map

A collection of 306 reference genomes were downloaded from the NCBI Genome database on January 4, 2022 as part of a general survey of bacteria with importance to human health.

Notes

As others have noted, the genomic diversity of M. tb is quite a bit different from the other organisms that have highlighted in this collection. Looking at the map above, we can see that the overall difference in nucleotide content between isolates very low (the branch length on the tree only goes up to 0.002, or 0.2%). The relative sizes of the core and accessory genomes are also remarkable because the vast majority of genes contained in any single isolate can be found across the entire species. For this reason, I chose to focus my own inspection on the smaller set of genes which are not so strictly conserved. In contrast to many of the organisms included in this library, the accessory genes for M. tb do not display the pattern that I would expect from horizontal gene transfer, which is consistent with the general impression of this organism as not containing as many mobile genetic elements.

Looking at the overall shape of the map, there is a subset of isolates which form a distinct cluster (indicated with the left-hand box in the image below). By zooming in on that region in the interactive plot at the top of the page, we can see that this clade contains the subspecies M. tb bovis.

The box in the middle (above) indicates the broadly-conserved core genome, while the box on the right (above) shows the region which contains a number of genetic elements which are only found in a subset of isolates. The right-hand box is expanded in the image below.

In each of the images below, I’ve expanded a smaller region from the image above. With the interactive display used to render these maps, the full set of gene names is only visible in the most expanded view, and so these expanded images have also been rotated and cropped to make it as easy as possible to read those gene labels.

The image above shows a small collection of genes which are largely absent from isolates which largely fall into a single clade of M. tb. The functions of the genes in this group which stand out to me are CRISPR-associated proteins as well as type VII secretion systems, both of which may be related to intra- or extra-cellular antagonism.

The image above shows a group of genes which are not found in two clades, both the M. tb bovis clade (on the left) as well as another clade on the right. Based on the presence of annotated phage and integrase proteins, I would guess that this is an integrated prophage of some sort. As with many phages, the function cannot be predicted for a large proportion of genes.

The image above shows yet another group of genes which contain a number of annotated phage functions. The group of isolates which lack this particular element is larger than the previous example. However, both sets of genes display patterns of presence/absence which are largely monophyletic, and less likely to be the result of horizontal gene transfer.

The final example in this series shows the opposite pattern – groups of genes which are only found in a subset of isolates and are generally absent from the species. Based on the monophyletic grouping of these genes, it is likely that they were gained by one (or a small number of) ancestors which are common to the groups of organisms now containing them.