Machine Learning Algorithms Identify Genes Responsible For Drug Resistance In Tuberculosis Bacteria

One of the larger challenges facing modern medicine is the rise of drug-resistant strains of bacteria. Overuse and overexposure to antibiotics during the past century have resulted in adapted strains of bacteria that are resistant to common antibiotics. Developing new drugs to combat drug-resistant strains is a difficult process, as scientists are often unsure exactly which cellular and genetic mechanisms contribute to the resistance of the new strain.

In a new study published in Nature, a team of researchers reports that they have created a machine learning algorithm that can identify and predict which genes are responsible for drug-resistance in bacteria. The procedure was tested on a drug-resistant strain of Mycobacterium tuberculosis and identified  33 known and 24 new genes that contribute to antibiotic resistance.

The approach is monumental in that it could end up changing how medical professionals go about treating pathogens. Instead of delivering antibiotics that are effective against a wide range of bacteria, machine learning algorithms could identify the relevant genes in a particular strain of bacteria to create a specialized targeted treatment.”This could open up opportunities for personalized treatment for your pathogen. Every strain is different and should potentially be treated differently,” co-author Bernhard Palsson told Although the procedure was only tested on the tuberculosis bacteria, the researchers claim that their platform is “reference strain-agnostic,” meaning that it can be applied to other species of bacteria.

Machine Learning Algorithms And Bacterial DNA

It is estimated that during the next 5 years, the total number of individual strains of the tuberculosis bacteria will reach over 60,000. Much of the data on these strains is publically available but the sheer amount of data makes it very difficult to comprehensively analyze and point out structural commonalities between the strains that could explain drug-resistance. Scientists employ machine learning algorithms to tackle this massive computational problem.

Existing machine learning approaches involve a single reference stain against which the algorithm can compare differences in single nucleotide sequences. After identifying differences in nucleotide sequences, those sequences are compared to a database of sequences previously known to cause drug-resistance. While these methods have been shown to be effective in identifying some genetic mechanisms of drug-resistance, many are too narrow in scope to deal with genome-wide changes that reflect antibiotic stimulated adaptations. What is needed is an algorithm capable of performing a genome-wide functional analysis of the genes and their combinations that could confer antibiotic resistance.

To train the machine learning algorithm, the team fed a computer data on the genomes of over 1500 known strains of the tuberculosis bacteria. After combining the genomes into a single “pan-genome”—a combination of all genes across the strains—they divided the pan-genome into genome clusters according to amino acid sequences. Amon these newly formed genome-clusters, the team applied an association metric called mutual information to predict resistance-conferring genes among the genomic clusters. This method was capable of predicting 33 genes known to cause antibiotic resistance and predicted 24 novel sequences that have not been experimentally tested yet. Many of the confirmed sequences are correlated to phenotypic variation in metabolic pathways or the structure of the cell wall, two very well known and studied families of variations that confer drug resistance.

In addition to predicting unconfirmed drug resistance-conferring genes, the algorithm identified 94 potential interactions among alleles that could lead to drug-resistance. These epistatic interactions involve the expression of one gene that is dependent on the presence of a number of background genes. Though all 94 of the epistatic interaction predictions are novel, 74 involve known gene partners. The other 20 predictions, however, are entirely novel gene products that scientists have not seen yet. These findings could aid future experimental research into identifying genetic interactions that confer antibiotic resistance.

A final piece of information the researchers found was the relation between drug-resistance in bacteria and geographic region the bacteria came from. They found that mutations conferring drug resistance were more heavily concentrated in geographical regions with poor tuberculosis management. Poor tuberculosis management means longer treatment times, which leads to overexposure to antibiotics and the ensuing resistance adaptations. The result is that regions that do not have the resources to effectively manage tuberculosis infections are at a greater risk of developing drug-resistant strains. For example, among the strains sequenced, the study found that strains with the highest concentration of resistance-conferring strains were localized to the region of Belarus, a country that has a relatively poor management of disease.

There are some limitations to the study though. Primarily, it is so far just computational in nature; there is still the need to go out in the world and experimentally confirm if the newly predicted sequences actually confer drug resistance. Though computational solutions in science are extremely useful, the only way to know for sure is to observe real strains of bacteria. Additionally, the study relies on a database of previously known gene-antibiotic interactions, so it cannot identify the specific drug that may be effective against regions the confer antibiotic resistance. This problem is further compounded by the fact that the model does not include much information about the actual mechanistic realizations of the gene processes that result in drug resistance. Correlations among genome sequences can be used to predict gene combinations that could confer resistance, but it is another story entirely to know the cellular and molecular mechanisms underlying the pathway from gene to phenotypic expression.

That being said, the study provides a generalizable computational platform for biologists to test, compare, and predict genetic variations that could cause drug resistance in bacteria. Evolutionary adaptation is a complex business, but machine learning algorithms would allow scientists to stay ahead of the evolutionary curve and preemptively engineer solutions to possible resistance-conferring allele combinations. Part of the difficulty of dealing with drug-resistance strains of bacteria is their ability to reproduce and mutate extremely quickly, so computational techniques will help scientists stay one step ahead of bacteria.  With the threat of drug-resistant bacteria growing ever larger, scientists now more than ever need to understand the genetics behind drug-resistance.