Researchers develop India-specific cancer genome toolkit

A research team from the Advanced Centre for Treatment, Research and Education in Cancer (ACTREC) has developed a bioinformatics tool to analyze the cancer-specific gene mutations in the Indian population. Their toolkit, the TMC-SNPdb2.0, comprises a complete dataset of genome sequences of 1800 individuals.

The study bridges the gap in the availability of Indian-origin cancer genome data in global genomic datasets. Importantly, this tool eliminates the confusion of false positive mutation readings in the cancer genome analysis from Indian cancer patients due to the lack of comprehensive India-specific data in global databases, claims the study.

Mutations occur at the nucleotide level (the building blocks of genes). A sequence of different bases of nucleotides forms a genetic code. These codes carry the instructions for protein-making, which carry out various cell functions.

Sometimes, a mistake happens during the coding process, and a single nucleotide may be replaced or misplaced in the sequence resulting in gene mutations. Such a mutation is called Single Nucleotide Polymorphism (SNP). Several such mutations occur routinely and may not be significant; some may even be the basis for evolutionary changes. However, some SNPs can turn detrimental and become a hotbed of diseases disrupting cell functions. In addition, genes are hereditary and get passed on to generations.

Cancer is a genetic disease occurring due to such mutations in the genes. Cancer-causing mutations are of two types: i) those occurring at birth and are present in every cell of the body called germline variants (hereditary), and ii) those that develop after birth due to environmental or lifestyle triggers called somatic variants, explicitly found in the diseased cells.

Genome sequencing is a method to read the genetic codes. Sequencing data not only helps in detecting the errors but also identifies predispositions to diseases. Therefore, global health bodies are actively encouraging the storage of genome sequences in public databases such as dbSNP, 1000 Genomes Project, GnomAD, and ExAc, to name a few. These genome databanks are helpful for researchers to avail information on the genome and whole exome (disease-specific) data.

However, cancer is also ethnic and population-specific: people in different regions are more likely to get certain cancers. Therefore, maintaining region-specific genomic data of the population is a valuable resource for monitoring public health and providing customized and targeted treatment for cancer.

Researchers get the actual somatic mutations by comparing the genetic sequence of the cancer tissue with the germline variants found by analyzing blood or normal tissue samples from the same patient. They also compare the SNPs in a more extensive variations, such as within a region or a population. “Region-specific comparison of genomic data is crucial to eliminate false positives,” said Prof Amit Dutt, the study’s lead researcher, while speaking to India Science Wire.

The team from ACTREC found that the global genome databases lack adequate representation of the mutation variations of the Indian population, specifically those that provide insights into the frequency of common and rare types of genetic disease. “The germline variations (hereditary factors) in the global databases are limited to the European populations and fall short of having genomes recorded for the Indian population,” said Prof Dutt.

So, the team developed a complete India-specific germline dataset and a corresponding open-source toolkit to bridge this gap. Biologists can use the toolkit’s graphics-user interface to analyze samples’ genomes and derive their reference datasets.

TMC-SNPdb 2.0 integrates the recent sequencing efforts undertaken by the Genomics for Public Health in India (IndiGen) program by CSIR, Govt of India, and the GenomeAsia 100K initiative. It incorporates SNP details of 1029 healthy individuals from the IndiGen program, 529 from the GenomeAsia and 173 samples obtained from cancer patients at ACTREC. Further, using the tool, the team analyzed the whole-exome sequence of seven different types of Indian-origin tumours from 224 samples available at Dutt Lab, ACTREC.

“To identify cancer-specific mutations, the harmless single nucleotide mutations or SNPs present in a non-cancerous tissue from the same individual and those reported in a public database of healthy individuals need to be removed,” explained Prof Dutt.

The comparisons give a clearer picture of what caused cancer and how it could progress, facilitating a customized treatment strategy – an increasingly sought-after option nowadays as individuals respond to treatments differently.

In all, the toolkit has identified 305,132 unique variants.

“Around 88.86% of the variations were seen in the non-coding region of the genome (these are codes that do not translate to proteins). The remaining 11.13% were within the coding region. We also identified 10614 missense variants – single-point variations that alter the function of a protein entirely. These can be specifically labelled as ‘novel’ or ‘variants of unknown significance’ in any of the somatic (post-birth cancer types) analyses,” said Prof Dutt.

[The database and toolkit package is available for download at http://www.actrec.gov.in/pi-webpages/AmitDutt/TMCSNPdb2/TMCSNPdb2.html ]

The team comprised Sanket Desai, Rohit Mishra, Suhail Ahmad, Supriya Hait, Asim Joshi, and Amit Dutt. The study was funded by the Department of Biotechnology, Govt of India, and published in the journal Database (Oxford Academic). (India Science Wire)