Now researchers can explore genomic data across space and time

April 13, 2014

3 Bullets:

  • Understanding systems from a multiscale perspective gives us a more detailed and holistic view of how features or functions from each scale connect and interact in a given system.
  • The challenge is integrating the different types of information that come from each scale in an efficient way that yields the most insight.
  • ISB developed a new tool to make it easier for researchers to to integrate, analyze and visualize human genome data at multiple resolutions.

By Varsha Dhankani

Most living and nonliving systems around us have important features at different resolutions – or scales – of space and time. Understanding such systems from a multiscale perspective gives us a more detailed and holistic view of how features or functions from each scale connect and interact in a given system.

Consider Google or Bing Maps. At the global scale of 30,000 feet, we can locate continents, countries and oceans. As we progressively zoom in, we can view cities, streets, homes and buildings. Having the ability to zoom in or out allows meteorologists, for example, to study weather patterns in a concentrated location or more broadly on the global scale.

Human genomes and the underlying biological mechanisms are no exception to this multiscale spatial and temporal nature. The challenge is integrating the different types of information that come from each scale in an efficient way that yields the most insight.

Scientists at ISB have developed a framework to generate a multiscale mapping of the human genome. The research “Multiscale representation of genomic signals” was published online on April 13 in the journal Nature Methods (in advance of the June print issue). The framework allows researchers to integrate, analyze and visualize information across multiple resolutions of the human genome. [https://github.com/tknijnen/MSR]

In general, biological information is encoded at scales of varying lengths of the genome, and it is crucial to integrate this information across multiple scales to gain a systems-level understanding of the biological mechanisms in health and disease.

This multiscale signal representation (MSR) method divides the genome into a hierarchical organization with single bases at the top progressively percolating information into larger segments of the DNA at the bottom of the hierarchy. This hierarchy enables researchers to summarize genomic information at multiple genomic scales.

The Science

For example, at a scale of 10-100 bases (lengths), binding sites on the DNA regulate the transcription of genes and exons within a gene make parts of the protein encoded by the gene. At a scale of 1000s of bases, DNA segments called CpG islands are susceptible to a DNA alteration called methylation that has been linked with silencing of tumor suppressor genes. Finally, at a scale of megabases, scientists study tightly packed DNA and protein structures like heterochromatin and nuclear lamina-associated domains (LADs). Heterochromatin has been associated with gene regulation as well as protection of the integrity of chromosomes. LADs represent a repressive chromatin environment indicated by low gene-expression levels in these regions.

The MSR approach was used to analyze both murine (mouse-model) data as well as human colon cancer data to demonstrate that genomic signals contain functional information at multiple scales. For murine data, predictive models based on multiscale representation of histone acetylation (a gene regulation mechanism) were able to predict the activation state of genes with better accuracy than conventional single-scale approaches.

Similarly, using a multiscale representation of DNA methylation based on human colon cancer data revealed that the relationship between DNA methylation and gene activation depends on the scale. DNA methylation has been recognized as an important component in organismal development as well as cancer onset and progression. The MSR approach confirmed the current understanding of the inverse correlation between methylation and gene expression at the scale of promoter regions of DNA.

By studying multiple scales concurrently, scientists at ISB observed a scale-dependent relationship between methylation and gene expression: Upregulated genes were associated with hypomethylation at small scales but hypermethylation at larger scales. Understanding such epigenetic processes at different scales can help scientists gain a better understanding of the organismal developmental process, as well as point at alternative targets for cancer therapy. The MSR approach saves researchers from the complex task of having to narrow down the genomic areas of interest.

Results from the MSR method encourage reformulation of biological experiments to expand beyond a predominantly gene-centric view of the genome to understand events that occur at much smaller or larger scales. MSR presents a novel and powerful way to unravel the biological information that can’t be observed at any single scale in isolation.

Just as with Google Maps, scientists can now zoom in and out of the human genome map and make discoveries that were not possible with a single-scale map.

– Theo Knijnenburg contributed to this report

Journal: Nature Methods, online April 13, 2014 (link)
Title: Multiscale representation of genomic signals
Authors: Theo A. Knijnenburg, Stephen A. Ramsey, Benjamin P. Berman, Kathleen A. Kennedy, Arian F.A. Smit, Lodewyk F.A. Wessels, Peter W. Laird, Alan Aderem, Ilya Shmulevich