The Most Powerful Tool for Reconstructing a Gene Network

isbscience.org/news/2015/04/17/the-most-powerful-tool-for-reconstructing-a-gene-network/

Posted on April 17, 2015

Scanning EM of bacteria being eaten by white blood cell
Photo Credit: Adrian Ozinsky

3 Bullets:

Nearly a decade ago, ISB’s Baliga Lab published a landmark paper describing cMonkey, an innovative method to accurately map gene networks within any organism from microbes to humans.
Two new papers describe the benchmark results of cMonkey and also the release of cMonkey2, which performs with higher accuracy.
Using this approach, genetic and molecular data generated from any organism, be it a bacterium or a birch tree, can be explored and analyzed from a network perspective.

By Arjun Raman

For most of the history of biology, data was a limiting quantity that was painstakingly gathered and meticulously curated. However, the meteoric rise of sequencing technology accompanied by the parallel emergence of computing has fueled a recent data explosion in biology. Such novel technologies allow scientists to ask new questions and revisit old ones with a fresh perspective. However, it also brings new problems when interpreting the deluge of information.

Nearly a decade ago, members of the Baliga group at the Institute for Systems Biology published a landmark paper describing cMonkey, an innovative method to accurately map gene networks within any organism from microbes to humans. The method takes advantage of expanding biological datasets and computational power, and has since been applied to many different organisms across a huge range of publicly and privately generated experimental data.

Prior to cMonkey, it was common practice to group sets of genes that had similar expression levels across the experimental conditions using a process termed clustering. However, this practice inherently assumes that gene clusters are static. While this assumption may apply in limited circumstances, such as a binary treatment vs. control experimental setup, it may not be true across a wide variety of conditions. Thus, ISB researchers set out to create a novel method to cluster both genes and conditions simultaneously, known as biclustering. Unlike clustering, biclustering is a difficult and arguably impossible computational problem, meaning that no solution is guaranteed to be optimal. Nevertheless, biclustering algorithms including cMonkey have demonstrated practical value in dealing with real biological data.

Journal: Nucleic Acids Research
Title: cMonkey2 Automated, systematic, integrated detection of co-regulated gene modules for any organism
Authors: David J. Reiss, Christopher L. Plaisier, Wei-Ju Wu, Nitin S. Baliga
Link: paper

In an update, published on April 15 in the journal Nucleic Acids Research, ISB researchers present the evolution of cMonkey. The paper describes updates to the algorithm – cMonkey2 – and assessed performance against alternative platforms using three distinct datasets from two different bacterial organisms and human cancer cells. Detailed performance benchmarks demonstrate the efficacy of cMonkey2 in accurate network reconstruction as well as the broad applicability across cell types. Performance aside, a major addition to the software is that is has been converted from being solely a biclustering algorithm to a biclustering and data integration platform. The new platform enables easy integration of many different additional data types, allowing users to expand the data classes to include categories we have not thought of yet.

Indeed, one of the signature features of cMonkey is its ability to integrate additional relevant sources of information. For example, many metabolic pathways have been experimentally defined in bakers yeast and various bacteria. Such pathway information can be assimilated as an association network that guides the algorithm in finding parsimonious biclusters. Various association networks based on interactions among proteins, DNA, and other molecules can also be used to aid this process. In the end, the clusters that arise are the ones that can best account for all the disparate types of data that are fed into the algorithm. This integrative approach gave cMonkey an advantage compared to other biclustering algorithms when it was released, and continues to be a distinguishing characteristic in its’ updated form.

Journal: BMC Systems Biology
Title: “Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions”
Authors: Samuel A Danziger, David J Reiss, Alexander V Ratushny, Jennifer J Smith, Christopher L Plaisier, John D Aitchison and Nitin S Baliga
Link: paper

There is a major opportunity to build bicluster networks from plentiful publicly available consortium datasets generated by multiple independent laboratories. This opportunity also presents a challenge because variation between the sources can introduce noise that reduces bicluster quality. In a paper published on April 15 in the journal BMC Systems Biology, ISB researchers have added a new metric to the cMonkey scoring algorithm to improve the quality of biclusters when dealing with highly variable source datasets. This metric improved the accuracy of condition-specific gene clustering, including a demonstrable enhancement in predicting a physiological response to nutrient shifts in yeast cells.

Beyond including the new bicluster quality metric, the revised cMonkey2 is now modularized to facilitate facile incorporation of additional data types, as well as adjustment of the weights those data types receive in the biclustering calculations. The resulting outputs of biclusters are easily interrogated using an intuitive web-based framework, and the data can be further analyzed and visualized using additional software that has been developed by the Baliga laboratory, different groups at the ISB, and the rest of the scientific community. A final note worth mentioning is the programming language: Originally built in the statistical modeling environment R, the updated version has been rewritten in Python, one of the most widely used programming languages today.

The benchmark results in the paper speak for themselves, but suffice it to say this is a uniquely comprehensive and powerful tool for modern systems biology research. To encourage widespread adoption of cMonkey2, the documentation for usage and development has been updated and expanded. Using this approach, genetic and molecular data generated from any organism, be it a bacterium or a birch tree, can be explored and analyzed from a network perspective.

About Dr. Arjun Raman: Arjun is a postdoc in the Baliga Lab. He is also a member of ISB’s Editorial Board.

Category: ISB Science Report, Lobby, Publications, Slider

Tags:Arjun Raman, Baliga Lab, Baliga publications, cMonkey, cMonkey2, Dr. Nitin Baliga, software

Recent Articles

ISB Study Highlights AI’s Potential and Pitfalls in Analyzing Health Data
Posted on November 21, 2024
New peer-reviewed research out of ISB highlights the strengths of large language models in uncovering social determinants of health while underscoring the need for human oversight and improved de-identification methods.
Sid Venkatesh Publishes Co-First Authored Paper in Science
Posted on November 13, 2024
ISB Assistant Professor Dr. Sid Venkatesh is the co-first author of a paper in the journal Science. While at Washington University in St. Louis, Venkatesh and colleagues identified a novel gut microbial enzyme that impacts satiety-related signaling pathways in undernourished children treated with microbiota-directed complementary foods.
AmeriCorps Member Faduma Hussein Joins ISB as Public Health Ambassador Coordinator
Posted on November 13, 2024
Faduma Hussein recently joined the ISB Education team as the Public Health Ambassador Coordinator, becoming only the fourth AmeriCorps member to serve at ISB. In this Q&A, she shares insights into her education, what drew her to ISB, career aspirations, and more.

The Most Powerful Tool for Reconstructing a Gene Network

Science Transforming Health

ISB News

The Most Powerful Tool for Reconstructing a Gene Network

Recent Articles

ISB Study Highlights AI’s Potential and Pitfalls in Analyzing Health Data

Sid Venkatesh Publishes Co-First Authored Paper in Science

AmeriCorps Member Faduma Hussein Joins ISB as Public Health Ambassador Coordinator

Sitemap