ISB News

ISB Releases Kaviar, World’s Largest Public Catalog of Human Genomic Variation

3 Bullets:

  • Kaviar is ISB’s comprehensive catalog of human genomic variation
  • Kaviar combines 31 data sources for a total of 151 million single nucleotide variants (SNVs), covering 5% of all the positions in the human genome
  • A researcher studying possible disease-causing variants can use Kaviar to answer the question, “Have these variants been observed before, and if so, how often?”

By Terry Farrah

A typical pair of human genomes are 99.9% identical. To understand what makes one person more prone to disease than another, it is the differences that are interesting. At ISB we have collected these differences from data sources covering nearly 72,000 individuals and integrated them into a catalog called Kaviar (Known Variants).

At the turn of the century a massive effort was made to determine the complete 3 billion nucleotide sequence of a single human genome (actually a composite of the genomes of several individuals). The result is called the reference genome. A typical human has 2 to 7 million differences from the reference genome, most of which are single nucleotide variants (SNVs). A scientist studying a disease may take the genomes of many people with that disease and see which SNVs they have in common. The scientist will then want to know, for each such SNV, how often it has been observed in the population at large. Kaviar provides a useful initial estimate.

Kaviar was first created four years ago by ISB Senior Research Scientist Gustavo Glusman and co-workers. At the time, only a small number of human genome sequences were publicly available. Glusman and co-worker Denise Mauldin continued to develop Kaviar and added new genomes as they became available. Then, over the past six months, Glusman and co-worker Terry Farrah made a concerted effort to add to Kaviar all publicly available whole genome and exome – the portion of the genome that encodes proteins – data sources. They also added the genomes of 780 unrelated individuals sequenced by ISB. The resulting database, along with an enhanced user-interface, was released January 14, 2015. This release of Kaviar contains 151 million different SNVs, some of them occurring in most genomes, some occurring in a small percentage, and the majority occurring quite rarely (e.g. <0.1%). Many of these rare SNVs are expected to be sequencing mistakes, and the Kaviar user must bear this in mind. Kaviar also catalogs 27 million small insertions, deletions, and substitutions.

Of the ~72,000 individuals covered, 12% are from whole genome sequence data sources such as the UK10K and the 1000 Genomes Projects, ISB’s 780, the Genomes of the Netherlands and the Wellderly Project. The remaining are from exome data sources, most notably The Exome Aggregation Consortium (ExAC). Anyone can look up variants in these sources directly, but it would be a tedious process. Kaviar allows anyone to look up any variant in a simple, fast web interface. Further, Kaviar calculates the overall frequency of each variant across all Kaviar data sources. Results for a list of 30 variants are returned in about 2 seconds. The entire Kaviar database can also be downloaded in Variant Call Format or VCF – commonly used for describing the SNVs in one genome or a collection of genomes – for fast and private local use.

This work was funded by Inova Translational Medicine Institute.

Recent Articles

  • Timing is Everything: ISB Study Finds Link Between Bowel Movement Frequency and Overall Health

    Everybody poops, but not every day. An ISB-led research team examined the clinical, lifestyle, and multi-omic data of more than 1,400 healthy adults. How often people poop, they found, can have a large influence on one’s physiology and health.

  • Wei Wei, PhD

    Dr. Wei Wei Promoted to Associate Professor

    Wei Wei, PhD – an accomplished cancer researcher with expertise in biotechnology and cancer systems biology – has been promoted to ISB associate professor. The Wei Lab focuses on understanding how cancer cells adapt to therapeutic treatment to foster therapy resistance by coordinating their internal molecular machinery and how these adaptive changes evolve within diverse tumors influenced by the tumor microenvironment. 

  • Drs. Nitin Baliga and James Park

    How Glioblastoma Resists Treatment – and Ways to Prevent It

    Glioblastoma is one of the deadliest and most aggressive forms of primary brain cancer in adults and is known for its ability to resist treatment and to recur. ISB researchers have made breakthrough discoveries in understanding the mechanisms behind acquired resistance, focusing on a rare and stubborn group of cells within tumors called glioma stem-like cells.