About the Lab

Much biological data is semantic in nature; this makes it a difficult substrate for computation. My research interests are centered on organizing biological data in ways that make it more amenable to computation. I am particularly interested in the development of ontologies that describe biological knowledge, and provide a means for detailed analysis of associated data.


  • Sequence Ontology. The Sequence Ontology (SO) aims to unify the terminology used to describe biological sequence. It has been developed in conjunction with the model organism database groups to simplify data exchange and promote the development of computable genomic annotations.

The SO is curated and maintained by this lab. We are also developing software to facilitate using the ontology, and are having fun exploring genomic annotations.
  • ClinGen Initiative. The lab is contributing to 2 of the 3 NHGRI ClinGen grants for variant databases. This involves collaboration with the NCBI’s ClinVar team. I am co-leading the Data Standards and IT group together with Sandy Aronson from Partners.
  • Metagenomics. The lab is involved in a collaboration with Robert Schlarberg in Pathology and the Yandell lab to build tools for the analysis of clinical metagenomic RNA-seq data. Our focus is community acquired pneumonia.

Previous projects

  • Disease Annotations for Variants in Personal Genomes. This project is in collaboration with Omicia, a personal genome software company.
  • Gene Ontology. The Gene Ontology (GO) has also provided the biological community with a tool that allows researchers to both communicate with each other effectively as it unified the vocabulary and also analyze large quantities of data. The GO is an ontology that describes the classes of molecular function, biological process and cellular location, and the relationships that hold between them. It is used by many of the model organism databases to label what the gene products do, what process they are involved in and where they are located. These functional annotations are then used to search across the genomes based on semantics rather than sequence similarity. 
We are part of the Gene Ontology Consortium
  • Ontologies for Public Health Informatics This project is in collaboration with Catherine Staes at the University of Utah


This lab is funded by the NIH -NHGRI (1R01HG004341-01) to develop software to facilitate the adoption of the Sequence Ontology, NHGRI U41 A Unified Clinical Genomics Database, NHGRI U01 A knowledge base for clinically  relevant genes and variants.