Biomedical and Translational Informatics Laboratory



Method Description:

The ever-growing wealth of biological information available through multiple comprehensive database repositories can be leveraged for advanced analysis of data. We have now extensively revised and updated the multi-purpose software tool Biofilter that allows researchers to annotate and/or filter data as well generate gene-gene interaction models based on existing biological knowledge. Biofilter now has the Library of Knowledge Integration (LOKI), for accessing and integrating existing comprehensive database information, including more flexibility for how ambiguity of gene identifiers are handled. We have also updated the way importance scores for interaction models are generated, and have used permutation testing to evaluate these scores. In addition, Biofilter 2.0 now works with a range of types and formats of data, including single nucleotide polymorphism (SNP) identifiers, rare variant identifiers, base pair positions, gene symbols, genetic regions, and copy number variant (CNV) location information.


 Biofilter provides a convenient single interface for accessing multiple publicly available human genetic data sources that have been compiled in the supporting database of LOKI. Information within LOKI includes genomic locations of SNPs and genes, as well as known relationships among genes and proteins such as interaction pairs, pathways and ontological categories.  Biofilter is a software tool that provides a flexible way to use the ever-expanding expert biological knowledge that exists to direct filtering, annotation, and complex predictive model development for elucidating the etiology of complex phenotypic outcomes.

Via Biofilter 2.0 researchers can:

  • Annotate genomic location or region based data, such as results from association studies, or CNV analyses, with relevant biological knowledge for deeper interpretation
  • Filter genomic location or region based data on biological criteria, such as filtering a series SNPs to retain only SNPs present in specific genes within specific pathways of interest
  • Generate Predictive Models for gene-gene, SNP-SNP, or CNV-CNV interactions based on biological information, with priority for models to be tested based on biological relevance, thus narrowing the search space and reducing multiple hypothesis-testing.

For more information on Biofilter features, please refer to Biofilter manual.

The two available files, affy.txt.gz and illm.txt.gz are tab-delimited descriptions of two-SNP genetic models for the Affymetrix SNP Chip 6.0 and the Illumina 1M HumanHap BeadArray. Models are annotated by the databases that support the model and the number of data sources that support the model. Data sources include PFAM, KEGG, DIP, NetPath, Reactome, and GO.
Download Biofilter

Related Publications:

  • Using Biological Knowledge to Uncover the Mystery in the Search for Epistasis in Genome-Wide Association Studies. Ritchie MD. Annals of Human Genetics, 75(1): 172-82.(2011).
  • Using prior knowledge and genome-wide association to identify pathways involved in multiple sclerosis. Ritchie MD. Genome Medicine 2009, 1:65 (doi:10.1186/gm65) Grady BJ, Torstenson ES, McLaren PJ, de Bakker P, Haas DW, Robbins GK, Gulick RM, Haubrich R, Ribaudo H, Ritchie MD.
  • Use of biological knowledge to inform the analysis of gene-gene interactions involved in modulating virologic failure with efavirenz-containing treatment regimens in art-naïve actg clinical trials participants. Pacific Symposium on Biocomputing, 253-64. (2011)
  •  Bush WS, Dudek SM, Ritchie MD. Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pacific Symposium on Biocomputing, 368-79 (2009).



Method Description:

Gene-centric analysis tools for genome-wide association study data are being developed both to annotate single locus statistics and to prioritize or group single nucleotide polymorphisms (SNPs) prior to analysis. These approaches require knowledge about the relationships between SNPs on a genotyping platform and genes in the human genome. SNPs in the genome can represent broader genomic regions via linkage disequilibrium (LD), and population-specific patterns of LD can be exploited to generate a data-driven map of SNPs to genes.

LD-Spline is a database routine that defines the genomic boundaries a particular SNP represents using linkage disequilibrium statistics from the International HapMap Project. LD-Spline performs comparably to the four-gamete rule and the Gabriel et al. approach; however as a SNP-centric approach LD-Spline has the added benefit of systematically identifying a genomic boundary for all SNPs, where the global block partitioning approaches may falter due to sampling variation in LD statistics.

Download LD-Spline

Related Publication: