With the recent decreasing cost of genome sequence data, there has been increasing interest in rare variants and methods to detect their association to disease. We developed BioBin, a flexible collapsing method inspired by biological knowledge that can be used to automate the binning of low frequency variants for association testing. We also built the Library of Knowledge Integration (LOKI), a repository of data assembled from public databases, which contains resources such as: dbSNP and gene Entrez database information from the National Center for Biotechnology (NCBI), pathway information from Gene Ontology (GO), Protein families database (Pfam), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, NetPath - signal transduction pathways, Open Regulatory Annotation Database (ORegAnno), Biological General Repository for Interaction Datasets (BioGrid), Pharmacogenomics Knowledge Base (PharmGKB), Molecular INTeraction database (MINT), and evolutionary conserved regions (ECRs) from UCSC Genome Browser. The novelty of BioBin is access to comprehensive knowledge-guided multi-level binning. For example, bin boundaries can be formed using genomic locations from: functional regions, evolutionary conserved regions, genes, and/or pathways.
We tested BioBin using simulated data and 1000 Genomes Project low coverage data to test our method with simulated causative variants and a pairwise comparison of rare variant (MAF < 0.03) burden differences between Yoruba individuals (YRI) and individuals of European descent (CEU). Lastly, we analyzed the NHLBI GO Exome Sequencing Project Kabuki dataset, a congenital disorder affecting multiple organs and often intellectual disability, contrasted with Complete Genomics data as controls.
- Using BioBin to explore rare variant population stratification. Carrie B. Moore, John R. Wallace, Alex T. Frase, Sarah A. Pendergrass, Marylyn D. Ritchie, 332-343, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 2013, PMID: 23424138 PMCID: PMC3638724
- BioBin: a bioinformatics tool for automating the binning of rare variants using publicly available biological knowledge. Carrie B. Moore, John R. Wallace, Alex T. Frase, Sarah A. Pendergrass, Marylyn D. Ritchie, 6 Suppl 2, BMC medical genomics, 2013, PMID: 23819467 PMCID: PMC3654874