Recent technological advances enable the study of hundreds of thousands of human single-nucleotide polymorphisms at the population level. Because strategies for analyzing these data have not kept pace with the laboratory methods that generate the data, it is unlikely that these advances will immediately lead to an improved understanding of the genetic contribution to common human disease and drug response. Currently, no single analytical method allows us to extract all available information from a whole-genome association study. In fact, no single method can be optimal for all datasets, especially when the genetic architecture for diseases can vary substantially, as is certainly the case. Therefore, an integrative platform is needed to accommodate multiple analytical methods for analysis as we learn more about genetic architecture. As a result, we are developing a system for the analysis of genome-wide association data that will incorporate several analytical approaches as filters to allow a scientist to choose whatever analytical methods they wish to apply. PLATO (PLatform for the Analysis, Translation, and Organization of large-scale data) will incorporate a number of filters to select the important SNPs in a genome-wide association study.
A filter is an analytical or knowledge based method used to select a set of interesting SNPs from a larger set of SNPs. Version 1.0 of PLATO includes eight different filters which can find such SNP subsets and test for association. The rationale for PLATO depends on the fact that any single underlying analytical scheme may fail to reveal all important results and that multiple filters can give different results. However, once results are obtained these results can be viewed in light of the results from other filters in order to best understand the meaning of the genetic data. The potential to use multiple filters forces no a priori assumptions about the mode of action of the genetic components of a phenotype allowing the most general possible analysis and interpretation.
Although PLATO has many filters that are equipped to test single-locus genetic associations, there is only one current filter able to search for epistatis gene-gene interactions in genetic data. Currently, we are working on implementing multiple new filters for the purpose of identifying gene-gene interactions. The underlying theory behind applying multiple interaction searching filters to a single dataset is that no single method is best equipped to find all types of genetic models. Particularly when considering multi-locus interactions, the model space defining all possible types of genetic effects grows quite large and it is unlikely that one type of search would be best across all of this space. Thus, we should have higher power to identify more epistatic effects in a dataset when implementing multiple methods.
Two complicating factors in the statistical analysis of genome-wide association data are multiple testing and establishing the biological significance of a statistical result. To address these issues, a biofilter is being utilized; the biofilter will use a database of metabolic and regulatory pathway information to identify biologically plausible genetic models. Layers of biological machinery exist between genetic variations and the phenotypes they manifest, and imposing this extra dimension of known biological information into statistical analysis may help identify relationships between genetic variants that contribute to common complex disease.
- Grady BJ, Torstenson ES, Dudek SM, Giles J, Sexton D, Ritchie MD. Finding unique filter sets in PLATO: a precursor to efficient interaction analysis in GWAS data. Pacific Symposium on Biocomputing, 315-26 (2010).
- Turner SD, Armstrong L, Bradford Y, Carlson C, Crawford DC, Crenshaw AT, de Andrede M, Doheny K, Haines JL, Hayes G, Jarvik G, Jiang L, Ling H, Kullo I, Li R, Manolio TA, Matsumoto M, McCarty CA, McDavid A, Mirel D, Paschall J, Pugh E, Rasmussen LV, Wilke RA, Zuvich RL, Ritchie MD. Quality Control procedures for Genome-Wide Association Studies. Current Protocols in Human Genetics, Chapter 1: Unit1.19. (2011)
- Srinivasan BS, Chen J, Cheng C, Conti D, Duan S, Fridley BL, Gu X, Haines JL, Jorgenson E, Kraja A, Lasky-Su J, Li L, Rodin A, Wang D, Province M, Ritchie MD. Methods for analysis in pharmacogenomics: lessons from the Pharmacogenetics Research Network Analysis Group. Pharmacogenomics 2009 Feb;10(2):243-51.
- Srinivasan BS, Chen J, Cheng C, Conti D, Duan S, Fridley BL, Gu X, Haines JL, Jorgenson E, Kraja A, Lasky-Su J, Li L, Rodin A, Wang D, Province M, and Ritchie MD. Methods for analysis in pharmacogenomics: Lessons from the Pharmacogenetics Research Network (PGRN) Analysis Group. Pharmacogenomics, 10:243-51 (2009).
- Grady BJ, Torstenson ES, McLaren PJ, de Bakker P, Haas DW, Robbins GK, Gulick RM, Haubrich R, Ribaudo H, Ritchie MD. Use of biological knowledge to inform the analysis of gene-gene interactions involved in modulating virologic failure with efavirenz-containing treatment regimens in art-naÃ¯ve actg clinical trials participants. Pacific Symposium on Biocomputing, 253-64. (2011)