Computational genomics for disease gene discovery is in its infancy and the existing methods, focused on detection and modeling of multi-factor disease susceptibility, are just scratching the surface of possible approaches. A rigorous analytical strategy will include three components: variable selection, statistical modeling, and model interpretation. Many methods have been developed to deal with some of these issues. However, few, if any of these methods have the capability of incorporating multiple data types into a single analysis. In addition, most of these methods ignore domain knowledge that has the potential to illuminate valuable biological signals from the dark sea of statistical noise.
ATHENA: the Analysis Tool for Heritable and Environmental Network Associations is a solution for the dissection of genetic architecture in common, complex disease. ATHENA will provide a mechanism to 1) perform variable selection from categorical and continuous independent variables, 2) model single factor and/or interaction effects to predict continuous or categorical outcomes, and 3) interpret or annotate the significant statistical models for use in biomedical research.
To provide this flexibility, we will build on the GENN strategy previously developed. GENN has been a successful endeavor for the analysis of case-control studies and categorical independent variables (i.e. SNPs and categorical environmental factors). Because of the flexibility in the computational structure of GENN, the evolution of GENN into ATHENA is possible through a series of logical and natural extensions to the existing architecture.
First, we will extend GENN to include multiple data sources and data types into one comprehensive analysis. Neural networks can theoretically model any complex function with both categorical and continuous data types; thus it is the perfect tool for this problem domain.
Next, we will incorporate biological knowledge from the public domain. This knowledge will include information about how genes are related to one another in pathways, the hierarchical relationship among genes, protein-protein interactions, and protein structure information. To achieve this, we will extract information from The Gene Ontology, The Database of Interacting Proteins, The Protein Families Database, The Kyoto Encyclopedia of Genes and Genomes, Reactome, and Biopath. Dr. Ritchieâ€™s research lab has developed a Biofilter that incorporates all of this public domain information which will be used in ATHENA. This information will be incorporated into the grammar to facilitate a more intelligent variable selection process.
While, NN are the initial analysis approach, the grammar will be adapted to include other computational models. NN are extremely flexible, however, there are many other powerful methods as well. Grammatical evolution makes the modification of the type of computer program evolved very simple through adaptations of the grammar. We have selected three additional modeling strategies to build into the ATHENA framework along with NN including support vector machines (SVM), decision trees, and regression. Finally, the set of significant statistical models will be annotated using the public databases in the Biofilter such that the models can be interpreted for biology. This will tackle the interpretation challenges currently facing so many computational tools.
- ATHENA: A knowledge-based hybrid backpropagation-grammatical evolution neural network algorithm for discovering epistasis among quantitative trait Loci. Turner SD, Dudek SM, Ritchie MD. BioData Min. 2010 Sep 27;3(1):5.
- Grammatical Evolution of Neural Networks for Discovering Epistasis among Quantitative Trait Loci. Turner SD, Dudek SM, Ritchie MD. Lect Notes Comput Sci. 2010 Apr 7;6023:86-97.
- Initialization Parameter Sweep in ATHENA: Optimizing Neural Networks for Detecting Gene-Gene Interactions in the Presence of Small Main Effects. Holzinger ER, Buchanan CC, Dudek SM, Torstenson EC, Turner SD, Ritchie MD. Genet Evol Comput Conf. 2010;12:203-210.
- Conquering the Needle-in-a-Haystack: How Correlated Input Variables Beneficially Alter the Fitness Landscape for Neural Networks. Turner SD, Ritchie MD, Bush WS.Lect Notes Comput Sci. 2009 May 20;5483:80-91.
- Holzinger ER, Dudek SM, Torstenson EC, Ritchie MD. ATHENA Optimization: The Effect of Initial Parameter Settings Across Different Genetic Models. EvoBIO 2011 Conference.
- Turner SD, Dudek SM, Ritchie MD. Incorporating Domain Knowledge into Evolutionary Computing for Discovering Gene-Gene Interaction. 11th Intâ€™l Conference on Parallel Problem Solving From Nature (PPSN), Lecture Notes in Computer Science. 6238:394-403. (2011)
- Initialization Parameter Sweep in ATHENA: Optimizing Neural Networks for Detecting Gene-Gene Interactions in the Presence of Small Main Effects Emily R. Holzinger, Carrie C. Buchanan, Scott M. Dudek, Eric C. Torstenson, Stephen D. Turner, Marylyn D. Ritchie
- ATHENA optimization sweep configuration and grammar files