A biologically annotated neural network for proteomic discovery in Parkinsons disease

medRxiv

Machine learning models that can utilize high-dimensional data to make predictions and derive biological insights can improve understanding of diseases. Here, we develop a biologically annotated neural network model for proteomics data (P-BANN) which has several practical advantages: (1) it incorporates known relationships between proteins and signaling pathways into its architecture design; (2) it uses Bayesian principles to enable variable selection on the most important proteins for a disease of interests; and (3) it combines structured and black-box variational inference to analyze different classes of phenotypes at scale. To demonstrate the value of the approach, we apply P-BANN to one of the most common neurodegenerative disorders: Parkinson’s disease (PD). We consider two biomarker-defined phenotypes within the PD population: presence of neuronal-predominate aggregated α-synuclein in cerebrospinal fluid, and changes in dopamine transporter binding in the striatum on imaging. By considering biomarkers of both neuropathological hallmarks of PD, we can examine the extent to which their underlying biology is connected. Using the P-BANN framework, we discover sparse, statistically-calibrated sets of proteins which map to pathways, enabling more straightforward interpretation and generation of testable hypotheses.