I am a computer science PhD student at University of Pennsylvania advised by Dr. Yoseph Barash. My research interest lies in the intersection of machine learning, healthcare, and biology. Part of my work right now is on using deep learning methods to detect important RNA secondary structures like G-quadruplexes that have been shown to have significant regulatory roles and are associated with cancers.
Before coming to Penn, I received a B.S in applied mathematics from UCSD and an M.S in computer science from Duke, where I was advised by Dr. Raluca Gordan working on transcription factor binding model.
PhD in Computer Science, 2021-Present
University of Pennsylvania
MS in Computer Science, 2020
Duke University
BS in Applied Mathematics, 2018
UC San Diego
G4mer: Transcriptome-wide mapping of RNA G-quadruplexes with an RNA langauge model
RNA G-quadruplex (rG4) formations are secondary structures that have been known to play an important role in cells and affect gene regulation. rG4 formations, if unstable, have been found to be linked to genetic diseases. Here, we introduce G4mer, an RNA language model that improves upon the current state-of-the-art models in predicting rG4 formations. G4mer detected variants affecting rG4s in breast cancer-associated genes. G4mer variant predictions were validated for their functional and structural effects both in situ and in vivo.
Presented as a long talk and poster at ISMB 2022 and ASHG 2023
Status: Published October 2024 bioRxiv (in preparation for journal submission)
APAeval
Alternative polyadenylation (APA) is an RNA-processing mechanism that generates distinct 3’ termini on transcripts, allowing a single gene to encode multiple variants.Studies have reported a critical role for APA-mediated gene regulation during development or disease. Given the importance of APA, identifying and quantifying usage of identified polyadenylation sites (PAS) on a global scale is critical for understanding the underlying mechanisms of APA-mediated gene regulation. Here, we evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples
Presented as a long talk and poster at ISMB 2022
Status: Published October 2023 RNA Journal
Modeling Transcription Factor Binding
Cooperative DNA-binding by transcription factor (TF) proteins is critical for gene expression regulation in eukaryotes. Currently, the rules that drive cooperative versus independent binding of TFs to neighboring sites are not well understood. Here, we developed models of cooperative DNA-binding for TF proteins, using human factors ETS1 and RUNX1 as our case study, based on a high-throughput assay designed specifically to detect cooperativity.
Presented as a poster at Recomb 2019
Status: Published October 2023 NAR
GENESIS: Gene-specific machine learning models for variants of uncertain significance
Genetic mutations found in the heart have been linked to arrhythmic heart disease. Here, we developed gene-specific machine learning models for variants of uncertain significance found in catecholaminergic polymorphic ventricular tachycardia and long QT syndrome-associated genes. Results from the models were used to help determine the suitable treatments of patients suffering from hereditary irregular heartbeat but had unclear genetic testing results.
Status: Published April 2022 Circulation: Arrhythmia and Electrophysiology
Predict mortality and allocate palliative care for older patients with hip fracture
Each year 300,000+ older adults are treated for hip fracture in the United States. For the majority who suffer a fractured hip, the likelihood of full functional recovery remains low, and other related poor health outcomes include permanent nursing home placement and excess mortality. Here investigate the use of machine learning models for mortality prediction in older adult hip fracture patients using Medicare assessment and administrative claims data with the overall goal of allocating palliative care resources more effectively.
Presented as a talk at South Carolina National Big Data Health Science Conference 2020
Status: Published Feb 2021 JAMDA
Leave me a message if you’d like to get in touch!