Farica Zhuang

Farica Zhuang

Computer science PhD student

University of Pennsylvania

About me

I am a computer science PhD student at University of Pennsylvania advised by Dr. Yoseph Barash. My research interest lies in the intersection of machine learning, healthcare, and biology. Part of my work right now is on using deep learning methods to detect important RNA secondary structures like G-quadruplexes that have been shown to have significant regulatory roles and are associated with cancers.

Before coming to Penn, I received a B.S in applied mathematics from UCSD and an M.S in computer science from Duke, where I was advised by Dr. Raluca Gordan working on transcription factor binding model.

Interests
  • Machine learning
  • Healthcare
  • Biology
Education
  • PhD in Computer Science, 2021-Present

    University of Pennsylvania

  • MS in Computer Science, 2020

    Duke University

  • BS in Applied Mathematics, 2018

    UC San Diego

Research Projects

G4mer: Transcriptome-wide mapping of RNA G-quadruplexes with an RNA langauge model

RNA G-quadruplex (rG4) formations are secondary structures that have been known to play an important role in cells and affect gene regulation. rG4 formations, if unstable, have been found to be linked to genetic diseases. Here, we introduce G4mer, an RNA language model that improves upon the current state-of-the-art models in predicting rG4 formations. G4mer detected variants affecting rG4s in breast cancer-associated genes. G4mer variant predictions were validated for their functional and structural effects both in situ and in vivo.

Presented as a long talk and poster at ISMB 2022 and ASHG 2023

Status: Published October 2024 bioRxiv (in preparation for journal submission)

APAeval

Alternative polyadenylation (APA) is an RNA-processing mechanism that generates distinct 3’ termini on transcripts, allowing a single gene to encode multiple variants.Studies have reported a critical role for APA-mediated gene regulation during development or disease. Given the importance of APA, identifying and quantifying usage of identified polyadenylation sites (PAS) on a global scale is critical for understanding the underlying mechanisms of APA-mediated gene regulation. Here, we evaluate computational methods for the detection and quantification of poly(A) sites and estimating their differential usage across RNA-seq samples

Presented as a long talk and poster at ISMB 2022

Status: Published October 2023 RNA Journal

Modeling Transcription Factor Binding

Cooperative DNA-binding by transcription factor (TF) proteins is critical for gene expression regulation in eukaryotes. Currently, the rules that drive cooperative versus independent binding of TFs to neighboring sites are not well understood. Here, we developed models of cooperative DNA-binding for TF proteins, using human factors ETS1 and RUNX1 as our case study, based on a high-throughput assay designed specifically to detect cooperativity.

Presented as a poster at Recomb 2019

Status: Published October 2023 NAR

GENESIS: Gene-specific machine learning models for variants of uncertain significance

Genetic mutations found in the heart have been linked to arrhythmic heart disease. Here, we developed gene-specific machine learning models for variants of uncertain significance found in catecholaminergic polymorphic ventricular tachycardia and long QT syndrome-associated genes. Results from the models were used to help determine the suitable treatments of patients suffering from hereditary irregular heartbeat but had unclear genetic testing results.

Status: Published April 2022 Circulation: Arrhythmia and Electrophysiology

Predict mortality and allocate palliative care for older patients with hip fracture

Each year 300,000+ older adults are treated for hip fracture in the United States. For the majority who suffer a fractured hip, the likelihood of full functional recovery remains low, and other related poor health outcomes include permanent nursing home placement and excess mortality. Here investigate the use of machine learning models for mortality prediction in older adult hip fracture patients using Medicare assessment and administrative claims data with the overall goal of allocating palliative care resources more effectively.

Presented as a talk at South Carolina National Big Data Health Science Conference 2020

Status: Published Feb 2021 JAMDA

Awards

Outstanding Teaching Assistant Award
HackXX Beginner Award Champion
Outstanding Student Award

Contact me

Leave me a message if you’d like to get in touch!