Research
We develop novel methods to extract insights from big data in genomics
Overview
Our research stands at the intersection between human genetics, omic technologies, and AI/ML to advance biomedical science. Sections below describe the three key research areas in my lab, followed by key publications (click on cover images to see papers).
Human Genetics and Functional Genomics
Our collaborative research has laid the groundwork on how genetic variations lead to complex diseases across 50 major human tissues (Nature, 2017). Building on this foundational work, our research further elucidated the genetic underpinnings of coronary artery disease (AJHG, 2018; Nature Medicine, 2019), leptomeningeal disease (JTO, 2018), rare diseases (Nature Medicine, 2019), age-related macular degeneration (Communications Biology, 2019), Alzheimer’s disease (Nature Genetics, 2020), obesity (Cell Genomics, 2025), and Parkinson’s disease (Nature Communications, 2025). Advancements in single-cell technologies have enabled us to further refine the precise cell types involved in each complex disease. Our research identified cell-type-specific alternative splicing as a key determinant on how genetic differences predispose individuals to autoimmune diseases (Nature Genetics, 2024), and genetic ancestry as a key mediator of complex disease risk (Cell, 2025).
AI and Machine Learning
Our research also builds AI and statistical machine learning tools to enable biological discoveries. Our work has tackled approximation methods for hard-to-estimate random variables (R Journal, 2017), established one of the first neural networks to jointly predict cis and trans regulation in yeast (NeurIPS, 2017), pioneered causal inference methods for mapping complex disease target genes (Nature Genetics, 2019), produced autoencoding neural networks for single-cell imputation (Quantitative Biology, 2020), and constructed transformer models for biomedical NLP (BMC Medical Informatics, 2021; Frontiers in Artificial Intelligence, 2021). Building on these foundational work, my team has established comprehensive toolkits for mapping complex disease target genes (Nature Genetics, 2025), as well as geometric deep learning models for spatial multi-omic integration.
AI for Target Discovery and Drug Design
Using the same AI and ML techniques, our collaborative work has built one of the best-performing models to optimize mRNA stability and protein expression for mRNA therapeutics (Nature, 2023), as well as graph neural networks to optimize siRNA efficacy (Briefings in Bioinformatics, 2024). During the COVID-19 pandemic, our work was licensed by Sanofi under a 5-million-dollar contract, and we designed SW-BIC-213, a COVID-19 vaccine that received emergency use authorization in Southeast Asia. Building on these foundations, my team is building the next-generation AI platform to generate full-length mRNA sequences optimized for mRNA vaccine stability and protein expression (work in progress), with a keen focus on dengue and influenza. In addition, we have used data science to identify high-confidence drug targets for coronary artery disease (AJHG, 2018), and the targets we identified have been validated in mouse models (Nature Medicine, 2019).