SMS scnews item created by Linh Nghiem at Fri 19 Apr 2024 0905
Type: Seminar
Modified: Wed 24 Apr 2024 2000; Tue 21 May 2024 1411; Tue 21 May 2024 1412
Distribution: World
Expiry: 11 Jun 2024
Calendar1: 4 Jun 2024 1400-1500
Auth: linhn@220-245-52-174.tpgi.com.au (hngh7483) in SMS-SAML

Statistics Seminar

Meta-clustering of Gene Expression Data

Wei

Our next statistics seminar is presented by A/Prof Yingying Wei from CUHK. This is a re-scheduled seminar that had been cancelled in April.

Title: Meta-clustering of Gene Expression Data
Speaker: Yingying Wei
Time and location : 2-3pm on Tuesday 4 June at F10A.01.101.Law Lecture Theatre 101
Abstract : Traditional meta-analyses pool effect sizes across studies to improve statistical power. Likewise, there is growing interest in joint clustering across datasets to identify disease subtypes for bulk gene expression data and to discover cell types for single-cell RNA-sequencing (scRNA-seq) data. Unfortunately, due to the prevalence of technical batch effects, directly clustering samples from multiple gene expression datasets can lead to wrong results. Therefore, in the past several years, there has been very active research on the integration of multiple gene expression datasets. However, the discussion on when multiple gene expression datasets can be integrated for joint clustering is lacking. Obviously, if different subtypes are assayed in distinct batches, then meta-clustering would be impossible no matter what types of machine learning or statistical methods are used.
In this talk, I will present our Batch-effects-correction-with-Unknown-Subtypes (BUS) framework. BUS is capable of adjusting batch effects explicitly, grouping samples that share similar characteristics into subtypes, identifying genes that distinguish subtypes and enjoying a linear-order computational complexity. The BUS framework can be adapted to perform meta-clustering for bulk gene expression data, scRNA-seq data collected from a single biological condition, and scRNA-seq data collected from multiple biological conditions, respectively. The proofs for model identifiability for the corresponding models provide insights on when multiple gene expression data can be integrated for meta-clustering. Simulation studies and real data analyses show the advantage of BUS over state-of-the-art methods.