The aim of the statistical bioinformatics seminar is to provide a forum for people working within the broad area of computation and statistics and their application to various aspects of biology to present their work and showcase their ongoing projects. It is intended to foster the exchange of ideas and build potential collaborations across multiple disciplines. The seminars will be held at 1:00 pm on Mondays at the Charles Perkins Centre, Seminar Room (Level 3, large meeting room). Seminars in 2018 will begin in March. The format of the talk is 30~45 minutes plus questions. Monday March 19, 2018 Speaker: Dana Pascovici (Macquarie University) Title: DIA/SWATH - challenges and opportunities for bioinformatics Abstract: Protein quantitation using DIA/SWATH mass spectrometry has been growing in popularity over the last few years. From the point of view of the bioinformatics involved, on one hand the data resulting from such experiments is quite easy to analyse at least if the experiment is not too large, due to a much lower percentage of missing data, and data look and distribution that makes existing methodology from other areas quite easily applicable. Put plainly, extracted SWATH data is quite nice to work with. However, that is because much of the difficulty has been pushed underneath, at the level of the SWATH library building and data extraction, where it is somewhat hidden from view. In this talk we will describe SWATH and its place in the landscape of quantitative proteomics (including broad comparisons with label free and labelled techniques such as iTRAQ and TMT), and the many positive aspects of the resulting SWATH datasets, from the point of view of the data analyst. We will also focus on how SWATH data extraction usually relies on using high quality peptide MS/MS spectral libraries, however building such libraries to ensure good proteome coverage can be time consuming and expensive. In order to address this issue various computational approaches for merging archived or external libraries were created and evaluated, including efforts from our group. We will describe the appeal of such methods, the possible issues that can ensue and some approaches to tackle them in order to ensure that the proteins are reliably detected and their quantitation is consistent and reproducible. We will discuss these aspects in the context of several existing datasets, including a carefully designed spiked-in experiment, and a recently published large plasma proteomics experiment containing samples from neonates, young children and adults. About the speaker: I am currently a Biostatistician at the Australian Proteome Analysis Facility at Macquarie University, where I help people generate biological insights out of their proteomics data, especially in the context of complex experiments. Working in a proteomics facility, our focus has been on generating reliable methods of interpreting and analysing data from a variety of platforms, lately emphasizing SWATH and TMT, and wherever possible incorporating them into software workflows. Areas of particular relevance to us have been plasma proteomics, and plant proteomics of agriculturally important species. Our work has benefitted from interactions with researchers, students and the APAF team of mass spectrometry specialists and analytical chemists. I come from a mathematical and computational background, having completed a bachelor degree in Mathematics and Computer Science at Dartmouth College in the US, followed by a PhD in Mathematics at MIT, and a brief stint of teaching at Purdue. In Sydney I took a more practical turn and worked in the industry in the area of speech recognition, before settling into biostatistics for the past 13 years, both in the industry and research environment.