Alberto Pessia

Postdoctoral researcher

University of Helsinki

Hi!

I’m Alberto, a postdoc at the University of Helsinki. My current research focuses on statistical models for cancer combination drug therapies, but I also work on statistical population genomics and metabolomics data analysis.

I am mainly interested in statistical methodology and computational statistics. I love developing software and keep up to date with the latest technology.

Interests

Bayesian statistics
Computational statistics
Combination drug therapy
Metabolomics
Cluster analysis
Clinical trials

Education

PhD in Statistics, 2017

University of Helsinki
MSc in Statistics, 2011

Sapienza University of Rome
BSc in Statistics, 2008

Sapienza University of Rome

Projects

BirthDeathProcess.jl

Julia package for fitting a simple birth-and-death process without migration.

kpax2

R package for bi-clustering multivariate categorical data.

Kpax3.jl

Julia package for bi-clustering multivariate categorical data.

Recent & Upcoming Talks

Numerical evaluation of the transition probability of the simple birth-and-death process

The simple birth-and-death process is a continuous-time Markov process that is commonly employed for describing changes over time of …

2019-07-25 15:15 Palermo, Italy

Project

Bayesian bi-clustering of categorical data

Cluster analysis is a common statistical technique for partitioning the observed data into disjoint homogeneous groups. In the presence …

2017-07-27 12:10 Helsinki, Finland

Project Slides

Featured Publications

Pessia, A., Corander, J.

February 2018 Bioinformatics, 34(12): 2132–2133. doi: 10.1093/bioinformatics/bty056

Kpax3: Bayesian bi-clustering of large sequence datasets

Motivation
Estimation of the hidden population structure is an important step in many genetic studies. Often the aim is also to identify which sequence locations are the most discriminative between groups of samples for a given data partition. Automated discovery of interesting patterns that are present in the data can help to generate new biological hypotheses.
Results
We introduce Kpax3, a Bayesian method for bi-clustering multiple sequence alignments. Influence of individual sites will be determined in a supervised manner by using informative prior distributions for the model parameters. Our inference method uses an implementation of both split-merge and Gibbs sampler type MCMC algorithms to traverse the joint posterior of partitions of samples and variables. We use a large Rotavirus sequence dataset to demonstrate the ability of Kpax3 to generate biologically important hypotheses about differential selective pressures across a virus protein.
Availability and Implementation
Kpax3 is implemented as a Julia package and released under the MIT license. Source code and documentation are available at: https://github.com/albertopessia/Kpax3.jl

PDF Code Project DOI URL

Pessia, A.

October 2017 Doctoral dissertation

Bayesian cluster analysis with applications to pathogen population genomics

Identifying similarity patterns in heterogeneous observations is a very common problem in many branches of science. When the similarities and dissimilarities are encoded by a group structure, the task of dividing the observed sample into an unknown number of homogeneous groups is known as cluster analysis. Among the many types of statistical data analyses, it is one of the most widely applied.
In evolutionary biology, for example, the population structure plays an important role. Groups naturally arise as the result of evolutionary processes and depending on the resolution of the study, clusters might represent similar molecules, organisms, or even species. With the huge amount of genetic data now freely available in on-line databases, cluster analysis is a valuable technique to better understand the evolution of organisms.
In this dissertation we focus our attention on Bayesian approaches to model-based clustering. We review the mathematical formalization of the two most common methods, finite mixture models and product partition models, together with algorithms needed to draw inferences. We then introduce a novel Bayesian model which has been specifically designed to partition categorical data matrices. Finally, we show how cluster analysis is a very effective method for understanding the evolution of pathogens, and how this information is relevant to public health.

PDF Project Project URL