Bayesian bi-clustering of categorical data

Name: Bayesian bi-clustering of categorical data
Start: 2017-07-27T12:10:00Z
Location: Helsinki, Finland

Project Slides

Abstract

Cluster analysis is a common statistical technique for partitioning the observed data into disjoint homogeneous groups. In the presence of multivariate data, it is often useful to identify which features are best predictors of cluster association. The problem is formalized as a bidirectional Bayesian cluster analysis, both in the units space and the features space. The aim is obviously to perform a clustering of the observed sample, but also to classify the variables according to prespecified levels of discrimination power. Split-merge and Gibbs sampler type MCMC algorithms are employed to simultaneously traverse the posterior of partitions of samples and variables. We show how the model can be successfully utilized for clustering genetic data and highlighting sites under selective pressure. Software implementation for clustering categorical data matrices is freely available at https://github.com/albertopessia/Kpax3.jl

Date

2017-07-27 12:10

Event

European Meeting of Statisticians 2017

Location

Helsinki, Finland

Bayesian bi-clustering of categorical data

Abstract

Alberto Pessia

Postdoctoral researcher