On Model-Based Learning and Directional Outlier Detection -
Model-based clustering assumes that the data were generated from a convex combination of densities. The choice of the density function is crucial; the multivariate contaminated normal distribution (MCN) was proposed to model datasets characterized by the presence of outliers. The MCN is a two-component Gaussian mixture; one of the components, with a large prior probability, represents the good observations, and the other, with a small prior probability, the same mean, and an inflated covariance matrix, represents the outliers. Mixtures of MCN distributions can detect outliers and perform cluster analysis improving the clustering performance when compared to normal mixtures and representing an alternative to t mixtures. However, the mixture of MCN distributions uses univariate parameters to model the proportion of outliers and their impact on the inflation parameter, i.e., they are the same for all the variables. This is a limit because the outliers may be different in each dimension. To overcome this issue, we propose a multiple scaled contaminated normal distribution with p-dimensional proportion of outliers and degrees of contamination, where p is the number of variables.
Thursday, February 27, 2020 at 4:40pm to 5:30pm
Eliot Hall, 314
3203 Southeast Woodstock Boulevard, Portland, Oregon 97202-8199
Reed Community Members
If you are a member of the Reed community, you MUST LOG IN to see events that are open ONLY to the Reed community. Log in with your Reed ID (your Kerberos account information). If you don’t remember your account username or password, go to reed.edu/cis/help/kerberos.html.Log in with Reed ID