Poisson PCA: Poisson measurement error corrected PCA, with application to microbiome data
العنوان: | Poisson PCA: Poisson measurement error corrected PCA, with application to microbiome data |
---|---|
المؤلفون: | Tianshu Huang, Hong Gu, Toby Kenney |
المصدر: | BiometricsREFERENCES. 77(4) |
سنة النشر: | 2020 |
مصطلحات موضوعية: | Statistics and Probability, FOS: Computer and information sciences, Computer science, Poisson distribution, 01 natural sciences, General Biochemistry, Genetics and Molecular Biology, Methodology (stat.ME), 010104 statistics & probability, 03 medical and health sciences, symbols.namesake, Applied mathematics, Poisson Distribution, 0101 mathematics, Statistics - Methodology, 030304 developmental biology, Parametric statistics, 0303 health sciences, Principal Component Analysis, General Immunology and Microbiology, Applied Mathematics, Dimensionality reduction, Microbiota, Shot noise, Estimator, General Medicine, Research Design, Outlier, Principal component analysis, symbols, General Agricultural and Biological Sciences, Count data |
الوصف: | In this paper, we study the problem of computing a Principal Component Analysis of data affected by Poisson noise. We assume samples are drawn from independent Poisson distributions. We want to estimate principle components of a fixed transformation of the latent Poisson means. Our motivating example is microbiome data, though the methods apply to many other situations. We develop a semiparametric approach to correct the bias of variance estimators, both for untransformed and transformed (with particular attention to log-transformation) Poisson means. Furthermore, we incorporate methods for correcting different exposure or sequencing depth in the data. In addition to identifying the principal components, we also address the non-trivial problem of computing the principal scores in this semiparametric framework. Most previous approaches tend to take a more parametric line. For example the Poisson-log-normal (PLN) model, approach. We compare our method with the PLN approach and find that our method is better at identifying the main principal components of the latent log-transformed Poisson means, and as a further major advantage, takes far less time to compute. Comparing methods on real data, we see that our method also appears to be more robust to outliers than the parametric method. Comment: 32 pages, 11 figures |
تدمد: | 1541-0420 |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=doi_dedup___::24c245d47150926d0688313d66bc4c86 https://pubmed.ncbi.nlm.nih.gov/33006392 |
Rights: | OPEN |
رقم الانضمام: | edsair.doi.dedup.....24c245d47150926d0688313d66bc4c86 |
قاعدة البيانات: | OpenAIRE |
تدمد: | 15410420 |
---|