التفاصيل البيبلوغرافية
العنوان: |
Bayesian Additive Regression Trees using Bayesian Model Averaging |
المؤلفون: |
Hernández, Belinda, Raftery, Adrian E., Pennington, Stephen R., Parnell, Andrew C. |
سنة النشر: |
2015 |
المجموعة: |
Statistics |
مصطلحات موضوعية: |
Statistics - Computation, Statistics - Methodology |
الوصف: |
Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for data sets where the number of variables $p$ is large (e.g. $p>5,000$) the algorithm can become prohibitively expensive, computationally. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However, as it is not a statistical model, it cannot produce probabilistic estimates or predictions. We propose an alternative algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to produce a model which is much more efficient than BART for datasets with large $p$. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small $n$ large $p$" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments; one to distinguish between patients with cardiovascular disease and controls and another to classify agressive from non-agressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git |
نوع الوثيقة: |
Working Paper |
URL الوصول: |
http://arxiv.org/abs/1507.00181 |
رقم الانضمام: |
edsarx.1507.00181 |
قاعدة البيانات: |
arXiv |