Statistical Inference for Sequential Feature Selection after Domain Adaptation

التفاصيل البيبلوغرافية
العنوان: Statistical Inference for Sequential Feature Selection after Domain Adaptation
المؤلفون: Loc, Duong Tan, Loi, Nguyen Thang, Duy, Vo Nguyen Le
سنة النشر: 2025
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Statistics - Machine Learning, Computer Science - Machine Learning
الوصف: In high-dimensional regression, feature selection methods, such as sequential feature selection (SeqFS), are commonly used to identify relevant features. When data is limited, domain adaptation (DA) becomes crucial for transferring knowledge from a related source domain to a target domain, improving generalization performance. Although SeqFS after DA is an important task in machine learning, none of the existing methods can guarantee the reliability of its results. In this paper, we propose a novel method for testing the features selected by SeqFS-DA. The main advantage of the proposed method is its capability to control the false positive rate (FPR) below a significance level $\alpha$ (e.g., 0.05). Additionally, a strategic approach is introduced to enhance the statistical power of the test. Furthermore, we provide extensions of the proposed method to SeqFS with model selection criteria including AIC, BIC, and adjusted R-squared. Extensive experiments are conducted on both synthetic and real-world datasets to validate the theoretical results and demonstrate the proposed method's superior performance.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2501.09933
رقم الانضمام: edsarx.2501.09933
قاعدة البيانات: arXiv