超高维部分线性变系数模型的贪婪变量筛选

    Greedy Variable Screening in Ultra-high Dimensional Partially Linear Varying Coefficient Models

    • 摘要: 考虑超高维部分线性变系数模型,其中线性部分的协变量的维数随着样本容量以指数阶的速度增长.考虑到超高维协变量间存在相关性,提出贪婪的profile向前回归(greedy profile forward regression,GPFR)方法对超高维的线性部分的协变量进行变量筛选.并在一定的正则条件下,证明了所提出GPFR方法的筛选相合性.GPFR方法得到一系列嵌套的模型,为确定是否将某个候选的解释变量选入模型,用EBIC准则选择“最优”的模型.通过数值模拟和实例分析研究了GPFR算法的有限样本性质,发现在变量间存在高度相关和信噪比较低时,所提的GPFR方法优势明显.

       

      Abstract: In this paper, the partially linear varying coefficient models were established when the predictors of the linear part were ultra-high dimensional, where the dimensionality grew exponentially with the sample size. A greedy profile forward regression (GPFR) method was proposed to finish the variable screening for the ultra-high dimensional linear predictors. Under some regularity conditions, the proposed GPFR method has a screening consistency property was proven. As for the GPFR procedure obtaining a list of the nested models, to determine whether or not to include the candidate predictor in the model of selected ones, an extended Bayesian information criterion (EBIC) was adopted to select the "best" candidate model. The finite-sample performance of the proposed GPFR method was assessed by using simulation studies and real data analysis. The result shows that the proposed GPFR method has advantage in the cases existing high correlation between the predictors and low signal noise ratio.

       

    /

    返回文章
    返回