Using the glmnet and ncvreg packages, fits a Generalized Linear Model or Cox Proportional Hazards Model using various methods for choosing the regularization parameter \(\lambda\)
Source:R/SIS-tune.fit.R
tune.fit.Rd
This function is modified from SIS::tune.fit()
. It is used to tune the regularization parameter for the regularized VAR models. This wrapper is used because of the following reasons.
The original
SIS::tune.fit()
function does not return the value of the information criteria that we would like to use.We use the ncvreg package exclusively (so we removed the code using the glmnet package). This is to make the result more consistent, and also because the ncvreg package has better support for the calculation of information criteria.
We also removed the generalized linear model (GLM) option, and the cross-validation option because we do not use them.
We use stats::AIC() and stats::BIC() instead of the ones using SIS:::loglik() to make the calculation methods more consistent.
We added
...
to allow the user to pass additional arguments to the ncvreg::ncvreg() function.
Arguments
- x
The design matrix, of dimensions n * p, without an intercept. Each row is an observation vector.
- y
The response vector of dimension n * 1. Quantitative for
family='gaussian'
, non-negative counts forfamily='poisson'
, binary (0-1) forfamily='binomial'
. Forfamily='cox'
,y
should be an object of classSurv
, as provided by the functionSurv()
in the package survival.- family
Response type (see above).
- penalty
The penalty to be applied in the regularized likelihood subproblems. 'SCAD' (the default), 'MCP', or 'lasso' are provided.
- concavity.parameter
The tuning parameter used to adjust the concavity of the SCAD/MCP penalty. Default is 3.7 for SCAD and 3 for MCP.
- tune
Method for selecting the regularization parameter along the solution path of the penalized likelihood problem. Options to provide a final model include
tune='cv'
,tune='aic'
,tune='bic'
, andtune='ebic'
. See references at the end for details.- type.measure
Loss to use for cross-validation. Currently five options, not all available for all models. The default is
type.measure='deviance'
, which uses squared-error for gaussian models (also equivalent totype.measure='mse'
in this case), deviance for logistic and poisson regression, and partial-likelihood for the Cox model. Bothtype.measure='class'
andtype.measure='auc'
apply only to logistic regression and give misclassification error and area under the ROC curve, respectively.type.measure='mse'
ortype.measure='mae'
(mean absolute error) can be used by all models except the'cox'
; they measure the deviation from the fitted mean to the response. Forpenalty='SCAD'
andpenalty='MCP'
, onlytype.measure='deviance'
is available.- gamma.ebic
Specifies the parameter in the Extended BIC criterion penalizing the size of the corresponding model space. The default is
gamma.ebic=1
. See references at the end for details.- ...
additional arguments to be passed to the ncvreg::ncvreg() function.
Value
Returns an object with
- ix
The vector of indices of the nonzero coefficients selected by the maximum penalized likelihood procedure with
tune
as the method for choosing the regularization parameter.- a0
The intercept of the final model selected by
tune
.- beta
The vector of coefficients of the final model selected by
tune
.- fit
The fitted penalized regression object.
- lambda
The corresponding lambda in the final model.
- lambda.ind
The index on the solution path for the final model.
Details
Original description from SIS::tune.fit()
:
This function fits a generalized linear model or a Cox proportional hazards model via penalized maximum likelihood, with available penalties as indicated in the glmnet and ncvreg packages. Instead of providing the whole regularization solution path, the function returns the solution at a unique value of \(\lambda\), the one optimizing the criterion specified in tune.
References
Jerome Friedman and Trevor Hastie and Rob Tibshirani (2010) Regularization Paths for Generalized Linear Models Via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.
Noah Simon and Jerome Friedman and Trevor Hastie and Rob Tibshirani (2011) Regularization Paths for Cox's Proportional Hazards Model Via Coordinate Descent. Journal of Statistical Software, 39(5), 1-13.
Patrick Breheny and Jian Huang (2011) Coordiante Descent Algorithms for Nonconvex Penalized Regression, with Applications to Biological Feature Selection. The Annals of Applied Statistics, 5, 232-253.
Hirotogu Akaike (1973) Information Theory and an Extension of the Maximum Likelihood Principle. In Proceedings of the 2nd International Symposium on Information Theory, BN Petrov and F Csaki (eds.), 267-281.
Gideon Schwarz (1978) Estimating the Dimension of a Model. The Annals of Statistics, 6, 461-464.
Jiahua Chen and Zehua Chen (2008) Extended Bayesian Information Criteria for Model Selection with Large Model Spaces. Biometrika, 95, 759-771.
Examples
set.seed(0)
data("leukemia.train", package = "SIS")
y.train <- leukemia.train[, dim(leukemia.train)[2]]
x.train <- as.matrix(leukemia.train[, -dim(leukemia.train)[2]])
x.train <- SIS::standardize(x.train)
model <- tune.fit(x.train[, 1:3500], y.train, family = "binomial", tune = "bic")
#> Warning: Maximum number of iterations reached
model$ix
#> V461 V2020 V3320
#> 461 2020 3320
model$a0
#> (Intercept)
#> -1.00974
model$beta
#> V461 V2020 V3320
#> 0.411225 2.356301 2.615023