Title: | Nonlinear Regression using Horseshoe Prior |
---|---|
Description: | Provides the posterior estimates of the regression coefficients when horseshoe prior is specified. The regression models considered here are logistic model for binary response and log normal accelerated failure time model for right censored survival response. The linear model analysis is also available for completeness. All models provide deviance information criterion and widely applicable information criterion. See <doi:10.1111/rssc.12377> Maity et. al. (2019) <doi:10.1111/biom.13132> Maity et. al. (2020). |
Authors: | Arnab Kumar Maity [aut, cre] |
Maintainer: | Arnab Kumar Maity <[email protected]> |
License: | GPL-3 |
Version: | 0.0.6 |
Built: | 2024-11-14 04:13:41 UTC |
Source: | https://github.com/maitya02/horseshoenlm |
This function employs the algorithm provided by van der Pas et. al. (2016) for log normal Accelerated Failure Rate (AFT) model to fit survival regression. The censored observations are updated according to the data augmentation approach described in Maity et. al. (2019) and Maity et. al. (2020).
afths( ct, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, method.sigma = c("fixed", "Jeffreys"), Sigma2 = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
afths( ct, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, method.sigma = c("fixed", "Jeffreys"), Sigma2 = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
ct |
survival response, a |
X |
Matrix of covariates, dimension |
method.tau |
Method for handling |
tau |
Use this argument to pass the (estimated) value of |
method.sigma |
Select "Jeffreys" for full Bayes with Jeffrey's prior on the error
variance |
Sigma2 |
A fixed value for the error variance |
burn |
Number of burn-in MCMC samples. Default is 1000. |
nmc |
Number of posterior draws to be saved. Default is 5000. |
thin |
Thinning parameter of the chain. Default is 1 (no thinning). |
alpha |
Level for the credible intervals. For example, alpha = 0.05 results in 95% credible intervals. |
Xtest |
test design matrix. |
The model is:
is response,
is censored time,
is observed time,
is censored data, so
if
is event time and
if
is right censored
.
SurvivalHat |
Predictive survival probability |
LogTimeHat |
Predictive log time |
BetaHat |
Posterior mean of Beta, a |
LeftCI |
The left bounds of the credible intervals |
RightCI |
The right bounds of the credible intervals |
BetaMedian |
Posterior median of Beta, a |
LambdaHat |
Posterior samples of |
Sigma2Hat |
Posterior mean of error variance |
TauHat |
Posterior mean of global scale parameter tau, a positive scalar |
BetaSamples |
Posterior samples of |
TauSamples |
Posterior samples of |
Sigma2Samples |
Posterior samples of Sigma2 |
LikelihoodSamples |
Posterior samples of likelihood |
DIC |
Devainace Information Criterion of the fitted model |
WAIC |
Widely Applicable Information Criterion |
Maity, A. K., Carroll, R. J., and Mallick, B. K. (2019) "Integration of Survival and Binary Data for Variable Selection and Prediction: A Bayesian Approach", Journal of the Royal Statistical Society: Series C (Applied Statistics).
Maity, A. K., Bhattacharya, A., Mallick, B. K., & Baladandayuthapani, V. (2020). Bayesian data integration and variable selection for pan cancer survival prediction using protein expression data. Biometrics, 76(1), 316-325.
Stephanie van der Pas, James Scott, Antik Chakraborty and Anirban Bhattacharya (2016). horseshoe: Implementation of the Horseshoe Prior. R package version 0.1.0. https://CRAN.R-project.org/package=horseshoe
Enes Makalic and Daniel Schmidt (2016). High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649
burnin <- 500 nmc <- 1000 thin <- 1 y.sd <- 1 # standard deviation of the response p <- 100 # number of predictors ntrain <- 100 # training size ntest <- 50 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p)) tmean <- x %*% beta.t y <- rnorm(n, mean = tmean, sd = y.sd) X <- scale(as.matrix(x)) # standarization T <- exp(y) # AFT model C <- rgamma(n, shape = 1.75, scale = 3) # 42% censoring time time <- pmin(T, C) # observed time is min of censored and true status = time == T # set to 1 if event is observed ct <- as.matrix(cbind(time = time, status = status)) # censored time # Training set cttrain <- ct[1:ntrain, ] Xtrain <- X[1:ntrain, ] # Test set cttest <- ct[(ntrain + 1):n, ] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- afths(ct = cttrain, X = Xtrain, method.tau = "halfCauchy", method.sigma = "Jeffreys", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the true predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
burnin <- 500 nmc <- 1000 thin <- 1 y.sd <- 1 # standard deviation of the response p <- 100 # number of predictors ntrain <- 100 # training size ntest <- 50 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p)) tmean <- x %*% beta.t y <- rnorm(n, mean = tmean, sd = y.sd) X <- scale(as.matrix(x)) # standarization T <- exp(y) # AFT model C <- rgamma(n, shape = 1.75, scale = 3) # 42% censoring time time <- pmin(T, C) # observed time is min of censored and true status = time == T # set to 1 if event is observed ct <- as.matrix(cbind(time = time, status = status)) # censored time # Training set cttrain <- ct[1:ntrain, ] Xtrain <- X[1:ntrain, ] # Test set cttest <- ct[(ntrain + 1):n, ] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- afths(ct = cttrain, X = Xtrain, method.tau = "halfCauchy", method.sigma = "Jeffreys", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the true predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
This function employs the algorithm provided by van der Pas et. al. (2016) for linear model to fit Bayesian regression.
lmhs( y, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, method.sigma = c("fixed", "Jeffreys"), Sigma2 = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
lmhs( y, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, method.sigma = c("fixed", "Jeffreys"), Sigma2 = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
y |
Response vector. |
X |
Matrix of covariates, dimension |
method.tau |
Method for handling |
tau |
Use this argument to pass the (estimated) value of |
method.sigma |
Select "Jeffreys" for full Bayes with Jeffrey's prior on the error
variance |
Sigma2 |
A fixed value for the error variance |
burn |
Number of burn-in MCMC samples. Default is 1000. |
nmc |
Number of posterior draws to be saved. Default is 5000. |
thin |
Thinning parameter of the chain. Default is 1 (no thinning). |
alpha |
Level for the credible intervals. For example, alpha = 0.05 results in 95% credible intervals. |
Xtest |
test design matrix. |
The model is:
is response,
.
yHat |
Predictive response |
BetaHat |
Posterior mean of Beta, a |
LeftCI |
The left bounds of the credible intervals |
RightCI |
The right bounds of the credible intervals |
BetaMedian |
Posterior median of Beta, a |
LambdaHat |
Posterior samples of |
Sigma2Hat |
Posterior mean of error variance |
TauHat |
Posterior mean of global scale parameter tau, a positive scalar |
BetaSamples |
Posterior samples of |
TauSamples |
Posterior samples of |
Sigma2Samples |
Posterior samples of Sigma2 |
LikelihoodSamples |
Posterior samples of likelihood |
DIC |
Devainace Information Criterion of the fitted model |
WAIC |
Widely Applicable Information Criterion |
Maity, A. K., Carroll, R. J., and Mallick, B. K. (2019) "Integration of Survival and Binary Data for Variable Selection and Prediction: A Bayesian Approach", Journal of the Royal Statistical Society: Series C (Applied Statistics).
Maity, A. K., Bhattacharya, A., Mallick, B. K., & Baladandayuthapani, V. (2020). Bayesian data integration and variable selection for pan cancer survival prediction using protein expression data. Biometrics, 76(1), 316-325.
Stephanie van der Pas, James Scott, Antik Chakraborty and Anirban Bhattacharya (2016). horseshoe: Implementation of the Horseshoe Prior. R package version 0.1.0. https://CRAN.R-project.org/package=horseshoe
Enes Makalic and Daniel Schmidt (2016). High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649
burnin <- 500 nmc <- 1000 thin <- 1 y.sd <- 1 # standard deviation of the response p <- 100 # number of predictors ntrain <- 100 # training size ntest <- 50 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p)) tmean <- x %*% beta.t y <- rnorm(n, mean = tmean, sd = y.sd) X <- scale(as.matrix(x)) # standarization # Training set ytrain <- y[1:ntrain] Xtrain <- X[1:ntrain, ] # Test set ytest <- y[(ntrain + 1):n] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- lmhs(y = ytrain, X = Xtrain, method.tau = "halfCauchy", method.sigma = "Jeffreys", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the true predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
burnin <- 500 nmc <- 1000 thin <- 1 y.sd <- 1 # standard deviation of the response p <- 100 # number of predictors ntrain <- 100 # training size ntest <- 50 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p)) tmean <- x %*% beta.t y <- rnorm(n, mean = tmean, sd = y.sd) X <- scale(as.matrix(x)) # standarization # Training set ytrain <- y[1:ntrain] Xtrain <- X[1:ntrain, ] # Test set ytest <- y[(ntrain + 1):n] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- lmhs(y = ytrain, X = Xtrain, method.tau = "halfCauchy", method.sigma = "Jeffreys", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the true predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
This function employs the algorithm provided by Makalic and Schmidt (2016) for binary logistic model to fit Bayesian logistic regression. The observations are updated according to the Polya-Gamma data augmentation of approach of Polson, Scott, and Windle (2014).
logiths( z, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
logiths( z, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
z |
Response, a |
X |
Matrix of covariates, dimension |
method.tau |
Method for handling |
tau |
Use this argument to pass the (estimated) value of |
burn |
Number of burn-in MCMC samples. Default is 1000. |
nmc |
Number of posterior draws to be saved. Default is 5000. |
thin |
Thinning parameter of the chain. Default is 1 (no thinning). |
alpha |
Level for the credible intervals. For example, alpha = 0.05 results in 95% credible intervals. |
Xtest |
test design matrix. |
The model is:
is response either 1 or 0,
.
ProbHat |
Predictive probability |
BetaHat |
Posterior mean of Beta, a |
LeftCI |
The left bounds of the credible intervals |
RightCI |
The right bounds of the credible intervals |
BetaMedian |
Posterior median of Beta, a |
LambdaHat |
Posterior samples of |
TauHat |
Posterior mean of global scale parameter tau, a positive scalar |
BetaSamples |
Posterior samples of |
TauSamples |
Posterior samples of |
LikelihoodSamples |
Posterior samples of likelihood |
DIC |
Devainace Information Criterion of the fitted model |
WAIC |
Widely Applicable Information Criterion |
Stephanie van der Pas, James Scott, Antik Chakraborty and Anirban Bhattacharya (2016). horseshoe: Implementation of the Horseshoe Prior. R package version 0.1.0. https://CRAN.R-project.org/package=horseshoe
Enes Makalic and Daniel Schmidt (2016). High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649
Polson, N.G., Scott, J.G. and Windle, J. (2014) The Bayesian Bridge. Journal of Royal Statistical Society, B, 76(4), 713-733.
burnin <- 100 nmc <- 500 thin <- 1 p <- 100 # number of predictors ntrain <- 250 # training size ntest <- 100 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p)) zmean <- x %*% beta.t z <- rbinom(n, size = 1, prob = boot::inv.logit(zmean)) X <- scale(as.matrix(x)) # standarization # Training set ztrain <- z[1:ntrain] Xtrain <- X[1:ntrain, ] # Test set ztest <- z[(ntrain + 1):n] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- logiths(z = ztrain, X = Xtrain, method.tau = "halfCauchy", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the true predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
burnin <- 100 nmc <- 500 thin <- 1 p <- 100 # number of predictors ntrain <- 250 # training size ntest <- 100 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p), sigma = diag(p)) zmean <- x %*% beta.t z <- rbinom(n, size = 1, prob = boot::inv.logit(zmean)) X <- scale(as.matrix(x)) # standarization # Training set ztrain <- z[1:ntrain] Xtrain <- X[1:ntrain, ] # Test set ztest <- z[(ntrain + 1):n] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- logiths(z = ztrain, X = Xtrain, method.tau = "halfCauchy", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the true predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
This function employs the algorithm provided by Makalic and Schmidt (2016) for binary probit model to fit Bayesian probit regression. The observations are updated according to the data augmentation of approach of Albert and Chib (1993).
The model is:
is response either 1 or 0,
.
probiths( z, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
probiths( z, X, method.tau = c("fixed", "truncatedCauchy", "halfCauchy"), tau = 1, burn = 1000, nmc = 5000, thin = 1, alpha = 0.05, Xtest = NULL )
z |
Response, a |
X |
Matrix of covariates, dimension |
method.tau |
Method for handling |
tau |
Use this argument to pass the (estimated) value of |
burn |
Number of burn-in MCMC samples. Default is 1000. |
nmc |
Number of posterior draws to be saved. Default is 5000. |
thin |
Thinning parameter of the chain. Default is 1 (no thinning). |
alpha |
Level for the credible intervals. For example, alpha = 0.05 results in 95% credible intervals. |
Xtest |
test design matrix. |
ProbHat |
Predictive probability |
BetaHat |
Posterior mean of Beta, a |
LeftCI |
The left bounds of the credible intervals |
RightCI |
The right bounds of the credible intervals |
BetaMedian |
Posterior median of Beta, a |
LambdaHat |
Posterior samples of |
TauHat |
Posterior mean of global scale parameter tau, a positive scalar |
BetaSamples |
Posterior samples of |
TauSamples |
Posterior samples of |
LikelihoodSamples |
Posterior samples of likelihood |
DIC |
Devainace Information Criterion of the fitted model |
WAIC |
Widely Applicable Information Criterion |
Stephanie van der Pas, James Scott, Antik Chakraborty and Anirban Bhattacharya (2016). horseshoe: Implementation of the Horseshoe Prior. R package version 0.1.0. https://CRAN.R-project.org/package=horseshoe
Enes Makalic and Daniel Schmidt (2016). High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649
Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88(422), 669-679.
burnin <- 100 nmc <- 200 thin <- 1 y.sd <- 1 # statndard deviation of the response p <- 200 # number of predictors ntrain <- 250 # training size ntest <- 100 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p)) zmean <- x %*% beta.t y <- rnorm(n, mean = zmean, sd = y.sd) z <- ifelse(y > 0, 1, 0) X <- scale(as.matrix(x)) # standarization z <- as.numeric(as.matrix(c(z))) # Training set ztrain <- z[1:ntrain] Xtrain <- X[1:ntrain, ] # Test set ztest <- z[(ntrain + 1):n] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- probiths(z = ztrain, X = Xtrain, method.tau = "halfCauchy", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the significant predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster # return cluster indices cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables
burnin <- 100 nmc <- 200 thin <- 1 y.sd <- 1 # statndard deviation of the response p <- 200 # number of predictors ntrain <- 250 # training size ntest <- 100 # test size n <- ntest + ntrain # sample size q <- 10 # number of true predictos beta.t <- c(sample(x = c(1, -1), size = q, replace = TRUE), rep(0, p - q)) x <- mvtnorm::rmvnorm(n, mean = rep(0, p)) zmean <- x %*% beta.t y <- rnorm(n, mean = zmean, sd = y.sd) z <- ifelse(y > 0, 1, 0) X <- scale(as.matrix(x)) # standarization z <- as.numeric(as.matrix(c(z))) # Training set ztrain <- z[1:ntrain] Xtrain <- X[1:ntrain, ] # Test set ztest <- z[(ntrain + 1):n] Xtest <- X[(ntrain + 1):n, ] posterior.fit <- probiths(z = ztrain, X = Xtrain, method.tau = "halfCauchy", burn = burnin, nmc = nmc, thin = 1, Xtest = Xtest) posterior.fit$BetaHat # Posterior processing to recover the significant predictors cluster <- kmeans(abs(posterior.fit$BetaHat), centers = 2)$cluster # return cluster indices cluster1 <- which(cluster == 1) cluster2 <- which(cluster == 2) min.cluster <- ifelse(length(cluster1) < length(cluster2), 1, 2) which(cluster == min.cluster) # this matches with the true variables