Title: | Hybrid Genetic and Simulated Annealing Algorithm for High Dimensional Linear Models with Interaction Effects |
---|---|
Description: | We provide a stage-wise selection method using genetic algorithms, designed to efficiently identify main and two-way interactions within high-dimensional linear regression models. Additionally, it implements simulated annealing algorithm during the mutation process. The relevant paper can be found at: Ye, C.,and Yang,Y. (2019) <doi:10.1109/TIT.2019.2913417>. |
Authors: | Leiyue Li [aut, cre], Chenglong Ye [aut] |
Maintainer: | Leiyue Li <[email protected]> |
License: | GPL-2 |
Version: | 1.2.1 |
Built: | 2025-02-17 05:00:35 UTC |
Source: | https://github.com/cran/hySAINT |
Gives ABC score for each fitted model. For a model I, the ABC is defined as
When comparing ABC of fitted models to the same dataset, the smaller the ABC, the better fit.
ABC( X, y, heredity = "Strong", sigma, varind = NULL, interaction.ind = NULL, lambda = 10 )
ABC( X, y, heredity = "Strong", sigma, varind = NULL, interaction.ind = NULL, lambda = 10 )
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
A numeric value is returned. It represents the ABC score of the fitted model.
Ye, C. and Yang, Y., 2019. High-dimensional adaptive minimax sparse estimation with interactions.
# When sigma is known set.seed(0) interaction.ind <- t(combn(4,2)) X <- matrix(rnorm(50*4,1,0.1), 50, 4) epl <- rnorm(50,0,0.01) y <- 1+X[,1]+X[,2]+X[,1]*X[,2] + epl ABC(X, y, sigma = 0.01, varind = c(1,2,5), interaction.ind = interaction.ind) # When sigma is not known full <- Extract(X, varind = c(1:(dim(X)[2]+dim(interaction.ind)[1])), interaction.ind) sigma <- selectiveInference::estimateSigma(full, y)$sigmahat # Estimate sigma
# When sigma is known set.seed(0) interaction.ind <- t(combn(4,2)) X <- matrix(rnorm(50*4,1,0.1), 50, 4) epl <- rnorm(50,0,0.01) y <- 1+X[,1]+X[,2]+X[,1]*X[,2] + epl ABC(X, y, sigma = 0.01, varind = c(1,2,5), interaction.ind = interaction.ind) # When sigma is not known full <- Extract(X, varind = c(1:(dim(X)[2]+dim(interaction.ind)[1])), interaction.ind) sigma <- selectiveInference::estimateSigma(full, y)$sigmahat # Estimate sigma
This function gives offspring from parents. It performs crossover at a fixed probability of 0.6.
Crossover(X, myParent, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
Crossover(X, myParent, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
X |
Input data. An optional data frame, or numeric matrix of dimension
|
myParent |
A numeric matrix with dimension |
EVAoutput |
The output from function |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
numElite |
Number of elite parents. Default is 40. |
Offspring. If crossover occurred, it returns a numeric matrix with dimensions
choose(numElite,2)
by r1+r2
. Otherwise, numElite
by r1 + r2
.
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind) myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2) Offsprings <- Crossover(X, myParent, EVAoutput, r1 = 5, r2 = 2)
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind) myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2) Offsprings <- Crossover(X, myParent, EVAoutput, r1 = 5, r2 = 2)
This function ranks each main and interaction effect. It also calculate the ABC
score for each potential interactions across different heredity structures.
If heredity = "No"
and the the number of potential interactions exceed
choose(1000,2)
, distance correlation between each variable in X
and y
will be calculated so that it reduces the running time.
This ensures a more efficient evaluation process.
EVA( X, y, heredity = "Strong", r1, sigma, varind = NULL, interaction.ind = NULL, lambda = 10 )
EVA( X, y, heredity = "Strong", r1, sigma, varind = NULL, interaction.ind = NULL, lambda = 10 )
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
A list of output. The components are: ranked main effect, ranked.mainpool
;
and a 4-column matrix contains potential interactions ranked by ABC score, ranked.intermat
.
# Strong heredity set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind)
# Strong heredity set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind)
This function simplifies the data preparation process by enabling users to
extract specific columns from their dataset X
, and automatically
generating any necessary interaction effects based on varind
.
Extract(X, varind, interaction.ind = NULL)
Extract(X, varind, interaction.ind = NULL)
X |
Input data. An optional data frame, or numeric matrix of dimension
|
varind |
A numeric vector that specifies the indices of variables to be
extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that |
A numeric matrix is returned.
# Generate interaction.ind interaction.ind <- t(combn(4,2)) # Generate data set.seed(0) X <- matrix(rnorm(20), ncol = 4) y <- X[, 2] + rnorm(5) # Extract X1 and X1X2 from X1, ..., X4 Extract(X, varind = c(1,5), interaction.ind) # Extract X5 from X1, ..., X4 Extract(X, varind = 5, interaction.ind) # Extract using duplicated values try(Extract(X, varind = c(1,1), interaction.ind)) # this will not run
# Generate interaction.ind interaction.ind <- t(combn(4,2)) # Generate data set.seed(0) X <- matrix(rnorm(20), ncol = 4) y <- X[, 2] + rnorm(5) # Extract X1 and X1X2 from X1, ..., X4 Extract(X, varind = c(1,5), interaction.ind) # Extract X5 from X1, ..., X4 Extract(X, varind = 5, interaction.ind) # Extract using duplicated values try(Extract(X, varind = c(1,1), interaction.ind)) # this will not run
This is the main function of package hySAINT. It implements both genetic algorithm and simulated annealing. The simulated annealing technique is used within mutation operator.
hySAINT( X, y, heredity = "Strong", r1, r2, sigma, interaction.ind = NULL, varind = NULL, numElite = 40, max.iter = 500, initial.temp = 1000, cooling.rate = 0.95, lambda = 10 )
hySAINT( X, y, heredity = "Strong", r1, r2, sigma, interaction.ind = NULL, varind = NULL, numElite = 40, max.iter = 500, initial.temp = 1000, cooling.rate = 0.95, lambda = 10 )
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
numElite |
Number of elite parents. Default is 40. |
max.iter |
Maximum number of iterations. Default is 500. |
initial.temp |
Initial temperature. Default is 1000. |
cooling.rate |
A numeric value represents the speed at which the temperature decreases. Default is 0.95. |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
An object with S3 class "hySAINT"
.
Final.variable.names |
Name of the selected effects. |
Final.variable.idx |
Index of the selected effects. |
Final.model.score |
Final Model ABC. |
All.iter.score |
Best ABC scores from initial parents and all iterations. |
ABC
, EVA
, Initial
,
Crossover
, Mutation
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl hySAINT(X, y, r1 = 5, r2 = 2, sigma = 0.01, interaction.ind = interaction.ind, max.iter = 5)
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl hySAINT(X, y, r1 = 5, r2 = 2, sigma = 0.01, interaction.ind = interaction.ind, max.iter = 5)
This function gives initial parents.
Initial(X, y, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
Initial(X, y, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
EVAoutput |
The output from function |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
numElite |
Number of elite parents. Default is 40. |
Initial parents. A numeric matrix with dimensions numElite
by r1+r2
.
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind) myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind) myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
This function gives mutant from parents.
Mutation( myParent, EVAoutput, r1, r2, initial.temp = 1000, cooling.rate = 0.95, X, y, heredity = "Strong", sigma, varind = NULL, interaction.ind = NULL, lambda = 10 )
Mutation( myParent, EVAoutput, r1, r2, initial.temp = 1000, cooling.rate = 0.95, X, y, heredity = "Strong", sigma, varind = NULL, interaction.ind = NULL, lambda = 10 )
myParent |
A numeric matrix with dimension |
EVAoutput |
The output from function |
r1 |
At most how many main effects do you want to include in your model?.
For high-dimensional data, |
r2 |
At most how many interaction effects do you want to include in your model? |
initial.temp |
Initial temperature. Default is 1000. |
cooling.rate |
A numeric value represents the speed at which the temperature decreases. Default is 0.95. |
X |
Input data. An optional data frame, or numeric matrix of dimension
|
y |
Response variable. A |
heredity |
Whether to enforce Strong, Weak, or No heredity. Default is "Strong". |
sigma |
The standard deviation of the noise term. In practice, sigma is usually
unknown. Users can estimate sigma from function |
varind |
A numeric vector that specifies the indices of variables to be extracted from |
interaction.ind |
A two-column numeric matrix. Each row represents a unique
interaction pair, with the columns indicating the index numbers of the variables
involved in each interaction. Note that interaction.ind must be generated
outside of this function using |
lambda |
A numeric value defined by users. The number needs to satisfy the condition:
|
Mutant. A numeric matrix with dimensions numElite
by r1+r2
.
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind) myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2) Mutation(myParent, EVAoutput, r1 = 5, r2 = 2, X = X, y = y, sigma = 0.1, interaction.ind = interaction.ind)
set.seed(0) interaction.ind <- t(combn(10,2)) X <- matrix(rnorm(100*10,1,0.1), 100, 10) epl <- rnorm(100,0,0.01) y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind) myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2) Mutation(myParent, EVAoutput, r1 = 5, r2 = 2, X = X, y = y, sigma = 0.1, interaction.ind = interaction.ind)