Package 'hySAINT' reference manual

Title:	Hybrid Genetic and Simulated Annealing Algorithm for High Dimensional Linear Models with Interaction Effects
Description:	We provide a stage-wise selection method using genetic algorithms, designed to efficiently identify main and two-way interactions within high-dimensional linear regression models. Additionally, it implements simulated annealing algorithm during the mutation process. The relevant paper can be found at: Ye, C.,and Yang,Y. (2019) <doi:10.1109/TIT.2019.2913417>.
Authors:	Leiyue Li [aut, cre], Chenglong Ye [aut]
Maintainer:	Leiyue Li <[email protected]>
License:	GPL-2
Version:	1.2.1
Built:	2025-03-19 04:47:17 UTC
Source:	https://github.com/cran/hySAINT

ABC Evaluation

Description

Gives ABC score for each fitted model. For a model I, the ABC is defined as

$ABC(I)=\sum\limits_{i=1}^n\bigg(Y_i-\hat{Y}_i^{I}\bigg)^2+2r_I\sigma^2+\lambda\sigma^2C_I.$

When comparing ABC of fitted models to the same dataset, the smaller the ABC, the better fit.

Usage

ABC(
  X,
  y,
  heredity = "Strong",
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)
ABC(
  X,
  y,
  heredity = "Strong",
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)

Arguments

`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects.
`y`	Response variable. A `n`-dimensional vector.
`heredity`	Whether to enforce Strong, Weak, or No heredity. Default is "Strong".
`sigma`	The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function `selectiveInference::estimateSigma`, then use the output as the sigma value. See examples for details.
`varind`	A numeric vector that specifies the indices of variables to be extracted from `X`. Default is "No".
`interaction.ind`	A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using `t(utils::combn(p,2))`. See Example section for details.
`lambda`	A numeric value defined by users. The number needs to satisfy the condition: $\lambda\geq 5.1/log(2)$ . Default is 10.

Value

A numeric value is returned. It represents the ABC score of the fitted model.

References

Ye, C. and Yang, Y., 2019. High-dimensional adaptive minimax sparse estimation with interactions.

Examples

# When sigma is known
set.seed(0)
interaction.ind <- t(combn(4,2))
X <- matrix(rnorm(50*4,1,0.1), 50, 4)
epl <- rnorm(50,0,0.01)
y <- 1+X[,1]+X[,2]+X[,1]*X[,2] + epl
 ABC(X, y, sigma = 0.01, varind = c(1,2,5), interaction.ind = interaction.ind)

# When sigma is not known
full <- Extract(X, varind = c(1:(dim(X)[2]+dim(interaction.ind)[1])), interaction.ind)
sigma <- selectiveInference::estimateSigma(full, y)$sigmahat # Estimate sigma
# When sigma is known
set.seed(0)
interaction.ind <- t(combn(4,2))
X <- matrix(rnorm(50*4,1,0.1), 50, 4)
epl <- rnorm(50,0,0.01)
y <- 1+X[,1]+X[,2]+X[,1]*X[,2] + epl
 ABC(X, y, sigma = 0.01, varind = c(1,2,5), interaction.ind = interaction.ind)

# When sigma is not known
full <- Extract(X, varind = c(1:(dim(X)[2]+dim(interaction.ind)[1])), interaction.ind)
sigma <- selectiveInference::estimateSigma(full, y)$sigmahat # Estimate sigma

Performing crossover

Description

This function gives offspring from parents. It performs crossover at a fixed probability of 0.6.

Usage

Crossover(X, myParent, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
Crossover(X, myParent, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)

Arguments

`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects.
`myParent`	A numeric matrix with dimension `numElite` by `r1 + r2`.
`EVAoutput`	The output from function `EVA`.
`heredity`	Whether to enforce Strong, Weak, or No heredity. Default is "Strong".
`r1`	At most how many main effects do you want to include in your model?. For high-dimensional data, `r1` cannot be larger than the number of screened main effects.
`r2`	At most how many interaction effects do you want to include in your model?
`numElite`	Number of elite parents. Default is 40.

Value

Offspring. If crossover occurred, it returns a numeric matrix with dimensions choose(numElite,2) by r1+r2. Otherwise, numElite by r1 + r2.

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Offsprings <- Crossover(X, myParent, EVAoutput, r1 = 5, r2 = 2)
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Offsprings <- Crossover(X, myParent, EVAoutput, r1 = 5, r2 = 2)

Evaluating main and interaction effects

Description

This function ranks each main and interaction effect. It also calculate the ABC score for each potential interactions across different heredity structures. If heredity = "No" and the the number of potential interactions exceed choose(1000,2), distance correlation between each variable in X and y will be calculated so that it reduces the running time. This ensures a more efficient evaluation process.

Usage

EVA(
  X,
  y,
  heredity = "Strong",
  r1,
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)
EVA(
  X,
  y,
  heredity = "Strong",
  r1,
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)

Arguments

`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects.
`y`	Response variable. A `n`-dimensional vector.
`heredity`	Whether to enforce Strong, Weak, or No heredity. Default is "Strong".
`r1`	At most how many main effects do you want to include in your model?. For high-dimensional data, `r1` cannot be larger than the number of screened main effects.
`sigma`	The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function `selectiveInference::estimateSigma`, then use the output as the sigma value.
`varind`	A numeric vector that specifies the indices of variables to be extracted from `X`. Default is "No".
`interaction.ind`	A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using `t(utils::combn(p,2))`. See Example section for details.
`lambda`	A numeric value defined by users. The number needs to satisfy the condition: $\lambda\geq 5.1/log(2)$ . Default is 10.

Value

A list of output. The components are: ranked main effect, ranked.mainpool; and a 4-column matrix contains potential interactions ranked by ABC score, ranked.intermat.

Examples

# Strong heredity
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind)
# Strong heredity
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind)

Extracting columns and generating required interaction effects from data

Description

This function simplifies the data preparation process by enabling users to extract specific columns from their dataset X, and automatically generating any necessary interaction effects based on varind.

Usage

Extract(X, varind, interaction.ind = NULL)
Extract(X, varind, interaction.ind = NULL)

Arguments

`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects. Note that the interaction effects should not be included in `X` because this function automatically generates the corresponding interaction effects if needed.
`varind`	A numeric vector that specifies the indices of variables to be extracted from `X`. Duplicated values are not allowed. See Example for details.
`interaction.ind`	A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that `interaction.ind` must be generated outside of this function using `t(utils::combn(p,2))`. See Example section for details.

Value

A numeric matrix is returned.

Examples

# Generate interaction.ind
interaction.ind <- t(combn(4,2))

# Generate data
set.seed(0)
X <- matrix(rnorm(20), ncol = 4)
y <- X[, 2] + rnorm(5)

# Extract X1 and X1X2 from X1, ..., X4
Extract(X, varind = c(1,5), interaction.ind)

# Extract X5 from X1, ..., X4
Extract(X, varind = 5, interaction.ind)

# Extract using duplicated values
try(Extract(X, varind = c(1,1), interaction.ind)) # this will not run

# Generate interaction.ind
interaction.ind <- t(combn(4,2))

# Generate data
set.seed(0)
X <- matrix(rnorm(20), ncol = 4)
y <- X[, 2] + rnorm(5)

# Extract X1 and X1X2 from X1, ..., X4
Extract(X, varind = c(1,5), interaction.ind)

# Extract X5 from X1, ..., X4
Extract(X, varind = 5, interaction.ind)

# Extract using duplicated values
try(Extract(X, varind = c(1,1), interaction.ind)) # this will not run

Hybrid Genetic and Simulated Annealing Algorithm

Description

This is the main function of package hySAINT. It implements both genetic algorithm and simulated annealing. The simulated annealing technique is used within mutation operator.

Usage

hySAINT(
  X,
  y,
  heredity = "Strong",
  r1,
  r2,
  sigma,
  interaction.ind = NULL,
  varind = NULL,
  numElite = 40,
  max.iter = 500,
  initial.temp = 1000,
  cooling.rate = 0.95,
  lambda = 10
)
hySAINT(
  X,
  y,
  heredity = "Strong",
  r1,
  r2,
  sigma,
  interaction.ind = NULL,
  varind = NULL,
  numElite = 40,
  max.iter = 500,
  initial.temp = 1000,
  cooling.rate = 0.95,
  lambda = 10
)

Arguments

`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects.
`y`	Response variable. A `n`-dimensional vector.
`heredity`	Whether to enforce Strong, Weak, or No heredity. Default is "Strong".
`r1`	At most how many main effects do you want to include in your model?. For high-dimensional data, `r1` cannot be larger than the number of screened main effects.
`r2`	At most how many interaction effects do you want to include in your model?
`sigma`	The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function `selectiveInference::estimateSigma`, then use the output as the sigma value.
`interaction.ind`	A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using `t(utils::combn(p,2))`. See Example section for details.
`varind`	A numeric vector that specifies the indices of variables to be extracted from `X`.
`numElite`	Number of elite parents. Default is 40.
`max.iter`	Maximum number of iterations. Default is 500.
`initial.temp`	Initial temperature. Default is 1000.
`cooling.rate`	A numeric value represents the speed at which the temperature decreases. Default is 0.95.
`lambda`	A numeric value defined by users. The number needs to satisfy the condition: $\lambda\geq 5.1/log(2)$ . Default is 10.

Value

An object with S3 class "hySAINT".

`Final.variable.names`	Name of the selected effects.
`Final.variable.idx`	Index of the selected effects.
`Final.model.score`	Final Model ABC.
`All.iter.score`	Best ABC scores from initial parents and all iterations.

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
hySAINT(X, y, r1 = 5, r2 = 2, sigma = 0.01, interaction.ind = interaction.ind, max.iter = 5)
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
hySAINT(X, y, r1 = 5, r2 = 2, sigma = 0.01, interaction.ind = interaction.ind, max.iter = 5)

Creating initial parents

Description

This function gives initial parents.

Usage

Initial(X, y, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)
Initial(X, y, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)

Arguments

`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects.
`y`	Response variable. A `n`-dimensional vector.
`EVAoutput`	The output from function `EVA`
`heredity`	Whether to enforce Strong, Weak, or No heredity. Default is "Strong".
`r1`	At most how many main effects do you want to include in your model?. For high-dimensional data, `r1` cannot be larger than the number of screened main effects.
`r2`	At most how many interaction effects do you want to include in your model?
`numElite`	Number of elite parents. Default is 40.

Value

Initial parents. A numeric matrix with dimensions numElite by r1+r2.

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)

Performing mutation

Description

This function gives mutant from parents.

Usage

Mutation(
  myParent,
  EVAoutput,
  r1,
  r2,
  initial.temp = 1000,
  cooling.rate = 0.95,
  X,
  y,
  heredity = "Strong",
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)
Mutation(
  myParent,
  EVAoutput,
  r1,
  r2,
  initial.temp = 1000,
  cooling.rate = 0.95,
  X,
  y,
  heredity = "Strong",
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)

Arguments

`myParent`	A numeric matrix with dimension `numElite` by `r1 + r2`.
`EVAoutput`	The output from function `EVA`.
`r1`	At most how many main effects do you want to include in your model?. For high-dimensional data, `r1` cannot be larger than the number of screened main effects.
`r2`	At most how many interaction effects do you want to include in your model?
`initial.temp`	Initial temperature. Default is 1000.
`cooling.rate`	A numeric value represents the speed at which the temperature decreases. Default is 0.95.
`X`	Input data. An optional data frame, or numeric matrix of dimension `n` observations by `p` main effects.
`y`	Response variable. A `n`-dimensional vector.
`heredity`	Whether to enforce Strong, Weak, or No heredity. Default is "Strong".
`sigma`	The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function `selectiveInference::estimateSigma`, then use the output as the sigma value.
`varind`	A numeric vector that specifies the indices of variables to be extracted from `X`.
`interaction.ind`	A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using `t(utils::combn(p,2))`.
`lambda`	A numeric value defined by users. The number needs to satisfy the condition: $\lambda\geq 5.1/log(2)$ . Default is 10.

Value

Mutant. A numeric matrix with dimensions numElite by r1+r2.

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Mutation(myParent, EVAoutput, r1 = 5, r2 = 2, X = X, y = y,
  sigma = 0.1, interaction.ind = interaction.ind)
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Mutation(myParent, EVAoutput, r1 = 5, r2 = 2, X = X, y = y,
  sigma = 0.1, interaction.ind = interaction.ind)

Package 'hySAINT'

Help Index

ABC Evaluation

Description

Usage

Arguments

Value

References

Examples

Performing crossover

Description

Usage

Arguments

Value

See Also

Examples

Evaluating main and interaction effects

Description

Usage

Arguments

Value

See Also

Examples

Extracting columns and generating required interaction effects from data

Description

Usage

Arguments

Value

Examples

Hybrid Genetic and Simulated Annealing Algorithm

Description

Usage

Arguments

Value

See Also

Examples

Creating initial parents

Description

Usage

Arguments

Value

See Also

Examples

Performing mutation

Description

Usage

Arguments

Value

See Also

Examples