Package 'hySAINT'

Title: Hybrid Genetic and Simulated Annealing Algorithm for High Dimensional Linear Models with Interaction Effects
Description: We provide a stage-wise selection method using genetic algorithms, designed to efficiently identify main and two-way interactions within high-dimensional linear regression models. Additionally, it implements simulated annealing algorithm during the mutation process. The relevant paper can be found at: Ye, C.,and Yang,Y. (2019) <doi:10.1109/TIT.2019.2913417>.
Authors: Leiyue Li [aut, cre], Chenglong Ye [aut]
Maintainer: Leiyue Li <[email protected]>
License: GPL-2
Version: 1.2.1
Built: 2025-02-17 05:00:35 UTC
Source: https://github.com/cran/hySAINT

Help Index


ABC Evaluation

Description

Gives ABC score for each fitted model. For a model I, the ABC is defined as

ABC(I)=i=1n(YiY^iI)2+2rIσ2+λσ2CI.ABC(I)=\sum\limits_{i=1}^n\bigg(Y_i-\hat{Y}_i^{I}\bigg)^2+2r_I\sigma^2+\lambda\sigma^2C_I.

When comparing ABC of fitted models to the same dataset, the smaller the ABC, the better fit.

Usage

ABC(
  X,
  y,
  heredity = "Strong",
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)

Arguments

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects.

y

Response variable. A n-dimensional vector.

heredity

Whether to enforce Strong, Weak, or No heredity. Default is "Strong".

sigma

The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function selectiveInference::estimateSigma, then use the output as the sigma value. See examples for details.

varind

A numeric vector that specifies the indices of variables to be extracted from X. Default is "No".

interaction.ind

A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using t(utils::combn(p,2)). See Example section for details.

lambda

A numeric value defined by users. The number needs to satisfy the condition: λ5.1/log(2)\lambda\geq 5.1/log(2). Default is 10.

Value

A numeric value is returned. It represents the ABC score of the fitted model.

References

Ye, C. and Yang, Y., 2019. High-dimensional adaptive minimax sparse estimation with interactions.

Examples

# When sigma is known
set.seed(0)
interaction.ind <- t(combn(4,2))
X <- matrix(rnorm(50*4,1,0.1), 50, 4)
epl <- rnorm(50,0,0.01)
y <- 1+X[,1]+X[,2]+X[,1]*X[,2] + epl
 ABC(X, y, sigma = 0.01, varind = c(1,2,5), interaction.ind = interaction.ind)

# When sigma is not known
full <- Extract(X, varind = c(1:(dim(X)[2]+dim(interaction.ind)[1])), interaction.ind)
sigma <- selectiveInference::estimateSigma(full, y)$sigmahat # Estimate sigma

Performing crossover

Description

This function gives offspring from parents. It performs crossover at a fixed probability of 0.6.

Usage

Crossover(X, myParent, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)

Arguments

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects.

myParent

A numeric matrix with dimension numElite by r1 + r2.

EVAoutput

The output from function EVA.

heredity

Whether to enforce Strong, Weak, or No heredity. Default is "Strong".

r1

At most how many main effects do you want to include in your model?. For high-dimensional data, r1 cannot be larger than the number of screened main effects.

r2

At most how many interaction effects do you want to include in your model?

numElite

Number of elite parents. Default is 40.

Value

Offspring. If crossover occurred, it returns a numeric matrix with dimensions choose(numElite,2) by r1+r2. Otherwise, numElite by r1 + r2.

See Also

EVA, Initial.

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Offsprings <- Crossover(X, myParent, EVAoutput, r1 = 5, r2 = 2)

Evaluating main and interaction effects

Description

This function ranks each main and interaction effect. It also calculate the ABC score for each potential interactions across different heredity structures. If heredity = "No" and the the number of potential interactions exceed choose(1000,2), distance correlation between each variable in X and y will be calculated so that it reduces the running time. This ensures a more efficient evaluation process.

Usage

EVA(
  X,
  y,
  heredity = "Strong",
  r1,
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)

Arguments

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects.

y

Response variable. A n-dimensional vector.

heredity

Whether to enforce Strong, Weak, or No heredity. Default is "Strong".

r1

At most how many main effects do you want to include in your model?. For high-dimensional data, r1 cannot be larger than the number of screened main effects.

sigma

The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function selectiveInference::estimateSigma, then use the output as the sigma value.

varind

A numeric vector that specifies the indices of variables to be extracted from X. Default is "No".

interaction.ind

A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using t(utils::combn(p,2)). See Example section for details.

lambda

A numeric value defined by users. The number needs to satisfy the condition: λ5.1/log(2)\lambda\geq 5.1/log(2). Default is 10.

Value

A list of output. The components are: ranked main effect, ranked.mainpool; and a 4-column matrix contains potential interactions ranked by ABC score, ranked.intermat.

See Also

ABC, Extract

Examples

# Strong heredity
set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01, interaction.ind = interaction.ind)

Extracting columns and generating required interaction effects from data

Description

This function simplifies the data preparation process by enabling users to extract specific columns from their dataset X, and automatically generating any necessary interaction effects based on varind.

Usage

Extract(X, varind, interaction.ind = NULL)

Arguments

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects. Note that the interaction effects should not be included in X because this function automatically generates the corresponding interaction effects if needed.

varind

A numeric vector that specifies the indices of variables to be extracted from X. Duplicated values are not allowed. See Example for details.

interaction.ind

A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using t(utils::combn(p,2)). See Example section for details.

Value

A numeric matrix is returned.

Examples

# Generate interaction.ind
interaction.ind <- t(combn(4,2))

# Generate data
set.seed(0)
X <- matrix(rnorm(20), ncol = 4)
y <- X[, 2] + rnorm(5)

# Extract X1 and X1X2 from X1, ..., X4
Extract(X, varind = c(1,5), interaction.ind)

# Extract X5 from X1, ..., X4
Extract(X, varind = 5, interaction.ind)

# Extract using duplicated values
try(Extract(X, varind = c(1,1), interaction.ind)) # this will not run

Hybrid Genetic and Simulated Annealing Algorithm

Description

This is the main function of package hySAINT. It implements both genetic algorithm and simulated annealing. The simulated annealing technique is used within mutation operator.

Usage

hySAINT(
  X,
  y,
  heredity = "Strong",
  r1,
  r2,
  sigma,
  interaction.ind = NULL,
  varind = NULL,
  numElite = 40,
  max.iter = 500,
  initial.temp = 1000,
  cooling.rate = 0.95,
  lambda = 10
)

Arguments

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects.

y

Response variable. A n-dimensional vector.

heredity

Whether to enforce Strong, Weak, or No heredity. Default is "Strong".

r1

At most how many main effects do you want to include in your model?. For high-dimensional data, r1 cannot be larger than the number of screened main effects.

r2

At most how many interaction effects do you want to include in your model?

sigma

The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function selectiveInference::estimateSigma, then use the output as the sigma value.

interaction.ind

A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using t(utils::combn(p,2)). See Example section for details.

varind

A numeric vector that specifies the indices of variables to be extracted from X.

numElite

Number of elite parents. Default is 40.

max.iter

Maximum number of iterations. Default is 500.

initial.temp

Initial temperature. Default is 1000.

cooling.rate

A numeric value represents the speed at which the temperature decreases. Default is 0.95.

lambda

A numeric value defined by users. The number needs to satisfy the condition: λ5.1/log(2)\lambda\geq 5.1/log(2). Default is 10.

Value

An object with S3 class "hySAINT".

Final.variable.names

Name of the selected effects.

Final.variable.idx

Index of the selected effects.

Final.model.score

Final Model ABC.

All.iter.score

Best ABC scores from initial parents and all iterations.

See Also

ABC, EVA, Initial, Crossover, Mutation

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
hySAINT(X, y, r1 = 5, r2 = 2, sigma = 0.01, interaction.ind = interaction.ind, max.iter = 5)

Creating initial parents

Description

This function gives initial parents.

Usage

Initial(X, y, EVAoutput, heredity = "Strong", r1, r2, numElite = 40)

Arguments

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects.

y

Response variable. A n-dimensional vector.

EVAoutput

The output from function EVA

heredity

Whether to enforce Strong, Weak, or No heredity. Default is "Strong".

r1

At most how many main effects do you want to include in your model?. For high-dimensional data, r1 cannot be larger than the number of screened main effects.

r2

At most how many interaction effects do you want to include in your model?

numElite

Number of elite parents. Default is 40.

Value

Initial parents. A numeric matrix with dimensions numElite by r1+r2.

See Also

EVA

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)

Performing mutation

Description

This function gives mutant from parents.

Usage

Mutation(
  myParent,
  EVAoutput,
  r1,
  r2,
  initial.temp = 1000,
  cooling.rate = 0.95,
  X,
  y,
  heredity = "Strong",
  sigma,
  varind = NULL,
  interaction.ind = NULL,
  lambda = 10
)

Arguments

myParent

A numeric matrix with dimension numElite by r1 + r2.

EVAoutput

The output from function EVA.

r1

At most how many main effects do you want to include in your model?. For high-dimensional data, r1 cannot be larger than the number of screened main effects.

r2

At most how many interaction effects do you want to include in your model?

initial.temp

Initial temperature. Default is 1000.

cooling.rate

A numeric value represents the speed at which the temperature decreases. Default is 0.95.

X

Input data. An optional data frame, or numeric matrix of dimension n observations by p main effects.

y

Response variable. A n-dimensional vector.

heredity

Whether to enforce Strong, Weak, or No heredity. Default is "Strong".

sigma

The standard deviation of the noise term. In practice, sigma is usually unknown. Users can estimate sigma from function selectiveInference::estimateSigma, then use the output as the sigma value.

varind

A numeric vector that specifies the indices of variables to be extracted from X.

interaction.ind

A two-column numeric matrix. Each row represents a unique interaction pair, with the columns indicating the index numbers of the variables involved in each interaction. Note that interaction.ind must be generated outside of this function using t(utils::combn(p,2)).

lambda

A numeric value defined by users. The number needs to satisfy the condition: λ5.1/log(2)\lambda\geq 5.1/log(2). Default is 10.

Value

Mutant. A numeric matrix with dimensions numElite by r1+r2.

See Also

EVA, Initial.

Examples

set.seed(0)
interaction.ind <- t(combn(10,2))
X <- matrix(rnorm(100*10,1,0.1), 100, 10)
epl <- rnorm(100,0,0.01)
y <- 1+X[,1]+X[,2]+X[,3]+X[,1]*X[,2]+X[,1]*X[,3]+epl
EVAoutput <- EVA(X, y, r1 = 5, sigma = 0.01,
  interaction.ind = interaction.ind)
myParent <- Initial(X = X, y = y, EVAoutput, r1 = 5, r2 = 2)
Mutation(myParent, EVAoutput, r1 = 5, r2 = 2, X = X, y = y,
  sigma = 0.1, interaction.ind = interaction.ind)