All the source codes of the aggregation methods are available here
.
To run the codes, you can clone
the repository directly
or simply load the R script
source file from the
repository using devtools
package in Rstudio
as follow:
Install devtools package using command:
install.packages("devtools")
Loading the source codes from GitHub
repository using source_url
function by:
devtools::source_url("https://raw.githubusercontent.com/hassothea/AggregationMethods/main/MixCOBRARegressor.R")
✎ Note: All codes contained in this
Rmarkdown
are built with recent version of (version \(>\) 4.1, available here) and Rstudio (version >2022.02.2+485
, available here). Note also that the code chucks are hidden by default.
To see the codes, you can:
Code
button of the page,
then choose Show All Code to show all the codes,
orCode
button at each section
to show the codes of that specific section.This Rmarkdown
provides the implementation of an
aggregation method using input and output trade-off by Fischer
and Mougeot (2019). Let \(\mathcal{D}_n=\{(x_1,y_1),...,(x_n,y_n)\}\)
be a training data of size \(n\), where
the input-output couples \((x_i,y_i)\in\mathbb{R}^d\times\mathbb{R}\)
for all \(i=1,...,n\). \(\mathcal{D}_{n}\) is first randomly
partitioned into \(\mathcal{D}_{k}\)
and \(\mathcal{D}_{\ell}\) of size
\(k\) and \(\ell\) respectively such that \(k+\ell=n\). We construct \(M\) regression estimators (machines) \(r_1,...,r_M\) using only \(\mathcal{D}_{k}\). Let \({\bf
r}(x)=(r_1(x),...,r_M(x))^T\in\mathbb{R}^M\) be the vector of
predictions of \(x\in\mathbb{R}^d\),
the aggregation method evaluated at point \(x\) is defined by
\[\begin{equation} g_n(x)=\frac{\sum_{i=1}^{\ell}y_iK_{\alpha,\beta}(x-x_i,{\bf r}(x)-{\bf r}(x_i))}{\sum_{j=1}^{\ell}K_{\alpha,\beta}(x-x_j,{\bf r}(x)-{\bf r}(x_j))} \end{equation}\] where \(K:\mathbb{R}^{d+M}\to\mathbb{R}_+\) is a non-increasing kernel function with \(K_{\alpha,\beta}(u,v)=K(\frac{u}{\alpha},\frac{v}{\beta})\) for some smoothing parameter \(\alpha,\beta>0\) to be tuned, with the convention \(0/0=0\).
We prepare all the necessary tools for this Rmarkdown
.
pacman
package allows us to load (if exists) or install (if
does not exist) any available packages from The Comprehensive R Archive Network
(CRAN) of .
# Check if package "pacman" is already installed
lookup_packages <- installed.packages()[,1]
if(!("pacman" %in% lookup_packages)){
install.packages("pacman")
}
# To be installed or loaded
pacman::p_load(magrittr)
pacman::p_load(ggplot2)
pacman::p_load(tidyverse)
## package for "generateMachines"
pacman::p_load(tree)
pacman::p_load(glmnet)
pacman::p_load(randomForest)
pacman::p_load(FNN)
pacman::p_load(xgboost)
pacman::p_load(keras)
pacman::p_load(pracma)
pacman::p_load(latex2exp)
pacman::p_load(plotly)
pacman::p_load(dash)
rm(lookup_packages)
This section provides functions to generate basic machines (regressors) to be aggregated.
setBasicParameter_Mix
This function allows us to set the values of some key parameters of the basic machines.
Argument:
lambda
: the penalty parameter \(\lambda\) used in penalized linear models:
ridge
or lasso
.k
: the parameter \(k\) of \(k\)NN (knn
) regression model
and the default value is \(k=10\).ntree
: the number of trees in random forest
(rf
). By default, ntree = 300
.mtry
: the number of random features chosen in each
split of random forest procedure. By default, mtry = NULL
and the default value of mtry
of randomForest
function from randomForest
library is used.eta_xgb
: the learning rate \(\eta>0\) in gradient step of extreme
gradient boosting method (xgb
) of xgboost
library.nrounds_xgb
: the parameter nrounds
indicating the max number of boosting iterations. By default,
nrounds_xgb = 100
.early_stop_xgb
: the early stopping round criterion of
xgboost
function. By, default,
early_stop_xgb = NULL
and the early stopping function is
not triggered.max_depth_xgb
: maximum depth of trees constructed in
xgboost
.Value:
This function returns a list of all the parameters given in
its arguments, to be fed to the basicMachineParam
argument
of function generateMachines_Mix
defined in the next
section.
🧾 Remark.1:
lambda
,k
,ntree
can be a single value or a vector. In other words, each type of models can be constructed several times according to the values of the parameters \((\alpha, \beta)\) of the method.
setBasicParameter_Mix <- function(lambda = NULL,
k = 5,
ntree = 300,
mtry = NULL,
eta_xgb = 1,
nrounds_xgb = 100,
early_stop_xgb = NULL,
max_depth_xgb = 3){
return(list(
lambda = lambda,
k = k,
ntree = ntree,
mtry = mtry,
eta_xgb = eta_xgb,
nrounds_xgb = nrounds_xgb,
early_stop_xgb = early_stop_xgb,
max_depth_xgb = max_depth_xgb)
)
}
generateMachines_Mix
This function generates all the basic machines to be aggregated.
Argument:
train_input
: a matrix or data frame of the training
input data.train_response
: a vector of training response variable
corresponding to the train_input
.scale_input
: logical value specifying whether to scale
the input data (to be between \(0\) and
\(1\)) or not. By default,
scale_input = FALSE
.scale_machine
: logical value specifying whether to
scale the predictions of the remaining part \(\mathcal{D}_{\ell}\) of the training data
(to be between \(0\) and \(1\)) or not. By default,
scale_machine = FALSE
.machines
: types of basic machines to be constructed.
It is a subset of {“lasso”, “ridge”, “knn”, “tree”, “rf”, “xgb”}. By
default, machines = NULL
and all the six types of basic
machines are built.splits
: real number between \(0\) and \(1\) specifying the proportion of training
data used to train the basic machines (\(\mathcal{D}_k\)). The remaining proportion
of (\(1-\) splits
) is used
for the aggregation (\(\mathcal{D}_{\ell}\)). By default,
splits = 0.5
.basicMachineParam
: the option used to setup the values
of parameters of each machines. One should feed the function
setBasicParameter_Mix()
defined above to this
argument.silent
: a logical value to silent all the messages or information to be printed during the process of the method. By default, silent = FALSE
.Value:
This function returns a list of the following objects.
fitted_remain
: the predictions of the remaining part
(\(\mathcal{D}_{\ell}\)) of the
training data, used for the aggregation.models
: all the constructed basic machines (it
contains only the values of parameter \(k\) for knn
).id2
: a logical vector of size equals to the number of
lines of the training data indicating the location of the points, used to
build the basic machines (FALSE
) and the remaining ones
(TRUE
).train_data
: a list of the following objects:
train_input
: training input data (scale or non-scaling
accordingly).predict_remain_org
: predictions of the second part
\({\cal D}_{\ell}\) of the training
data without scaling.train_response
: the trainging response variable.min_machine
, max_machine
: vectors of
minimum and maximum predicted values of the remaining part \(\mathcal{D}_{\ell}\) of the training data.
They are NULL
if scale_machine = FALSE
.min_input
, max_input
: vectors of minimum
and maximum values of each variable of the training input. They are
NULL
if scale_input = FALSE
.✎ Note: You may need to modify the function accordingly if you want to build different types of basic machines.
generateMachines_Mix <- function(train_input,
train_response,
scale_input = FALSE,
scale_machine = FALSE,
machines = NULL,
splits = 0.5,
basicMachineParam = setBasicParameter_Mix()){
lambda = basicMachineParam$lambda
k <- basicMachineParam$k
ntree <- basicMachineParam$ntree
mtry <- basicMachineParam$mtry
eta_xgb <- basicMachineParam$eta_xgb
nrounds_xgb <- basicMachineParam$nrounds_xgb
early_stop_xgb <- basicMachineParam$early_stop_xgb
max_depth_xgb <- basicMachineParam$max_depth_xgb
# Packages
pacman::p_load(tree)
pacman::p_load(glmnet)
pacman::p_load(randomForest)
pacman::p_load(FNN)
pacman::p_load(xgboost)
# pacman::p_load(keras)
# Preparing data
input_names <- colnames(train_input)
input_size <- dim(train_input)
df_input <- train_input_scale <- train_input
maxs <- mins <- NULL
if(scale_input){
maxs <- map_dbl(.x = df_input, .f = max)
mins <- map_dbl(.x = df_input, .f = min)
train_input_scale <- scale(train_input, center = mins, scale = maxs - mins)
}
if(is.matrix(train_input_scale)){
df_input <- as_tibble(train_input_scale)
matrix_input <- train_input_scale
} else{
df_input <- train_input_scale
matrix_input <- as.matrix(train_input_scale)
}
# Machines
lasso_machine <- function(x, lambda0){
if(is.null(lambda)){
cv <- cv.glmnet(matrix_train_x1, train_y1, alpha = 1, lambda = 10^(seq(-3,2,length.out = 50)))
mod <- glmnet(matrix_train_x1, train_y1, alpha = 1, lambda = cv$lambda.min)
} else{
mod <- glmnet(matrix_train_x1, train_y1, alpha = 1, lambda = lambda0)
}
res <- predict.glmnet(mod, newx = x)
return(list(pred = res,
model = mod))
}
ridge_machine <- function(x, lambda0){
if(is.null(lambda)){
cv <- cv.glmnet(matrix_train_x1, train_y1, alpha = 0, lambda = 10^(seq(-3,2,length.out = 50)))
mod <- glmnet(matrix_train_x1, train_y1, alpha = 0, lambda = cv$lambda.min)
} else{
mod <- glmnet(matrix_train_x1, train_y1, alpha = 0, lambda = lambda0)
}
res <- predict.glmnet(mod, newx = x)
return(list(pred = res,
model = mod))
}
tree_machine <- function(x, pa = NULL) {
mod <- tree(as.formula(paste("train_y1~",
paste(input_names, sep = "", collapse = "+"),
collapse = "",
sep = "")),
data = df_train_x1)
res <- as.vector(predict(mod, x))
return(list(pred = res,
model = mod))
}
knn_machine <- function(x, k0) {
mod <- knn.reg(train = matrix_train_x1, test = x, y = train_y1, k = k0)
res = mod$pred
return(list(pred = res,
model = k0))
}
RF_machine <- function(x, ntree0) {
if(is.null(mtry)){
mod <- randomForest(x = df_train_x1, y = train_y1, ntree = ntree0)
}else{
mod <- randomForest(x = df_train_x1, y = train_y1, ntree = ntree0, mtry = mtry)
}
res <- as.vector(predict(mod, x))
return(list(pred = res,
model = mod))
}
xgb_machine = function(x, nrounds_xgb0){
mod <- xgboost(data = matrix_train_x1,
label = train_y1,
eta = eta_xgb,
nrounds = nrounds_xgb0,
objective = "reg:squarederror",
early_stopping_rounds = early_stop_xgb,
max_depth = max_depth_xgb,
verbose = 0)
res <- predict(mod, x)
return(list(pred = res,
model = mod))
}
# All machines
all_machines <- list(lasso = lasso_machine,
ridge = ridge_machine,
knn = knn_machine,
tree = tree_machine,
rf = RF_machine,
xgb = xgb_machine)
# All parameters
all_parameters <- list(lasso = lambda,
ridge = lambda,
knn = k,
tree = 1,
rf = ntree,
xgb = nrounds_xgb)
if(is.null(machines)){
mach <- c("lasso", "ridge", "knn", "tree", "rf", "xgb")
}else{
mach <- machines
}
# Extracting data
M <- length(mach)
size_D1 <- floor(splits*input_size[1])
id_D1 <- logical(input_size[1])
id_D1[sample(input_size[1], size_D1)] <- TRUE
df_train_x1 <- df_input[id_D1,]
matrix_train_x1 <- matrix_input[id_D1,]
train_y1 <- train_response[id_D1]
df_train_x2 <- df_input[!id_D1,]
matrix_train_x2 <- matrix_input[!id_D1,]
# Function to extract df and model from 'map' function
extr_df <- function(x, id){
return(tibble("r_{{id}}":= as.vector(pred_m[[x]]$pred)))
}
extr_mod <- function(x, id){
return(pred_m[[x]]$model)
}
pred_D2 <- c()
all_mod <- c()
cat("\n* Building basic machines ...\n")
cat("\t~ Progress:")
for(m in 1:M){
if(mach[m] %in% c("tree", "rf")){
x0_test <- df_train_x2
} else {
x0_test <- matrix_train_x2
}
if(is.null(all_parameters[[mach[m]]])){
para_ <- 1
}else{
para_ <- all_parameters[[mach[m]]]
}
pred_m <- map(para_,
.f = ~ all_machines[[mach[m]]](x0_test, .x))
tem0 <- imap_dfc(.x = 1:length(para_),
.f = extr_df)
tem1 <- imap(.x = 1:length(para_),
.f = extr_mod)
names(tem0) <- names(tem1) <- paste0(mach[m], 1:length(para_))
pred_D2 <- bind_cols(pred_D2, as_tibble(tem0))
all_mod[[mach[m]]] <- tem1
cat(" ... ", round(m/M, 2)*100L,"%", sep = "")
}
max_M <- min_M <- NULL
pred_D2_ <- pred_D2
if(scale_machine){
max_M <- map_dbl(.x = pred_D2, .f = max)
min_M <- map_dbl(.x = pred_D2, .f = min)
pred_D2 <- scale(pred_D2, center = min_M, scale = max_M - min_M)
}
return(list(fitted_remain = pred_D2,
models = all_mod,
id2 = !id_D1,
train_data = list(train_input = train_input_scale,
train_response = train_response,
predict_remain_org = pred_D2_,
min_machine = min_M,
max_machine = max_M,
min_input = mins,
max_input = maxs)))
}
Example.1: In this example, the method is implemented on
Boston
data ofMASS
library. The basic machines “rf”, “knn” and “xgb” are built on the first part of the training data (\(\mathcal{D}_{k}\)), and the Root Mean Square Errors (RMSE) evaluated on the second part of the training data (\(\mathcal{D}_{\ell}\)) used for aggregation) are reported.
pacman::p_load(MASS)
df <- Boston
basic_machines <- generateMachines_Mix(train_input = df[,1:13],
train_response = df[,14],
scale_input = TRUE,
machines = c("rf", "knn", "xgb"),
basicMachineParam = setBasicParameter_Mix(lambda = 1:10/10,
ntree = 10:20 * 25,
k = c(2:10)))
* Building basic machines ...
~ Progress: ... 33% ... 67% ... 100%
basic_machines$train_data$predict_remain_org %>%
sweep(1, df[basic_machines$id2, "medv"]) %>%
.^2 %>%
colMeans %>%
t %>%
sqrt %>%
as_tibble
This part provides functions to approximate the key parameters \((\alpha,\beta)\in(\mathbb{R}_+^*)^2\) of
the aggregation. Two important optimization methods are implemented:
gradient descent algorithm (grad
) and
grid search (grid
).
setGradParameter_Mix
This function allows us to set the values of parameters needed to process the gradient descent algorithm to approximate the hyperparameter of the method.
Argument:
val_init
: a 2D vector of initial values of gradient
descent iteration. By default, val_init = NULL
and the
algorithm will select the best value (with smallest cost function) among
alpha_range
and beta_range
values of
parameters.rate
: the 2D real-valued vector or a string of
learning rate in gradent descent algorithm. By default,
rate = NULL
(or “auto”) and the value
coef_auto = c(1, 1)
will be used. It can also be a
functional rate, which is a string, an element of {“logarithm”, “sqrtroot”, “linear”, “polynomial”, “exponential”}. Each rate is defined according
to coef_
type arguments bellow.alpha_range
: a range vector of \(\alpha\) values to be considered as the
initial value in gradient step. By default,
alpha_range = seq(0.0001, 10, length.out = 5)
.beta_range
: a range vector of \(\beta\) values to be considered as the
initial value in gradient step. By default,
beta_range = seq(0.0001, 50, length.out = 5)
.max_iter
: maximum itertaion of gradient descent
algorithm. By default, max_iter = 100
.print_step
: a logical value controlling whether to
print the result of each gradient step or not in order to keep track of
the algorithm. By default, print_step = TRUE
.print_result
: a logical value controlling whether to
print the result of the algorithm or not. By default,
print_result = TRUE
.figure
: a logical value controlling whether to plot a
graphic of the result or not. By default,
figure = TRUE
.coef_auto
: the constant learning rate when
rate = NULL
. By default,
coef_auto = c(1, 1)
.coef_log
: the coefficinet multiplying to the
logarithmic increment of the learning rate, i.e., rate
\(=\)
coef_log
\(\times
\log(1+t)\) where \(t\) is the
numer number of iteration. By default, coef_log = 1
.coef_sqrt
: the coefficinet multiplying to the
square root increment of the learning rate, i.e.,
rate
\(=\)
coef_sqrt
\(\times
\sqrt{t}\). By default, coef_sqrt = 1
.coef_lm
: the coefficinet multiplying to the
linear increment of the learning rate, i.e.,
rate
\(=\)
coef_lm
\(\times t\). By
default, coef_lm = 1
.deg_poly
: the degree of the polynomial
increment of the learning rate, i.e., rate
\(=t^{\texttt{deg_poly}}\). By
default, deg_poly = 2
.base_exp
: the base of the exponential
increment of the learning rate, i.e., rate
\(=\) base_exp
\(^t\). By default,
base_exp = 1.5
.axes
: names of \(x,y\) and \(z\)-axis respectively. By default,
axes = c("alpha", "beta", "L1 norm of gradient")
.title
: the title of the plot. By default,
title = NULL
and the default title is
Gradient step
.threshold
: the threshold to stop the algorithm what relative change is smaller than this value. By default,
threshold = 1e-10
.Value:
This function returns a list of all the parameters given in its arguments.
setGradParameter_Mix <- function(val_init = NULL,
rate = NULL,
alpha_range = seq(0.0001, 10, length.out = 5),
beta_range = seq(0.0001, 50, length.out = 5),
max_iter = 100,
print_step = TRUE,
print_result = TRUE,
figure = TRUE,
coef_auto = c(1, 1),
coef_log = 1,
coef_sqrt = 1,
coef_lm = 1,
deg_poly = 2,
base_exp = 1.5,
axes = c("alpha", "beta", "L1 norm of gradient"),
title = NULL,
threshold = 1e-10) {
return(
list(val_init = val_init,
rate = rate,
alpha_range = alpha_range,
beta_range = beta_range,
max_iter = max_iter,
print_step = print_step,
print_result = print_result,
figure = figure,
coef_auto = coef_auto,
coef_log = coef_log,
coef_sqrt = coef_sqrt,
coef_lm = coef_lm,
deg_poly = deg_poly,
base_exp = base_exp,
axes = axes,
title = title,
threshold = threshold
)
)
}
gradOptimizer_Mix
This function performs gradient descent algorithm to approximate the minimizer of any given functions (convex or locally convex around its optimizer).
Argument:
obj_fun
: the objective function for which its
minimizer is to be estimated. It should take a 2D vector as an
input.setParameter
: the control of gradient descent
parameters which should be the output of function
setGradParameter_Mix()
defined earlier.silent
: a logical value to silent all the messages or information to be printed during the process of the method. By default, silent = FALSE
.Value:
This function returns a list of the following objects:
opt_param
: the observed value of the minimizer.opt_error
: the value of the optimal risk.all_grad
: the matrix of all the gradients collected
during the walk of the algorithm (by row).all_param
: the matrix of all parameters collected
during the walk of the algorithm (by row).run_time
: the running time of the algorithm.gradOptimizer_Mix <- function(obj_fun,
setParameter = setGradParameter_Mix(),
silent = FALSE) {
start.time <- Sys.time()
# Optimization step:
# ==================
spec_print <- function(x, dig = 5) return(ifelse(x > 1e-6,
format(x, digit = dig, nsmall = dig),
format(x, scientific = TRUE, digit = dig, nsmall = dig)))
collect_val <- c()
gradients <- c()
if (is.null(setParameter$val_init)){
range_alp <- rep(setParameter$alpha_range, length(setParameter$beta_range))
range_bet <- rep(setParameter$beta_range, length(setParameter$alpha_range))
tem <- map2_dbl(.x = range_alp,
.y = range_bet,
.f = ~ obj_fun(c(.x, .y)))
id0 <- which.min(tem)
val <- val0 <- c(range_alp[id0], range_bet[id0])
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3))
} else{
val <- val0 <- setParameter$val_init
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3))
}
if(setParameter$print_step & !silent){
cat("\n* Gradient descent algorithm ...")
cat("\n Step\t| alpha ; beta \t| Gradient (alpha ; beta)\t| Threshold \n")
cat(" ", rep("-", 80), sep = "")
cat("\n 0 \t| ", spec_print(val0[1])," ; ", spec_print(val0[2]),
"\t| ", spec_print(grad_[1], 6), " ; ", spec_print(grad_[2], 5),
" \t| ", setParameter$threshold, "\n")
cat(" ", rep("-",80), sep = "")
}
if (is.numeric(setParameter$rate)){
lambda0 <- setParameter$rate / abs(grad_)
rate_GD <- "auto"
} else{
r0 <- setParameter$coef_auto / abs(grad_)
# Rate functions
rate_func <- list(auto = r0,
logarithm = function(i) setParameter$coef_log * log(2 + i) * r0,
sqrtroot = function(i) setParameter$coef_sqrt * sqrt(i) * r0,
linear = function(i) setParameter$coef_lm * (i) * r0,
polynomial = function(i) i ^ setParameter$deg_poly * r0,
exponential = function(i) setParameter$base_exp ^ i * r0)
rate_GD <- match.arg(setParameter$rate,
c("auto",
"logarithm",
"sqrtroot",
"linear",
"polynomial",
"exponential"))
lambda0 <- rate_func[[rate_GD]]
}
i <- 0
grad0 <- 10*grad_
if (is.numeric(setParameter$rate) | rate_GD == "auto") {
while (i < setParameter$max_iter) {
if(any(is.na(grad_))){
val0 <- c(runif(1, val0[1]*0.99, val0[1]*1.01),
runif(1, val0[2]*0.99, val0[2]*1.01))
grad_ = pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
}
val <- val0 - lambda0 * grad_
if (any(val < 0)){
val[val < 0] <- val0[val < 0]/2
lambda0[val < 0] <- lambda0[val < 0] / 2
}
if(i > 5){
sign_ <- sign(grad_) != sign(grad0)
if(any(sign_)){
lambda0[sign_] = lambda0[sign_]/2
}
}
relative <- sum(abs(val - val0)) / sum(abs(val0))
test_threshold <- max(relative, sum(abs(grad_ - grad0)))
if (test_threshold > setParameter$threshold){
val0 <- val
grad0 <- grad_
} else{
break
}
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
i <- i + 1
if(setParameter$print_step & !silent){
cat("\n ", i, "\t| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t| ", spec_print(grad_[1], 5), " ; ", spec_print(grad_[2], 5),
"\t| ", test_threshold, "\r")
}
collect_val <- rbind(collect_val, val)
gradients <- rbind(gradients, grad_)
}
}
else{
while (i < setParameter$max_iter) {
if(any(is.na(grad_))){
val0 <- c(runif(1, val0[1]*0.99, val0[1]*1.01),
runif(1, val0[2]*0.99, val0[2]*1.01))
grad_ = pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
}
val <- val0 - lambda0(i) * grad_
if (any(val < 0)){
val[val < 0] <- val0[val < 0]/2
r0[val < 0] <- r0[val < 0] / 2
}
if(i > 5){
sign_ <- sign(grad_) != sign(grad0)
if(any(sign_)){
r0[sign_] <- r0[sign_] / 2
}
}
relative <- sum(abs(val - val0)) / sum(abs(val0))
test_threshold <- max(relative, sum(abs(grad_ - grad0)))
if (test_threshold > setParameter$threshold){
val0 <- val
grad0 <- grad_
}else{
break
}
grad_ <- pracma::grad(
f = obj_fun,
x0 = val0,
heps = .Machine$double.eps ^ (1 / 3)
)
if(setParameter$print_step & !silent){
cat("\n ", i, "\t| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t| ", spec_print(grad_[1], 5), " ; ", spec_print(grad_[2], 5),
"\t| ", test_threshold, "\r")
}
i <- i + 1
collect_val <- rbind(collect_val, val)
gradients <- rbind(gradients, grad_)
}
}
opt_ep <- val
opt_risk <- obj_fun(opt_ep)
if(setParameter$print_step & !silent){
cat(rep("-", 80), sep = "")
if(sum(abs(grad_)) == 0){
cat("\n Stopped| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t|\t ", 0,
"\t\t| ", test_threshold)
}else{
cat("\n Stopped| ", spec_print(val[1], 4), " ; ", spec_print(val[2], 4),
"\t| ", spec_print(grad_[1]), " ; ", spec_print(grad_[2]),
"\t| ", test_threshold)
}
}
if(setParameter$print_result & !silent){
cat("\n ~ Observed parameter: (alpha, beta) = (", opt_ep[1], ", ", opt_ep[2], ") in",i, "itertaions.")
}
if (setParameter$figure) {
if(is.null(setParameter$title)){
tit <- paste("<b> L1 norm of gradient as a function of</b> (",
setParameter$axes[1],",",
setParameter$axes[2],
")")
} else{
tit <- setParameter$title
}
siz = length(collect_val[,1])
fig <- tibble(x = collect_val[,1],
y = collect_val[,2],
z = apply(abs(gradients), 1, sum)) %>%
plot_ly(x = ~x, y = ~y) %>%
add_trace(z = ~z,
type = "scatter3d",
mode = "lines",
line = list(width = 6,
color = ~z,
colorscale = 'Viridis'),
name = "Gradient step") %>%
add_trace(x = c(opt_ep[1], opt_ep[1]),
y = c(0, opt_ep[2]),
z = ~c(z[siz], z[siz]),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#5E88FC",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#5E88FC", "#38DE25")),
name = paste("Optimal",setParameter$axes[1])) %>%
add_trace(x = c(0, opt_ep[1]),
y = c(opt_ep[2], opt_ep[2]),
z = ~c(z[siz], z[siz]),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#F31536",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#F31536", "#38DE25")),
name = paste("Optimal",setParameter$axes[2])) %>%
add_trace(x = opt_ep[1],
y = opt_ep[2],
z = ~z[siz],
type = "scatter3d",
mode = 'markers',
marker = list(
size = 5,
color = "#38DE25"),
name = "Optimal point") %>%
layout(title = list(text = tit,
x = 0.075,
y = 0.925,
font = list(family = "Verdana",
color = "#5E88FC")),
legend = list(x = 100, y = 0.5),
scene = list(
xaxis = list(title = setParameter$axes[1]),
yaxis = list(title = setParameter$axes[2]),
zaxis = list( title = setParameter$axes[3])))
fig %>% print
}
end.time = Sys.time()
return(list(
opt_param = opt_ep,
opt_error = opt_risk,
all_grad = gradients,
all_param = collect_val,
run_time = difftime(end.time,
start.time,
units = "secs")[[1]]
))
}
Example.2: Approximate \[(x^*,y^*)=\text{arg}\min_{x,y)\in\mathbb{R}^2}f(x,y),\] where \[f(x,y)=(x-1)^2(1+\sin^2(2.5(x-1)))+(y-1)^2(1+\sin^2(2.5(y-1)))\] Note that argument
val_init
is crucial since \(f\) is not convex.
object_func <- function(x) sum((x-1)^2*(1+sin(2.5*(x-1))^2))
p <- tibble::tibble(x = rep(seq(-4,6, length.out = 30),30),
y = rep(seq(-3,7, length.out = 30), each = 30)) %>%
mutate(z = map2_dbl(.x = x, .y = y, .f = ~ object_func(c(.x,.y)))) %>%
plot_ly(x = ~x, y = ~y, z = ~z, type = "mesh3d") %>%
add_trace(x = 1,
y = 1,
z = 0,
type = "scatter3d", mode = "markers",
name = "Optimal point") %>%
layout(title = "Cost function")
show(p)
gd <- gradOptimizer_Mix(obj_fun = object_func,
setParameter = setGradParameter_Mix(val_init = c(2.4, 3.5),
rate = "log",
coef_auto = c(0.7, 0.7),
print_step = TRUE,
figure = TRUE,
axes = c("x", "y")))
* Gradient descent algorithm ...
Step | alpha ; beta | Gradient (alpha ; beta) | Threshold
--------------------------------------------------------------------------------
0 | 2.40000 ; 3.50000 | 6.363771 ; 3.96922 | 1e-10
--------------------------------------------------------------------------------
0 | 1.9148 ; 3.0148 | 0.79847 ; 1.51397 | 92.99696
1 | 1.8183 ; 2.7215 | 1.56931 ; 11.74586 | 8.020555
2 | 1.5790 ; 1.3607 | 2.50307 ; 1.48199 | 11.00273
3 | 1.1359 ; 0.9401 | 0.33091 ; -1.2513e-01 | 11.19763
4 | 1.0707 ; 0.9796 | 0.14999 ; -4.0947e-02 | 3.779282
5 | 1.0385 ; 0.9937 | 0.078525 ; -1.2638e-02 | 0.2651093
6 | 1.0206 ; 0.9983 | 0.041395 ; -3.3626e-03 | 0.09977258
7 | 1.0106 ; 0.9996 | 0.021197 ; -7.565e-04 | 0.04640585
8 | 1.0052 ; 0.9999 | 0.010433 ; -1.421e-04 | 0.02280361
9 | 1.0025 ; 1.0000 | 0.0049263 ; -2.1917e-05 | 0.01137799
10 | 1.0011 ; 1.0000 | 0.0022329 ; -2.7076e-06 | 0.005627254
11 | 1.0005 ; 1.0000 | 0.00097291 ; -2.5805e-07 | 0.002712624
12 | 1.0002 ; 1.0000 | 0.00040805 ; -1.7849e-08 | 0.001262474
13 | 1.0001 ; 1.0000 | 0.00016495 ; -8.0023e-10 | 0.0005650944
14 | 1.0000 ; 1.0000 | 6.4339e-05 ; -1.7661e-11 | 0.000243119
15 | 1.0000 ; 1.0000 | 2.4237e-05 ; -1.2212e-14 | 0.0001006146
16 | 1.0000 ; 1.0000 | 8.8254e-06 ; 3.3307e-16 | 4.010183e-05
17 | 1.0000 ; 1.0000 | 3.1086e-06 ; -1.1102e-16 | 1.541137e-05
18 | 1.0000 ; 1.0000 | 1.0599e-06 ; -1.1102e-16 | 5.716742e-06
19 | 1.0000 ; 1.0000 | 3.5e-07 ; -1.1102e-16 | 2.048728e-06
20 | 1.0000 ; 1.0000 | 1.1199e-07 ; -1.1102e-16 | 7.09896e-07
21 | 1.0000 ; 1.0000 | 3.4741e-08 ; -1.1102e-16 | 2.380033e-07
22 | 1.0000 ; 1.0000 | 1.0452e-08 ; -1.1102e-16 | 7.72527e-08
23 | 1.0000 ; 1.0000 | 3.0504e-09 ; -1.1102e-16 | 2.428952e-08
24 | 1.0000 ; 1.0000 | 8.6399e-10 ; -1.1102e-16 | 7.401194e-09
25 | 1.0000 ; 1.0000 | 2.3754e-10 ; -1.1102e-16 | 2.18645e-09
26 | 1.0000 ; 1.0000 | 6.3406e-11 ; -1.1102e-16 | 6.264504e-10
27 | 1.0000 ; 1.0000 | 1.6436e-11 ; -1.1102e-16 | 1.741309e-10
--------------------------------------------------------------------------------
Stopped| 1.0000 ; 1.0000 | 1.6436e-11 ; -1.1102e-16 | 4.697043e-11
~ Observed parameter: (alpha, beta) = ( 1 , 1 ) in 28 itertaions.
setGridParameter_Mix
This function allows us to set the values of parameters needed to process the grid search algorithm to approximate the hyperparameter of the method.
Argument:
min_alpha
: mininum value of \(\alpha\) in the grid. By defualt,
min_alpha = 1e-5
.max_alpha
: maxinum value of \(\alpha\) in the grid. By defualt,
max_alpha = 10
.min_beta
: mininum value of \(\beta\) in the grid. By defualt,
min_beta = 1e-5
.max_beta
: maximum value of \(\beta\) in the grid. By defualt,
max_alpha = 50
.n_alpha
, n_beta = 30
: the number of \(\alpha\) and \(\beta\) respectively in the grid. By
defualt, n_alpha = n_beta = 30
.parameters
: the list of parameter \(\alpha\) and \(\beta\) in case non-uniform grid is
considered. It should be a list of two vectors containing the values of
\(\alpha\) and \(\beta\) respectively. By default,
parameters = NULL
and the default uniform grid is
used.axes
: names of \(x,y\) and \(z\)-axis respectively. By default,
axes = c("alpha", "beta", "Risk")
.title
: the title of the plot. By default,
title = NULL
and the default title is
Cross-validation risk VS
\((\alpha, \beta)\).print_result
: a logical value specifying whether to
print the observed result or not.figure
: a logical value specifying whether to plot the
graphic of cross-validation error or not.silent
: a logical value to silent all the messages or information to be printed during the process of the method. By default, silent = FALSE
.Value:
This function returns a list of all the parameters given in its arguments.
setGridParameter_Mix <- function(min_alpha = 1e-5,
max_alpha = 10,
min_beta = 0.1,
max_beta = 50,
n_alpha = 30,
n_beta = 30,
parameters = NULL,
axes = c("alpha", "beta", "Risk"),
title = NULL,
print_result = TRUE,
figure = TRUE,
silent = FALSE){
return(list(min_alpha = min_alpha,
max_alpha = max_alpha,
min_beta = min_beta,
max_beta = max_beta,
n_alpha = n_alpha,
n_beta = n_beta,
axes = axes,
title = title,
parameters = parameters,
print_result = print_result,
figure = figure))
}
gridOptimizer_Mix
This function performs grid search algorithm in approximating the values of parameters \(\alpha, \beta\) of the method.
Argument:
obj_fun
: the objective function for which its
minimizer is to be estimated. It should be a univarate function of real
positive variables.setParameter
: the control of grid search algorithm parameters which should be the function
setGridParameter_Mix()
defined above.silent
: a logical value to silent all the messages or information to be printed during the process of the method. By default, silent = FALSE
.Value:
This function returns a list of the following objects:
opt_param
: the observed value of the minimizer.opt_error
: the value of optimal risk.all_risk
: the vector of all the errors evaluated at
all the values of the considered parameters.run_time
: the running time of the algorithm.gridOptimizer_Mix <- function(obj_func,
setParameter = setGridParameter_Mix(),
silent = FALSE){
t0 <- Sys.time()
if(is.null(setParameter$parameters)){
param_list <- list(alpha = rep(seq(setParameter$min_alpha,
setParameter$max_alpha,
length.out = setParameter$n_alpha),
setParameter$n_beta),
beta = rep(seq(setParameter$min_beta,
setParameter$max_beta,
length.out = setParameter$n_beta),
each = setParameter$n_alpha))
} else{
param_list <- list(alpha = rep(setParameter$parameters[[1]],
length(setParameter$parameters[[2]])),
beta = rep(setParameter$parameters[[2]],
each = length(setParameter$parameters[[1]])))
}
risk <- map2_dbl(.x = param_list$alpha,
.y = param_list$beta,
.f = ~ obj_func(c(.x, .y)))
id_opt <- which.min(risk)
opt_ep <- c(param_list$alpha[id_opt], param_list$beta[id_opt])
opt_risk <- risk[id_opt]
if(setParameter$print_result & !silent){
cat("\n* Grid search algorithm...", "\n ~ Observed parameter: (alpha, beta) = (", opt_ep[1],
", ",
opt_ep[2], ")",
sep = "")
}
if(setParameter$figure){
if(is.null(setParameter$title)){
tit <- paste("<b> Cross-validation risk as a function of</b> (",
setParameter$axes[1],",",
setParameter$axes[2],
")")
} else{
tit <- setParameter$title
}
fig <- tibble(alpha = param_list$alpha,
beta = param_list$beta,
risk = risk) %>%
plot_ly(x = ~alpha, y = ~beta, z = ~risk, type = "mesh3d") %>%
add_trace(x = c(opt_ep[1], opt_ep[1]),
y = c(0, opt_ep[2]),
z = c(opt_risk, opt_risk),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#5E88FC",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#5E88FC", "#38DE25")),
name = paste("Optimal",setParameter$axes[1])) %>%
add_trace(x = c(0, opt_ep[1]),
y = c(opt_ep[2], opt_ep[2]),
z = c(opt_risk, opt_risk),
type = "scatter3d",
mode = 'lines+markers',
line = list(
width = 2,
color = "#F31536",
dash = TRUE),
marker = list(
size = 4,
color = ~c("#F31536", "#38DE25")),
name = paste("Optimal",setParameter$axes[2])) %>%
add_trace(x = opt_ep[1],
y = opt_ep[2],
z = opt_risk,
type = "scatter3d",
mode = 'markers',
marker = list(
size = 5,
color = "#38DE25"),
name = "Optimal point") %>%
layout(title = list(text = tit,
x = 0.075,
y = 0.925,
font = list(family = "Verdana",
color = "#5E88FC")),
legend = list(x = 100, y = 0.5),
scene = list(xaxis = list(title = setParameter$axes[1]),
yaxis = list(title = setParameter$axes[2]),
zaxis = list( title = setParameter$axes[3])))
print(fig)
}
t1 <- Sys.time()
return(list(opt_param = opt_ep,
opt_error = opt_risk,
all_risk = risk,
run_time = difftime(t1,
t0,
units = "secs")[[1]])
)
}
Example.2: Again with grid search.
grid <- gridOptimizer_Mix(obj_fun = object_func,
setParameter = setGridParameter_Mix(min_alpha = -2,
max_alpha = 4,
min_beta = -2,
max_beta = 4,
n_alpha = 150,
n_beta = 150,
axes = c("x", "y", "z"),
title = "z = f(x,y)",
figure = TRUE))
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1.020134, 1.020134)
Constructing aggregation method is equivalent to approximating the optimal value of parameter \((\alpha,\beta)\in(\mathbb{R}_+^*)^2\) introduced in section 1.1 by minimizing some lost function. In this study, we propose \(\kappa\)-fold cross validation lost function defined by
\[\begin{equation} \label{eq:kappa} \varphi^{\kappa}(\alpha, \beta)=\frac{1}{\kappa}\sum_{k=1}^{\kappa}\sum_{(x_j,y_j)\in F_k}(g_n(x_j)-y_j)^2 \end{equation}\] where
dist_matrix_Mix
This function computes different distances between data points of each training folds (\(\mathcal{D}_{\ell}-F_k\)) and the corresponding validation fold \(F_k\) for any \(k=1,\dots,\kappa\). The \(\kappa\) distance matrices (between input) \(D_k=(d[{\bf r}(x_i),{\bf r}(x_j)])_{i,j}\) for \(k=1,\dots,\kappa\), are computed, where the distance \(d\) is defined according to different types of kernel functions.
Argument:
basicMachines
: the basic machine object, which is an
output of generateMachines_Mix
function.n_cv
: the number \(\kappa\) of cross-validation folds. By
default, n_cv = 5
.kernel
: the kernel function used for the aggregation,
which is an element of {“gaussian”, “epanechnikov”, “biweight”,
“triweight”, “triangular”, “naive”}. By default,
kernel = "gaussian"
.Value:
This functions returns a list of the following objects:
dist_input
: a list of sublists corresponding to kernel functions used for the
aggregation. Each sublist contains n_cv
numbers of
matrices \(D_k=(d[{\bf r}(x_i),{\bf
r}(x_j)])_{i,j}\), for \(k=1,\dots,\kappa\), containing distances
between the data points in validation fold (along the columns) and the
\(\kappa-1\) remaining folds of
training data (along the rows). The type of distance matrices depends on
the kernel function:
kernel = naive
, the distance matrices contain the
maximum distance between data points, i.e., \[D_k=(\|{\bf r}(x_i)-{\bf
r}(x_j)\|_{\max})_{i,j}\text{ for }k=1,\dots,\kappa.\]kernel = triangular
, the distance matrices contain
the \(L_1\) distance between data
points, i.e., \[D_k=(\|{\bf r}(x_i)-{\bf r}(x_j)\|_1)_{i,j}\text{
for }k=1,\dots,\kappa.\]dist_machine
: a list of n_cv
data frames corresponding to \(\kappa\)-fold cross-validation. Each data frame contains the Hamming distances between data points in \(F_k\) (by column) and in \(\mathcal{D}_{\ell}-F_k\) (by row) for \(k=1,...,\kappa\).id_shuffle
: the shuffled indices in
cross-validation.n_cv
: the number \(\kappa\) of cross-validation folds.dist_matrix_Mix <- function(basicMachines,
n_cv = 5,
kernel = "gausian",
id_shuffle = NULL){
n <- nrow(basicMachines$fitted_remain)
n_each_fold <- floor(n/n_cv)
# shuffled indices
if(is.null(id_shuffle)){
shuffle <- 1:(n_cv-1) %>%
rep(n_each_fold) %>%
c(., rep(n_cv, n - n_each_fold * (n_cv - 1))) %>%
sample
}else{
shuffle <- id_shuffle
}
# the prediction matrix D_l
df_mach <- as.matrix(basicMachines$fitted_remain)
df_input <- as.matrix(basicMachines$train_data$train_input[basicMachines$id2,])
if(! (kernel %in% c("naive", "triangular"))){
pair_dist <- function(M, N){
n_N <- dim(N)
n_M <- dim(M)
res_ <- 1:nrow(N) %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums((M - matrix(rep(N[id,], n_M[1]), ncol = n_M[2], byrow = TRUE))^2)))))
return(res_)
}
}
if(kernel == "triangular"){
pair_dist <- function(M, N){
n_N <- dim(N)
n_M <- dim(M)
res_ <- 1:nrow(N) %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums(abs(M - matrix(rep(N[id,], n_M[1]), ncol = n_M[2], byrow = TRUE)))))))
return(res_)
}
}
if(kernel == "naive"){
pair_dist <- function(M, N){
n_N <- dim(N)
n_M <- dim(M)
res_ <- 1:nrow(N) %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(apply(abs(M - matrix(rep(N[id,], n_M[1]), ncol = n_M[2], byrow = TRUE)), 1, max)))))
return(res_)
}
}
L1 <- 1:n_cv %>%
map(.f = (\(x) pair_dist(df_input[shuffle != x,],
df_input[shuffle == x,])))
L2 <- 1:n_cv %>%
map(.f = (\(x) pair_dist(df_mach[shuffle != x,],
df_mach[shuffle == x,])))
return(list(dist_input = L1,
dist_machine = L2,
id_shuffle = shuffle,
n_cv = n_cv))
}
Example.3: The method
dist_matrix_Mix
is implemented on the obtained basic machines built in Example.1 with the corresponding Gaussian kernel function.
dis <- dist_matrix_Mix(basicMachines = basic_machines,
n_cv = 3,
kernel = "gaussian")
dis$n_cv
[1] 3
Example.4: From the distance matrix, we can compute the error corresponding to Gaussian kernel function, then use both of the optimization methods to approximate the smoothing paramter in this case.
# Gaussian kernel
gaussian_kern <- function(.ep = c(.05, 0.005),
.dist_matrix,
.train_response2,
.inv_sigma = sqrt(.5),
.alpha = 2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(exp(- (x[1]*D1+x[2]*D2)^(.alpha/2)*.inv_sigma^.alpha))
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Kappa cross-validation error
cost_fun <- function(x,
.dist_matrix = dis,
.kernel_func = gaussian_kern,
.train_response2 = basic_machines$train_data$train_response[basic_machines$id2],
.inv_sigma = sqrt(.5),
.alpha = 2){
return(.kernel_func(.ep = x,
.dist_matrix = .dist_matrix,
.train_response2 = .train_response2,
.inv_sigma = .inv_sigma,
.alpha = .alpha))
}
# Optimization
opt_param_gd <- gradOptimizer_Mix(obj_fun = cost_fun,
setParameter = setGradParameter_Mix(rate = "linear",
print_step = TRUE,
print_result = TRUE,
figure = TRUE))
* Gradient descent algorithm ...
Step | alpha ; beta | Gradient (alpha ; beta) | Threshold
--------------------------------------------------------------------------------
0 | 2.50008 ; 12.57500 | 814.991255 ; 30.47029 | 1e-10
--------------------------------------------------------------------------------
0 | 2.5001 ; 12.5750 | 814.99125 ; 30.47029 | 7609.154
1 | 2.4001 ; 12.4750 | 813.80540 ; 27.59426 | 0.01326693
2 | 2.2004 ; 12.2939 | 797.36331 ; 21.19662 | 4.061877
3 | 1.9069 ; 12.0852 | 739.40969 ; 10.92453 | 22.83973
4 | 1.5440 ; 11.9418 | 619.17286 ; -1.4531e+00 | 68.22571
5 | 1.1641 ; 11.9656 | 460.99729 ; -1.1812e+01 | 132.6145
6 | 0.8247 ; 12.1982 | 325.28348 ; -1.7511e+01 | 168.5342
7 | 0.5453 ; 12.6005 | 234.77194 ; -1.9486e+01 | 141.4133
8 | 0.3149 ; 13.1121 | 177.46993 ; -1.9536e+01 | 92.48655
9 | 0.1189 ; 13.6891 | 139.73801 ; -1.8766e+01 | 57.35153
10 | 0.05944 ; 14.3050 | 132.00474 ; -1.721e+01 | 38.5016
11 | 0.02972 ; 14.9263 | 129.75559 ; -1.5735e+01 | 9.289188
12 | 0.01486 ; 15.5460 | 130.16449 ; -1.4399e+01 | 3.723879
13 | 0.00743 ; 16.1603 | 131.94900 ; -1.3202e+01 | 1.745649
14 | 0.003715 ; 16.7669 | 134.48365 ; -1.2132e+01 | 2.981046
15 | 0.001857 ; 17.3642 | 137.44866 ; -1.1172e+01 | 3.604488
16 | 0.0009287 ; 17.9508 | 140.67331 ; -1.0307e+01 | 3.924898
17 | 0.0004644 ; 18.5259 | 144.06119 ; -9.5214e+00 | 4.090395
18 | 0.0002322 ; 19.0883 | 147.55338 ; -8.8052e+00 | 4.173159
19 | 0.0001161 ; 19.6374 | 151.11012 ; -8.1489e+00 | 4.208369
20 | 5.804e-05 ; 20.1723 | 154.70164 ; -7.5449e+00 | 4.213066
21 | 2.902e-05 ; 20.6923 | 158.30370 ; -6.9873e+00 | 4.195479
22 | 1.451e-05 ; 21.1967 | 161.89545 ; -6.471e+00 | 4.159704
23 | 7.256e-06 ; 21.6852 | 165.45847 ; -5.9921e+00 | 4.108007
24 | 3.628e-06 ; 22.1572 | 168.97637 ; -5.5472e+00 | 4.041933
25 | 1.814e-06 ; 22.6123 | 172.43458 ; -5.1334e+00 | 3.962818
26 | 9.069e-07 ; 23.0503 | 175.82033 ; -4.7483e+00 | 3.871989
27 | 4.535e-07 ; 23.4711 | 179.12258 ; -4.3899e+00 | 3.770819
28 | 2.267e-07 ; 23.8745 | 182.33196 ; -4.0561e+00 | 3.660727
29 | 1.134e-07 ; 24.2605 | 185.44067 ; -3.7453e+00 | 3.543141
30 | 5.668e-08 ; 24.6293 | 188.44245 ; -3.4561e+00 | 3.419468
31 | 2.834e-08 ; 24.9809 | 191.33243 ; -3.1869e+00 | 3.291055
32 | 1.417e-08 ; 25.3156 | 194.10700 ; -2.9365e+00 | 3.159172
33 | 7.086e-09 ; 25.6336 | 196.76372 ; -2.7036e+00 | 3.024992
34 | 3.543e-09 ; 25.9353 | 199.30119 ; -2.4872e+00 | 2.889581
35 | 1.771e-09 ; 26.2210 | 201.71894 ; -2.2861e+00 | 2.753895
36 | 8.857e-10 ; 26.4911 | 204.01730 ; -2.0995e+00 | 2.618783
37 | 4.428e-10 ; 26.7460 | 206.19729 ; -1.9264e+00 | 2.484988
38 | 2.214e-10 ; 26.9863 | 208.26057 ; -1.7658e+00 | 2.353152
39 | 1.107e-10 ; 27.2123 | 210.20928 ; -1.617e+00 | 2.223826
40 | 5.536e-11 ; 27.4246 | 212.04604 ; -1.4793e+00 | 2.097477
41 | 2.768e-11 ; 27.6236 | 213.77381 ; -1.3519e+00 | 1.974494
42 | 1.384e-11 ; 27.8099 | 215.39585 ; -1.2341e+00 | 1.855197
43 | 6.919e-12 ; 27.9841 | 216.91569 ; -1.1253e+00 | 1.739842
44 | 3.46e-12 ; 28.1466 | 218.33701 ; -1.0249e+00 | 1.628634
45 | 1.73e-12 ; 28.2980 | 219.66366 ; -9.3231e-01 | 1.521724
46 | 8.649e-13 ; 28.4387 | 220.89961 ; -8.4705e-01 | 1.419224
47 | 4.325e-13 ; 28.5694 | 222.04887 ; -7.6861e-01 | 1.321203
48 | 2.162e-13 ; 28.6904 | 223.11549 ; -6.9652e-01 | 1.227699
49 | 1.081e-13 ; 28.8024 | 224.10355 ; -6.3034e-01 | 1.138718
50 | 5.406e-14 ; 28.9059 | 225.01710 ; -5.6965e-01 | 1.054241
51 | 2.703e-14 ; 29.0012 | 225.86013 ; -5.1408e-01 | 0.9742259
52 | 1.351e-14 ; 29.0890 | 226.63662 ; -4.6325e-01 | 0.8986095
53 | 6.757e-15 ; 29.1695 | 227.35042 ; -4.1683e-01 | 0.8273103
54 | 3.379e-15 ; 29.2434 | 228.00535 ; -3.7449e-01 | 0.7602317
55 | 1.689e-15 ; 29.3110 | 228.60507 ; -3.3592e-01 | 0.6972661
56 | 8.447e-16 ; 29.3727 | 229.15318 ; -3.0085e-01 | 0.6382913
57 | 4.223e-16 ; 29.4290 | 229.65312 ; -2.69e-01 | 0.5831785
58 | 2.112e-16 ; 29.4802 | 230.10822 ; -2.4013e-01 | 0.5317869
59 | 1.056e-16 ; 29.5267 | 230.52168 ; -2.14e-01 | 0.4839742
60 | 5.279e-17 ; 29.5689 | 230.89655 ; -1.9039e-01 | 0.4395896
61 | 2.64e-17 ; 29.6070 | 231.23573 ; -1.6908e-01 | 0.3984804
62 | 1.32e-17 ; 29.6414 | 231.54202 ; -1.499e-01 | 0.3604903
63 | 6.599e-18 ; 29.6724 | 231.81801 ; -1.3266e-01 | 0.3254641
64 | 3.299e-18 ; 29.7002 | 232.06621 ; -1.1719e-01 | 0.2932405
65 | 1.65e-18 ; 29.7252 | 232.28894 ; -1.0333e-01 | 0.2636668
66 | 8.249e-19 ; 29.7476 | 232.48840 ; -9.0944e-02 | 0.236585
67 | 4.124e-19 ; 29.7676 | 232.66664 ; -7.9894e-02 | 0.2118453
68 | 2.062e-19 ; 29.7855 | 232.82559 ; -7.0053e-02 | 0.1892938
69 | 1.031e-19 ; 29.8013 | 232.96703 ; -6.1307e-02 | 0.1687891
70 | 5.155e-20 ; 29.8154 | 233.09262 ; -5.355e-02 | 0.150188
71 | 2.578e-20 ; 29.8279 | 233.20391 ; -4.6683e-02 | 0.1333514
72 | 1.289e-20 ; 29.8389 | 233.30230 ; -4.0618e-02 | 0.1181491
73 | 6.444e-21 ; 29.8486 | 233.38910 ; -3.5271e-02 | 0.1044558
74 | 3.222e-21 ; 29.8572 | 233.46552 ; -3.0567e-02 | 0.09215096
75 | 1.611e-21 ; 29.8647 | 233.53264 ; -2.6438e-02 | 0.08111911
76 | 8.055e-22 ; 29.8713 | 233.59148 ; -2.2821e-02 | 0.07125375
77 | 4.028e-22 ; 29.8771 | 233.64293 ; -1.9658e-02 | 0.06245214
78 | 2.014e-22 ; 29.8821 | 233.68783 ; -1.69e-02 | 0.05461838
79 | 1.007e-22 ; 29.8865 | 233.72693 ; -1.4499e-02 | 0.04766063
80 | 5.035e-23 ; 29.8903 | 233.76090 ; -1.2413e-02 | 0.04149966
81 | 2.517e-23 ; 29.8936 | 233.79035 ; -1.0606e-02 | 0.03605504
82 | 1.259e-23 ; 29.8965 | 233.81582 ; -9.0427e-03 | 0.03125491
83 | 6.293e-24 ; 29.8989 | 233.83780 ; -7.694e-03 | 0.027034
84 | 3.147e-24 ; 29.9010 | 233.85673 ; -6.5332e-03 | 0.02333178
85 | 1.573e-24 ; 29.9029 | 233.87300 ; -5.5354e-03 | 0.02009066
86 | 7.867e-25 ; 29.9044 | 233.88694 ; -4.6806e-03 | 0.01726348
87 | 3.933e-25 ; 29.9058 | 233.89887 ; -3.9491e-03 | 0.01479849
88 | 1.967e-25 ; 29.9069 | 233.90905 ; -3.3252e-03 | 0.01265898
89 | 9.833e-26 ; 29.9079 | 233.91772 ; -2.7938e-03 | 0.0108034
90 | 4.917e-26 ; 29.9087 | 233.92508 ; -2.3419e-03 | 0.009199476
91 | 2.458e-26 ; 29.9094 | 233.93133 ; -1.9594e-03 | 0.007817463
92 | 1.229e-26 ; 29.9100 | 233.93661 ; -1.6358e-03 | 0.00662477
93 | 6.146e-27 ; 29.9105 | 233.94106 ; -1.3628e-03 | 0.005603899
94 | 3.073e-27 ; 29.9109 | 233.94481 ; -1.1325e-03 | 0.004729318
95 | 1.536e-27 ; 29.9113 | 233.94797 ; -9.3954e-04 | 0.003982552
96 | 7.682e-28 ; 29.9116 | 233.95061 ; -7.7748e-04 | 0.003344677
97 | 3.841e-28 ; 29.9118 | 233.95282 ; -6.4208e-04 | 0.002804203
98 | 1.921e-28 ; 29.9120 | 233.95466 ; -5.2921e-04 | 0.002344683
99 | 9.603e-29 ; 29.9122 | 233.95620 ; -4.3526e-04 | 0.00195598
100 | 4.801e-29 ; 29.9123 | 233.95747 ; -3.5694e-04 | 0.001628782
101 | 2.401e-29 ; 29.9125 | 233.95853 ; -2.9228e-04 | 0.001353401
102 | 1.2e-29 ; 29.9126 | 233.95940 ; -2.3896e-04 | 0.001120525
103 | 6.002e-30 ; 29.9126 | 233.96012 ; -1.948e-04 | 0.0009266237
104 | 3.001e-30 ; 29.9127 | 233.96071 ; -1.5808e-04 | 0.000765315
105 | 1.5e-30 ; 29.9128 | 233.96120 ; -1.2857e-04 | 0.0006299149
106 | 7.502e-31 ; 29.9128 | 233.96160 ; -1.0386e-04 | 0.0005159174
107 | 3.751e-31 ; 29.9128 | 233.96193 ; -8.4484e-05 | 0.0004239985
108 | 1.876e-31 ; 29.9129 | 233.96219 ; -6.7963e-05 | 0.0003448461
109 | 9.378e-32 ; 29.9129 | 233.96241 ; -5.4671e-05 | 0.000283717
110 | 4.689e-32 ; 29.9129 | 233.96259 ; -4.3707e-05 | 0.0002302479
111 | 2.344e-32 ; 29.9129 | 233.96273 ; -3.477e-05 | 0.0001872923
112 | 1.172e-32 ; 29.9129 | 233.96284 ; -2.7936e-05 | 0.0001508702
113 | 5.861e-33 ; 29.9129 | 233.96293 ; -2.2229e-05 | 0.0001209815
114 | 2.931e-33 ; 29.9130 | 233.96301 ; -1.7948e-05 | 9.837726e-05
115 | 1.465e-33 ; 29.9130 | 233.96307 ; -1.4043e-05 | 7.802594e-05
116 | 7.326e-34 ; 29.9130 | 233.96312 ; -1.0964e-05 | 6.450845e-05
117 | 3.663e-34 ; 29.9130 | 233.96315 ; -8.9366e-06 | 5.106606e-05
118 | 1.832e-34 ; 29.9130 | 233.96319 ; -6.984e-06 | 3.9426e-05
119 | 9.158e-35 ; 29.9130 | 233.96321 ; -5.407e-06 | 3.296765e-05
120 | 4.579e-35 ; 29.9130 | 233.96323 ; -4.0552e-06 | 2.568322e-05
121 | 2.289e-35 ; 29.9130 | 233.96324 ; -3.3043e-06 | 2.050152e-05
122 | 1.145e-35 ; 29.9130 | 233.96326 ; -2.7035e-06 | 1.509453e-05
123 | 5.724e-36 ; 29.9130 | 233.96326 ; -2.1027e-06 | 1.261632e-05
124 | 2.862e-36 ; 29.9130 | 233.96327 ; -1.6521e-06 | 9.98792e-06
125 | 1.431e-36 ; 29.9130 | 233.96328 ; -1.1265e-06 | 8.110492e-06
126 | 7.155e-37 ; 29.9130 | 233.96328 ; -8.2607e-07 | 6.83384e-06
127 | 3.577e-37 ; 29.9130 | 233.96329 ; -8.2607e-07 | 4.280537e-06
128 | 1.789e-37 ; 29.9130 | 233.96329 ; -5.2568e-07 | 3.15408e-06
129 | 8.943e-38 ; 29.9130 | 233.96329 ; -5.2568e-07 | 3.379372e-06
130 | 4.472e-38 ; 29.9130 | 233.96329 ; -3.0039e-07 | 1.802332e-06
131 | 2.236e-38 ; 29.9130 | 233.96329 ; -2.2529e-07 | 2.553303e-06
132 | 1.118e-38 ; 29.9130 | 233.96329 ; -7.5097e-08 | 1.126457e-06
133 | 5.589e-39 ; 29.9130 | 233.96330 ; -7.5097e-08 | 8.260686e-07
134 | 2.795e-39 ; 29.9130 | 233.96330 ; -2.2529e-07 | 6.007772e-07
135 | 1.397e-39 ; 29.9130 | 233.96330 ; 0e+00 | 2.252914e-07
136 | 6.987e-40 ; 29.9130 | 233.96330 ; 0e+00 | 1.501943e-06
--------------------------------------------------------------------------------
Stopped| 3.493e-40 ; 29.9130 | 233.96330 ; 0e+00 | 1.167866e-41
~ Observed parameter: (alpha, beta) = ( 3.493436e-40 , 29.91299 ) in 137 itertaions.
# Optimization
opt_param_grid <- gridOptimizer_Mix(obj_fun = cost_fun,
setParameter = setGridParameter_Mix(min_alpha = 0.0001,
max_alpha = 20,
min_beta = 0.001,
max_beta = 100,
n_beta = 30,
n_alpha = 30,
figure = TRUE))
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1e-04, 31.03517)
cat('* Observed parameter:\n\t - Gradient descent\t: (alpha, beta) = (',
opt_param_gd$opt_param[1],",",opt_param_gd$opt_param[2], ")",
'\t with error:', cost_fun(opt_param_gd$opt_param),
'\n\t - Grid search\t\t: (alpha, beta) = (',opt_param_grid$opt_param[1],",",
opt_param_grid$opt_param[2],
') \t with error:', cost_fun(opt_param_grid$opt_param))
* Observed parameter:
- Gradient descent : (alpha, beta) = ( 3.493436e-40 , 29.91299 ) with error: 4657.818
- Grid search : (alpha, beta) = ( 1e-04 , 31.03517 ) with error: 4658.179
This function gathers the constructed machines and performs an optimization algorithm to approximate the smoothing parameter for the aggregation, using only the remaining part \({\cal D}_{\ell}\) of the training data.
Argument:
train_input
, : a matrix or data frame of the training
input data.train_response
: a vector of the corresponding response variable of the train_input
.machines
: a vector of basic machines to be
constructed. It must be a subset of {“lasso”, “ridge”, “knn”, “tree”,
“rf”, “xgb”}. By default, machines = NULL
and all the six
types of basic machines are built.scale_input
: a logical value specifying whether or not
to scale the input data before building the basic regression predictors.
By default, scale_input = TRUE
.scale_machine
: a logical value specifying whether or
not to scale the predicted features given by all the basic regression
predictors, for aggregation. By default,
scale_machine = TRUE
.splits
: the proportion of training data (the
proportion of \({\cal D}_k\subset {\cal D}_n\)), used to build the basic machines. By default, splits = .5
.n_cv
: the number of cross-validation folds, used to
tune the smoothing parameter.inv_sigma, alpha
: the inverse normalized constant
\(\sigma^{-1}>0\) and the exponent
\(\alpha>0\) of exponential kernel:
\(K(x)=e^{-\|x/\sigma\|^{\alpha}}\) for
any \(x\in\mathbb{R}^d\). By default,
inv_sigma =
\(\sqrt{1/2}\)
and alpha = 2
which corresponds to the Gaussian
kernel.kernels
: the kernel function or vector of kernel
functions used for the aggregation. By fault,
kernels = "gaussian"
.optimizeMethod
: the optimization methods used to learn the smoothing parameter. By default,
optimizeMethod = "grad"
which stands for gradient descent
algorithm, and if optimizeMethod = "grid"
, then the grid
search algorithm is used. Note that
this should be of the same size as the kernels
argument, otherwise “grid” method will
be used for all the kernel functions.setBasicMachineParam
: an option used to set the values
of the parameters of the basic machines.
setBasicParameter_Mix
function should be fed to this
argument.setGradParam
: an option used to set the values of the
parameters of the gradient descent algorithm.
setGradParameter_Mix
function should be fed to it.setGridParam
: an option used to set the values of the parameters of the grid search algorithm.
setGridParameter_Mix
function should be fed to it.silent
: a logical value to silent all the messages or information to be printed during the process of the method. By default, silent = FALSE
.Value:
This function returns a list of the following objects:
opt_parameters
: the observed optimal parameters.add_parameters
: other aditional parameters such as
scaling options, parameters of kernel functions and the optimization
methods used.basic_machines
: the list of basic machine object.fit_parameter_Mix <- function(train_input,
train_response,
train_predictions = NULL,
machines = NULL,
scale_input = TRUE,
scale_machine = TRUE,
splits = 0.5,
n_cv = 5,
inv_sigma = sqrt(.5),
alp = 2,
kernels = "gaussian",
optimizeMethod = "grad",
setBasicMachineParam = setBasicParameter_Mix(),
setGradParam = setGradParameter_Mix(),
setGridParam = setGridParameter_Mix(),
silent = FALSE){
kernels_lookup <- c("gaussian", "epanechnikov", "biweight", "triweight", "triangular", "naive")
kernel_real <- kernels %>%
sapply(FUN = function(x) return(match.arg(x, kernels_lookup)))
if(is.null(train_predictions)){
mach2 <- generateMachines_Mix(train_input = train_input,
train_response = train_response,
scale_input = scale_input,
scale_machine = scale_machine,
machines = machines,
splits = splits,
basicMachineParam = setBasicMachineParam,
silent = silent)
}else{
mach2 <- list(fitted_remain = train_predictions,
models = NULL,
id2 = rep(TRUE, nrow(train_input)),
train_data = list(train_input = train_input,
train_response = train_response,
predict_remain_org = train_predictions,
min_machine = NULL,
max_machine = NULL,
min_input = NULL,
max_input = NULL))
if(scale_machine){
min_ <- map_dbl(train_predictions, .f = min)
max_ <- map_dbl(train_predictions, .f = max)
mach2$train_data$min_machine = min_
mach2$train_data$max_amchine = max_
mach2$fitted_remain <- scale(train_predictions,
center = min_,
scale = max_ - min_)
}
if(scale_input){
min_ <- map_dbl(train_input, .f = min)
max_ <- map_dbl(train_input, .f = max)
mach2$train_data$min_input = min_
mach2$train_data$max_input = max_
mach2$train_data$train_input <- scale(train_input,
center = min_,
scale = max_ - min_)
}
}
# distance matrix to compute loss function
if_euclid <- FALSE
id_euclid <- NULL
n_ker <- length(kernels)
dist_all <- list()
id_shuf <- NULL
for (k_ in 1:n_ker){
ker <- kernel_real[k_]
if(ker == "naive"){
dist_all[["naive"]] <- dist_matrix_Mix(basicMachines = mach2,
n_cv = n_cv,
kernel = "naive",
id_shuffle = id_shuf)
} else{
if(ker == "triangular"){
dist_all[["triangular"]] <- dist_matrix_Mix(basicMachines = mach2,
n_cv = n_cv,
kernel = "triangular",
id_shuffle = id_shuf)
} else{
if(if_euclid){
dist_all[[ker]] <- dist_all[[id_euclid]]
} else{
dist_all[[ker]] <- dist_matrix_Mix(basicMachines = mach2,
n_cv = n_cv,
kernel = ker,
id_shuffle = id_shuf)
id_euclid <- ker
if_euclid <- TRUE
}
}
}
id_shuf <- dist_all[[1]]$id_shuffle
}
# Kernel functions
# ================
# Gaussian
gaussian_kernel <- function(.ep,
.dist_matrix,
.train_response2,
.inv_sigma = inv_sigma,
.alpha = alp){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(exp(- (x[1]*D1+x[2]*D2)^(.alpha/2)*.inv_sigma^.alpha))
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Epanechnikov
epanechnikov_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D))
tem0[tem0 < 0] = 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Biweight
biweight_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D2))
tem0[tem0 < 0] = 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0^2/colSums(tem0^2)
y_hat[is.na(y_hat)] <- 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Triweight
triweight_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D2))
tem0[tem0 < 0] = 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0^3/colSums(tem0^3)
y_hat[is.na(y_hat)] <- 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Triangular
triangular_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- as.matrix(1- (x[1]*D1+x[2]*D2))
tem0[tem0 < 0] <- 0
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# Naive
naive_kernel <- function(.ep,
.dist_matrix,
.train_response2){
kern_fun <- function(x, id, D1, D2){
tem0 <- (as.matrix((x[1]*D1+x[2]*D2)) < 1)
y_hat <- .train_response2[.dist_matrix$id_shuffle != id] %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(sum((y_hat - .train_response2[.dist_matrix$id_shuffle == id])^2))
}
temp <- map(.x = 1:.dist_matrix$n_cv,
.f = ~ kern_fun(x = .ep,
id = .x,
D1 = .dist_matrix$dist_input[[.x]],
D2 = .dist_matrix$dist_machine[[.x]]))
return(Reduce("+", temp))
}
# list of kernel functions
list_funs <- list(gaussian = gaussian_kernel,
epanechnikov = epanechnikov_kernel,
biweight = biweight_kernel,
triweight = triweight_kernel,
triangular = triangular_kernel,
naive = naive_kernel)
# error for all kernel functions
error_func <- kernel_real %>%
map(.f = ~ (\(x) list_funs[[.x]](.ep = x,
.dist_matrix = dist_all[[.x]],
.train_response2 = train_response[mach2$id2])/n_cv))
names(error_func) <- kernels
# list of prameter setup
list_param <- list(grad = setGradParam,
GD = setGradParam,
grid = setGridParam)
# list of optimizer
list_optimizer <- list(grad = gradOptimizer_Mix,
GD = gradOptimizer_Mix,
grid = gridOptimizer_Mix)
optMethods <- optimizeMethod
if(length(kernels) != length(optMethods)){
warning("* kernels and optimization methods differ in sides! Grid search will be used!")
optMethods = rep("grid", length(kernels))
}
# Optimization
parameters <- map2(.x = kernels,
.y = optMethods,
.f = ~ list_optimizer[[.y]](obj_fun = error_func[[.x]],
setParameter = list_param[[.y]],
silent = silent))
names(parameters) <- paste0(kernel_real, "_", optMethods)
return(list(opt_parameters = parameters,
add_parameters = list(inv_sigma = inv_sigma,
alp = alp,
opt_methods = optimizeMethod),
basic_machines = mach2))
}
Example.5: We approximate the smoothing parameter of
Boston
data.
param <- fit_parameter_Mix(train_input = df[train, 1:13],
train_response = df[train, 14],
machines = c("knn", "rf", "xgb"),
splits = .50,
kernels = c("gaussian", "biweight", "triangular"),
optimizeMethod = c("grad", "grid", "grid"),
setBasicMachineParam = setBasicParameter_Mix(k = 2:6,
ntree = 1:5*100,
nrounds_xgb = 1:5*100),
setGradParam = setGradParameter_Mix(rate = "linear",
print_step = FALSE,
coef_auto = c(0.3, 0.3)),
setGridParam = setGridParameter_Mix(min_alpha = 0.0001,
max_alpha = 3,
min_beta = 0.0001,
max_beta = 3))
* Building basic machines ...
~ Progress: ... 33% ... 67% ... 100%
~ Observed parameter: (alpha, beta) = ( 0.2723926 , 34.52247 ) in 112 itertaions.
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (0.1035448, 1.862107)
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (1e-04, 0.4138793)
The smoothing parameter obtained from the previous section can be used to make the final predictions.
Several types of kernel functions used for the aggregation are defined in this section.
Argument:
theta
: a 2D-vector of the parameter \((\alpha, \beta)\) used..y2
: the vector of response variable of the second
part \(\mathcal{D}_{\ell}\) of the
training data, which is used for the aggregation..distance
: the distance matrix object obtained from
dist_matrix
function..kern
: the string specifying the kernel function. By
default, .kern = "gaussian"
..inv_sig, .alp
: the parameters of exponential kernel
function..meth
: the string of optimization methods to be linked
to the name of the kernel functions if any perticular kernels are used
more than once (maybe with different optimiztaion method such as
“gaussian” kernel, can be used with both “grad” and “grid” optimization
methods).Value:
This function returns the predictions of the aggregation method evaluated with the given parameter.
kernel_pred_Mix <- function(theta,
.y2,
.dist1,
.dist2,
.kern = "gaussian",
.inv_sig = sqrt(.5),
.alp = 2,
.meth = NA){
distD <- as.matrix(theta[1]*.dist1+theta[2]*.dist1)
# Kernel functions
# ================
gaussian_kernel <- function(D,
.inv_sigma = .inv_sig,
.alpha = .alp){
tem0 <- exp(- D^(.alpha/2)*.inv_sig^.alpha)
y_hat <- .y2 %*% tem0/colSums(tem0)
return(t(y_hat))
}
# Epanechnikov
epanechnikov_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] = 0
y_hat <- .y2 %*% tem0/colSums(tem0)
return(t(y_hat))
}
# Biweight
biweight_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] = 0
y_hat <- .y2 %*% tem0^2/colSums(tem0^2)
y_hat[is.na(y_hat)] <- 0
return(t(y_hat))
}
# Triweight
triweight_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] = 0
y_hat <- .y2 %*% tem0^3/colSums(tem0^3)
y_hat[is.na(y_hat)] <- 0
return(t(y_hat))
}
# Triangular
triangular_kernel <- function(D){
tem0 <- 1- D
tem0[tem0 < 0] <- 0
y_hat <- .y2 %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(t(y_hat))
}
# Naive
naive_kernel <- function(D){
tem0 <- (D < 1)
y_hat <- .y2 %*% tem0/colSums(tem0)
y_hat[is.na(y_hat)] = 0
return(t(y_hat))
}
# Prediction
kernel_list <- list(gaussian = gaussian_kernel,
epanechnikov = epanechnikov_kernel,
biweight = biweight_kernel,
triweight = triweight_kernel,
triangular = triangular_kernel,
naive = naive_kernel)
res <- tibble(as.vector(kernel_list[[.kern]](D = distD)))
names(res) <- ifelse(is.na(.meth),
.kern,
paste0(.kern, '_', .meth))
return(res)
}
predict_Mix
This function allows us to predict new data points.
Argument:
fitted_models
: the object obtained from
fit_parameter_Mix
function.new_data
: the testing data to be predicted.new_pred
: the predictions of the testing data
new_data
(when the basic machines are not
constructed).test_response
: the testing response variable, it is
optional. If it is given, the mean square error on the testing data is
also computed. By default, test_response = NULL
.Value:
This function returns a list of the following objects:
fitted_aggregate
: the predictions by the aggregation
methods.fitted_machine
: the predictions given by all the basic
machines.mse
: the mean square error computed only if the
test_reponse
argument is not NULL
.# Prediction
predict_Mix <- function(fitted_models,
new_data,
new_pred = NULL,
test_response = NULL){
opt_param <- fitted_models$opt_parameters
add_param <- fitted_models$add_parameters
basic_mach <- fitted_models$basic_machines
kern0 <- names(opt_param)
kernel0 <- stringr::str_split(kern0, "_") %>%
imap_dfc(.f = ~ tibble("{.y}" := .x))
kerns <- kernel0[1,] %>%
as.character
opt_meths <- kernel0[2,] %>%
as.character
new_data_ <- new_data
mat_input <- as.matrix(basic_mach$train_data$train_input)
if(!is.null(basic_mach$train_data$min_input)){
new_data_ <- scale(new_data,
center = basic_mach$train_data$min_input,
scale = basic_mach$train_data$max_input - basic_mach$train_data$min_input)
}
if(is.matrix(new_data_)){
mat_test <- new_data_
df_test <- as_tibble(new_data_)
} else {
mat_test <- as.matrix(new_data_)
df_test <- new_data_
}
if(is.null(basic_mach$models)){
pred_test_all <- new_pred
pred_test0 <- new_pred
} else{
# Prediction test by basic machines
built_models <- basic_mach$models
pred_test <- function(meth){
if(meth == "knn"){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := FNN::knn.reg(train = mat_input[!basic_mach$id2,],
test = mat_test,
y = basic_mach$train_data$train_response[!basic_mach$id2],
k = built_models[[meth]][[k]])$pred)))
}
if(meth %in% c("tree", "rf")){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := as.vector(predict(built_models[[meth]][[k]], df_test)))))
}
if(meth %in% c("lasso", "ridge")){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := as.vector(predict.glmnet(built_models[[meth]][[k]], mat_test)))))
}
if(meth == "xgb"){
pre <- 1:length(built_models[[meth]]) %>%
map_dfc(.f = (\(k) tibble('{{k}}' := as.vector(predict(built_models[[meth]][[k]], mat_test)))))
}
names(pre) <- names(built_models[[meth]])
return(pre)
}
pred_test_all <- names(built_models) %>%
map_dfc(.f = pred_test)
pred_test0 <- pred_test_all
}
if(!is.null(basic_mach$train_data$min_machine)){
pred_test_all <- scale(pred_test0,
center = basic_mach$train_data$min_machine,
scale = basic_mach$train_data$max_machine - basic_mach$train_data$min_machine)
}
# Prediction train2
pred_train_all <- basic_mach$fitted_remain
colnames(pred_test_all) <- colnames(pred_train_all)
d_train <- dim(pred_train_all)
d_test <- dim(pred_test_all)
d_train_input <- dim(mat_input[basic_mach$id2,])
d_test_input <- dim(new_data_)
pred_test_mat <- as.matrix(pred_test_all)
pred_train_mat <- as.matrix(pred_train_all)
# Distance matrix
dist_mat <- function(kernel = "gausian"){
res_1 <- res_2 <- NULL
if(!(kernel %in% c("naive", "triangular"))){
res_1 <- 1:d_test_input[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums((mat_input[basic_mach$id2,] - matrix(rep(new_data_[id,],
d_train_input[1]),
ncol = d_train_input[2],
byrow = TRUE))^2)))))
res_2 <- 1:d_test[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums((pred_train_mat - matrix(rep(pred_test_mat[id,],
d_train[1]),
ncol = d_train[2],
byrow = TRUE))^2)))))
}
if(kernel == "triangular"){
res_1 <- 1:d_test_input[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums(abs(mat_input[basic_mach$id2,] - matrix(rep(new_data_[id,],
d_train_input[1]),
ncol = d_train_input[2],
byrow = TRUE)))))))
res_2 <- 1:d_test[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(rowSums(abs(pred_train_mat - matrix(rep(pred_test_mat[id,],
d_train[1]),
ncol = d_train[2],
byrow = TRUE)))))))
}
if(kernel == "naive"){
res_1 <- 1:d_test_input[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(apply(abs(mat_input[basic_mach$id2,] - matrix(rep(new_data_[id,],
d_train_input[1]),
ncol = d_train_input[2],
byrow = TRUE)), 1, max)))))
res_2 <- 1:d_test[1] %>%
map_dfc(.f = (\(id) tibble('{{id}}' := as.vector(apply(abs(pred_train_mat - matrix(rep(pred_test_mat[id,], d_train[1]),
ncol = d_train[2],
byrow = TRUE)), 1, max)))))
}
return(list(dist_input = res_1,
dist_machine = res_2))
}
dists <- 1:length(kerns) %>%
map(.f = ~ dist_mat(kerns[.x]))
tab_nam <- table(kerns)
nam <- names(tab_nam[tab_nam > 1])
vec <- rep(NA, length(kerns))
for(id in nam){
id_ <- kerns == id
if(!is.null(id_)){
vec[id_] = add_param$opt_methods[id_]
}
}
prediction <- 1:length(kerns) %>%
map_dfc(.f = ~ kernel_pred_Mix(theta = opt_param[[kern0[.x]]]$opt_param,
.y2 = basic_mach$train_data$train_response[basic_mach$id2],
.dist1 = dists[[.x]]$dist_input,
.dist2 = dists[[.x]]$dist_machine,
.kern = kerns[.x],
.inv_sig = add_param$inv_sigma,
.alp = add_param$alp,
.meth = vec[.x]))
if(is.null(test_response)){
return(list(fitted_aggregate = prediction,
fitted_machine = pred_test0))
} else{
error <- cbind(pred_test0, prediction) %>%
dplyr::mutate(y_test = test_response) %>%
dplyr::summarise_all(.funs = ~ (. - y_test)) %>%
dplyr::select(-y_test) %>%
dplyr::summarise_all(.funs = ~ mean(.^2))
return(list(fitted_aggregate = prediction,
fitted_machine = pred_test0,
mse = error))
}
}
Example.6 Aggregation on
Boston
dataset.
MixCOBRARegressor
(direct aggregation)This function puts together all the functions above and provides the desire result of MixCobra aggregation method.
Argument: all of its arguments are already described in the previous parts.
Value:
This function return a list of the following objects:
fitted_aggregate
: the predicted values of the
aggregation method.fitted_machine
: the predicted values of all the basic
machines built on \(\mathcal{D}_{k}\).pred_train2
: the prediction of \(\mathcal{D}_{\ell}\), given by all the
basic machiens.opt_parameter
: the observed optimal parameters.mse
: the meas square error evaluated on the testing
data if test_response
is given.kernels
: a vector of kernel function used.ind_D2
: a logical vector indicating the part of
training data corresponding to \(\mathcal{D}_{\ell}\)
(TRUE
).MixCOBRARegressor <- function(train_input,
train_response,
test_input,
train_predictions = NULL,
test_predictions = NULL,
test_response = NULL,
machines = NULL,
scale_input = TRUE,
scale_machine = TRUE,
splits = 0.5,
n_cv = 5,
inv_sigma = sqrt(.5),
alp = 2,
kernels = "gaussian",
optimizeMethod = "grad",
setBasicMachineParam = setBasicParameter_Mix(),
setGradParam = setGradParameter_Mix(),
setGridParam = setGridParameter_Mix(),
silent = FALSE){
# build machines + tune parameter
fit_mod <- fit_parameter_Mix(train_input = train_input,
train_response = train_response,
train_predictions = train_predictions,
machines = machines,
scale_input = scale_input,
scale_machine = scale_machine,
splits = splits,
n_cv = n_cv,
inv_sigma = inv_sigma,
alp = alp,
kernels = kernels,
optimizeMethod = optimizeMethod,
setBasicMachineParam = setBasicMachineParam,
setGradParam = setGradParam,
setGridParam = setGridParam,
silent = silent)
# prediction
pred <- predict_Mix(fitted_models = fit_mod,
new_data = test_input,
new_pred = test_predictions,
test_response = test_response)
return(list(fitted_aggregate = pred$fitted_aggregate,
fitted_machine = pred$fitted_machine,
pred_train2 = fit_mod$basic_machines$fitted_remain,
opt_parameter = fit_mod$opt_parameters,
mse = pred$mse,
kernels = kernels,
ind_D2 = fit_mod$basic_machines$id2))
}
Example.7 A complete aggregation is implemented on Abalone dataset. Three types of basic machines, and three different kernel functions are used.
agg <- MixCOBRARegressor(train_input = df[train, 2:8],
train_response = df$Rings[train],
test_input = df[!train,2:8],
test_response = df$Rings[!train],
n_cv = 3,
machines = c("knn", "rf", "xgb"),
splits = .5,
kernels = c("gaussian",
"naive",
"triangular"),
optimizeMethod = c("grad",
"grid",
"grid"),
setBasicMachineParam = setBasicParameter_Mix(k = c(2,5,7,10),
ntree = 2:5*100,
nrounds_xgb = c(1,3,5)*100),
setGradParam = setGradParameter_Mix(rate = "linear",
coef_auto = c(0.5, 0.5),
print_step = TRUE,
print_result = TRUE,
figure = TRUE),
setGridParam = setGridParameter_Mix(parameters = list(alpha = seq(0.0001, 3, length.out = 20),
beta = seq(0.0001, 5, length.out = 20))))
* Building basic machines ...
~ Progress: ... 33% ... 67% ... 100%
* Gradient descent algorithm ...
Step | alpha ; beta | Gradient (alpha ; beta) | Threshold
--------------------------------------------------------------------------------
0 | 10.00000 ; 50.00000 | -1.28882e+00 ; 0.25457 | 1e-10
--------------------------------------------------------------------------------
0 | 10.0000 ; 50.0000 | -1.2888e+00 ; 0.25457 | 13.89048
1 | 10.5000 ; 49.5000 | -1.3154e+00 ; 0.30085 | 0.01666667
2 | 11.5206 ; 48.3182 | -1.3902e+00 ; 0.38099 | 0.07286875
3 | 13.1386 ; 46.0733 | -1.4596e+00 ; 0.42388 | 0.154925
4 | 15.4036 ; 42.7431 | -1.201e+00 ; 0.22624 | 0.1122472
5 | 17.7331 ; 40.5213 | -6.9773e-01 ; -7.5933e-02 | 0.4562241
6 | 19.3573 ; 41.4162 | -4.52e-01 ; -4.1175e-02 | 0.8054029
7 | 20.5848 ; 41.6992 | -2.7488e-01 ; -5.52e-02 | 0.2804899
8 | 21.4379 ; 42.1329 | -1.7249e-01 ; -3.9204e-02 | 0.191143
9 | 22.0402 ; 42.4794 | -1.0728e-01 ; -2.3637e-02 | 0.1183893
10 | 22.4564 ; 42.7115 | -6.475e-02 ; -1.345e-02 | 0.08077813
11 | 22.7327 ; 42.8568 | -3.7509e-02 ; -7.4236e-03 | 0.05271782
12 | 22.9073 ; 42.9443 | -2.0689e-02 ; -3.9679e-03 | 0.03326777
13 | 23.0116 ; 42.9950 | -1.0786e-02 ; -2.0311e-03 | 0.02027503
14 | 23.0702 ; 43.0229 | -5.2749e-03 ; -9.8347e-04 | 0.01184012
15 | 23.1009 ; 43.0374 | -2.4012e-03 ; -4.457e-04 | 0.006558684
16 | 23.1158 ; 43.0444 | -1.0097e-03 ; -1.8688e-04 | 0.003411438
17 | 23.1225 ; 43.0475 | -3.8859e-04 ; -7.1943e-05 | 0.001650372
18 | 23.1252 ; 43.0488 | -1.357e-04 ; -2.5007e-05 | 0.0007360271
19 | 23.1262 ; 43.0492 | -4.2317e-05 ; -7.8852e-06 | 0.0002998254
20 | 23.1265 ; 43.0494 | -1.194e-05 ; -2.1778e-06 | 0.0001105055
21 | 23.1266 ; 43.0494 | -2.7786e-06 ; -4.5058e-07 | 3.608418e-05
22 | 23.1266 ; 43.0495 | -5.2568e-07 ; -2.6284e-07 | 1.088909e-05
23 | 23.1267 ; 43.0495 | -2.2529e-07 ; 1.8774e-07 | 2.440657e-06
24 | 23.1267 ; 43.0495 | 7.5097e-08 ; -3.0039e-07 | 7.509715e-07
25 | 23.1267 ; 43.0495 | -7.5097e-08 ; 1.1265e-07 | 7.8852e-07
26 | 23.1267 ; 43.0495 | 0e+00 ; 0e+00 | 5.632286e-07
27 | 23.1267 ; 43.0495 | 0e+00 ; 0e+00 | 1.877429e-07
--------------------------------------------------------------------------------
Stopped| 23.1267 ; 43.0495 | 0 | 0
~ Observed parameter: (alpha, beta) = ( 23.12665 , 43.04946 ) in 28 itertaions.
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (0.9474368, 4.736847)
* Grid search algorithm...
~ Observed parameter: (alpha, beta) = (0.4737684, 0.5264053)
sqrt(agg$mse)
📖 Read also MixCobraClass, KernelAggReg and KernelAggClass methods.