-
Notifications
You must be signed in to change notification settings - Fork 22
Description
Thank you for this amazing package! I run the hyperparameter tuning code for xgboost and the result sometimes is reproducible and sometimes not, is this because of the reason I run parallelization?
scoringFunction <- function(max_depth, min_child_weight, subsample) {
dtrain <- xgb.DMatrix(agaricus.train$data,label = agaricus.train$label)
Pars <- list(
booster = "gbtree"
, eta = 0.001
, max_depth = max_depth
, min_child_weight = min_child_weight
, subsample = subsample
, objective = "binary:logistic"
, eval_metric = "auc"
)
xgbcv <- xgb.cv(
params = Pars
, data = dtrain
, nround = 100
, folds = Folds
, early_stopping_rounds = 5
, maximize = TRUE
, verbose = 0
)
return(list(Score = max(xgbcv$evaluation_log$test_auc_mean)
, nrounds = xgbcv$best_iteration
)
)
}
bounds <- list(
max_depth = c(1L, 5L)
, min_child_weight = c(0, 25)
, subsample = c(0.25, 1)
)
set.seed(42)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
clusterExport(cl,c('Folds','agaricus.train'))
clusterEvalQ(cl,expr= {
library(xgboost)
})
set.seed(42)
tWithPar <- system.time(
optObj <- bayesOpt(
FUN = scoringFunction
, bounds = bounds
, initPoints = 4
, iters.n = 4
, iters.k = 2
, parallel = TRUE
)
)
stopCluster(cl)
registerDoSEQ()
the code is like this, but getBestPars(optObj) get different every time when I run exactly the same code, the score summary is similar to #52, and the parameters chosen are the same but the score is different. I just wonder whether this is because of the parallelization or other reasons.
I also run the code you mentioned in #7 and the result is FALSE, but the scores summary table seems to be the same for optobj and optobj2, so I guess the reason of different results for running the same code several times is because of parallelization?