after running parallelization the result is not reproducible 

Thank you for this amazing package! I run the hyperparameter tuning code for xgboost and the result sometimes is reproducible and sometimes not, is this because of the reason I run parallelization? 
scoringFunction <- function(max_depth, min_child_weight, subsample) {

  dtrain <- xgb.DMatrix(agaricus.train$data,label = agaricus.train$label)
  
  Pars <- list( 
      booster = "gbtree"
    , eta = 0.001
    , max_depth = max_depth
    , min_child_weight = min_child_weight
    , subsample = subsample
    , objective = "binary:logistic"
    , eval_metric = "auc"
  )

  xgbcv <- xgb.cv(
      params = Pars
    , data = dtrain
    , nround = 100
    , folds = Folds
    , early_stopping_rounds = 5
    , maximize = TRUE
    , verbose = 0
  )

  return(list(Score = max(xgbcv$evaluation_log$test_auc_mean)
             , nrounds = xgbcv$best_iteration
             )
         )
}

bounds <- list( 
    max_depth = c(1L, 5L)
  , min_child_weight = c(0, 25)
  , subsample = c(0.25, 1)
)

set.seed(42)
library(doParallel)
cl <- makeCluster(2)
registerDoParallel(cl)
clusterExport(cl,c('Folds','agaricus.train'))
clusterEvalQ(cl,expr= {
  library(xgboost)
})

set.seed(42)
tWithPar <- system.time(
  optObj <- bayesOpt(
      FUN = scoringFunction
    , bounds = bounds
    , initPoints = 4
    , iters.n = 4
    , iters.k = 2
    , parallel = TRUE
  )
)
stopCluster(cl)
registerDoSEQ()

the code is like this,  but getBestPars(optObj) get different every time when I run exactly the same code, the score summary is similar to #52, and the parameters chosen are the same but the score is different. I just wonder whether this is because of the parallelization or other reasons. 

I also run the code you mentioned in #7  and the result is FALSE, but the scores summary table seems to be the same for optobj and optobj2, so I guess the reason of different results for running the same code several times is because of parallelization? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

after running parallelization the result is not reproducible #59

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

after running parallelization the result is not reproducible #59

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions