Page 1 of 1

Parallel computing MXL

Posted: 11 Jun 2020, 17:02
by svenne
Dear Stephane,

I recently came across the following error message

Preparing workersError in makePSOCKcluster(names = spec, ...) :
Cluster setup failed. 4 of 4 workers failed to connect.

when estimating an MNL or MXL (error message is identical) using nCores>1.
For nCores = 1 everything works fine (but somewhat slow).

Below is a working example using the Car data from mlogit().

Any suggestions are very welcome.

Best
Sven

Example

Code: Select all

apollo_initialise()

### Set core controls
apollo_control = list(
  modelName ="car_purchase_mxl",
  modelDescr ="Mixed logit model on Train's car purchase data",
  indivID   ="ID",  
  mixing    = TRUE, 
  nCores    = 1 ## here is the trouble maker
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #
data("Car") ### load data from mlogit() package
Car$ID      = 1:nrow(Car)
Car$Choice  = as.numeric(substring(Car$choice, 7))
database = Car

# ################################################################# #
#### ANALYSIS OF CHOICES                                         ####
# ################################################################# #

choiceAnalysis_settings <- list(
  alternatives = c(car1=1, car2=2,car3=3,car4=4,car5=5, car6=6),
  avail        = 1,
  explanators  = database[,c("college","hsg2","coml5")],
  choiceVar    = database$Choice
  #rows         = database$income>30000
)

apollo_choiceAnalysis(choiceAnalysis_settings, apollo_control, database)

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(priceLogInc    = 0,
                range          = 0,
                mu_log_b_oc    = 0,
                sigma_log_b_oc = -2,
                meth           = 0,
                cng            = 0,
                mu_b_ev        = 0,
                sigma_b_ev     = -2,
                truck          = 0,
                mu_b_suv       = 0,
                sigma_b_suv    = -2,
                van            = 0
                )

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()

# ################################################################# #
#### DEFINE RANDOM COMPONENTS                                    ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
 interDrawsType = "halton",
 interNDraws    = 0,
 interUnifDraws = c(),
 interNormDraws = c(),
 intraDrawsType = "halton",
 intraNDraws    = 10,
 intraUnifDraws = c(),
 intraNormDraws = c("draws_oc", "draws_ev", "draws_suv")
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
  randcoeff = list()

  randcoeff[["opcost"]] = -exp( mu_log_b_oc + sigma_log_b_oc * draws_oc )
  randcoeff[["ev"]]     =  mu_b_ev + sigma_b_ev * draws_oc 
  randcoeff[["suv"]]    =  mu_b_suv + sigma_b_suv * draws_suv 

  return(randcoeff)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Function initialisation: do not change the following three commands
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  V[['car1']] = suv * (type1 == "sportuv") + truck * (type1 == "truck") + van * (type1 == "van") + meth * (fuel1 == "methanol") + cng * (fuel1 == "cng") + ev * (fuel1 == "electric") + priceLogInc * price1 + range * range1 + opcost * cost1 
  V[['car2']] = suv * (type2 == "sportuv") + truck * (type2 == "truck") + van * (type2 == "van") + meth * (fuel2 == "methanol") + cng * (fuel2 == "cng") + ev * (fuel2 == "electric") + priceLogInc * price2 + range * range2 + opcost * cost2
  V[['car3']] = suv * (type3 == "sportuv") + truck * (type3 == "truck") + van * (type3 == "van") + meth * (fuel3 == "methanol") + cng * (fuel3 == "cng") + ev * (fuel3 == "electric") + priceLogInc * price3 + range * range3 + opcost * cost3
  V[['car4']] = suv * (type4 == "sportuv") + truck * (type4 == "truck") + van * (type4 == "van") + meth * (fuel4 == "methanol") + cng * (fuel4 == "cng") + ev * (fuel4 == "electric") + priceLogInc * price4 + range * range4 + opcost * cost4
  V[['car5']] = suv * (type5 == "sportuv") + truck * (type5 == "truck") + van * (type5 == "van") + meth * (fuel5 == "methanol") + cng * (fuel5 == "cng") + ev * (fuel5 == "electric") + priceLogInc * price5 + range * range5 + opcost * cost5
  V[['car6']] = suv * (type6 == "sportuv") + truck * (type6 == "truck") + van * (type6 == "van") + meth * (fuel6 == "methanol") + cng * (fuel6 == "cng") + ev * (fuel6 == "electric") + priceLogInc * price6 + range * range6 + opcost * cost6
  
  ### Define settings for MNL model component
  mnl_settings    = list(
    alternatives  = c(car1=1, car2=2, car3=3, car4=4, car5=5, car6=6),
    avail         = 1,
    choiceVar     = Choice,
    V             = V
  )
  
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  ### Take product across observation for same individual
  #P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Average across intra-individual draws
  P = apollo_avgIntraDraws(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed,
                        apollo_probabilities, apollo_inputs, 
                        estimate_settings=list(hessianRoutine="maxLik"))

# ################################################################# #
#### MODEL OUTPUTS                                               ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN)                               ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model)

Re: Parallel computing MXL

Posted: 11 Jun 2020, 17:07
by stephanehess
Hi Sven

to try and diagnose whether this is an issue with your model or with your installation of R/your machine, can you please try running this example from the website, which uses multi-core too: http://apollochoicemodelling.com/files/ ... ample_14.r

Thanks

Stephane

Re: Parallel computing MXL

Posted: 12 Jun 2020, 11:22
by svenne
Dear Stephane,

this yields the same result. I use Apollo 0.1.0 under R version 4.0.0 (2020-04-24) on MacBook Pro MacOS 10.15.5.

I know that parallel computing facility of Apollo worked for me some time ago. So, I guess something happened due to R or MacOS updates.

Do you have any idea?

Best regards
Sven

Re: Parallel computing MXL

Posted: 12 Jun 2020, 11:33
by stephanehess
Sven

we've had problems with R4.0 though it works on our machines now with parallel too. Can I suggest that you try to reinstall Apollo as well as updating libraries that Apollo uses (Rcpp, maxLik, mnormt, mvtnorm, graphics, coda, sandwich, randtoolbox, numDeriv, RSGHB, parallel, Deriv)

Stephane

Re: Parallel computing MXL

Posted: 12 Jun 2020, 12:34
by svenne
Things getting more complicated:
- mvtnorm mixed oldrel and rel for MacOS on CRAN (macOS binaries: r-release: mvtnorm_1.1-0.tgz, r-oldrel: mvtnorm_1.1-1.tgz). However, I downloaded the binary manually.
- After installing the current version of mvtnorm, I try to load mvtnorm and get the

$version.string
[1] "R version 4.0.1 (2020-06-06)"
> remove.packages("mvtnorm")
Removing package from ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library’
(as ‘lib’ is unspecified)
> install.packages("~/Downloads/mvtnorm_1.1-1.tgz", repos = NULL, type = .Platform$pkgType)
> library(mvtnorm)
Error: package or namespace load failed for ‘mvtnorm’:
package ‘mvtnorm’ was installed before R 4.0.0: please re-install it

I did reinstall mvtnorm. Nevertheless, I get the same results. Any ideas?

Best
Sven

Re: Parallel computing MXL

Posted: 15 Jun 2020, 13:21
by dpalma
Hi Sven,

I would recommend trying to install mvtnorm again from CRAN. I know you already did that, but I just checked CRAN and now both the r-release and r-oldrel are 1.1-1. So, with a bit of luck, installing it again might help.

Also, do you have full administrator rights on your computer? It has happened before that installing Apollo (and other R-packages) in institutional computers is problematic. These computers usually have restrictions when installing software that do not play well with R. Please check section 2 of the FAQ if this is your case.

Best
David

Re: Parallel computing MXL

Posted: 15 Jun 2020, 13:52
by svenne
David,

thank you so much. Yes, I have full rights on my computer. However, https://cran.r-project.org/web/packages ... index.html still shows macOS binaries: r-release: mvtnorm_1.1-0.tgz, r-oldrel: mvtnorm_1.1-1.tgz

I might install from sources, but here I run into a Fortran error. I have now installed a Fortran compiler, but the problem remains. It might have to do with some path issues. But I could not figure out so far.

Best
Sven

Re: Parallel computing MXL

Posted: 17 Jun 2020, 11:07
by svenne
Dear David & Stephane,

now I have the most recent versions of all packages that you mentioned. I also reinstalled apollo afterwards. However, I still get

Code: Select all

Attempting to split data into 4 pieces.
 Number of observations per worker (thread):
  1164, 1164, 1164, 1162
 Writing pieces to disk.... Done. 248.6MB of RAM in use.
Preparing workersError in makePSOCKcluster(names = spec, ...) : 
  Cluster setup failed. 4 of 4 workers failed to connect.
Any ideas?

Best
Sven

Re: Parallel computing MXL

Posted: 17 Jun 2020, 11:37
by dpalma
Hi Sven,

I did some googling, and apparently you have run into an RStudio bug, as reported in the following forum:
https://github.com/rstudio/rstudio/issues/6692

I didn't go through the whole thread, but they do say not to expect a fix earlier than a few weeks from now.

In the meantime, you could try using an older version of RStudio. With a bit of luck the error is not present in the older version.

Sorry we couldn't be of more help
David

Re: Parallel computing MXL

Posted: 17 Jun 2020, 13:04
by svenne
Hi David,

thank you so much. What a pity. Hope that the update comes fast ...

Best
Sven