Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

Any example to simulate data for EMDC model?

Ask questions about existing examples and put in requests to software developers and users for other example implementations of models.
Post Reply
TWayne
Posts: 4
Joined: 04 Dec 2023, 06:49

Any example to simulate data for EMDC model?

Post by TWayne »

Hi authors,

Thank you very much for the package and the great forum. It is super helpful.

I am hoping to try the EMDC model that considers complementarity based on Palma and Hess (2022). I am wondering how we can simulate the budget allocation data? For example, I hope to simulate some budget allocation data first, and then I can use the package the estimate the model using the simulated data to recover parameters. I understand how I can do the estimation, but do not know how to do the data simulation.

Thank you.

Best,
TW
stephanehess
Site Admin
Posts: 1142
Joined: 24 Apr 2020, 16:29

Re: Any example to simulate data for EMDC model?

Post by stephanehess »

Hi

you can use apollo_prediction to simulate some choices, but as this will be averaged across the draws for the error terms, it won't have any corner solutions.

If you want predictions at the draw level, you could include rawPrediction=TRUE in emdc_settings

Stephane & David
--------------------------------
Stephane Hess
www.stephanehess.me.uk
TWayne
Posts: 4
Joined: 04 Dec 2023, 06:49

Re: Any example to simulate data for EMDC model?

Post by TWayne »

Hi Stephane and David,

Thank you very much for your quick response. This makes a lot of sense. I tried to use this function to simulate/predict. I did realize that the prediction takes much longer time (the estimation takes about 1 minute, while the prediction takes about 30 minutes using the data in the sample code). I believe it is because it requires a lot of draws. Thus, I am wondering what is the minimum number of draws required for prediction and how I can change that?

Best,
TW
TWayne
Posts: 4
Joined: 04 Dec 2023, 06:49

Re: Any example to simulate data for EMDC model?

Post by TWayne »

I also tried to use the simulated data to estimate the model again to see whether the true parameters can be recovered. Specifically, what I did was:

Given the sample code, I obtained the true parameters (I also fixed all the complementarity/substitution deltas to be 0 to make it simpler). Then I used these parameters to do the simulation (prediction). After that, I used the simulated choices to estimate the model again to see whether I can get the same set of estimates (I also used to the true parameters as the starting value in this step). However, I cannot seem to get the same estimates. Did I do it correctly?

Here is my code. Thank you!

Code: Select all

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Clear memory and initialise
rm(list = ls())
library(apollo)
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  ="eMDC_with_budget",
  modelDescr ="Extended MDC with complementarity and substitution, with observed budget and socio-demographics",
  indivID    ="indivID",
  outputDirectory="output"
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

### Load data from within the Apollo package
database = apollo_timeUseData

### Create consumption variables for combined activities
# outside good: time spent at home and travelling
database$t_outside = rowSums(database[,c("t_a01", "t_a06", "t_a10", "t_a11", "t_a12")]) 
database$t_leisure = rowSums(database[,c("t_a07", "t_a08", "t_a09")])

# ### Randomly split dataset into estimation (70%) and validation (30%)
# set.seed(1)
# database$validation <- runif(nrow(database))>0.7
# dbVal    <- database[ database$validation,] # validation sample
# database <- database[!database$validation,] # estimation sample

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Parameters starting values c(name1=value1, name2=value2, ...)
apollo_beta = c(  sigma = 0.71,   ###Here I remove aFemale
                  # Satiation
                  gWork    = 10.425, 
                  gSchool  = 9.955, 
                  gShopping= 2.065, 
                  gPrivate = 4.419, 
                  gLeisure = 6.927, 
                  # Base utility
                  bWork     =-3.372, bSchool   =-5.250, 
                  bShopping =-3.663, bPrivate  =-3.973, 
                  bLeisure  =-3.300, bWork_FT  = 0.708, 
                  bWork_wknd=-1.600, bSchool_young= 0.948, 
                  bLeisure_wknd= 0.155, 
                  # Compl/subst
                  # dWorkScho=-0.008, dWorkShop= 0.000, 
                  # dWorkPriv= 0.000, dWorkLeis= 0.000, 
                  # dSchoShop= 0.000, dSchoPriv= 0.000, 
                  # dSchoLeis= 0.000, dShopPriv= 0.010, 
                  # dShopLeis= 0.012, dPrivLeis= 0.012
                  
                  dWorkScho=-0.000, dWorkShop= 0.000, 
                  dWorkPriv= 0.000, dWorkLeis= 0.000, 
                  dSchoShop= 0.000, dSchoPriv= 0.000, 
                  dSchoLeis= 0.000, dShopPriv= 0.000, 
                  dShopLeis= 0.000, dPrivLeis= 0.000)



### Names of fixed parameters
apollo_fixed = c('dWorkShop', 'dWorkPriv', 'dSchoShop', 
                 'dSchoPriv', 'dSchoLeis', 'dWorkLeis'   ,
                 "dWorkScho" , "dShopPriv" , "dShopLeis" , "dPrivLeis")  #############Here I also fix sigma to the true value 0.71

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){
    
  ### Initialise
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  P = list()
  
  ### Prepare Inputs
  alts  = c("work", "school", "shopping", "private", "leisure")
  nAlt = length(alts)
  ones = setNames(as.list(rep(1, nAlt)), alts)
  continuousChoice = list(work     =     t_a02/60,
                          school   =     t_a03/60,
                          shopping =     t_a04/60,
                          private  =     t_a05/60,
                          leisure  = t_leisure/60)
  utilities = list(
    work     = bWork     + bWork_FT*occ_full_time + bWork_wknd*weekend,
    school   = bSchool   + bSchool_young*(age<=30), 
    shopping = bShopping, 
    private  = bPrivate, 
    leisure  = bLeisure  + bLeisure_wknd*weekend
  )
  gamma = list(work     = gWork,    
               school   = gSchool,
               shopping = gShopping,
               private  = gPrivate,
               leisure  = gLeisure)
  delta <- c(0,                 0,         0,         0, 0,
             dWorkScho,         0,         0,         0, 0,
             dWorkShop, dSchoShop,         0,         0, 0,
             dWorkPriv, dSchoPriv, dShopPriv,         0, 0,
             dWorkLeis, dSchoLeis, dShopLeis, dPrivLeis, 0)
  delta <- matrix(delta, nrow=nAlt, ncol=nAlt, byrow=TRUE)
  emdc_settings <- list(continuousChoice = continuousChoice, 
                        avail            = ones,
                        utilityOutside   = 0, 
                        utilities        = utilities, 
                        budget           = 24,
                        sigma            = sigma, 
                        gamma            = gamma, 
                        delta            = delta, 
                        cost             = ones)
  P[["model"]] = apollo_emdc(emdc_settings, functionality)
  
  ### Comment out as necessary
  P = apollo_panelProd(P, apollo_inputs, functionality)
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION & OUTPUT                                   ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs)

apollo_modelOutput(model)

apollo_saveOutput(model)


# ################################################################# #
#### PREDICTION                                                  ####
# ################################################################# #
#############################Use the model estimates to predict and see whether it is close to what is observed

model <- apollo_loadModel(apollo_control$modelName)
apollo_inputs <- apollo_validateInputs(database=database)
apollo_inputs$apollo_control$nCores <- 4
pred <- apollo_prediction(model, apollo_probabilities, apollo_inputs)


XObs <- cbind(work     =     database$t_a02/60,
              school   =     database$t_a03/60,
              shopping =     database$t_a04/60,
              private  =     database$t_a05/60,
              leisure  = database$t_leisure/60)
XPre <- pred[,4:8]

round(sqrt(colMeans((XObs - XPre)^2)),2)  # RMSE per product: 3.59 0.81 1.96 1.86 3.74
round(sqrt(mean((colSums(XObs) - colSums(XPre))^2)),2) # 372.38













###############Now use predicted choice to estimate again

database$t_a02 = XPre$work*60
database$t_a03 = XPre$school*60
database$t_a04 = XPre$shopping*60
database$t_a05 = XPre$private*60
database$t_leisure = XPre$leisure*60





### Parameters starting values c(name1=value1, name2=value2, ...)
apollo_beta = c(  sigma = 1.944,   ###Here I remove aFemale
                  # Satiation
                  gWork    = 3.2425, 
                  gSchool  = 3.7222, 
                  gShopping= 0.3727, 
                  gPrivate = 0.6274, 
                  gLeisure = 1.5038, 
                  # Base utility
                  bWork     =-3.6871, bSchool   =-7.4054, 
                  bShopping =-4.0058, bPrivate  =-4.5592, 
                  bLeisure  =-3.5019, bWork_FT  = 1.2540, 
                  bWork_wknd=-2.9429, bSchool_young= 1.8492, 
                  bLeisure_wknd= 0.3962, 
                  # Compl/subst
                  # dWorkScho=-0.008, dWorkShop= 0.000, 
                  # dWorkPriv= 0.000, dWorkLeis= 0.000, 
                  # dSchoShop= 0.000, dSchoPriv= 0.000, 
                  # dSchoLeis= 0.000, dShopPriv= 0.010, 
                  # dShopLeis= 0.012, dPrivLeis= 0.012
                  
                  dWorkScho=-0.000, dWorkShop= 0.000, 
                  dWorkPriv= 0.000, dWorkLeis= 0.000, 
                  dSchoShop= 0.000, dSchoPriv= 0.000, 
                  dSchoLeis= 0.000, dShopPriv= 0.000, 
                  dShopLeis= 0.000, dPrivLeis= 0.000)



### Names of fixed parameters
apollo_fixed = c('dWorkShop', 'dWorkPriv', 'dSchoShop', 
                 'dSchoPriv', 'dSchoLeis', 'dWorkLeis'   ,
                 "dWorkScho" , "dShopPriv" , "dShopLeis" , "dPrivLeis")  #############Here I also fix sigma to the true value 0.71

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){
  
  ### Initialise
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  P = list()
  
  ### Prepare Inputs
  alts  = c("work", "school", "shopping", "private", "leisure")
  nAlt = length(alts)
  ones = setNames(as.list(rep(1, nAlt)), alts)
  continuousChoice = list(work     =     t_a02/60,
                          school   =     t_a03/60,
                          shopping =     t_a04/60,
                          private  =     t_a05/60,
                          leisure  = t_leisure/60)
  utilities = list(
    work     = bWork     + bWork_FT*occ_full_time + bWork_wknd*weekend,
    school   = bSchool   + bSchool_young*(age<=30), 
    shopping = bShopping, 
    private  = bPrivate, 
    leisure  = bLeisure  + bLeisure_wknd*weekend
  )
  gamma = list(work     = gWork,    
               school   = gSchool,
               shopping = gShopping,
               private  = gPrivate,
               leisure  = gLeisure)
  delta <- c(0,                 0,         0,         0, 0,
             dWorkScho,         0,         0,         0, 0,
             dWorkShop, dSchoShop,         0,         0, 0,
             dWorkPriv, dSchoPriv, dShopPriv,         0, 0,
             dWorkLeis, dSchoLeis, dShopLeis, dPrivLeis, 0)
  delta <- matrix(delta, nrow=nAlt, ncol=nAlt, byrow=TRUE)
  emdc_settings <- list(continuousChoice = continuousChoice, 
                        avail            = ones,
                        utilityOutside   = 0, 
                        utilities        = utilities, 
                        budget           = 24,
                        sigma            = sigma, 
                        gamma            = gamma, 
                        delta            = delta, 
                        cost             = ones)
  P[["model"]] = apollo_emdc(emdc_settings, functionality)
  
  ### Comment out as necessary
  P = apollo_panelProd(P, apollo_inputs, functionality)
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION & OUTPUT                                   ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs)

apollo_modelOutput(model)



dpalma
Posts: 194
Joined: 24 Apr 2020, 17:54

Re: Any example to simulate data for EMDC model?

Post by dpalma »

Hi,

Sorry for the long delay in our reply.
TWayne wrote: 07 Dec 2023, 07:21 I tried to use this function to simulate/predict. I did realize that the prediction takes much longer time (the estimation takes about 1 minute, while the prediction takes about 30 minutes using the data in the sample code). I believe it is because it requires a lot of draws. Thus, I am wondering what is the minimum number of draws required for prediction and how I can change that?
When you simulate, you only need one draw. However, the forecasting code might not work if you only use one draw (it assumes that it must average over draws, and averaging over a single element may lead to issues). So if you want to simulate data, I would recommend you use very small number of draws (e.g. 2). Sadly, we have not implemented a rawPrediction option into emdc yet (only the mdcev model has it). But It is actually a good idea, and we will put it in our "to do" list.

Best wishes
David
dpalma
Posts: 194
Joined: 24 Apr 2020, 17:54

Re: Any example to simulate data for EMDC model?

Post by dpalma »

Hi,

So about the following post:
TWayne wrote: 11 Dec 2023, 21:15 I also tried to use the simulated data to estimate the model again to see whether the true parameters can be recovered. Specifically, what I did was:

Given the sample code, I obtained the true parameters (I also fixed all the complementarity/substitution deltas to be 0 to make it simpler). Then I used these parameters to do the simulation (prediction). After that, I used the simulated choices to estimate the model again to see whether I can get the same set of estimates (I also used to the true parameters as the starting value in this step). However, I cannot seem to get the same estimates. Did I do it correctly?

Here is my code. Thank you!
(...)
If I understand correctly, what you want to do is use the parameters defined in lines 41 - 65 to simulate data, and then estimate in that data and recover the parameters. If so, then you should not estimate the model in line 140, because this will lead to a change in the parameters. You should instead estimate it with a limit of zero iterations. You can do that as follows:

Code: Select all

estimate_settings=list(maxIterations=0, estimationRoutine="bfgs")
model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs, estimate_settings)
I believe that should help with the recovery of the parameters.

Best wishes
David
Post Reply