Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. We check the forum at least twice a week. It may thus take a couple of days for your post to appear and before we reply. There is no need to submit the post multiple times.

How to integrate out the prediction error of an imputed variable?

Ask questions about model specifications. Ideally include a mathematical explanation of your proposed model.
Post Reply
t.busschots
Posts: 1
Joined: 07 Oct 2025, 12:40

How to integrate out the prediction error of an imputed variable?

Post by t.busschots »

First of all, many thanks for providing such a great open-source tool to the research community and providing detailed documentation and assistance.

Question in short

How can I use the apollo package to estimate a mixed logit model where, next to inter-observation draws for the random parameters, I integrate over intra-observations draws of an imputed independent variable? Does anyone have/know some example code for this?

And, can the function be adapted such that I supply the set of values of the imputed independent variable computed earlier, instead of relying on the values generated based on the draws generated within the apollo functions? In other words, I do not want to use apollo_draws() and apollo_randCoeffs() functions, but instead use apollo_avgIntraDraws() to average out over a set of pre-computed values resulting from draws of an imputed variable.

With more context

I am using the apollo package to estimate a discrete choice labor supply model where households choose the number of days per week to work, following Van Soest (1995). In each discrete choice, disposable income is calculated by multiplying hourly wage by the hours worked in that option and passing the resulting gross income through a detailed tax-benefit calculator.

For the unemployed, hourly wage has to be imputed. As is standard in the discrete choice structural labor supply literature, I want to integrate out the wage imputation error by including intra-person random draws from the wage imputation and averaging out over them in the maximum likelihood function. It is not possible to simultaneously estimate wages and the discrete choice model, as I first need to pass simulated gross income based on (imputed) wage through the tax-benefit calculator.

This raises two issues that, to my knowledge, have not been discussed in the manual. In this post (https://www.apollochoicemodelling.com/f ... mpute#p842), the issue of multiple imputation is discussed. However, I thought that simply stacking copied observations with added disturbances to the imputed variable yields biased results, as the likelihood contribution of each observation-draw combination should first be averaged at the observation level, before taking the log and summing across observations (aka, the mean of logs is not the same at the log of means).

First, the apollo functions are designed to generate intra- and inter-observation draws for random parameters. I am struggling to understand how to adapt the functions to a setting where it is the covariate that has a distribution. It also appears that the error "intra-person draws are used without a panel structure. This is not allowed!" would need to be overuled somehow; when taking draws from an imputed variable, intra-person draws with one observation is possible.

Second, I am not sure of to replace the interaction between the apollo_draws(), apollo_randCoeffs() and apoll_mnl() in a way that allows me to use a pre-generated set of covariate values to perform the apollo_avgIntraDraws() over. Section 6.1.3 discusses how to use pre-computed draws, but not how one can use a set of pre-computes values that result from those draws.

I am hoping someone has faced the same situation and can help me get started, or has advice on how to implement this in Apollo.
dpalma
Posts: 227
Joined: 24 Apr 2020, 17:54

Re: How to integrate out the prediction error of an imputed variable?

Post by dpalma »

Hi,

I am not sure I got all the details right of what you want to do, but let’s start with the following case and take it from there.

Let’s say your choice model has 5 alternatives: working 1, 2, 3, 4, or 5 days a week. And you only have two explanatory variables: income_j (the income that the respondent would get under each alternative) and age (the respondent’s age). Therefore, your “deterministic” utilities look as follows:

V_1 = b_inc*income_1
V_j = asc_j + b_inc*income_j + b_age*age. For j>1

However, the explanatory variable “income” is imputed, so you want to represent it by something like:

income = expIncome + eta_inc

Where “expIncome” is the expected income, calculated using another model (e.g. a linear regression). However, from the perspective of the choice model, it is an exogenous non-random variable, so you add the random error component “eta_inc” to account for the estimation error. Note that eta_inc is an intra-respondent error term, meaning it varies from one observation to the next.

On top of this, you want to make b_age follow a random distribution, meaning b_age should be a random coefficient. So you make b_age = mu_age + sigma_age*eta_age, where eta_age is an inter-respondent random error term. This means that eta_inc varies from respondent to respondent, but it is the same for all observations of the same respondent.

Then, the code would look very similar to the one in example “MMNL_wtp_space_inter_intra”, but with the intra draws inside explanatory variable “income”, as follows:

Code: Select all

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Initialise
rm(list = ls())
library(apollo)
apollo_setWorkDir()
apollo_initialise()

### Set core controls
apollo_control = list(
  modelDescr      = "Mixed logit with random explanatory variable",
  indivID         = "ID",  
  nCores          = 1,
  analyticGrad    = TRUE # Needs to be manually set when using inter-intra
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

### Loading data from package
database = apollo_swissRouteChoiceData

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_2 = 0, asc_3 = 0, asc_4 = 0, asc_5 = 0, 
                b_inc = 0, s_inc = 0, 
                m_age = 0, s_age = 0)

### Name of parameters to be kept fixed at their starting values
apollo_fixed = c()

# ################################################################# #
#### DEFINE RANDOM COMPONENTS                                    ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
  interDrawsType = "halton",
  interNDraws    = 100,
  interUnifDraws = c(),
  interNormDraws = c("eta_age"),
  intraDrawsType = "mlhs",
  intraNDraws    = 100,
  intraUnifDraws = c(),
  intraNormDraws = c("eta_inc")
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
  randcoeff = list(
    income = expIncome + s_inc*eta_inc, 
    b_age  = m_age + s_age*eta_age
  )
  return(randcoeff)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities = function(apollo_beta, apollo_inputs, 
                                functionality="estimate"){
  
  ### Initialise
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  P = list()
  
  ### MNL setings
  V = list(
    alt1 =         b_inc*income_1, 
    alt2 = asc_2 + b_inc*income_2 + b_age*age, 
    alt3 = asc_3 + b_inc*income_3 + b_age*age, 
    alt4 = asc_4 + b_inc*income_4 + b_age*age, 
    alt5 = asc_5 + b_inc*income_5 + b_age*age)
  mnl_settings = list(
    alternatives  = c(alt1=1, alt2=2, alt3=3, alt4=4, alt5=5), 
    choiceVar     = choice,
    utilities     = V
  )
  
  ### Compute probabilities using MNL model
  P[["model"]] = apollo_mnl(mnl_settings, functionality)
  
  ### Average across intra-individual draws
  P = apollo_avgIntraDraws(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Average across inter-individual draws
  P = apollo_avgInterDraws(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION AND OUTPUT                                 ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs)

### Output to screen
apollo_modelOutput(model)

### Output to file
apollo_saveOutput(model)
In the code above, you may want to fix the value of s_inc if you already know it from the imputation model.
I hope this helps.

Best wishes,
David
Post Reply