Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at This contains detailed descriptions of the various Apollo functions, and numerous examples are available at In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

correlation between latent variables in hybrid model

Ask questions about the results reported after estimation. If the output includes errors, please include your model code if possible.
Post Reply
Posts: 1
Joined: 24 Oct 2022, 07:30

correlation between latent variables in hybrid model

Post by elyh1992 »

I am using Apollo to predict a hybrid model with two latent variables. However, the example in the manual only includes one latent variable, so I have a few questions. I made some adjustments to the code to accommodate two latent variables, assuming it considers a standard normal distribution for the error terms. I was wondering if this approach takes into account the correlation between the latent variables. If not, how can I incorporate a bivariate normal distribution for error terms to consider correlation between the latent variables?
Additionally, I would like to know if it's acceptable for some of the threshold variables to turn out insignificant. Furthermore, could you recommend a source or reference for writing the formulas in the methodology section for a hybrid choice model with more than one latent variable?

Thanks a lot for your assistance in advance.


### Clear memory
rm(list = ls())
start_time <- Sys.time()
#setwd('C:/Users/ehajhashemi/Desktop/EV public data')
### Load Apollo library
##x = apollo_drugChoiceData
##Y = apollo_modeChoiceData

### Initialise code

#predictions_base = apollo_prediction(model, apollo_probabilities, apollo_inputs, prediction_settings = list(runs=30))

### Set core controls
apollo_control = list(
modelName = "Hybrid",
modelDescr = "Hybrid choice model",
indivID = "ID",
mixing = TRUE,
nCores = 18,
outputDirectory = "output_solar_EV_new_data_test_17"


### Loading data from a file

database = read.csv("data_new.csv",header=TRUE)

# ################################################################# #
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_solar = 0,
asc_EV = 0,
asc_PV_and_EV = 0,

b_gender_EV = 0,
b_gender_PV_and_EV = 0,

b_house_type_solar = 0,
b_house_type_PV_and_EV = 0,

b_home_status_solar = 0,
b_home_status_PV_and_EV = 0,

b_HH_size_solar = 0,
b_HH_size_PV_and_EV = 0,

b_empoyment_retired_solar = 0,

b_more_working_from_home_EV = 0,

b_smart_meter_PV_and_EV = 0,

b_energy_managemnet_solar = 0,
b_energy_managemnet_PV_and_EV = 0,

b_have_pool_solar = 0,

b_have_pool_PV_and_EV = 0,

b_have_freezers_solar = 0,

b_location_1_EV = 0,

lambda_env_EV = 0,
lambda_env_PV_and_EV = 0,

lambda_tech_solar = 0,
lambda_tech_EV = 0,
lambda_tech_PV_and_EV = 0,

gamma_age_1_env = 0,
gamma_age_2_env = 0,
gamma_age_3_env = 0,

gamma_age_1_tech = 0,
gamma_age_2_tech = 0,

gamma_income_3_env = 0,

gamma_income_1_tech = 0,
gamma_income_2_tech = 0,
gamma_income_3_tech = 0,

gamma_gender_env = 0,
gamma_gender_tech = 0,

gamma_HH_size_env = 0,
gamma_HH_size_tech = 0,

zeta_env_1 = 1,
zeta_env_2 = 1,
zeta_env_3 = 1,
zeta_env_4 = 1,
zeta_env_5 = 1,

zeta_tech_1 = 1,
zeta_tech_2 = 1,
zeta_tech_3 = 1,

tau_env_1_1 = -2,
tau_env_1_2 = -1,
tau_env_1_3 = 1,
tau_env_1_4 = 2,

tau_env_2_1 =-2,
tau_env_2_2 =-1,
tau_env_2_3 = 1,
tau_env_2_4 = 2,

tau_env_3_1 =-2,
tau_env_3_2 =-1,
tau_env_3_3 = 1,
tau_env_3_4 = 2,

tau_env_4_1 =-2,
tau_env_4_2 =-1,
tau_env_4_3 = 1,
tau_env_4_4 = 2,

tau_env_5_1 =-2,
tau_env_5_2 =-1,
tau_env_5_3 = 1,
tau_env_5_4 = 2,

tau_tech_1_1 =-2,
tau_tech_1_2 =-1,
tau_tech_1_3 = 1,
tau_tech_1_4 = 2,

tau_tech_2_1 =-2,
tau_tech_2_2 =-1,
tau_tech_2_3 = 1,
tau_tech_2_4 = 2,

tau_tech_3_1 =-2,
tau_tech_3_2 =-1,
tau_tech_3_3 = 1,
tau_tech_3_4 = 2)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()

# ################################################################# #
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(


### Create random parameters
apollo_randCoeff=function(apollo_beta, apollo_inputs){
randcoeff = list()

randcoeff[['LV_env']] = (gamma_age_1_env * age_18_34_l + gamma_age_2_env * age_35_54_l + gamma_age_3_env * age_55_74_l +
gamma_income_3_env * X120.000.or.more_l +gamma_gender_env * gender_l +
gamma_HH_size_env * HH_size_continuous_l +

randcoeff[['LV_tech']] = ( gamma_age_1_tech * age_18_34_l_t + gamma_age_2_tech * age_35_54_l_t +
#gamma_age_3_tech * age_55_74 +
gamma_income_1_tech * + gamma_income_2_tech * +
gamma_income_3_tech * X120.000.or.more_l_t + gamma_gender_tech * gender_l_t +
gamma_HH_size_tech * HH_size_continuous_l_t + gamma_work_status_1_tech * full_time_l_t +



# ################################################################# #
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P
P = list()

## Likelihood of indicators
ol_settings1 = list(outcomeOrdered = env_1,
V = zeta_env_1*LV_env,
tau = list(tau_env_1_1, tau_env_1_2, tau_env_1_3, tau_env_1_4),
componentName = "indic_env_1")

ol_settings2 = list(outcomeOrdered = env_2,
V = zeta_env_2*LV_env,
tau = list(tau_env_2_1, tau_env_2_2, tau_env_2_3, tau_env_2_4),
componentName = "indic_env_2")

ol_settings3 = list(outcomeOrdered = env_3,
V = zeta_env_3*LV_env,
tau = list(tau_env_3_1, tau_env_3_2, tau_env_3_3, tau_env_3_4),
componentName = "indic_env_3")

ol_settings4 = list(outcomeOrdered = env_4,
V = zeta_env_4*LV_env,
tau = list(tau_env_4_1, tau_env_4_2, tau_env_4_3, tau_env_4_4),
componentName = "indic_env_4")

ol_settings5 = list(outcomeOrdered = env_5,
V = zeta_env_5*LV_env,
tau = list(tau_env_5_1, tau_env_5_2, tau_env_5_3, tau_env_5_4),
componentName = "indic_env_5")

ol_settings6 = list(outcomeOrdered = U12r1,
V = zeta_tech_1*LV_tech,
tau = list(tau_tech_1_1, tau_tech_1_2, tau_tech_1_3, tau_tech_1_4),
componentName = "indic_tech_1")

ol_settings7 = list(outcomeOrdered = U12r2,
V = zeta_tech_2*LV_tech,
tau = list(tau_tech_2_1, tau_tech_2_2, tau_tech_2_3, tau_tech_2_4),
componentName = "indic_tech_2")

ol_settings8 = list(outcomeOrdered = U12r3,
V = zeta_tech_3*LV_tech,
tau = list(tau_tech_3_1, tau_tech_3_2, tau_tech_3_3, tau_tech_3_4),
componentName = "indic_tech_3")

P[["indic_env_1"]] = apollo_ol(ol_settings1, functionality)
P[["indic_env_2"]] = apollo_ol(ol_settings2, functionality)
P[["indic_env_3"]] = apollo_ol(ol_settings3, functionality)
P[["indic_env_4"]] = apollo_ol(ol_settings4, functionality)
P[["indic_env_5"]] = apollo_ol(ol_settings5, functionality)
P[["indic_tech_1"]] = apollo_ol(ol_settings6, functionality)
P[["indic_tech_2"]] = apollo_ol(ol_settings7, functionality)
P[["indic_tech_3"]] = apollo_ol(ol_settings8, functionality)

### Likelihood of choices
### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
V = list()

V[["adopting_none"]] = 0
V[["only_adopt_solar"]] = (asc_solar +
#b_age_solar_1 * age_18_34 + b_age_solar_2 * age_35_54 + b_age_solar_3 * age_55_74 +
b_house_type_solar * house_type + b_home_status_solar * home_status +
b_HH_size_solar * HH_size_continuous +
b_empoyment_retired_solar * Retired +
b_energy_managemnet_solar * energy_management_system +
b_have_pool_solar * swimming_pool +
b_have_freezers_solar * freezers +
#b_have_dishwasher_solar * Dishwasher +
lambda_tech_solar * LV_tech

V[["only_adopt_EV"]] = (asc_EV + b_gender_EV * gender
b_income_EV_3 * X120.000.or.more +
b_more_working_from_home_EV * more_working_home +
b_location_1_EV * Inner.Metro +
lambda_env_EV * LV_env +lambda_tech_EV * LV_tech

V[["adopt_PV_and_EV"]] = (asc_PV_and_EV + b_gender_PV_and_EV * gender +
b_house_type_PV_and_EV * house_type + b_home_status_PV_and_EV * home_status +
b_HH_size_PV_and_EV * HH_size_continuous +
b_smart_meter_PV_and_EV * smart_meter + b_energy_managemnet_PV_and_EV * energy_management_system +
b_have_pool_PV_and_EV * swimming_pool +
lambda_env_PV_and_EV * LV_env + lambda_tech_PV_and_EV * LV_tech

### Define settings for MNL model component
mnl_settings = list(
alternatives = c(adopting_none=1, only_adopt_solar=2, only_adopt_EV=3, adopt_PV_and_EV=4),
avail = list(adopting_none=1, only_adopt_solar=1, only_adopt_EV=1, adopt_PV_and_EV=1),
choiceVar = EV_solar_decision_coded_new,
utilities = V,
componentName = "choice"
### Compute probabilities for MNL model component
P[["choice"]] = apollo_mnl(mnl_settings, functionality)

### Likelihood of the whole model
P = apollo_combineModels(P, apollo_inputs, functionality)

### Take product across observation for same individual
###P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
# ################################################################# #
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

# ################################################################# #
# ################################################################# #

# ----------------------------------------------------------------- #
# ----------------------------------------------------------------- #


# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO FILE, using model name) ----
# ----------------------------------------------------------------- #


# ################################################################# #
# ################################################################# #

### Print outputs of additional diagnostics to new output file (remember to close file writing when complete)

# ----------------------------------------------------------------- #
# ----------------------------------------------------------------- #
# conditionals <- apollo_conditionals(model,apollo_probabilities,apollo_inputs)
# summary(conditionals)
# unconditionals <-
# apollo_unconditionals(model,apollo_probabilities,apollo_inputs)
# mean(unconditionals[[1]]) sd(unconditionals[[1]])

# ----------------------------------------------------------------- #
#---- switch off writing to file ----
# ----------------------------------------------------------------- #☻

end_time <- Sys.time()
end_time - start_time
apollo_modelOutput(model, modelOutput_settings=list(printPVal = 2))
Site Admin
Posts: 1040
Joined: 24 Apr 2020, 16:29

Re: correlation between latent variables in hybrid model

Post by stephanehess »


your specification does not take into account correlation between the LVs as you have specified two sets of independent draws and each set is only used in one LV. You could incorporate correlation by specifying the model a bit like this:

randcoeff[['LV_env']] = (... + eta_env)

randcoeff[['LV_tech']] = ( ... + eta_tech + sigma * eta_env)

but you then need to consider whether you need an additional normalisation. I suggest you look at the references cited in the manual for ICLV normalisaiton, and also for how to write the equations

For thresholds, they simply replicate the distribution of answers in your data. T-tests against 0 have little meaning here, and if some thresholds are not different from each other, then that just represents what is happening in your data

Stephane Hess
Posts: 1
Joined: 24 Oct 2022, 07:30

Re: correlation between latent variables in hybrid model

Post by elyh1992 »

Hi Stephane,

Thank you for your response. I did what you suggested and Sigma is estimated as 0.28 and is significant. So I guess I should mention that my latent variables are positively correlated. However, I'm unsure about the necessity of normalization and how to proceed with it. Could you please provide further clarification and specify the reference you mentioned?

I also have one other question. I got this comment from a reviewer: " How do you account for the correlation between the three alternatives? Adopt PV and EV would share unobservables with both PV Only and EV Only. Perhaps the latent variable accounts for it since the reduced form has error terms shared between the different alternatives, but do you mention this? Do you think this form accurately handles the correlation?"

I shared the entire code of my model earlier. Does this code account for the correlation between alternatives? When I was searching the forum I found this answer from you about calculating the correlation between alternatives: "Keep using an MNL kernel for the choice part, but add error components to the utility of each alternative that are correlated across alternatives. This would be analogous to what is done in example 15, but instead of making random coefficients correlated across alternatives, it would be an additional error term (different for each alternative, as discussed by Daly & Hess)".
However, I'm uncertain about how to implement this in my code. Could you advise if the following is correct?
I added the following parts to my code:


V[["adopting_none"]] = 0
V[["only_adopt_solar"]] = (asc_solar + .... + lambda_tech_solar * LV_tech + sigma_1 * eta_PV )
V[["only_adopt_EV"]] = (asc_EV + ... + lambda_tech_EV * LV_tech + sigma_2_1 * eta_PV + sigma_2 * eta_EV )
V[["adopt_PV_and_EV"]] = (asc_PV_and_EV + ...+ lambda_tech_PV_and_EV * LV_tech + sigma_3_1 * eta_PV + sigma_3_2 * eta_EV + sigma_3 * eta_PV_EV)

When I checked the results, only sigma_3_1 is significant. Does this mean that only V[["only_adopt_solar"]] and V[["adopt_PV_and_EV"]] are correlated and their correlation is (sigma_3_1)? Should I remove the insignificant ones from my model?
Thanks a lot for your assistance in advance.
Last edited by elyh1992 on 08 Mar 2024, 07:14, edited 1 time in total.
Site Admin
Posts: 1040
Joined: 24 Apr 2020, 16:29

Re: correlation between latent variables in hybrid model

Post by stephanehess »


yes, you're finding correlation. If your model estimates fine and gives you a covariance matrix, then you likely do not face an identification issue.

in relation to correlation between the alternatives, I would recommend that you consider a GEV kernel such as Nested Logit instead of MNL inside your hybrid model

Stephane Hess
Post Reply