ApolloChoiceModelling forum

Posted: **03 May 2020, 22:04**

Dear Sir or Madam,

Hello! I hope this message finds you well. I'm a current APOLLO user who is interested in running the ICLV model for my commuting mode choice data. I was successfully able to specify and run the ICLV model for my data, but I have some questions regarding the results:

1. The attached IMAGE file (see the attached IMAGE file) shows the results of my latent variable model part of the ICLV. My question is: is there anyway to "standardized" these estimates in a similar way that I can do in LAVAAN R-package? (e.g., for "pro-car attitudes", I hope to make factor loading of "item 1" as 1.0, and other factor loadings accordingly.) In this regard, I'm also wondering why I get different results obtained from between the ICLV model and the traditional CFA (e.g., factor loadings).

2. I'm also wondering, in terms of the ICLV model, whether the APOLLO package provides any methods to calculate other model fit indices, such as McFadeen R-squared value or so.

Thanks in advance for your help.

Best,
Junghwan

*** The code attached below includes randCoeff part and Apollo_probabilities function.
========================

### Create random parameters
apollo_randCoeff=function(apollo_beta, apollo_inputs){
randcoeff = list()

randcoeff[["PROCAR"]] = CAR_gamma_female*female + CAR_gamma_inc*IncLOG + CAR_gamma_child*young_chil + CAR_gamma_age*age + CAR_gamma_nonwhite*nonwhite+ CAR_gamma_vehphh*vehphh+ CAR_gamma_parking*PARKING + PROCAR_eta
randcoeff[["ENVCON"]] = ENV_gamma_female*female + ENV_gamma_inc*IncLOG + ENV_gamma_child*young_chil + ENV_gamma_age*age + ENV_gamma_nonwhite*nonwhite+ ENV_gamma_vehphh*vehphh+ ENV_gamma_parking*PARKING + ENVCON_eta
randcoeff[["PROBUS"]] = BUS_gamma_female*female + BUS_gamma_inc*IncLOG + BUS_gamma_child*young_chil + BUS_gamma_age*age + BUS_gamma_nonwhite*nonwhite+ BUS_gamma_vehphh*vehphh+ BUS_gamma_parking*PARKING + PROBUS_eta

return(randcoeff)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #
apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #
apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P
P = list()

ol_settings1 = list(outcomeOrdered=ITEM6,
V=CAR_zeta_i6*PROCAR,
tau=c(CAR_tau_i6_1, CAR_tau_i6_2, CAR_tau_i6_3, CAR_tau_i6_4),
rows=(task==1))
ol_settings2 = list(outcomeOrdered=ITEM17,
V=CAR_zeta_i17*PROCAR,
tau=c(CAR_tau_i17_1, CAR_tau_i17_2, CAR_tau_i17_3, CAR_tau_i17_4),
rows=(task==1))
ol_settings3 = list(outcomeOrdered=ITEM21,
V=CAR_zeta_i21*PROCAR,
tau=c(CAR_tau_i21_1, CAR_tau_i21_2, CAR_tau_i21_3, CAR_tau_i21_4),
rows=(task==1))
ol_settings4 = list(outcomeOrdered=ITEM25,
V=CAR_zeta_i25*PROCAR,
tau=c(CAR_tau_i25_1, CAR_tau_i25_2, CAR_tau_i25_3, CAR_tau_i25_4),
rows=(task==1))
ol_settings5 = list(outcomeOrdered=ITEM2,
V=ENV_zeta_i2*ENVCON,
tau=c(ENV_tau_i2_1, ENV_tau_i2_2, ENV_tau_i2_3, ENV_tau_i2_4),
rows=(task==1))
ol_settings6 = list(outcomeOrdered=ITEM9,
V=ENV_zeta_i9*ENVCON,
tau=c(ENV_tau_i9_1, ENV_tau_i9_2, ENV_tau_i9_3, ENV_tau_i9_4),
rows=(task==1))
ol_settings7 = list(outcomeOrdered=ITEM20,
V=ENV_zeta_i20*ENVCON,
tau=c(ENV_tau_i20_1, ENV_tau_i20_2, ENV_tau_i20_3, ENV_tau_i20_4),
rows=(task==1))
ol_settings8 = list(outcomeOrdered=ITEM8,
V=BUS_zeta_i8*PROBUS,
tau=c(BUS_tau_i8_1, BUS_tau_i8_2, BUS_tau_i8_3, BUS_tau_i8_4),
rows=(task==1))
ol_settings9 = list(outcomeOrdered=ITEM13,
V=BUS_zeta_i13*PROBUS,
tau=c(BUS_tau_i13_1, BUS_tau_i13_2, BUS_tau_i13_3, BUS_tau_i13_4),
rows=(task==1))
ol_settings10 = list(outcomeOrdered=ITEM18,
V=BUS_zeta_i18*PROBUS,
tau=c(BUS_tau_i18_1, BUS_tau_i18_2, BUS_tau_i18_3, BUS_tau_i18_4),
rows=(task==1))
ol_settings11 = list(outcomeOrdered=ITEM23, # 13 18 23
V=BUS_zeta_i23*PROBUS,
tau=c(BUS_tau_i23_1, BUS_tau_i23_2, BUS_tau_i23_3, BUS_tau_i23_4),
rows=(task==1))

P[["indic_PROCAR1"]] = apollo_ol(ol_settings1, functionality)
P[["indic_PROCAR2"]] = apollo_ol(ol_settings2, functionality)
P[["indic_PROCAR3"]] = apollo_ol(ol_settings3, functionality)
P[["indic_PROCAR4"]] = apollo_ol(ol_settings4, functionality)

P[["indic_ENVCON1"]] = apollo_ol(ol_settings5, functionality)
P[["indic_ENVCON2"]] = apollo_ol(ol_settings6, functionality)
P[["indic_ENVCON3"]] = apollo_ol(ol_settings7, functionality)

P[["indic_PROBUS1"]] = apollo_ol(ol_settings8, functionality)
P[["indic_PROBUS2"]] = apollo_ol(ol_settings9, functionality)
P[["indic_PROBUS3"]] = apollo_ol(ol_settings10, functionality)
P[["indic_PROBUS4"]] = apollo_ol(ol_settings11, functionality)

### Likelihood of choices
### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
V = list()
V[['car']] = asc_car + b_tt * AUTO_TIME
V[['bus']] = asc_bus + b_tt * BUS_TIME + b_age_bus*age + b_female_bus*female + b_inc_bus*IncLOG + b_child_bus*young_chil + b_nonwhite_bus*nonwhite + b_vehicle_bus*vehphh + b_parking_bus*PARKING + CAR_lambda_bus * PROCAR + BUS_lambda_bus * PROBUS + ENV_lambda_bus * ENVCON
V[['bike']] = asc_bike + b_tt * BIKE_TIME + b_age_bike*age + b_female_bike*female + b_inc_bike*IncLOG + b_child_bike*young_chil + b_nonwhite_bike*nonwhite + b_vehicle_bike*vehphh + b_parking_bike*PARKING + CAR_lambda_bike * PROCAR + BUS_lambda_bike * PROBUS + ENV_lambda_bike * ENVCON
V[['walk']] = asc_walk + b_tt * WALK_TIME + b_age_walk*age + b_female_walk*female + b_inc_walk*IncLOG + b_child_walk*young_chil + b_nonwhite_walk*nonwhite + b_vehicle_walk*vehphh + b_parking_walk*PARKING + CAR_lambda_walk * PROCAR + BUS_lambda_walk * PROBUS + ENV_lambda_walk * ENVCON

### Define settings for MNL model component
mnl_settings = list(
alternatives = c(car=1, bus=2, bike=3, walk=4),
avail = 1,
choiceVar = O_Code,
V = V
)

### Compute probabilities for MNL model component
P[["choice"]] = apollo_mnl(mnl_settings, functionality)

### Likelihood of the whole model
P = apollo_combineModels(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

Posted: **03 May 2020, 23:38**

Hi

On point 1, what you could do is to use the alternative normalisation where you fix one zeta parameter to 1 per latent variable, but then estimate the standard deviation of the latent variable. So e.g., for the first LV, you could fix CAR_zeta_i6 to 1, but then estimate a standard deviation for PROCAR_eta rather than fixing that to 1.

In relation to differences with CFA, what you need to realise is that here, you're jointly estimating a model for the choice data and the indicators, so the latent variable is used for both, and there is no expectation that the impacts of the LV will be the same as if you were only using it to explain the indicators.

On point 2, Apollo calculates rho2 only for models that are pure discrete choice. It would not be meaningful to calculate rho2 for the overall model. However, Apollo does give you the log-likelihood for the choice part alone, and you could use that to calculate a rho2 for the choice part alone. But be careful with any comparisons with a model fitted to the choice data alone, as that model should always explain the choices at least as well as a model that also explains the values of indicators.

Hope this helps

Stephane

Posted: **04 Nov 2020, 09:49**

Hello everyone,

sorry to bring this old forum thread back to life, but I didn't want to open a new thread for my question:

I have a data set where the ICLV model works fine if I only use socio-demographic variables for the structural equations.On the other hand I read the following in the "Handbook of Choice Modeling":

"Second, one issue with the structural equations of the latent variables is that these equations usually have low explanatory power in most empirical applications as usually indicated by insignificant variables and low pseudo R 2 values. This is because latent variables like attitudes and perceptions are usually expressed as a function of socio- demographic variables. However, it is doubtful whether latent variables such as attitudes are actually a function of socio- demographic variables (see for example, Anable, 2005). They are more likely to be shaped by people’s life experiences, lifestyles, and so on."

Source: Abou-Zeid, Maya; Ben-Akiva, Moshe (2014): Hybrid choice models. In: Stephane Hess und Andrew Daly (Hg.): Handbook of Choice Modelling: Edward Elgar Publishing, S. 383–412.

Now I wanted to use dummy variables for the latent variables that reflect real life behaviour, such as "buy environmentally friendly products (0/1)" or "I am a customer of supplier A (0/1)". However, I now find that the parameter estimators for these variables become very large and the influence of the latent variables on utility is not significant.

Let me give you an example:

Code: Select all

gamma_Gender_RegionalProducts                 41.992723
gamma_Age_RegionalProducts                    15.587692
gamma_Education_RegionalProducts              35.550692
gamma_Residence_RegionalProducts              -8.036421
gamma_RenewablesInvestment_RegionalProducts   94.057277
gamma_CurrentSupplierKVU_RegionalProducts    -38.036003
gamma_CurrentSupplierBEG_RegionalProducts     27.321266
gamma_CurrentMix_RegionalProducts            161.461377

The aim is to determine if environmentally friendly behaviour (dummys 0/1) has an effect on the latent variable "RegionalProducts". However, the influence of the latent variable on the utility of the attribute "Regio" is not significantly different from 0:

Code: Select all

lambda_RegionalProducts_Regio2                 0.001247
lambda_RegionalProducts_Regio3               3.1524e-04

If, on the other hand, I use only the socio-demographic variables, their influence on the latent variable is nearly zero, but the latent variable has a significant influence on the attribute "Regio" (lambda != 0).

My conclusion now would be that the model specification is simply not correct, i.e. the latent variable cannot be built using these variables for environmentally friendly behaviour. There may be other variables that explain the latent variable, but these are unknown (at least to me).

Is this conclusion correct?

I look forward to your answer! =)

Posted: **09 Nov 2020, 18:27**

Nico

these large parameters could suggest a near deterministic process, i.e. people who have these values make very specific choices and the model can use the variable alone to explain those choices. But these variables in my view should not be used as explanatory variables, but as dependent variables. We shouldn't explain the choices in the data as a function of whether someone also buys environmentally friendly goods, but have the same latent variable explain these outcomes as well as the choices

Stephane

Posted: **10 Nov 2020, 07:06**

That makes sense. Thank you very much for your answer!

EDIT:

I got another question related to metrokim217's question about scaling: Is there a reason to scale the variance of the LV rather than setting the scale of the LV equal to the scale of an indicator (or the indicator which is assumed to be the most reliable)?

Source: Abou-Zeid, Maya; Ben-Akiva, Moshe (2014): Hybrid choice models. In: Stephane Hess und Andrew Daly (Hg.): Handbook of Choice Modelling: Edward Elgar Publishing, S. 383–412.

Nico

Posted: **20 Nov 2020, 18:32**

Hi

I tend to fix the variance of the LV rather than needing to make a judgement on which indicator is most reliable. Further details in this paper of ours: https://link.springer.com/article/10.10 ... 011-9351-z

Stephane

Posted: **23 Nov 2020, 11:45**

Hi Stephane,

as always: Thank you very much for your great support. I wonder if I will ever ask a question that has not already been answered in one of your papers.

However, as you can imagine, I do have two more questions.

Question 1:
Earlier you wrote that you would integrate past behavior into an HCM as a dependent variable. Do I do this with a simple MNL model if it is a dummy variable or with a linear regression for a continuous variable?
Example for “CurrentMix”, indicating whether a respondent has purchased green electricity in the past (0=no, 1=yes):

Code: Select all

# ----------------------------------------------------------------- #
#---- Choice Model (MNL CurrentMix)
# ----------------------------------------------------------------- #
  V[['alt0']] =  MNL_CurrentMix_asc_0
  V[['alt1']] =  MNL_CurrentMix_asc_1 + MNL_CurrentMix_b_LV_GreenProducts * LV_GreenProducts
  
  mnl_settings = list(
    alternatives  = c(alt0=0, alt1=1),
    avail         = list(alt0=1, alt1=1),
    choiceVar     = COV_CurrentMix,
    V             = V,
    rows          = (Task==1),
    componentName = "MNL CurrentMix"
  )
  
  # Compute probabilities using MNL model
  P[['MNL CurrentMix']] = apollo_mnl(mnl_settings, functionality)

Finally, the MNL model is integrated via apollo_combineModels() like the other model parts (e.g. MIXL, structural equation model, measurement model).

Question 2:
In the Choice Modeling Workshop and in the literature it was/is recommended to directly integrate sociodemographic characteristics of respondents into both the LV and the utilities. Now I would like to know how to handle characteristics like income, which are modelled as multiplicative effect on the price coefficient, since sociodemographics are usually modelled as additive influences.

Examples:

Code: Select all

b_Comfort_value = b_Comfort + b_Age_Comfort * Age

So …

Code: Select all

LV_Comfort = gamma_Age_Comfort * Age

But …

Code: Select all

b_Price_value = b_Price * (PriceMonthly / PriceMonthly_Mean) ^ elast_Price_PriceMonthly

My idea is …

Code: Select all

LV_CheapProducts = 1 * (PriceMonthly / PriceMonthly_Mean) ^ elast_CheapProducts_PriceMonthly

Is that correct?

Nico

Posted: **25 Nov 2020, 14:58**

Hi Nico

on point 1, yes, that is fine.

on point 2, when you include such continuous effects, you are also changing the mean of the LV of course. Centering the attribute on zero can help with that, but that would affect your ability to use non-linear transforms

Stephane

Posted: **25 Nov 2020, 15:43**

Hi Stephane,

thanks again!

On point 2: How would you handle the problem? Is an appropriate way not to use elasticity but a linear specification for the covariates of the price coefficient, although marginal utility of money is not constant?

Code: Select all

b_Price_value = b_Price + b_Income_Price * ( (Income/Income_Mean)-1)

or

Code: Select all

b_Price_value = b_Price + b_Income_Price * Income_centered

The same specification for the LV, i.e.

Code: Select all

LV_PriceSensitivity = gamma_Income_PriceSensitivity * ( (Income/Income_Mean)-1)

or

Code: Select all

LV_PriceSensitivity = gamma_Income_PriceSensitivity * Income_centered

Thanks in advance!

Nico

Posted: **26 Nov 2020, 19:58**

Nico

you do not technically need to do any of this, i.e. mean centre them. You need to just be aware that it will affect for example the constants in your model. But that's fine as long as you recognise this when analysing your results.

Stephane

ApolloChoiceModelling forum

ICLV model questions

ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions

Re: ICLV model questions