Latent variable estimation

bokapatsila · Post by **bokapatsila** » 06 Oct 2021, 00:24

Hi Stephane ad David,

Is it possible to use Appollo for the estimation of Latent Variables only?

I know how to code 4 latent variables that are based on 16 Likert-scale questions and some demographics, using the structural equation part of a hybrid choice model. However, I am not interested in the measurement part of the model, but only need estimates for those 4 latent variables. Can it be estimated using Apollo? If yes, what changes have to be made to an ICLV example (#24) available for Apollo? Thanks!

Post by **stephanehess** » 07 Oct 2021, 08:20

Hi

it's not quite clear what you mean here.

The estimation of a model requires a dependent variable. In a hybrid choice model, you have choices and Likert-scale questions as the dependent variables. Which of these would you keep?

Stephane

bokapatsila · Post by **bokapatsila** » 07 Oct 2021, 14:42

Hi Stephen,

I'm sorry I wasn't clear enough. I don't want to estimate a hybrid choice model at this point. I want to estimate 4 latent variables only and use these estimates to experiment with their influence on different dependent variables to see which of them will work out best before running it simultaneously as a hybrid choice model. Is this possible in apollo? Thanks!

Post by **stephanehess** » 07 Oct 2021, 17:09

Hi

I understand what you are after here are the parameters for the structural equation of the latent variable. However, my point is that for estimation, you need a dependent variable. So you can't drop both the choices and the indicators. Maybe what you want to do is to keep the indicators only as the dependent variable, like in SEM.

Stephane

bokapatsila · Post by **bokapatsila** » 07 Oct 2021, 22:57

That's actually what I'm after, thanks for this suggestion. In that case, can I simultaneously estimate 4 ordered logit models with normally distributed error terms in Apollo? Or the only way to do it in Apollo would be to estimate each of them separately? Thank you for the suggestions!

Post by **stephanehess** » 07 Oct 2021, 23:00

An ordered logit model does not have a normally distributed error term. But I guess what you mean is to include the latent variable in the utility function of the ordered logit models. But you'll need to use the LV in at least two OLs so there can't be 4 OLs in your case

bokapatsila · Post by **bokapatsila** » 15 Oct 2021, 16:42

stephanehess wrote: ↑07 Oct 2021, 17:09 Hi

I understand what you are after here are the parameters for the structural equation of the latent variable. However, my point is that for estimation, you need a dependent variable. So you can't drop both the choices and the indicators. Maybe what you want to do is to keep the indicators only as the dependent variable, like in SEM.

Stephane

Thank you Stephane. In this case when I define the latent variable 1 (LV1) as:

randcoeff[["LV1"]] = gamma_LV1_female * FEMALE + gamma_LV1_wave2 * WAVE2 + gamma_LV1_ampeak * TIME_BC_AM +
gamma_LV1_fulltime * EMPLOY_FULL + gamma_LV1_incomelow * INCOME_LOW + gamma_LV1_age65O * AGE_65O +
gamma_LV1_edbach * EDUCATION_BACH + gamma_LV1_kids * HOUSEHOLD_CHILD_CLEAN_N + gamma_LV1_car * CAR_B +
gamma_LV1_ptno * PT_C_NO +
eta1

Which is then used in indicators:

ol_settings9 = list(outcomeOrdered=STATE_APP_R,
V=zeta_app*LV1,
tau=list(tau_app_1, tau_app_2, tau_app_3, tau_app_4))
ol_settings10 = list(outcomeOrdered=STATE_MPAY_R,
V=zeta_mpay*LV1,
tau=list(tau_mpay_1, tau_mpay_2, tau_mpay_3, tau_mpay_4))

P[["indic_app"]] = apollo_ol(ol_settings9, functionality)
P[["indic_mpay"]] = apollo_ol(ol_settings10, functionality)

Can I obtain an estimate for LV1 for each individual in my dataset? If yes, how would I do that?

Post by **stephanehess** » 18 Oct 2021, 13:59

Hi

there is no such thing as an estimate for each person unless you estimate person-specific models, for which you would need very large amounts of data per person. I assume what you referring to is the Bayesian idea of posteriors from the sample level distribution. For this, you can use apollo_conditionals. There is a discussion in the manual

Best wishes

Stephane

bokapatsila · Post by **bokapatsila** » 22 Oct 2021, 06:16

Thank you, Stephane. Your comments and answers have finally guided me to what I'm after.

To zoom out, I'm trying to classify the respondents in my dataset into distinct behavioural groups based on the estimates for their latent variables. To do that, I want to estimate the class allocation probabilities based on the values of latent variables, in other words (or as I think of it simple terms) estimate only indicators and class allocation probabilities of a Latent Variable Latent Class model. I then want to use apollo_lcUnconditionals to pull out allocation probabilities for each individual and create a dichotomous variable that will be 1 for those individuals who fall into that class (have a probability above a certain threshold) and 0 for those that don't (probability is below a certain threshold). I then want to use that variable as a predictor in a series of choice models. If you're curious, the reason I want this estimation to be sequential is that I want the class allocation to be the same for all different dependent variables that I will use in different choice models.

While browsing this forum for a relevant example, I stumbled upon this answer - http://www.apollochoicemodelling.com/fo ... +class#p74 and used ut as guidance for my case. In the code below I tried to adapt that example to my needs, with the intent of using one latent variable to predict class allocation. Obviously, it didn't work, since inClassProb=P has a different length from classProb=pi_values, and without the choice component I don't know how and when to state that I'm interested in 2 classes only. My hunch is that for my purposes I don't even need inClassProb, but without it lc_settings doesn't work (which you probably know well).

Code: Select all

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load libraries
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  ="Translink_Time_Covid_OL_1LV_Est",
  modelDescr ="ICLV for Translink with 1 LV Estimation",
  indivID    ="UNID",
  panelData = FALSE,
  mixing     = TRUE,
  nCores     = 3)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

setwd("D:/Research/2020_TransLink_Overcrowding/Data/Time_Covid")

database = read.csv("translink_time_covid.csv",header=TRUE)

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta=c(zeta_acon = 1, 
              zeta_scon = 1,
              zeta_both = 1,
              zeta_seat = 1,
              zeta_offpeak = 1, 
              zeta_alt = 1, 
              tau_acon_1 =-2, 
              tau_acon_2 =-1, 
              tau_acon_3 = 1, 
              tau_acon_4 = 2,
              tau_scon_1 =-2, 
              tau_scon_2 =-1, 
              tau_scon_3 = 1, 
              tau_scon_4 = 2,
              tau_both_1 =-2, 
              tau_both_2 =-1, 
              tau_both_3 = 1, 
              tau_both_4 = 2,
              tau_seat_1 =-2, 
              tau_seat_2 =-1, 
              tau_seat_3 = 1, 
              tau_seat_4 = 2,
              tau_offpeak_1 =-2, 
              tau_offpeak_2 =-1,
              tau_offpeak_3 = 1, 
              tau_offpeak_4 = 2,
              tau_alt_1 =-2, 
              tau_alt_2 =-1,
              tau_alt_3 = 1, 
              tau_alt_4 = 2,
              gamma_LV1_female  = 0, 
              gamma_LV1_wave2  = 0,
              gamma_LV1_ampeak  = 0, 
              gamma_LV1_fulltime  = 0,
              gamma_LV1_incomelow  = 0,
              gamma_LV1_age65O  = 0,
              gamma_LV1_edbach = 0,
              gamma_LV1_ptno = 0,
              gamma_LV1_kids = 0,
              gamma_LV1_car = 0,
              
              piCons  = 1, 
              piLV1   = 1)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()

# ################################################################# #
#### DEFINE RANDOM COMPONENTS                                    ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
  interDrawsType="halton", 
  interNDraws=100,          
  interUnifDraws=c(),      
  interNormDraws=c("eta1"), 
  
  intraDrawsType='',
  intraNDraws=0,          
  intraUnifDraws=c(),     
  intraNormDraws=c()      
)

### Create random parameters
apollo_randCoeff=function(apollo_beta, apollo_inputs){
  randcoeff = list()
  
  randcoeff[["LV1"]] = gamma_LV1_female * FEMALE + gamma_LV1_wave2 * WAVE2 + gamma_LV1_ampeak * TIME_BC_AM +
    gamma_LV1_fulltime * EMPLOY_FULL + gamma_LV1_incomelow * INCOME_LOW + gamma_LV1_age65O * AGE_65O + 
    gamma_LV1_edbach * EDUCATION_BACH + gamma_LV1_kids * HOUSEHOLD_CHILD_CLEAN_N + gamma_LV1_car * CAR_B +
    gamma_LV1_ptno * PT_C_NO + 
    eta1

  return(randcoeff)
}


# ################################################################# #
#### DEFINE LATENT CLASS COMPONENTS                              ####
# ################################################################# #

apollo_lcPars=function(apollo_beta, apollo_inputs){
  lcpars = list()

  ### Class allocation probabilities
  ### These are the probabilities of a binary logit model
  ### apollo_mnl could be used too (with functionality="raw" 
  ### and choice=NA), but explicitly writing the probability 
  ### is easier.
  VA  = piCons + piLV1*LV1
  VB  = 0
  piA = exp(VA)/(exp(VA) + exp(VB))
  piB = 1 - piA
  lcpars[["pi_values"]] = apollo_firstRow(list(piA, piB), apollo_inputs)
  
  return(lcpars)
}


# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))

  ### Create list of probabilities P
  P = list()

  ### Likelihood of indicators
  ol_settings1 = list(outcomeOrdered=AGREE_BOTH_CROWD_BC_R, 
                      V=zeta_both*LV1, 
                      tau=list(tau_both_1, tau_both_2, tau_both_3, tau_both_4))
  ol_settings2 = list(outcomeOrdered=AGREE_CONCERN_BC_R, 
                      V=zeta_acon*LV1, 
                      tau=list(tau_acon_1, tau_acon_2, tau_acon_3, tau_acon_4))
  ol_settings3 = list(outcomeOrdered=AGREE_SEAT_BC_R, 
                      V=zeta_seat*LV1, 
                      tau=list(tau_seat_1, tau_seat_2, tau_seat_3, tau_seat_4))
  ol_settings4 = list(outcomeOrdered=STATE_CONCERNED_R, 
                      V=zeta_scon*LV1, 
                      tau=list(tau_scon_1, tau_scon_2, tau_scon_3, tau_scon_4))
  ol_settings5 = list(outcomeOrdered=AGREE_OFFPEAK_BC_R, 
                      V=zeta_offpeak*LV1, 
                      tau=list(tau_offpeak_1, tau_offpeak_2, tau_offpeak_3, tau_offpeak_4))
  ol_settings6 = list(outcomeOrdered=AGREE_ALT_BC_R, 
                      V=zeta_alt*LV1, 
                      tau=list(tau_alt_1, tau_alt_2, tau_alt_3, tau_alt_4))

  P[["indic_both"]]     = apollo_ol(ol_settings1, functionality)
  P[["indic_acon"]]     = apollo_ol(ol_settings2, functionality)
  P[["indic_seat"]]      = apollo_ol(ol_settings3, functionality)
  P[["indic_scon"]]      = apollo_ol(ol_settings4, functionality)
  P[["indic_offpeak"]]     = apollo_ol(ol_settings5, functionality)
  P[["indic_alt"]]      = apollo_ol(ol_settings6, functionality)
  
  ### Compute latent class model probabilities
  lc_settings   = list(inClassProb=P, classProb=pi_values)
  P[["model"]] = apollo_lc(lc_settings, apollo_inputs, functionality)

  ### Likelihood of the whole model
  P = apollo_combineModels(P, apollo_inputs, functionality)

  ### Take product across observation for same individual
  #P = apollo_panelProd(P, apollo_inputs, functionality)

  ### Average across inter-individual draws
  P = apollo_avgInterDraws(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### ESTIMATE SETTINGS                                           ####
# ################################################################# #

estimate_settings = list(maxIterations  = 250)

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs, estimate_settings)

# ################################################################# #
#### MODEL OUTPUTS                                               ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN)                               ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model)

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO FILE, using model name)               ----
# ----------------------------------------------------------------- #

apollo_saveOutput(model)

Can you please advise me on how to modify the code above to achieve what I'm after? Also, is it possible to streamline the process, and
assign each of the respondents to a respective latent class using allocation thresholds right away?

Post by **stephanehess** » 25 Oct 2021, 09:01

Hi

while you can of course do this, to me, it is a wrong thing to try and do. The latent variable is not deterministic, so using it to deterministically classify people means that you are ignoring the random part of the LV.

In terms of the other part of your question, if you have a latent class model, you need something that varies across classes, which you don't seem to do here, as you only have a single value for each parameter.

Stephane

ApolloChoiceModelling forum

Latent variable estimation

Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation

Re: Latent variable estimation