I want to estimate a latent class model for a choice situation with 4 alternatives (alt4 being the output with u=0) and three classes.
I have adopted this example: https://www.apollochoicemodelling.com/f ... variates.r
My dataset is relatively large (5700 observations from 1400 individuals).
The intuitive rationale:
There might be a group of "power users" which are less price sensitive, a group of customers choosing a product when the price is right and a rather uninterested group.
A latent class model with 2 classes produces results that align with the rationale: Class 1 is drawn to choose one of the products (alt1-3), whereas class 2 has negative ASCs. The socio-demographics for class-allocation also behave in a way I would expect.
However, when I introduce a third class, one ASC becomes highly negative (-40 or higher, depending on the starting value). It seems, the model allocates the classes in way that the choice probability becomes close to 0 for this class and alternative - which is generally in line with the expected behavior. I have tried out different starting values and while I can get the model to converge, it always produces a Singular Hessian so that it cannot calculate s.e.
Do you have any advise on how to deal with situation?
On the one hand, this close to 0 probability of selection is a result I am interested in. On the other hand, it should not cause the singular Hessian that prevents calculating standard errors.
Model with 2 classes:
Code: Select all
Model name : LC_with_covariates
Model description : LC model with covariates
Model run at : 2025-01-05 17:19:37.849555
Estimation method : bgw
Model diagnosis : Relative function convergence
Optimisation diagnosis : Maximum found
hessian properties : Negative definite
maximum eigenvalue : -0.93269
reciprocal of condition number : 1.28311e-07
Number of individuals : 1365
Number of rows in database : 5708
Number of modelled outcomes : 5708
Number of cores used : 16
Model without mixing
LL(start) : -4942.68
LL (whole model) at equal shares, LL(0) : -6270.88
LL (whole model) at observed shares, LL(C) : -6457.21
LL(final, whole model) : -3762.22
Rho-squared vs equal shares : 0.4
Adj.Rho-squared vs equal shares : 0.3967
Rho-squared vs observed shares : 0.4174
Adj.Rho-squared vs observed shares : 0.415
AIC : 7566.44
BIC : 7706.08
LL(0,Class_1) : -6270.88
LL(final,Class_1) : -7055.57
LL(0,Class_2) : -6270.88
LL(final,Class_2) : -12290.69
Estimated parameters : 21
Time taken (hh:mm:ss) : 00:00:15.96
pre-estimation : 00:00:8.04
estimation : 00:00:1.05
post-estimation : 00:00:6.88
Iterations : 14
Unconstrained optimisation.
Estimates:
Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0)
asc_alt1_a 2.241996 0.129835 17.2680 0.159014 14.0994
asc_alt1_b -1.984309 0.774941 -2.5606 1.025579 -1.9348
asc_alt2_a 1.304957 0.073943 17.6481 0.098459 13.2538
asc_alt2_b -2.709017 0.266642 -10.1597 0.355732 -7.6153
asc_alt3_a 1.130627 0.121517 9.3043 0.143100 7.9009
asc_alt3_b -2.675159 0.597399 -4.4780 0.656607 -4.0742
asc_optout 0.000000 NA NA NA NA
beta_cost_a -0.061275 0.003352 -18.2809 0.003975 -15.4153
beta_cost_b -0.189954 0.044894 -4.2311 0.064143 -2.9614
beta_min_a 0.002651 7.6848e-04 3.4501 7.7064e-04 3.4404
beta_min_b 0.004484 0.003243 1.3825 0.003394 1.3211
delta_a -2.710631 0.331778 -8.1700 0.294274 -9.2112
gamma_gender_a 0.262983 0.184719 1.4237 0.188837 1.3926
gamma_student_a -1.389877 0.252792 -5.4981 0.279823 -4.9670
gamma_rural_a -0.175207 0.196980 -0.8895 0.194762 -0.8996
gamma_ptsub_a 0.236563 0.230957 1.0243 0.260822 0.9070
gamma_18_29_a 1.452012 0.372704 3.8959 0.374219 3.8801
gamma_30_44_a 0.441497 0.310437 1.4222 0.294830 1.4975
gamma_45_65_a 0.535397 0.298272 1.7950 0.266544 2.0087
gamma_65plus_a 0.000000 NA NA NA NA
delta_b 0.000000 NA NA NA NA
gamma_gender_b 0.000000 NA NA NA NA
gamma_student_b 0.000000 NA NA NA NA
gamma_rural_b 0.000000 NA NA NA NA
gamma_ptsub_b 0.000000 NA NA NA NA
gamma_18_29_b 0.000000 NA NA NA NA
gamma_30_44_b 0.000000 NA NA NA NA
gamma_45_65_b 0.000000 NA NA NA NA
gamma_65plus_b 0.000000 NA NA NA NA
gamma_use_freq_1_a 6.851905 1.031303 6.6439 1.088037 6.2975
gamma_use_freq_2_a 3.800776 0.281886 13.4834 0.308779 12.3090
gamma_use_freq_3_a 3.247105 0.228326 14.2214 0.244988 13.2541
gamma_use_freq_1_b 0.000000 NA NA NA NA
gamma_use_freq_2_b 0.000000 NA NA NA NA
gamma_use_freq_3_b 0.000000 NA NA NA NA
Summary of class allocation for model component :
Mean prob.
Class_1 0.5206
Class_2 0.4794
Code: Select all
Model name : LC_with_covariates
Model description : LC model with covariates
Model run at : 2025-01-05 17:12:39.280796
Estimation method : bgw
Model diagnosis : Relative function convergence
Optimisation diagnosis : Maximum found
hessian properties : Negative definite
maximum eigenvalue : 0
reciprocal of condition number : 4.63004e-17
Number of individuals : 1365
Number of rows in database : 5708
Number of modelled outcomes : 5708
Number of cores used : 16
Model without mixing
LL(start) : -5841.64
LL (whole model) at equal shares, LL(0) : -6270.88
LL (whole model) at observed shares, LL(C) : -6457.21
LL(final, whole model) : -3596.96
Rho-squared vs equal shares : 0.4264
Adj.Rho-squared vs equal shares : 0.4205
Rho-squared vs observed shares : 0.443
Adj.Rho-squared vs observed shares : 0.4386
AIC : 7267.91
BIC : 7513.95
LL(0,Class_1) : -6270.88
LL(final,Class_1) : -11201.42
LL(0,Class_2) : -6270.88
LL(final,Class_2) : -5728.87
LL(0,Class_3) : -6270.88
LL(final,Class_3) : -37709.73
Estimated parameters : 37
Time taken (hh:mm:ss) : 00:00:57.78
pre-estimation : 00:00:19.87
estimation : 00:00:2.68
post-estimation : 00:00:35.23
Iterations : 28
Unconstrained optimisation.
Estimates:
Estimate s.e. t.rat.(0) Rob.s.e. Rob.t.rat.(0)
asc_alt1_a 3.551748 NA NA NA NA
asc_alt1_b 1.378953 NA NA NA NA
asc_alt2_a 2.676837 NA NA NA NA
asc_alt2_b 0.523842 NA NA NA NA
asc_alt3_a 2.531044 NA NA NA NA
asc_alt3_b 0.329199 NA NA NA NA
asc_alt1_c -0.051339 NA NA NA NA
asc_alt2_c -3.683159 NA NA NA NA
asc_alt3_c -44.232978 NA NA NA NA
asc_optout 0.000000 NA NA NA NA
beta_cost_a -0.044982 NA NA NA NA
beta_cost_b -0.096358 NA NA NA NA
beta_cost_c -0.535203 NA NA NA NA
beta_min_a 0.001505 NA NA NA NA
beta_min_b 0.004365 NA NA NA NA
beta_min_c 0.212491 NA NA NA NA
delta_a -3.309667 NA NA NA NA
gamma_gender_a 0.385924 NA NA NA NA
gamma_student_a -1.950435 NA NA NA NA
gamma_rural_a -0.100137 NA NA NA NA
gamma_ptsub_a -0.037152 NA NA NA NA
gamma_18_29_a 2.009102 NA NA NA NA
gamma_30_44_a 0.376384 NA NA NA NA
gamma_45_65_a 0.611426 NA NA NA NA
gamma_65plus_a 0.000000 NA NA NA NA
delta_b -3.806103 NA NA NA NA
gamma_gender_b 0.184908 NA NA NA NA
gamma_student_b -1.289224 NA NA NA NA
gamma_rural_b -0.169371 NA NA NA NA
gamma_ptsub_b 0.768332 NA NA NA NA
gamma_18_29_b 2.067346 NA NA NA NA
gamma_30_44_b 0.879920 NA NA NA NA
gamma_45_65_b 1.045906 NA NA NA NA
gamma_65plus_b 0.000000 NA NA NA NA
gamma_use_freq_1_a 8.636814 NA NA NA NA
gamma_use_freq_2_a 4.393848 NA NA NA NA
gamma_use_freq_3_a 3.680043 NA NA NA NA
gamma_use_freq_1_b 7.641568 NA NA NA NA
gamma_use_freq_2_b 4.414366 NA NA NA NA
gamma_use_freq_3_b 3.810765 NA NA NA NA
delta_c 0.000000 NA NA NA NA
gamma_gender_c 0.000000 NA NA NA NA
gamma_student_c 0.000000 NA NA NA NA
gamma_rural_c 0.000000 NA NA NA NA
gamma_ptsub_c 0.000000 NA NA NA NA
gamma_18_29_c 0.000000 NA NA NA NA
gamma_30_44_c 0.000000 NA NA NA NA
gamma_45_65_c 0.000000 NA NA NA NA
gamma_65plus_c 0.000000 NA NA NA NA
gamma_use_freq_1_c 0.000000 NA NA NA NA
gamma_use_freq_2_c 0.000000 NA NA NA NA
gamma_use_freq_3_c 0.000000 NA NA NA NA
Summary of class allocation for model component :
Mean prob.
Class_1 0.2744
Class_2 0.3056
Class_3 0.4200
Model with 3 classes R code:
Code: Select all
# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS ####
# ################################################################# #
### Load Apollo library
library(apollo)
### Initialise code
apollo_initialise()
apollo_control = list(
modelName = "LC_with_covariates",
modelDescr = "LC model with covariates",
indivID = "ID",
nCores = 16,
outputDirectory = "output"
)
# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS ####
# ################################################################# #
### Loading data from package
### if data is to be loaded from a file (e.g. called data.csv),
### the code would be: database = read.csv("data.csv",header=TRUE)
database = data_csv
database <- database[order(database$ID), ]
# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #
### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_alt1_a = 3,
asc_alt1_b = 1,
asc_alt2_a = 2.6,
asc_alt2_b = 0.5,
asc_alt3_a = 2.5,
asc_alt3_b = 0.3,
asc_alt1_c = 0,
asc_alt2_c = -3,
asc_alt3_c = -20,
asc_optout = 0,
beta_cost_a = 0,
beta_cost_b = 0,
beta_cost_c = 0,
beta_min_a = 0,
beta_min_b = 0,
beta_min_c = 0,
delta_a = -3,
gamma_gender_a = 0,
gamma_student_a = 0,
gamma_rural_a = 0,
gamma_ptsub_a = 0,
gamma_18_29_a = 0,
gamma_30_44_a = 0,
gamma_45_65_a = 0,
gamma_65plus_a = 0,
delta_b = -3,
gamma_gender_b = 0,
gamma_student_b = 0,
gamma_rural_b = 0,
gamma_ptsub_b = 0,
gamma_18_29_b = 0,
gamma_30_44_b = 0,
gamma_45_65_b = 0,
gamma_65plus_b = 0,
gamma_use_freq_1_a = 5,
gamma_use_freq_2_a = 4,
gamma_use_freq_3_a = 2,
gamma_use_freq_1_b = 0,
gamma_use_freq_2_b = 0,
gamma_use_freq_3_b = 0,
delta_c = 0,
gamma_gender_c = 0,
gamma_student_c = 0,
gamma_rural_c = 0,
gamma_ptsub_c = 0,
gamma_18_29_c = 0,
gamma_30_44_c = 0,
gamma_45_65_c = 0,
gamma_65plus_c = 0,
gamma_use_freq_1_c = 0,
gamma_use_freq_2_c = 0,
gamma_use_freq_3_c = 0)
### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_optout","delta_c","gamma_gender_c", "gamma_student_c", "gamma_rural_c", "gamma_ptsub_c", "gamma_18_29_c", "gamma_30_44_c", "gamma_45_65_c", "gamma_65plus_c", "gamma_65plus_a", "gamma_65plus_b", "gamma_use_freq_3_c", "gamma_use_freq_2_c", "gamma_use_freq_1_c")
# ################################################################# #
#### DEFINE LATENT CLASS COMPONENTS ####
# ################################################################# #
apollo_lcPars=function(apollo_beta, apollo_inputs){
lcpars = list()
lcpars[["beta_cost"]] = list(beta_cost_a, beta_cost_b, beta_cost_c)
lcpars[["asc_alt1"]] = list(asc_alt1_a, asc_alt1_b, asc_alt1_c)
lcpars[["asc_alt2"]] = list(asc_alt2_a, asc_alt2_b, asc_alt2_c)
lcpars[["asc_alt3"]] = list(asc_alt3_a, asc_alt3_b, asc_alt3_c)
lcpars[["beta_min"]] = list(beta_min_a, beta_min_b, beta_min_c)
### Utilities of class allocation model
V=list()
V[["class_a"]] = delta_a + gamma_gender_a * gender + gamma_student_a * student + gamma_rural_a * rural + gamma_ptsub_a * ptsub + gamma_18_29_a * age_18_29 + gamma_30_44_a * age_30_44 + gamma_45_65_a * age_45_65 + gamma_use_freq_1_a * use_freq_1 + gamma_use_freq_2_a * use_freq_2 + gamma_use_freq_3_a * use_freq_3
V[["class_b"]] = delta_b + gamma_gender_b * gender + gamma_student_b * student + gamma_rural_b * rural + gamma_ptsub_b * ptsub + gamma_18_29_b * age_18_29 + gamma_30_44_b * age_30_44 + gamma_45_65_b * age_45_65 + gamma_use_freq_1_b * use_freq_1 + gamma_use_freq_2_b * use_freq_2 + gamma_use_freq_3_b * use_freq_3
V[["class_c"]] = delta_c + gamma_gender_c * gender + gamma_student_c * student + gamma_rural_c * rural + gamma_ptsub_c * ptsub + gamma_18_29_c * age_18_29 + gamma_30_44_c * age_30_44 + gamma_45_65_c * age_45_65 + gamma_use_freq_1_c * use_freq_1 + gamma_use_freq_2_c * use_freq_2 + gamma_use_freq_3_c * use_freq_3
classAlloc_settings = list(
classes = c(class_a=1, class_b=2, class_c=3),
utilities = V
)
lcpars[["pi_values"]] = apollo_classAlloc(classAlloc_settings)
return(lcpars)
}
# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #
apollo_inputs = apollo_validateInputs()
# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #
apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))
### Create list of probabilities P
P = list()
### Define settings for MNL model component that are generic across classes
mnl_settings = list(
alternatives = c(alt1=1, alt2=2, alt3=3, optout=4),
avail = list(alt1=avail_alt1, alt2=avail_alt2, alt3=avail_alt3, optout=1),
choiceVar = pref1
)
### Loop over classes
for(s in 1:3){
### Compute class-specific utilities
V=list()
V[["alt1"]] = asc_alt1[[s]] + beta_cost[[s]]*price_alt1
V[["alt2"]] = asc_alt2[[s]] + beta_cost[[s]]*price_alt2
V[["alt3"]] = asc_alt3[[s]] + beta_cost[[s]]*price_min + beta_min[[s]]*min
V[["optout"]] = asc_optout
mnl_settings$utilities = V
mnl_settings$componentName = paste0("Class_",s)
### Compute within-class choice probabilities using MNL model
P[[paste0("Class_",s)]] = apollo_mnl(mnl_settings, functionality)
### Take product across observation for same individual
P[[paste0("Class_",s)]] = apollo_panelProd(P[[paste0("Class_",s)]], apollo_inputs, functionality)
}
### Compute latent class model probabilities
lc_settings = list(inClassProb = P, classProb=pi_values)
P[["model"]] = apollo_lc(lc_settings, apollo_inputs, functionality)
### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}
# ################################################################# #
#### MODEL ESTIMATION ####
# ################################################################# #
### Estimate model
model = apollo_estimate(apollo_beta, apollo_fixed,
apollo_probabilities, apollo_inputs)
### Show output in screen
apollo_modelOutput(model)
### Save output to file(s)
apollo_saveOutput(model)