Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

Could the LL of a full model worse than that for the reduced model?

Ask questions about the results reported after estimation. If the output includes errors, please include your model code if possible.
arohamirai
Posts: 18
Joined: 28 Jun 2020, 16:26

Could the LL of a full model worse than that for the reduced model?

Post by arohamirai »

Dear Stephane and David,

Thank you very much for inventing the Apollo package and creating the forum allowing our users to communicate.

I recently come across a question in choice modelling using the Apollo package. The setting of my DCE scenario is: there is a health attribute and it is described as either an improvement or a deterioration from the current health situation. The model setting is: I have an asymmetric model in which one parameter (i.e., health) is split into two according to whether the attribute is described as an improvement (i.e., health_increase) or a deterioration (i.e., health_decrease) from a reference point.The idea is similar to Hess et al. (2008). The reduced model is the symmetric model where the health parameter is specified as linear regardless of its relative position to the reference point.

Theoretically, the two models should be nested, as the asymmetric model can be restricted to the symmetric model by imposing a constraint of beta[health_increase]=-beta[health_decrease]. It is expected that the LL (likelihood) for the asymmetric model is better than that for the symmetric model, when both models are specified as basic MNL. I did empirically find this using the MNL.

HOWEVER, I expected that this is also true (i.e., the LL for the asymmetric model is better than that for the symmetric model) when two models are specified as RPL where health parameters follow normal distributions. but the result suggests that the LL for the asymmetric model (in RPL form) is WORSE . This is a bit of a surprise, because given the two models are nested, I thought the model fit of the asymmetric model should be better anyway (although may not be significantly better).

I don't know if it is because the two models are not nested anymore when they are specified as RPL (e.g., because draws for the linear health parameter in the symmetric model and for the health_increase and health_decrease parameters in the asymmetric model are from different independent normal distributions) ? or because some estimation issues (e.g., one of the models reaches a local maximum or I should try different distributions?). Codes are attached. Thank you very much.


Reference
Hess, S., Rose, J. M., & Hensher, D. A. (2008). Asymmetric preference formation in willingness to pay estimates in discrete choice models. Transportation Research Part E: Logistics and Transportation Review, 44(5), 847-863.


Codes

Asymmetric model

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
modelName ="RPL_asymmetric",
modelDescr ="RPL_asymmetric",
mixing = TRUE,
indivID ="ID",
nCores = 3
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS ####
# ################################################################# #

database = read.csv("R_T3.csv",header=TRUE)


##renames##
names(database)[names(database)=="id"] <- "ID"
database= subset(database, database$same_answersG==0&same_answersQ==0)
#
names(database)[names(database)=="E_h_1_inc"]<-"h_1_inc"
names(database)[names(database)=="E_h_2_inc"]<-"h_2_inc"
names(database)[names(database)=="E_h_1_dec"]<-"h_1_dec"
names(database)[names(database)=="E_h_2_dec"]<-"h_2_dec"
# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_1 = 0,


asc_3_mu = 0,
asc_3_sig = 0,




h_inc_mu = 0,
h_inc_sig = 0,

h_dec_mu = 0,
h_dec_sig = 0,

v_mu = 0,
v_sig = 0,





c = 0


)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_1") #(for cost_ANA: don't set delta the same as ANA class)


# ################################################################# #
#### DEFINE RANDOM COMPONENTS ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
interDrawsType="mlhs",
interNDraws=500,
interUnifDraws=c(),
interNormDraws=c("draws_v",
"draws_h_inc","draws_h_dec","draws_asc_3"),

intraDrawsType="mlhs",
intraNDraws=0,
intraUnifDraws=c(),
intraNormDraws=c()
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
randcoeff = list()
randcoeff[["hinc"]] = h_inc_mu + h_inc_sig*draws_h_inc
randcoeff[["hdec"]] = h_dec_mu + h_dec_sig*draws_h_dec


randcoeff[["v"]] = v_mu + v_sig*draws_v


randcoeff[["asc_3"]] = asc_3_mu + asc_3_sig*draws_asc_3






return(randcoeff)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P
P = list()





### Compute class-specific utilities
V=list()
V[['alt1']] = asc_1 + hinc*h_1_inc + hdec*h_1_dec + v*v_1+ c*c_1

V[['alt2']] = asc_1 + hinc*h_2_inc + hdec*h_2_dec + v*v_2 + c*c_2

V[['alt3']] = asc_3 + v*v_3 + c*c_3


### Define settings for MNL model component that are generic across classes
mnl_settings = list(
alternatives = c(alt1=1, alt2=2,alt3=3),
avail = list(alt1=1, alt2=1,alt3=1),
choiceVar = choice,
V = V

)

### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)

### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

# ################################################################# #
#### MODEL ESTIMATION ####
# ################################################################# #

### Optional starting values search
# apollo_beta=apollo_searchStart(apollo_beta, apollo_fixed,apollo_probabilities, apollo_inputs)

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)


# ################################################################# #
#### MODEL OUTPUTS ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN) ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model)









Symmetric model







# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
modelName ="RPL_symmetric",
modelDescr ="RPL_symmetric",
mixing = TRUE,
indivID ="ID",
nCores = 3
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS ####
# ################################################################# #

database = read.csv("R_T3.csv",header=TRUE)


##renames##
names(database)[names(database)=="id"] <- "ID"
database= subset(database, database$same_answersG==0&same_answersQ==0)
#
names(database)[names(database)=="E_h_1_inc"]<-"h_1_inc"
names(database)[names(database)=="E_h_2_inc"]<-"h_2_inc"
names(database)[names(database)=="E_h_1_dec"]<-"h_1_dec"
names(database)[names(database)=="E_h_2_dec"]<-"h_2_dec"
# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_1 = 0,


asc_3_mu = 0,
asc_3_sig = 0,




h = 0,



v_mu = 0,
v_sig = 0,





c = 0


)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_1") #(for cost_ANA: don't set delta the same as ANA class)


# ################################################################# #
#### DEFINE RANDOM COMPONENTS ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
interDrawsType="mlhs",
interNDraws=500,
interUnifDraws=c(),
interNormDraws=c("draws_v",
"draws_h","draws_asc_3"),

intraDrawsType="mlhs",
intraNDraws=0,
intraUnifDraws=c(),
intraNormDraws=c()
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
randcoeff = list()
randcoeff[["h"]] = h_mu + h_sig*draws_h


randcoeff[["v"]] = v_mu + v_sig*draws_v


randcoeff[["asc_3"]] = asc_3_mu + asc_3_sig*draws_asc_3






return(randcoeff)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P
P = list()





### Compute class-specific utilities
V=list()
V[['alt1']] = asc_1 + h*h_1 + v*v_1 + c*c_1

V[['alt2']] = asc_1 + h*h_2 + v*v_2 + c*c_2

V[['alt3']] = asc_3 + h*h_3 + v*v_3 + c*c_3


### Define settings for MNL model component that are generic across classes
mnl_settings = list(
alternatives = c(alt1=1, alt2=2,alt3=3),
avail = list(alt1=1, alt2=1,alt3=1),
choiceVar = choice,
V = V

)

### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)

### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

# ################################################################# #
#### MODEL ESTIMATION ####
# ################################################################# #

### Optional starting values search
# apollo_beta=apollo_searchStart(apollo_beta, apollo_fixed,apollo_probabilities, apollo_inputs)

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)


# ################################################################# #
#### MODEL OUTPUTS ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN) ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model)


----------
Tim
stephanehess
Site Admin
Posts: 974
Joined: 24 Apr 2020, 16:29

Re: Could the LL of a full model worse than that for the reduced model?

Post by stephanehess »

Tim

could you please also share the output?

Thanks
--------------------------------
Stephane Hess
www.stephanehess.me.uk
arohamirai
Posts: 18
Joined: 28 Jun 2020, 16:26

Re: Could the LL of a full model worse than that for the reduced model?

Post by arohamirai »

stephanehess wrote: 01 Jul 2020, 13:17 Tim

could you please also share the output?

Thanks
Hi Stephane,

Thank you. The model I posted above is actually served as an example, and I would like to know your expertise about the question (i.e., whether the two RPL models are nested or not). My actual model is a bit more complicated, so that's why I asked this question using a simplified example, and sorry for the confusion.

My actual model that has the same issue is as below: again there are a reduced model and a full model; in the full model, parameters are split into two according to whether a (health) attribute is specified as 20% probability of achieving the goal or 90% probability of achieving the goal; in the reduced model, the attribute is specified as linear regardless of the probability. I said my actual model is more complicated because I apply asymmetric specifications for both models and account for scale differences (because I have combined two data sets). But I think the only difference between the full and the reduced models is whether the parameter is split into two or not, according the the presentation of probability in the choice cards.

In a similar fashion to the simplified example I mentioned in the previous post, theoretically, the full and the reduced models are nested using MNL models. I have empirically tested using MNL and the LL(full) is better than the LL(reduced); these are all within expectation. HOWEVER, the LL(full) is WORSE than the LL(reduced) when using RPL. I attach my codes below and the estimation results can be found in attachment.

In the estimates results, hinc_T3 is split into two parameters in the full model, namely hinc_20 and hinc_90, and the same for the hdec_T3 parameter, but the LL (shaded red) for the full model is worse than that for the reduced model. I expect them to be the other way around. But I note that for the models using RPL they may not be nested anymore, I'm wondering if this is the case? or there are some other reasons that could cause the issue? thank you very much.



Codes for the reduced model

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
modelName ="RPL_treatment-specific",
modelDescr ="RPL_treatment-specific",
indivID ="id1",
nCores = 3,
mixing = TRUE,
noDiagnostics = TRUE
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS ####
# ################################################################# #

database = read.csv("R_T1T3.csv",header=TRUE)


##renames##
names(database)[names(database)=="id"] <- "ID"

database= subset(database, database$same_answersG==0&same_answersQ==0)
#
names(database)[names(database)=="E_h_1_inc"]<-"h_1_inc"
names(database)[names(database)=="E_h_2_inc"]<-"h_2_inc"
names(database)[names(database)=="E_h_1_dec"]<-"h_1_dec"
names(database)[names(database)=="E_h_2_dec"]<-"h_2_dec"

database$certain=ifelse(database$T3_dummy==0, 1, 0)
database$uncertain=ifelse(database$T3_dummy==1, 1, 0)

##create treatment-specific variables##
database$h_1_inc_T3=database$h_1_inc*database$T3_dummy
database$h_1_inc_T1=database$h_1_inc*(1-database$T3_dummy)
database$h_1_dec_T3=database$h_1_dec*database$T3_dummy
database$h_1_dec_T1=database$h_1_dec*(1-database$T3_dummy)
database$h_2_inc_T3=database$h_2_inc*database$T3_dummy
database$h_2_inc_T1=database$h_2_inc*(1-database$T3_dummy)
database$h_2_dec_T3=database$h_2_dec*database$T3_dummy
database$h_2_dec_T1=database$h_2_dec*(1-database$T3_dummy)

# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_I = 0,



asc_mu_III = -0.2333,
asc_sig_III = 0,



hinc_mu_T1 = 0.4488,
hinc_sig_T1 = 0,


hinc_mu_T3 = 0.8375,
hinc_sig_T3 = 0,


hdec_mu_T1 = -0.3152,
hdec_sig_T1 = 0,

hdec_mu_T3 = -0.1770,
hdec_sig_T3 = 0,


v_mu = -0.0728,
v_sig = 0,




c = -0.0001,


Uncert_scale = 1,
Cert_scale = 1




)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c( "asc_I","Cert_scale")


# ################################################################# #
#### DEFINE RANDOM COMPONENTS ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
interDrawsType = "mlhs",
interNDraws = 500,
interUnifDraws = c(),
interNormDraws = c("draws_v","draws_hdec_T1","draws_hdec_T3","draws_hinc_T1","draws_hinc_T3","draws_asc"


),
intraDrawsType = "halton",
intraNDraws = 0,
intraUnifDraws = c(),
intraNormDraws = c()
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
randcoeff = list()

randcoeff[["asc_III"]] = asc_mu_III + asc_sig_III*draws_asc

randcoeff[["v"]] = v_mu + v_sig*draws_v
randcoeff[["hinc_T1"]] = hinc_mu_T1 + hinc_sig_T1*draws_hinc_T1
randcoeff[["hinc_T3"]] = hinc_mu_T3 + hinc_sig_T3*draws_hinc_T3
randcoeff[["hdec_T1"]] = hdec_mu_T1 + hdec_sig_T1*draws_hdec_T1
randcoeff[["hdec_T3"]] = hdec_mu_T3 + hdec_sig_T3*draws_hdec_T3

return(randcoeff)
}


# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P
P = list()



### Compute utilities
V=list()
V[['alt1']] = ((exp(Uncert_scale)*uncertain)+(Cert_scale*certain))*(asc_I +
hinc_T1*h_1_inc_T1 + hdec_T1*h_1_dec_T1 +
hinc_T3*h_1_inc_T3 + hdec_T3*h_1_dec_T3 +
v*v_1 + c*c_1)

V[['alt2']] = ((exp(Uncert_scale)*uncertain)+(Cert_scale*certain))* (asc_I +
hinc_T1*h_2_inc_T1 + hdec_T1*h_2_dec_T1 +
hinc_T3*h_2_inc_T3 + hdec_T3*h_2_dec_T3 +
v*v_2 + c*c_2)

V[['alt3']] = ((exp(Uncert_scale)*uncertain)+(Cert_scale*certain))*(asc_III + v*v_3 + c*c_3)

### Define settings for MNL model component that are generic across classes
mnl_settings = list(
alternatives = c(alt1=1, alt2=2,alt3=3),
avail = list(alt1=1, alt2=1,alt3=1),
choiceVar = choice,
V = V

)


### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)

### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

# ################################################################# #
#### MODEL ESTIMATION ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed,
apollo_probabilities, apollo_inputs)

# ################################################################# #
#### MODEL OUTPUTS ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN) ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model)






Codes for the full model

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
modelName ="RPL_prob-specific",
modelDescr ="RPL_prob-specific",
indivID ="id1",
nCores = 3,
mixing = TRUE,
noDiagnostics = TRUE
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS ####
# ################################################################# #

database = read.csv("R_T1T3.csv",header=TRUE)




##renames##
names(database)[names(database)=="id"] <- "ID"

database= subset(database, database$same_answersG==0&same_answersQ==0)
#
names(database)[names(database)=="E_h_1_inc"]<-"h_1_inc"
names(database)[names(database)=="E_h_2_inc"]<-"h_2_inc"
names(database)[names(database)=="E_h_1_dec"]<-"h_1_dec"
names(database)[names(database)=="E_h_2_dec"]<-"h_2_dec"

database$certain=ifelse(database$T3_dummy==0, 1, 0)
database$uncertain=ifelse(database$T3_dummy==1, 1, 0)

##create treatment-specific variables##
database$h_1_inc_T3=database$h_1_inc*database$T3_dummy
database$h_1_inc_T1=database$h_1_inc*(1-database$T3_dummy)
database$h_1_dec_T3=database$h_1_dec*database$T3_dummy
database$h_1_dec_T1=database$h_1_dec*(1-database$T3_dummy)
database$h_2_inc_T3=database$h_2_inc*database$T3_dummy
database$h_2_inc_T1=database$h_2_inc*(1-database$T3_dummy)
database$h_2_dec_T3=database$h_2_dec*database$T3_dummy
database$h_2_dec_T1=database$h_2_dec*(1-database$T3_dummy)

##create prob-specific variables##
database$h_1_inc_20=ifelse(database$h_1==3|database$h_1==5.5|database$h_1==8,database$h_1_inc,0)
database$h_2_inc_20=ifelse(database$h_2==3|database$h_2==5.5|database$h_2==8,database$h_2_inc,0)
database$h_1_inc_90=ifelse(database$h_1==10.8|database$h_1==11.3|database$h_1==11.9,database$h_1_inc,0)
database$h_2_inc_90=ifelse(database$h_2==10.8|database$h_2==11.3|database$h_2==11.9,database$h_2_inc,0)
database$h_1_dec_20=ifelse(database$h_1==18|database$h_1==20.5|database$h_1==23,database$h_1_dec,0)
database$h_2_dec_20=ifelse(database$h_2==18|database$h_2==20.5|database$h_2==23,database$h_2_dec,0)
database$h_1_dec_90=ifelse(database$h_1==14.1|database$h_1==14.7|database$h_1==15.2,database$h_1_dec,0)
database$h_2_dec_90=ifelse(database$h_2==14.1|database$h_2==14.7|database$h_2==15.2,database$h_2_dec,0)


##rescaling

database$c_1= database$c_1/100
database$c_2= database$c_2/100
database$c_3= database$c_3/100
# ################################################################# #
#### DEFINE MODEL PARAMETERS ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(asc_I = 0,



asc_mu_III = -0.2370,
asc_sig_III = 0,



hinc_mu_T1 = 0.4488,
hinc_sig_T1 = 0,

hinc_mu_20 = 0,
hinc_sig_20 = 0,

hinc_mu_90 = 0,
hinc_sig_90 = 0,



hdec_mu_T1 = -0.3152,
hdec_sig_T1 = 0,

hdec_mu_20 = 0,
hdec_sig_20 = 0,

hdec_mu_90 = 0,
hdec_sig_90 = 0,


v_mu = -0.0728,
v_sig = 0,




c = 0,

Uncert_scale = 1,
Cert_scale = 1


)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c( "asc_I","Cert_scale")


# ################################################################# #
#### DEFINE RANDOM COMPONENTS ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
interDrawsType = "mlhs",
interNDraws = 500,
interUnifDraws = c(),
interNormDraws = c("draws_v","draws_asc",
"draws_hdec_T1","draws_hdec_20","draws_hdec_90",
"draws_hinc_T1","draws_hinc_20","draws_hinc_90"


),
intraDrawsType = "halton",
intraNDraws = 0,
intraUnifDraws = c(),
intraNormDraws = c()
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
randcoeff = list()

randcoeff[["asc_III"]] = asc_mu_III + asc_sig_III*draws_asc
randcoeff[["v"]] = v_mu + v_sig*draws_v

randcoeff[["hinc_T1"]] = hinc_mu_T1 + hinc_sig_T1*draws_hinc_T1
randcoeff[["hinc_20"]] = hinc_mu_20 + hinc_sig_20*draws_hinc_20
randcoeff[["hinc_90"]] = hinc_mu_90 + hinc_sig_90*draws_hinc_90


randcoeff[["hdec_T1"]] = hdec_mu_T1 + hdec_sig_T1*draws_hdec_T1
randcoeff[["hdec_20"]] = hdec_mu_20 + hdec_sig_20*draws_hdec_20
randcoeff[["hdec_90"]] = hdec_mu_90 + hdec_sig_90*draws_hdec_90






return(randcoeff)
}


# ################################################################# #
#### GROUP AND VALIDATE INPUTS ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P
P = list()



### Compute utilities
V=list()
V[['alt1']] = ((exp(Uncert_scale)*uncertain)+(Cert_scale*certain))*(asc_I +
hinc_T1*h_1_inc_T1 + hdec_T1*h_1_dec_T1 +
hinc_20*h_1_inc_20 + hdec_20*h_1_dec_20 +
hinc_90*h_1_inc_90 + hdec_90*h_1_dec_90 +
v*v_1 + c*c_1)

V[['alt2']] = ((exp(Uncert_scale)*uncertain)+(Cert_scale*certain))* (asc_I +
hinc_T1*h_2_inc_T1 + hdec_T1*h_2_dec_T1 +
hinc_20*h_2_inc_20 + hdec_20*h_2_dec_20 +
hinc_90*h_2_inc_90 + hdec_90*h_2_dec_90 +
v*v_2 + c*c_2)

V[['alt3']] = ((exp(Uncert_scale)*uncertain)+(Cert_scale*certain))*(asc_III + v*v_3 + c*c_3)

### Define settings for MNL model component that are generic across classes
mnl_settings = list(
alternatives = c(alt1=1, alt2=2,alt3=3),
avail = list(alt1=1, alt2=1,alt3=1),
choiceVar = choice,
V = V

)


### Compute probabilities using MNL model
P[['model']] = apollo_mnl(mnl_settings, functionality)

### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

# ################################################################# #
#### MODEL ESTIMATION ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed,
apollo_probabilities, apollo_inputs)

# ################################################################# #
#### MODEL OUTPUTS ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN) ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model)






-----
Tim
Attachments
reduced vs full.png
reduced vs full.png (41.33 KiB) Viewed 12573 times
stephanehess
Site Admin
Posts: 974
Joined: 24 Apr 2020, 16:29

Re: Could the LL of a full model worse than that for the reduced model?

Post by stephanehess »

Hi Tim

let us assume you have the following situation:

model 1:

V = beta_x * x + ...

model 2:

V = beta_xinc * x_inc + beta_xdec * x_dec + ...

where x_inc = (x>x0)*(x-x0) and x_dec = (x<x0)*(x-x0)

not that in this notation, x_dec will be negative, which will help us with the comparison

As only differences in utility matter, a model where beta_xinc=beta_xdec will give you the same LL as model 1

This is what you confirm you see in the MNL models. In your MMNL models, I can see two reasons why this might not happen:

1. you have ended up in a poor local optimum
2. the approximation given by draws you are using is different for the two betas

An easy thing you can do to test what is wrong is to take your results from model 1 and use them as starting values in model 2, i.e. for both the increase and decrease parameters. If you haven't taken the differences in the same direction I did above, then you may need to apply a sign change to one of the betas. Your LL should now be the same as in model 1, and this would confirm a poor local optimum as what has happened. If not, then you have a different issue somewhere, either in the code or due to the draws.

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
arohamirai
Posts: 18
Joined: 28 Jun 2020, 16:26

Re: Could the LL of a full model worse than that for the reduced model?

Post by arohamirai »

Dear Stephane,

Thank you very much indeed. That is a great help. I finally find that I didn't use the same draw for the two betas of x_inc and x_dec.

One more question on this: What is the rationale behind the decision of using the same draw for different parameters? Is it because we believe that the x_inc and x_dec parameters are from the same attribute x, so people should have the same preference towards this attribute (e.g., if it is a health attribute, those who like to see health improvement should also dislike to see health deterioration)? On the other hand, if we believe people have different preferences for two parameters, let's say, a health improvement from low to medium level, and an improvement from low to high level, then we can assume two independent draws for Beta(low to medium) and Beta(low to high) respectively?

Thank you again for your help.


Tim
stephanehess
Site Admin
Posts: 974
Joined: 24 Apr 2020, 16:29

Re: Could the LL of a full model worse than that for the reduced model?

Post by stephanehess »

Tim

I wasn't necessarily implying that you should use the same draws for increases and decreases, but that this could be a reason for the difference in fit. Using different draws arguably makes more sense, but let's try and understand what you're seeing a little better.

1. What draws are you using, and how many?
2. How big a difference in fit are you seeing between the two models?

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
arohamirai
Posts: 18
Joined: 28 Jun 2020, 16:26

Re: Could the LL of a full model worse than that for the reduced model?

Post by arohamirai »

Hi Stephane,

For the models I posted in the my second post:

1. I use 500 MLHS draws from normal distribution for all mentioned analysis where random parameters are assumed.

2. Using different draws for the two parameters: LL is -3960 for the reduced model, and -4021 for the full model. This is counter-intuitive, as shown in my previous post.

Now using the same draw, LL is -3951 for the full model; the issue seems disappear. My understanding is that If I claim that two models are nested and likelihood ratio test is appropriate, I should use the same draw, otherwise I can still compare model difference from AIC/BIC statistics, but can't say they are nested. what do you think?

Thank you.

Tim
stephanehess
Site Admin
Posts: 974
Joined: 24 Apr 2020, 16:29

Re: Could the LL of a full model worse than that for the reduced model?

Post by stephanehess »

Tim

that's a much larger difference in fit than one would expect with different draws but is in line with recent tests I've done on antithetic draws and changing signs of sigmas. What happens when you use your model with different draws, but use the results from the reduced model as starting values (obviously with a sign change on one of the parameters)?

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
arohamirai
Posts: 18
Joined: 28 Jun 2020, 16:26

Re: Could the LL of a full model worse than that for the reduced model?

Post by arohamirai »

Hi Stephane,

Thank you very much. Please see the attached file for the LL results of the full model with and without using reduced model's coefficients as starting values. It seems that the problem is not that the LL ends up with a local maximum, but that I didn't use the same draws for the two parameters. However, I don't quite understand the reason behind, what do you think?

Thank you.


Tim
Attachments
reduced vs full (LL).png
reduced vs full (LL).png (13.44 KiB) Viewed 12508 times
stephanehess
Site Admin
Posts: 974
Joined: 24 Apr 2020, 16:29

Re: Could the LL of a full model worse than that for the reduced model?

Post by stephanehess »

Hi Tim

I've looked into your model in detail and I believe I have found the reason this is happening.

In your base specification, you use the same coefficient whether the treatment is 20 or 90. In your more general model, you use separate parameters. Now, you see that in the MNL model, the more general model gives better fit than the model that constrains the parameter in the treatment with 20 and the treatment with 90 to be the same. In the MMNL model, this does not happen when you use separate draws for the two treatments.

What happens here is that, in your data, the two treatments are not specific to individual people but to individual choices. So for the same person, you can have treatment 20 happen in some tasks and treatment 90 happen in other tasks. Now, if you have a mixed logic model, then the random coefficients create correlation across choices for the same person for the affected coefficient, and hence for the utilities/probabilities. In the simulated LL, this happens as the probability for all choices is calculated with a given draw, and then multiplied across choices. This is done for each draw before the average is taken. But if you use two different sets of draws for the treatment with 20 and the treatment with 90, then you're losing the correlation for that coefficient between those choice tasks. As a result, it is not unexpected that your model fit is worse.

As you saw, when you set the draws to be the same, then the model gets the same log-likelihood with the same starting values for the random terms as the simpler model, and then converges to a better solution. What this tells you is that it makes sense to have different means and sigmas for the treatment with 20 and the treatment with 90. That's of course part of what you want to find out.

You also see that if you don't use the same draws for the two treatments, then you get a worse fit. What you are doing here is comparing two extremes. Remember that your beta20 and beta90 are distributed across the sample population. The model that uses the same draws imposes that the correlation between these two distribution is 1. On the other hand, if you use separate draws, then you're imposing that the correlation is 0. That is of course not very reasonable as we would expect an individual who has a high sensitivity with treatment 20 would also have a high sensitivity with treatment 90.

You could of course refine your model further by doing something like this:

randcoeff[["hinc_20"]] = hinc_mu_20 + hinc_sig_20*draws_hinc_20
randcoeff[["hinc_90"]] = hinc_mu_90 + hinc_sig_20_90*draws_hinc_20 + hinc_sig_90*draws_hinc_90

In that case:
1. the standard deviation for hinc_20 is still given by hinc_sig_20.
2. The standard deviation for hinc_90 is given by sqrt(hinc_sig_20_90^2+hinc_sig_90^2)
3. The covariance between the two is hinc_sig_20* hinc_sig_20_90

In your basic model using the same draws, you have hinc_sig_90 fixed to zero, so you have perfect correlation. In your model using separate draws, you have hinc_sig_20_90 fixed to zero, so you have no correlation.

Does this make sense?

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
Post Reply