Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

dummy coding of variables

Ask questions about model specifications. Ideally include a mathematical explanation of your proposed model.
Post Reply
maa033
Posts: 41
Joined: 23 Jul 2020, 14:00

dummy coding of variables

Post by maa033 »

Hi

We are dummy coding a model and now the question arises whether we have to exclude the utility function for the status quo (SQ) alternative V[[altSQ]]. For each attribute we use the SQ level as reference level, and there is no variation in level for any of the attributes in the SQ alternative.

The problem is that when including the SQ-alternative, along with two other alternatives where the attribute levels do vary across the choice situations, the model doesn't converge, and we get the output below. In that model we did not include the ASC for the SQ-alternative. Usually, we include an ASC in the SQ-alternative. So, the question is; would it be a solution to to exclude the SQ-alternative, alternatively to include only the ASC in the SQ-alternative, as there is no attribute level variation in this alternative?

Thank you in advance for your input.

best regards,
Margrethe

***** Function evaluation limit *****

Estimated parameters:
Estimate
a_b_SB_2 -7.34007
b_b_SB_2 4.56510
a_b_SB_3 -6.88600
b_b_SB_3 5.34108
a_b_SAL_2 4.29063
b_b_SAL_2 -3.12090
a_b_SAL_3 4.52387
b_b_SAL_3 -3.20453
a_b_JOB_2 5.81977
b_b_JOB_2 0.25630
a_b_JOB_3 5.60373
b_b_JOB_3 -0.20408
a_b_JOB_4 5.77565
b_b_JOB_4 -0.02408
a_b_JOB_5 5.41023
b_b_JOB_5 -0.89961
a_b_JOB_6 5.41988
b_b_JOB_6 -0.70856
a_b_COST_2 -18.42823
b_b_COST_2 -7.64385
a_b_COST_3 -737.81001
b_b_COST_3 -305.42652
a_b_COST_4 -1.41074
b_b_COST_4 -1.89788
a_b_COST_5 -2.37992
b_b_COST_5 -4.68789
a_b_COST_6 -45.77526
b_b_COST_6 -163.25629

Final LL: -6095.3657

WARNING: Your model did not converge properly, and some of your parameter values are tending
to +/- infinity. This could point to an identification issue. If you want to retain
these parameters in the model, you may wish to set their value(s) in apollo_beta to
the estimated value(s), include the parameter name(s) in apollo_fixed, and
re-estimate the model.
WARNING: Function evaluation limit exceeded. No covariance matrix will be computed. You may
wish to use the current estimates as starting values for a new estimation with a
higher limit for functional evaluations.
stephanehess
Site Admin
Posts: 1189
Joined: 24 Apr 2020, 16:29

Re: dummy coding of variables

Post by stephanehess »

Hi

please show us the code too, impossible to find the issue with just seeing outputs

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
maa033
Posts: 41
Joined: 23 Jul 2020, 14:00

Re: dummy coding of variables

Post by maa033 »

Hi again

yes, sorry for forgetting about the code.
Here it is. I also attach the last warning/when running the attached code.

### Load Apollo library
install.packages("apollo")
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
modelName = "Mining_MXL_dummy",
modelDescr = "Mixed logit model on mining data using dummy coding",
indivID = "ID",
nCores = 4,
outputDirectory = "output MXL local")

database = read_excel("Mining_survey_long_NEW")
database = survey #%>% filter(distancecounty == 1)
database <- database[order(database$ID),]
database <- as.data.frame(database)
database <- database[database$distancecounty == 1,]

apollo_beta = c(a_b_asc = -7.0,
b_b_asc = 0,
a_b_SB_2 = -0.05,
a_b_SB_3 = -0.05,
b_b_SB_2 = 0,
b_b_SB_3 = 0,
a_b_SAL_2 = -0.05,
a_b_SAL_3 = -0.05,
b_b_SAL_2 = 0,
b_b_SAL_3 = 0,
a_b_JOB_1 = 0,
a_b_JOB_2 = 0,
a_b_JOB_3 = 0.01,
a_b_JOB_4 = 0.01,
a_b_JOB_5 = 0.01,
b_b_JOB_1 = 0,
b_b_JOB_2 = 0,
b_b_JOB_3 = 0,
b_b_JOB_4 = 0,
b_b_JOB_5 = 0,
a_b_COST = -7,
b_b_COST = -2.0)


apollo_fixed = c() #nothing is fixed at starting value

### Define random components
set.seed(1)

### Set parameters for generating draws
apollo_draws = list(
interDrawsType = "sobol",
interNDraws = 500,
interUnifDraws = c(), #only inter-heterogeneity for COST, Uniformly distributed
interNormDraws = c("draws_COST", "draws_SB_2", "draws_SB_3", "draws_SAL_2", "draws_SAL_3", "draws_JOB_1", "draws_JOB_2", "draws_JOB_3", "draws_JOB_4", "draws_JOB_5", "draws_asc")#, #for normal distribution of inter-heterogeneity
#intraDrawsType = "halton",
#intraNDraws = 0,
#intraUnifDraws = c(),
#intraNormDraws = c()
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
randcoeff = list()

randcoeff[["b_COST"]] = -exp(a_b_COST + b_b_COST * draws_COST)
randcoeff[["b_SB_2"]] = a_b_SB_2 + b_b_SB_2 * draws_SB_2
randcoeff[["b_SB_3"]] = a_b_SB_3 + b_b_SB_3 * draws_SB_3
randcoeff[["b_SAL_2"]] = a_b_SAL_2 + b_b_SAL_2 * draws_SAL_2
randcoeff[["b_SAL_3"]] = a_b_SAL_3 + b_b_SAL_3 * draws_SAL_3
randcoeff[["b_JOB_1"]] = a_b_JOB_1 + b_b_JOB_1 * draws_JOB_1
randcoeff[["b_JOB_2"]] = a_b_JOB_2 + b_b_JOB_2 * draws_JOB_2
randcoeff[["b_JOB_3"]] = a_b_JOB_3 + b_b_JOB_3 * draws_JOB_3
randcoeff[["b_JOB_4"]] = a_b_JOB_4 + b_b_JOB_4 * draws_JOB_4
randcoeff[["b_JOB_5"]] = a_b_JOB_5 + b_b_JOB_5 * draws_JOB_5
randcoeff[["asc"]] = a_b_asc + b_b_asc * draws_asc

return(randcoeff)
}

### Group and validate inputs
apollo_inputs = apollo_validateInputs()


### Define model and likelihood function

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P and List of utilities V
P = list()

V = list()

V[["alternative1"]] = asc+b_SB_2*SB1_20+b_SB_3*SB1_30+b_JOB_1*JOB1_10+b_JOB_2*JOB1_20+b_JOB_3*JOB1_30+b_JOB_4*JOB1_40+b_JOB_5*JOB1_50+b_SAL_2*SAL1_5+b_SAL_3*SAL1_10+b_COST*COST1
V[["alternative2"]] = b_SB_2*SB2_20+b_SB_3*SB2_30+b_JOB_1*JOB2_10+b_JOB_2*JOB2_20+b_JOB_3*JOB2_30+b_JOB_4*JOB2_40+b_JOB_5*JOB2_50+b_SAL_2*SAL2_5+b_SAL_3*SAL2_10+b_COST*COST2
V[["alternative3"]] = b_SB_2*SB3_20+b_SB_3*SB3_30+b_JOB_1*JOB3_10+b_JOB_2*JOB3_20+b_JOB_3*JOB3_30+b_JOB_4*JOB3_40+b_JOB_5*JOB3_50+b_SAL_2*SAL3_5+b_SAL_3*SAL3_10+b_COST*COST3



### Define settings for MNL model component
mnl_settings = list(
alternatives = c(alternative1=1, alternative2=2, alternative3=3),
avail = list(alternative1=1, alternative2=1, alternative3=1), #not required
choiceVar = choice,
utilities = V
)

### Compute probabilities using MNL model
P[["model"]] = apollo_mnl(mnl_settings, functionality)

### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)

return(P)

}

model = apollo_estimate(apollo_beta, apollo_fixed,apollo_probabilities, apollo_inputs)



Preparing user-defined functions.

Testing likelihood function...

Overview of choices for MNL model component :
alternative1 alternative2 alternative3
Times available 8411.00 8411.00 8411.00
Times chosen 2489.00 3114.00 2808.00
Percentage chosen overall 29.59 37.02 33.38
Percentage chosen when available 29.59 37.02 33.38


Pre-processing likelihood function...
Creating cluster...
Preparing workers for multithreading...

Testing influence of parameters
Starting main estimation

BGW using analytic model derivatives supplied by caller...


Iterates will be written to:
output MXL local/Mining_MXL_dummy_iterations.csv
it nf F RELDF PRELDF RELDX MODEL stppar
0 1 8.795975854e+03
1 2 8.753934507e+03 4.780e-03 4.452e-03 7.95e-04 G 3.98e+01
2 3 8.478735227e+03 3.144e-02 1.509e-02 3.39e-03 G 6.03e+00
3 4 7.921624107e+03 6.571e-02 2.981e-02 6.70e-02 G -7.16e-08
4 5 7.432825033e+03 6.170e-02 1.896e-02 4.15e-02 G -4.83e-08
5 6 7.178648353e+03 3.420e-02 1.608e-02 1.18e-02 G -4.83e-11
6 7 6.711280742e+03 6.511e-02 2.859e-01 1.25e-01 S 5.51e-01
7 8 6.493101855e+03 3.251e-02 4.639e-02 1.57e-01 S -5.51e-07
8 10 6.374881708e+03 1.821e-02 2.387e-02 3.53e-02 S 4.41e-01
9 11 6.270189002e+03 1.642e-02 1.694e-02 3.77e-02 G 7.56e-02
10 12 6.254650434e+03 2.478e-03 1.205e-02 6.70e-02 G 5.23e-03
11 13 6.228709026e+03 4.148e-03 1.255e-02 1.09e-01 G 4.22e-03
12 15 6.225072729e+03 5.838e-04 9.948e-03 8.89e-02 G-S -4.22e-06
13 16 6.196661670e+03 4.564e-03 6.673e-03 2.55e-02 S 4.44e-01
14 17 6.185438797e+03 1.811e-03 4.674e-03 3.86e-02 S 9.51e-04
15 18 6.174296310e+03 1.801e-03 1.679e-03 8.83e-03 S -9.51e-04
16 19 6.174025968e+03 4.379e-05 6.278e-04 8.95e-03 S -9.51e-07
17 20 6.172022641e+03 3.245e-04 3.505e-04 4.52e-03 S 5.03e-06
18 21 6.171844798e+03 2.881e-05 7.879e-05 3.38e-03 S -4.61e-06
19 22 6.171698855e+03 2.365e-05 3.644e-05 1.76e-03 S -4.61e-06
20 23 6.171569102e+03 2.102e-05 1.722e-05 1.29e-03 S -4.61e-06
21 24 6.171461824e+03 1.738e-05 1.270e-05 2.24e-03 S -4.61e-06
22 27 6.171406442e+03 8.974e-06 1.191e-05 9.41e-04 G-S 2.52e-01
23 28 6.171381851e+03 3.985e-06 5.167e-06 1.14e-03 S 2.32e-02
24 29 6.171353871e+03 4.534e-06 3.168e-06 4.97e-04 S -2.31e-05
25 31 6.171331379e+03 3.645e-06 5.834e-06 1.01e-03 G-S 2.71e-02
26 32 6.171291236e+03 6.505e-06 3.582e-06 6.13e-04 S -2.71e-05
27 34 6.171224931e+03 1.074e-05 1.398e-05 5.83e-04 G 1.00e+00
28 35 6.171081553e+03 2.323e-05 1.604e-05 1.04e-03 G 1.08e-01
29 36 6.170408272e+03 1.091e-04 5.377e-05 6.02e-03 G 4.07e-02
30 38 6.170000649e+03 6.606e-05 1.055e-04 3.34e-03 G 8.78e-02
31 39 6.169652482e+03 5.643e-05 1.676e-04 2.69e-03 G 1.39e-01
32 40 6.169648396e+03 6.623e-07 7.520e-05 4.95e-03 S 1.23e-02
33 41 6.169365574e+03 4.584e-05 7.363e-05 2.97e-03 S 1.12e-01
34 43 6.169272239e+03 1.513e-05 1.976e-05 6.90e-04 S 8.34e-01
35 44 6.169241176e+03 5.035e-06 7.013e-06 6.10e-04 S 9.68e-02
36 45 6.169236915e+03 6.907e-07 3.037e-06 8.47e-04 S 8.65e-05
37 46 6.169228330e+03 1.392e-06 2.560e-06 3.78e-04 S -8.65e-05
38 47 6.169226481e+03 2.996e-07 1.578e-06 4.28e-04 S -8.65e-05
39 48 6.169222923e+03 5.769e-07 9.577e-07 2.34e-04 S -8.65e-05
40 49 6.169221720e+03 1.950e-07 3.250e-07 1.35e-04 S -8.65e-05
41 50 6.169221441e+03 4.526e-08 6.631e-08 5.58e-05 S -8.65e-08
42 51 6.169221400e+03 6.517e-09 1.881e-08 4.42e-05 S -8.65e-08
43 52 6.169221362e+03 6.153e-09 6.613e-09 2.21e-05 S -8.65e-08
44 53 6.169221362e+03 1.372e-10 2.495e-10 8.17e-06 S -8.65e-08
45 54 6.169221361e+03 4.743e-11 4.732e-11 2.10e-06 S -8.65e-14

***** Singular convergence *****

Estimated parameters:
Estimate
a_b_asc -6.370847
b_b_asc 4.060868
a_b_SB_2 -0.773047
a_b_SB_3 0.584889
b_b_SB_2 0.688278
b_b_SB_3 1.625482
a_b_SAL_2 -0.387136
a_b_SAL_3 0.573328
b_b_SAL_2 -0.008551
b_b_SAL_3 -2.092937
a_b_JOB_1 0.132275
a_b_JOB_2 0.141044
a_b_JOB_3 -0.077773
a_b_JOB_4 0.068794
a_b_JOB_5 0.639390
b_b_JOB_1 -0.093039
b_b_JOB_2 0.031118
b_b_JOB_3 0.012914
b_b_JOB_4 0.084082
b_b_JOB_5 -2.482313
a_b_COST -7.825042
b_b_COST -1.798416

Final LL: -6169.2214

WARNING: Estimation failed. No covariance matrix to compute.
Calculating log-likelihood at equal shares (LL(0)) for applicable models...
Calculating log-likelihood at observed shares from estimation data (LL(c)) for applicable models...
Calculating LL of each model component...
Calculating other model fit measures
INFORMATION: Your model took more than 10 minutes to estimate, so it was saved to file output MXL
local/Mining_MXL_dummy_model.rds before calculating its covariance matrix. If calculation of the
covariance matrix fails or is stopped before finishing, you can load the model up to this point using
apollo_loadModel.

Your model was estimated using the BGW algorithm. Please acknowledge this by citing Bunch et al. (1993) -
DOI 10.1145/151271.151279
stephanehess
Site Admin
Posts: 1189
Joined: 24 Apr 2020, 16:29

Re: dummy coding of variables

Post by stephanehess »

did the MNL version work?
--------------------------------
Stephane Hess
www.stephanehess.me.uk
maa033
Posts: 41
Joined: 23 Jul 2020, 14:00

Re: dummy coding of variables

Post by maa033 »

Hi again

Sorry for delayed response, it was an misunderstanding between me and the PhD student that runs the model. Attached is the MNL script.
Note that we did run it with other starting values as well (only zeros). It does not run. Still the same error message.

best regards,
Margrethe

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
modelName = "Mining_MNL_pref_Dummies",
modelDescr = "Multinomial logit model on mining data, non-cost attributes dummy-coded",
indivID = "ID",
nCores = 4,
outputDirectory = "output MNL dummies")

database = survey #%>% filter(split == 1) %>% filter(distancecounty == 1) database <- database[order(database$ID),] database <- as.data.frame(database)



### Define model parameters

apollo_beta = c(asc_1 = -5,
b_SB_1 = -0.5,
b_SB_2 = -0.4,
b_SAL_1 = -0.5,
b_SAL_2 = -0.5,
b_JOB_1 = 0.5,
b_JOB_2 = 0.5,
b_JOB_3 = 0.5,
b_JOB_4 = 0.5,
b_JOB_5= 0.5,
b_COST = -5)


apollo_fixed = c() #nothing is fixed at starting value

### Group and validate inputs
apollo_inputs = apollo_validateInputs()


### Define model and likelihood function

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){

### Attach inputs and detach after function exit
apollo_attach(apollo_beta, apollo_inputs)
on.exit(apollo_detach(apollo_beta, apollo_inputs))

### Create list of probabilities P and List of utilities V
P = list()

V = list()
V[["alternative1"]] = asc_1 + b_SB_1*SB1_1+b_JOB_1*JOB1_1+b_SAL_1*SAL1_1+b_COST*COST1
V[["alternative2"]] = b_SB_2*SB2_2+b_JOB_2*JOB2_2+b_JOB_3*JOB2_3+b_JOB_4*JOB2_4+b_JOB_5*JOB2_5+b_SAL_2*SAL2_2+b_COST*COST2
V[["alternative3"]] = b_SB_2*SB3_2+b_JOB_2*JOB3_2+b_JOB_3*JOB3_3+b_JOB_4*JOB3_4+b_JOB_5*JOB3_5+b_SAL_2*SAL3_2+b_COST*COST3



### Define settings for MNL model component
mnl_settings = list(
alternatives = c(alternative1=1, alternative2=2, alternative3=3),
avail = list(alternative1=1, alternative2=1, alternative3=1),
choiceVar = choice,
utilities = V
)

### Compute probabilities using MNL model
P[["model"]] = apollo_mnl(mnl_settings, functionality)

### Take product across observation for same individual
P = apollo_panelProd(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

model = apollo_estimate(apollo_beta, apollo_fixed,apollo_probabilities, apollo_inputs)

apollo_saveOutput(model)
stephanehess
Site Admin
Posts: 1189
Joined: 24 Apr 2020, 16:29

Re: dummy coding of variables

Post by stephanehess »

Can you tell us more about the attributes and their levels, and how they vary across alternatives. thanks
--------------------------------
Stephane Hess
www.stephanehess.me.uk
maa033
Posts: 41
Joined: 23 Jul 2020, 14:00

Re: dummy coding of variables

Post by maa033 »

Yes, of course.

SB (seabed) takes 3 levels: 30 (only used in alternative 1, SQ), 20, 10 (only used in alternatives 2 & 3, i.e. non-SQ alternatives)
SAL (salmon) takes 3 levels: 10 (only used in alternative 1, SQ), 5, 0 (only used in alternatives 2 & 3, i.e. non-SQ alternatives)
JOB takes 6 levels: 50 (only used in alternative 1, SQ), 40, 30, 20, 10, 0 (only used in alternatives 2 & 3, i.e. non-SQ alternatives)
COST - we don't dummy code this. takes 6 levels: 0 (SQ), 500, 1000, 2000, 3000, 4000

BR,
Margrethe
stephanehess
Site Admin
Posts: 1189
Joined: 24 Apr 2020, 16:29

Re: dummy coding of variables

Post by stephanehess »

Hi

that means your base levels are confounded with the status quo alternative. You cannot have both ASCs plus only one level being set to zero as a result

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
maa033
Posts: 41
Joined: 23 Jul 2020, 14:00

Re: dummy coding of variables

Post by maa033 »

Hi again

We tried to remove the ASC from the utility function (alternative1=SQ alternative), and not using the SQ attribute level as base level.
We also tried using the SQ attribute levels as base levels, so that alternative1 (SQ) only included the ASC.
Neither worked.
Are you saying that when the SQ attribute level is not used in the non-SQ alternatives, then it is not possible to estimate a dummy coded model?
If not; how should the utility functions for the 3 alternatives be formulated?


Best regards,
Margrethe
stephanehess
Site Admin
Posts: 1189
Joined: 24 Apr 2020, 16:29

Re: dummy coding of variables

Post by stephanehess »

your base levels are confounded with the SQ. So e.g. for SB, you can only estimate one of the levels, not 2.

I believe that what you would be able to do is estimate a model where you have a SQ constant, and then you have a normalisation within the levels for the non-SQ alternatives
--------------------------------
Stephane Hess
www.stephanehess.me.uk
Post Reply