Doubts about model specification with an alternative vs. status quo

lucamariani96 · Post by **lucamariani96** » 10 Oct 2025, 11:32

Hello!

First of all, thanks prof. Hess and prof. Palma for creating such a useful package and being so helpful towards us normal people that use it.

I am writing here to ask about a DCE that I have created and that I am trying to analyse using Apollo.

I have to admit that statistics is not my greatest strength so that's where my doubts come from. So forgive me for asking possibly dumb questions.

Here's the idea: we're asking people about whether they want to invest in a public project which has four attributes, or they prefer to stay with the default choice (i.e. not investing in anything and stay with the status quo).

The attributes of the two alternatives are: A, as the cost of the project (continuous variable); B, the CO2 savings (defined in three levels, which I am not sure whether to treat as continuous or categorical);C, the type of the construction (three levels according to each type, so a categorical); D, an attribute which takes value 50% and 100%. The second alternative (no project) has all the attributes set at 0, logically, as it is the status quo.

So, my doubts are the following:
1) in this case, using a MNL model is still a valid choice? or a logit model would still be suitable, or even better?
2) is it better to code the CO2 levels as a continuous or categorical variable? also, the three levels are equally distant from each other.
3) how do I add the socio-demographic variables in the model? Is it correct to interact them with the two alternatives? e.g. the two coefficients for gender would give me how much being a female increases the chance of choosing the first alternative, and the same for the chance of choosing the second alternative.
4) Is it correct if I define the utility functions as for the following code

Code: Select all

 ### Set core controls
apollo_control = list(
  modelName           = "DCE",
  modelDescr          = "dcetest",
  indivID             = "RID",
  outputDirectory     = "output",
  estimationRoutine   = "ml",        
  nCores              = 4  
)

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(
  asc_opt1 = 0,
  asc_opt2 = 0,
  b_A = 0,
  b_B = 0,
  b_D = 0,
  b_C1 = 0,
  b_C2 = 0,
  b_C3 = 0
)

  
### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_opt2","b_C3")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()


# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  V[['opt1']] = asc_opt1 +
    b_A * A_opt1 +
    b_B * B_opt1 +
    b_D * D_opt1 +
    b_C1 * C_C1_opt1 +
    b_C2 * C_C2_opt1 +
    b_C3 * C_C3_opt1 
  
  V[['opt2']] = asc_opt2 
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(opt1=1, opt2=0), 
    avail         = 1, 
    choiceVar     = Choice,
    V             = V
  )
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  # ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  # 
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

So basically, in the utility function for the second alternative I only included the asc and not the attributes, as they are all set to zero. And i set the asc for the opt2 as the baseline, and the third level for attribute C as a baseline so that I can compare the other levels to it (e.g. type A 50% more preferred that type C)

Final question: does it all make sense to you guys?

I would really appreciate any help from you about this.

Thanks in advance for any answer you might give me.

Post by **stephanehess** » 13 Oct 2025, 08:14

Hi

in relation to your questions:

1) in this case, using a MNL model is still a valid choice? or a logit model would still be suitable, or even better?

==> MNL is a logit model. So it's not clear what you mean here

2) is it better to code the CO2 levels as a continuous or categorical variable? also, the three levels are equally distant from each other.

==> you could try as categorical first to see if you spot non-linearity and then decide depending on that

3) how do I add the socio-demographic variables in the model? Is it correct to interact them with the two alternatives? e.g. the two coefficients for gender would give me how much being a female increases the chance of choosing the first alternative, and the same for the chance of choosing the second alternative.

==> this depends on whether you want to capture whether the likelihood of choosing the first alternative per se difers across socio-demographic groups (in which case interacting with the constants in this way makes sense) or whether you also expect differences in how they react to the attributes. Or maybe both happen. You can have a look at https://apollochoicemodelling.com/files ... variates.r

Best wishes

Stephane

lucamariani96 · Post by **lucamariani96** » 13 Oct 2025, 15:41

Hello Stephane,
thanks for your answer and your invaluable help.
Regarding the questions:
1) I posed the question incorrectly. My doubt refers to using a logit model (so a binary choice) versus a multinomial logit (a choice among three or more alternatives). In my case I choose between two alternatives, so that classifies the model as a logit one (I guess). I was thinking whether using a command for a simple logit model - say, gml - would have been more suitable...and if that changed anything regarding the estimation processes of the two commands (logit vs apollo)

2)okay, I will. How would you suggest that I spot the linearity? Looking at the coefficients (e.g. if the one for 0.05 is 1, the one for 0.40 is 8, the one for 0.75 is 15, then I can assume that it is linear?) or some else more complicated that I can't think of?

3) Yes I was thinking about the second option. I'll look at that guide, thanks.

Closing up, can I ask you if you notice any particular problem in this model specification? Would you have anything to suggest that I can improve?

Thank you so much again and I send you my best regards.

Luca

Post by **stephanehess** » 23 Oct 2025, 14:36

Hi

1) there is no difference between binary and multinomial. It's the same formula, the denominator is a sum across all alternatives. So no concern there.
2) you could plot them
3) great, let us know if you need help

Stephane

ApolloChoiceModelling forum

Doubts about model specification with an alternative vs. status quo

Doubts about model specification with an alternative vs. status quo

Re: Doubts about model specification with an alternative vs. status quo

Re: Doubts about model specification with an alternative vs. status quo

Re: Doubts about model specification with an alternative vs. status quo