MNL for residential choices

lolabdn · Post by **lolabdn** » 09 May 2023, 09:27

Hi,

I am working on a MNL model for residential choices. I have 1000 alternatives and each represent a municipality. This work is based on a survey I have been conducting, and all the respondent declared their municipality of residence, and also other information about this choice.

I started work on my model by using the MNL_iterative_coding example. It allowed me to code the utilities with the different attributes of the municipalities. I also want to add a fixed spatial effect and some socio-demo variables (that are not related to the municipalities but to individuals). Yet, when I do it, I obtain NA for the corresponding estimates.

Is there a specific way to do it ? Is there another example I should explore ?

Thank you for your help
Lola

Post by **stephanehess** » 09 May 2023, 09:54

Hi

we really need to see your code and output in order to be able to help

Stephane

lolabdn · Post by **lolabdn** » 09 May 2023, 10:53

Hi,

Thanks for your answer
Here is my code :

Code: Select all

###################################################################
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ##
###################################################################
#### Library and data 
library(apollo) 

### Clear memory
#rm(list = ls())

#Initialise code
apollo_initialise()

#Set core control
apollo_control = list(
  modelName       = "MNL_RP",
  modelDescr      = "Simple MNL model on mode choice RP data",
  indivID         = "Obs", #Identifiant ménage
  outputDirectory = "output")

###################################################################
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ##
###################################################################

N = nrow(data)
J = 1316 #nb alternatives
K = 8 #nb attributes

database = read_rds("output/data_final_4.rds")#data_final

###################################################################
#### DEFINE MODEL PARAMETERS                                     ##
###################################################################

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(alpha = 1,
                #beta_ch = 0,
                beta_ruc = 0,
                beta_work = 0,
                beta_mont = 0,
                beta_mer = 0,
                #beta_poll = 0,
                #beta_icu = 0,
                #beta_bruit = 0,
                setNames(rep(0,K),paste0("beta_",1:K)))

### Vector with names (in quotes) of parameters to be kept fixed at their 
# starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()


###################################################################
#### GROUP AND VALIDATE INPUTS                                   ##
###################################################################

apollo_inputs = apollo_validateInputs()
apollo_inputs$J = J # need to retain J (number of alternatives) for use
                    # inside apollo_probabilities
apollo_inputs$K = K # need to retain K (number of attributes) for use 
                    # inside apollo_probabilities


###################################################################
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ##
###################################################################

apollo_probabilities=function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, 
  #order is irrelevant
  V = list()
  # for(j in 1:apollo_inputs$J){
  #   V[[paste0("alt_",j)]] = 0
  #   for(k in 1:apollo_inputs$K) V[[paste0("alt_",j)]] = V[[paste0("alt_",j)]] + 
  #       get(paste0("beta_",k))*get(paste0("x_",j,"_",k))
  # }
  
  for(j in 1:apollo_inputs$J){
    V[[paste0("alt_",j)]] = 
       alpha +
       #beta_ch * nb_chambres + #taille_log 
       beta_ruc * log_ruc +
       beta_work * log_dwork +
       beta_mont * vue_montagne +
       beta_mer * vue_mer +
       #beta_poll * pollution +
       #beta_icu * icu_1 +
       #beta_bruit * bruit +
      get(paste0("beta_",1))*get(paste0("x_",j,"_",1)) + #prix m2
      get(paste0("beta_",2))*get(paste0("x_",j,"_",2)) + #log(pop)
      get(paste0("beta_",3))*get(paste0("x_",j,"_",3)) + #densite
      get(paste0("beta_",4))*get(paste0("x_",j,"_",4)) + #jobs
      get(paste0("beta_",5))*get(paste0("x_",j,"_",5)) + #taille men
      get(paste0("beta_",6))*get(paste0("x_",j,"_",6)) + #parc logements
      get(paste0("beta_",7))*get(paste0("x_",j,"_",7)) + #ecoles
      get(paste0("beta_",8))*get(paste0("x_",j,"_",8)) #distance cbd
  }
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = setNames(1:apollo_inputs$J, names(V)), 
    avail         = setNames(apollo_inputs$database[,paste0("av_",1:apollo_inputs$J)], names(V)),
    choiceVar     = Choice, #Alternative choisie 
    utilities     = V
  )
  
  ### Compute probabilities using MNL model
  P[["model"]] = apollo_mnl(mnl_settings, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

###################################################################
#### MODEL ESTIMATION                                            ##
###################################################################

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs)

And the output :

Code: Select all

Calculating log-likelihood at equal shares (LL(0)) for applicable
  models...
Calculating log-likelihood at observed shares from estimation data
  (LL(c)) for applicable models...
Calculating LL of each model component...
Calculating other model fit measures
Computing covariance matrix using analytical gradient.
 0%....25%....50%....75%....100%
WARNING: Some eigenvalues of the Hessian are positive, indicating
  convergence to a saddle point!
Computing score matrix...
Warning message:
In sqrt(diag(varcov)) : NaNs produced

Model run by blandin using Apollo 0.2.8 on R 4.1.0 for Windows.
www.ApolloChoiceModelling.com

Model name                                  : MNL_RP
Model description                           : Simple MNL model on mode choice RP data
Model run at                                : 2023-05-09 11:40:23
Estimation method                           : bfgs
Model diagnosis                             : successful convergence 
Number of individuals                       : 1871
Number of rows in database                  : 1871
Number of modelled outcomes                 : 1871

Number of cores used                        :  1 
Model without mixing

LL(start)                                   : -7319.4
LL at equal shares, LL(0)                   : -7319.4
LL at observed shares, LL(C)                : -12959.74
LL(final)                                   : -7119.83
Rho-squared vs equal shares                  :  0.0273 
Adj.Rho-squared vs equal shares              :  0.0256 
Rho-squared vs observed shares               :  0.4506 
Adj.Rho-squared vs observed shares           :  0.4497 
AIC                                         :  14263.66 
BIC                                         :  14330.07 

Estimated parameters                        :  12
Time taken (hh:mm:ss)                       :  00:05:9.09 
     pre-estimation                         :  00:02:42 
     estimation                             :  00:00:51.72 
     post-estimation                        :  00:01:35.38 
Iterations                                  :  16  
Min abs eigenvalue of Hessian               :  627.5277 
Some eigenvalues of Hessian are positive, indicating potential
  problems!

Unconstrained optimisation.

These outputs have had the scaling used in estimation applied to them.
Estimates:
             Estimate        s.e.   t.rat.(0)    Rob.s.e. Rob.t.rat.(0)
beta_ruc   -7.069e-13    0.003174  -2.228e-10    0.069879    -1.012e-11
beta_work  -3.324e-14         NaN         NaN    1.261392    -2.635e-14
beta_mont  -3.640e-14         NaN         NaN    5.128068    -7.098e-15
beta_mer   -3.212e-14    0.242989  -1.322e-13   10.101880    -3.180e-15
beta_1      -0.046473    0.021337     -2.1780    0.018858       -2.4644
beta_2       0.827930    0.099239      8.3428    0.113729        7.2799
beta_3      -0.070419    0.012101     -5.8190    0.011746       -5.9952
beta_4      -0.001739  3.3729e-04     -5.1558  3.3722e-04       -5.1570
beta_5      -1.637894    0.155710    -10.5189    0.154312      -10.6142
beta_6      -0.739536    0.108651     -6.8066    0.122292       -6.0473
beta_7       0.013343    0.001704      7.8289    0.001796        7.4290
beta_8      -0.002197    0.003162     -0.6949    0.003004       -0.7313

If I use different socio-demo variables, sometimes there are no NA...

Thank you again

Lola

dpalma · Post by **dpalma** » 18 May 2023, 13:47

Hi Lola,

From looking at your code, I reckon that your deterministic utility functions are as follows:

V_j = a + b_z*z + b_x*x_j

Where j is the index of the alternative, and z are sociodemographic characteristics of the respondent.

The problem with this formulation is that z variables are the same across all alternatives, so when you do (for example) V_1 - V_2, b_z cancels out. As you are probably aware, only differences in utility matter in MNL models, so your b_z parameters (beta_ruc, beta_wor, beta_mont, and beta_mer) would not be identified. And these are precisely the parameters that are causing problems in your results. For the same reason you do not need the alpha parameter either.

The most traditional way to introduce sociodemographics in a model like yours would be to interact them with the b_x coefficients. So for example, people currently with view to the sea might have a different sensibility to price than other people. Yo would then do something like:

Code: Select all

V[[paste0("alt_",j)]] = 
      (get(paste0("beta_",1)) + beta_mer * vue_mer)*get(paste0("x_",j,"_",1)) + #prix m2
      get(paste0("beta_",2))*get(paste0("x_",j,"_",2)) + #log(pop)
      get(paste0("beta_",3))*get(paste0("x_",j,"_",3)) + #densite
      get(paste0("beta_",4))*get(paste0("x_",j,"_",4)) + #jobs
      get(paste0("beta_",5))*get(paste0("x_",j,"_",5)) + #taille men
      get(paste0("beta_",6))*get(paste0("x_",j,"_",6)) + #parc logements
      get(paste0("beta_",7))*get(paste0("x_",j,"_",7)) + #ecoles
      get(paste0("beta_",8))*get(paste0("x_",j,"_",8)) #distance cbd
  }

Section 2.5 from Train's book [Train (2009) Discrete choice methods with simulation, Cambridge University Press] contains more details on model identification.

Best wishes
David

lolabdn · Post by **lolabdn** » 02 Jun 2023, 13:35

Hi,

Thank you so much for your help David!

Best wishes
Lola

ApolloChoiceModelling forum

MNL for residential choices

MNL for residential choices

Re: MNL for residential choices

Re: MNL for residential choices

Re: MNL for residential choices

Re: MNL for residential choices