Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. We check the forum at least twice a week. It may thus take a couple of days for your post to appear and before we reply. There is no need to submit the post multiple times.

Huge difference if estimated with or without dummy variables

Ask questions about the results reported after estimation. If the output includes errors, please include your model code if possible.
Post Reply
kristof
Posts: 3
Joined: 09 Aug 2024, 11:34

Huge difference if estimated with or without dummy variables

Post by kristof »

Hello,

I do a classical mode choice experiment for my master thesis.
For some parameters I use dummy variables (number of changes, seat availability and days with congestion). I wanted to test the model with and without the dummy variables. Surprisingly, the estimates are very different, especially for parameters without dummy variables used in both cases. Is there an error in the code?

Estimation result without dummy variables:

Code: Select all

===============================================================
           Estimate Std.Error t_value p_value CI_lower CI_upper
---------------------------------------------------------------
asc_car       0                                                
asc_bus     0.568     0.095    5.983     0     0.382    0.754  
asc_train   1.252     0.120   10.471     0     1.017    1.486  
b_c_car     -0.228    0.010   -22.436    0     -0.248   -0.208 
b_c_bus     -0.202    0.007   -28.497    0     -0.216   -0.188 
b_c_train   -0.167    0.009   -18.137    0     -0.186   -0.149 
b_tt_car    -0.030    0.003   -9.669     0     -0.036   -0.024 
b_tt_bus    -0.051    0.002   -22.417    0     -0.056   -0.047 
b_tt_train  -0.057    0.002   -23.986    0     -0.062   -0.053 
b_wt_bus    0.028     0.016    1.783   0.075   -0.003   0.059  
b_wt_train  -0.041    0.009   -4.772  0.00000  -0.057   -0.024 
b_r_car     -0.088    0.016   -5.597  0.00000  -0.119   -0.057 
b_r_bus     -0.088    0.019   -4.574  0.00000  -0.125   -0.050 
b_r_train   -0.202    0.018   -11.342    0     -0.237   -0.167 
b_x_bus     -0.424    0.021   -20.279    0     -0.465   -0.383 
b_x_train   -0.175    0.034   -5.119  0.00000  -0.242   -0.108 
b_h_bus     -0.012    0.001   -12.069    0     -0.014   -0.010 
b_h_train   -0.008    0.001   -11.284    0     -0.010   -0.007 
b_s_bus     0.080     0.029    2.818   0.005   0.024    0.136  
b_s_train   0.002     0.025    0.073   0.942   -0.047   0.050 
Estimation results with dummy variables:

Code: Select all

================================================================
            Estimate Std.Error t_value p_value CI_lower CI_upper
----------------------------------------------------------------
asc_car        0                                                
asc_bus      -0.444    0.110   -4.029  0.0001   -0.660   -0.228 
asc_train    0.535     0.137    3.901  0.0001   0.266    0.804  
b_c_car      -0.149    0.013   -11.056    0     -0.175   -0.123 
b_c_bus      -0.139    0.010   -14.320    0     -0.158   -0.120 
b_c_train    -0.092    0.011   -8.419     0     -0.113   -0.070 
b_tt_car     -0.022    0.003   -6.478     0     -0.029   -0.016 
b_tt_bus     -0.052    0.003   -15.738    0     -0.058   -0.045 
b_tt_train   -0.033    0.003   -11.032    0     -0.039   -0.027 
b_wt_bus     0.115     0.018    6.477     0     0.080    0.149  
b_wt_train   -0.048    0.009   -5.276  0.00000  -0.065   -0.030 
b_r_car_1    -0.446    0.045   -9.921     0     -0.534   -0.358 
b_r_car_3    -0.309    0.050   -6.182     0     -0.407   -0.211 
b_r_bus_1    0.683     0.087    7.874     0     0.513    0.853  
b_r_bus_3    0.118     0.078    1.512   0.130   -0.035   0.271  
b_r_train_1  -0.648    0.053   -12.200    0     -0.752   -0.544 
b_r_train_3  -0.461    0.058   -7.913     0     -0.576   -0.347 
b_x_bus_1    0.186     0.059    3.142   0.002   0.070    0.302  
b_x_bus_2    -0.872    0.085   -10.203    0     -1.040   -0.705 
b_x_bus_3    -1.154    0.079   -14.594    0     -1.309   -0.999 
b_x_train_1  0.350     0.084    4.180  0.00003  0.186    0.515  
b_x_train_2  0.286     0.095    3.002   0.003   0.099    0.472  
b_x_train_3  0.027     0.124    0.216   0.829   -0.216   0.270  
b_h_bus     -0.0004    0.002   -0.236   0.813   -0.004   0.003  
b_h_train    -0.014    0.001   -15.304    0     -0.016   -0.012 
b_s_bus_1    -0.457    0.067   -6.803     0     -0.588   -0.325 
b_s_bus_2    0.199     0.067    2.961   0.003   0.067    0.331  
b_s_train_1  0.021     0.045    0.457   0.647   -0.068   0.110  
b_s_train_2  0.157     0.052    3.034   0.002   0.055    0.258  
----------------------------------------------------------------
The full R code without dummy variables:

Code: Select all

setwd("D:/2 Studium/Masterarbeit/R/Modell")

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  = "MNL",
  modelDescr = "MNL Logit Masterarbeit",
  indivID    = "ID",    # Identifikationsvariable für Personen
  panelData  = TRUE     # Panel-Daten werden verwendet
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

database = read.csv("Choices_Panel_Data.csv", header = TRUE)

### Create Dummy Variables

# X_BUS
#database$X_BUS_1 <- ifelse(database$X_BUS == 1, 1, 0)
#database$X_BUS_2 <- ifelse(database$X_BUS == 2, 1, 0)
#database$X_BUS_3 <- ifelse(database$X_BUS == 3, 1, 0)

# X_TRAIN
#database$X_TRAIN_1 <- ifelse(database$X_TRAIN == 1, 1, 0)
#database$X_TRAIN_2 <- ifelse(database$X_TRAIN == 2, 1, 0)
#database$X_TRAIN_3 <- ifelse(database$X_TRAIN == 3, 1, 0)

# S_BUS
#database$S_BUS_1 <- ifelse(database$S_BUS == 1, 1, 0)
#database$S_BUS_2 <- ifelse(database$S_BUS == 2, 1, 0)

# S_TRAIN
#database$S_TRAIN_1 <- ifelse(database$S_TRAIN == 1, 1, 0)
#database$S_TRAIN_2 <- ifelse(database$S_TRAIN == 2, 1, 0)

# R_CAR
#database$R_CAR_1 <- ifelse(database$R_CAR == 1, 1, 0)
#database$R_CAR_3 <- ifelse(database$R_CAR == 3, 1, 0)

# R_BUS
#database$R_BUS_1 <- ifelse(database$R_BUS == 1, 1, 0)
#database$R_BUS_3 <- ifelse(database$R_BUS == 3, 1, 0)

# R_TRAIN
#database$R_TRAIN_1 <- ifelse(database$R_TRAIN == 1, 1, 0)
#database$R_TRAIN_3 <- ifelse(database$R_TRAIN == 3, 1, 0)

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(
  asc_car       = 0,
  asc_bus       = 0,
  asc_train     = 0,
  b_c_car       = 0,
  b_c_bus       = 0,
  b_c_train     = 0,
  b_tt_car      = 0,
  b_tt_bus      = 0,
  b_tt_train    = 0,
  b_wt_bus      = 0,
  b_wt_train    = 0,
  b_r_car       = 0,
  b_r_bus       = 0,
  b_r_train     = 0,
  #b_r_car_1     = 0,
  #b_r_car_3     = 0,
  #b_r_bus_1     = 0,
  #b_r_bus_3     = 0,
  #b_r_train_1   = 0,
  #b_r_train_3   = 0,
  b_x_bus       = 0,
  b_x_train     = 0,
  #b_x_bus_1     = 0,
  #b_x_bus_2     = 0,
  #b_x_bus_3     = 0,
  #b_x_train_1   = 0,
  #b_x_train_2   = 0,
  #b_x_train_3   = 0,
  b_h_bus       = 0,
  b_h_train     = 0,
  b_s_bus       = 0,
  b_s_train     = 0
  #b_s_bus_1     = 0,
  #b_s_bus_2     = 0,
  #b_s_train_1   = 0,
  #b_s_train_2   = 0
)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_car")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  
  V[['car']] = 
    asc_car +
    b_c_car * C_CAR +
    b_tt_car * TT_CAR  +
    b_r_car * R_CAR
    #b_r_car_1 * R_CAR_1 +
    #b_r_car_3 * R_CAR_3
  
  V[['bus']] =
    asc_bus +
    b_c_bus * C_BUS +
    b_tt_bus * TT_BUS +
    b_wt_bus * WT_BUS +
    b_r_bus * R_BUS +
    b_x_bus * X_BUS +
    #b_r_bus_1 * R_BUS_1 +
    #b_r_bus_3 * R_BUS_3 +
    #b_x_bus_1 * X_BUS_1 +
    #b_x_bus_2 * X_BUS_2 +
    #b_x_bus_3 * X_BUS_3 +
    b_h_bus * H_BUS +
    b_s_bus * S_BUS
    #b_s_bus_1 * S_BUS_1 +
    #b_s_bus_2 * S_BUS_2
  
  V[['train']] =
    asc_train +
    b_c_train * C_TRAIN +
    b_tt_train * TT_TRAIN +
    b_wt_train * WT_TRAIN + 
    b_r_train * R_TRAIN +
    b_x_train * X_TRAIN +
    #b_r_train_1 * R_TRAIN_1 +
    #b_r_train_3 * R_TRAIN_3 +
    #b_x_train_1 * X_TRAIN_1 +
    #b_x_train_2 * X_TRAIN_2 +
    #b_x_train_3 * X_TRAIN_3 +
    b_h_train * H_TRAIN + 
    b_s_train * S_TRAIN
    #b_s_train_1 * S_TRAIN_1 +
    #b_s_train_2 * S_TRAIN_2
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(car = 1, bus = 2, train = 3), 
    avail         = list(car = AV_CAR, bus = AV_BUS, train = AV_TRAIN), 
    choiceVar     = CHOICE,
    V             = V
  )
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality) 
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

# ################################################################# #
#### MODEL OUTPUTS                                               ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN)                               ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model, list(printPVal = 2))

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO FILE, using model name)               ----
# ----------------------------------------------------------------- #

apollo_saveOutput(model)

##########

# Extract estimates
estimates <- model$estimate

# Calculate utilities
V_car   <- estimates['asc_car'] +
  estimates['b_c_car'] * database$C_CAR +
  estimates['b_tt_car'] * database$TT_CAR +
  estimates['b_r_car'] * database$R_CAR
  #estimates['b_r_car_1'] * database$R_CAR_1 +
  #estimates['b_r_car_3'] * database$R_CAR_3

V_bus   <- estimates['asc_bus'] +
  estimates['b_c_bus'] * database$C_BUS +
  estimates['b_tt_bus'] * database$TT_BUS +
  estimates['b_wt_bus'] * database$WT_BUS +
  estimates['b_r_bus'] * database$R_BUS +
  estimates['b_x_bus'] * database$X_BUS +
  #estimates['b_r_bus_1'] * database$R_BUS_1 +
  #estimates['b_r_bus_3'] * database$R_BUS_3 +
  #estimates['b_x_bus_1'] * database$X_BUS_1 +
  #estimates['b_x_bus_2'] * database$X_BUS_2 +
  #estimates['b_x_bus_3'] * database$X_BUS_3 +
  estimates['b_h_bus'] * database$H_BUS +
  estimates['b_s_bus'] * database$S_BUS
  #estimates['b_s_bus_1'] * database$S_BUS_1 +
  #estimates['b_s_bus_2'] * database$S_BUS_2

V_train <- estimates['asc_train'] +
  estimates['b_c_train'] * database$C_TRAIN +
  estimates['b_tt_train'] * database$TT_TRAIN +
  estimates['b_wt_train'] * database$WT_TRAIN +
  estimates['b_r_train'] * database$R_TRAIN +
  estimates['b_x_train'] * database$X_TRAIN +
  #estimates['b_r_train_1'] * database$R_TRAIN_1 +
  #estimates['b_r_train_3'] * database$R_TRAIN_3 +
  #estimates['b_x_train_1'] * database$X_TRAIN_1 +
  #estimates['b_x_train_2'] * database$X_TRAIN_2 +
  #estimates['b_x_train_3'] * database$X_TRAIN_3 +
  estimates['b_h_train'] * database$H_TRAIN +
  estimates['b_s_train'] * database$S_TRAIN
  #estimates['b_s_train_1'] * database$S_TRAIN_1 +
  #estimates['b_s_train_2'] * database$S_TRAIN_2

# Calculate exponential utilities
exp_V_car   <- exp(V_car)
exp_V_bus   <- exp(V_bus)
exp_V_train <- exp(V_train)

# Calculate total utility
sum_exp_V <- exp_V_car + exp_V_bus + exp_V_train

# Calculate choice probabilities
P_car <- exp_V_car / sum_exp_V
P_bus <- exp_V_bus / sum_exp_V
P_train <- exp_V_train / sum_exp_V

# Save utilities and probabilities in the dataset
database$V_car <- V_car
database$V_bus <- V_bus
database$V_train <- V_train
database$P_car <- P_car
database$P_bus <- P_bus
database$P_train <- P_train

# Summarize the results
summary(database[, c('V_car', 'V_bus', 'V_train', 'P_car', 'P_bus', 'P_train')])

##########

# Extract relevant coefficients from estimates
model_results <- data.frame(
  estimate = estimates[names(estimates) != "call"],
  SE = sapply(estimates[names(estimates) != "call"], sd),  # Assuming standard errors are available
  statistic = "z",
  p.value = 2 * pnorm(-abs(estimates[names(estimates) != "call"] / sapply(estimates[names(estimates) != "call"], sd)))  # Assuming z-statistics
)

# Use stargazer with the created data frame

# Extrahiere Schätzungen und Standardfehler aus dem Modell
estimates <- model$estimate
se <- model$se

# Benenne die Spalten
names(estimates) <- gsub("`", "", names(estimates)) # Entfernt Backticks aus den Namen, falls vorhanden
names(se) <- gsub("`", "", names(se))

# Berechne t-Werte und p-Werte
t_values <- estimates / se
p_values <- 2 * (1 - pnorm(abs(t_values)))

# Berechne 95% Konfidenzintervalle
ci_lower <- estimates - 1.96 * se
ci_upper <- estimates + 1.96 * se

# Erstelle einen Datenrahmen der Schätzungen, Standardfehler, t-Werte, p-Werte und Konfidenzintervalle
results_df <- data.frame(
  Estimate = estimates,
  Std.Error = se,
  t_value = t_values,
  p_value = p_values,
  CI_lower = ci_lower,
  CI_upper = ci_upper
)

# Benutze stargazer, um die Ergebnisse anzuzeigen
library(stargazer)
stargazer(results_df, type = "text", summary = FALSE)
The full code with dummy variables:

Code: Select all

setwd("D:/2 Studium/Masterarbeit/R/Modell")

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  = "MNL",
  modelDescr = "MNL Logit Masterarbeit",
  indivID    = "ID",    # Identifikationsvariable für Personen
  panelData  = TRUE     # Panel-Daten werden verwendet
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

database = read.csv("Choices_Panel_Data.csv", header = TRUE)

#database <- subset(database, STADT == 1)

#database$STADT <- as.numeric(database$STADT)
#database = subset(database,database$STADT==1)

### Create Dummy Variables

# X_BUS
database$X_BUS_1 <- ifelse(database$X_BUS == 1, 1, 0)
database$X_BUS_2 <- ifelse(database$X_BUS == 2, 1, 0)
database$X_BUS_3 <- ifelse(database$X_BUS == 3, 1, 0)

# X_TRAIN
database$X_TRAIN_1 <- ifelse(database$X_TRAIN == 1, 1, 0)
database$X_TRAIN_2 <- ifelse(database$X_TRAIN == 2, 1, 0)
database$X_TRAIN_3 <- ifelse(database$X_TRAIN == 3, 1, 0)

# S_BUS
database$S_BUS_1 <- ifelse(database$S_BUS == 1, 1, 0)
database$S_BUS_2 <- ifelse(database$S_BUS == 2, 1, 0)

# S_TRAIN
database$S_TRAIN_1 <- ifelse(database$S_TRAIN == 1, 1, 0)
database$S_TRAIN_2 <- ifelse(database$S_TRAIN == 2, 1, 0)

# R_CAR
database$R_CAR_1 <- ifelse(database$R_CAR == 1, 1, 0)
database$R_CAR_3 <- ifelse(database$R_CAR == 3, 1, 0)

# R_BUS
database$R_BUS_1 <- ifelse(database$R_BUS == 1, 1, 0)
database$R_BUS_3 <- ifelse(database$R_BUS == 3, 1, 0)

# R_TRAIN
database$R_TRAIN_1 <- ifelse(database$R_TRAIN == 1, 1, 0)
database$R_TRAIN_3 <- ifelse(database$R_TRAIN == 3, 1, 0)

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(
  asc_car       = 0,
  asc_bus       = 0,
  asc_train     = 0,
  b_c_car       = 0,
  b_c_bus       = 0,
  b_c_train     = 0,
  b_tt_car      = 0,
  b_tt_bus      = 0,
  b_tt_train    = 0,
  b_wt_bus      = 0,
  b_wt_train    = 0,
  b_r_car_1     = 0,
  b_r_car_3     = 0,
  b_r_bus_1     = 0,
  b_r_bus_3     = 0,
  b_r_train_1   = 0,
  b_r_train_3   = 0,
  b_x_bus_1     = 0,
  b_x_bus_2     = 0,
  b_x_bus_3     = 0,
  b_x_train_1   = 0,
  b_x_train_2   = 0,
  b_x_train_3   = 0,
  b_h_bus       = 0,
  b_h_train     = 0,
  b_s_bus_1     = 0,
  b_s_bus_2     = 0,
  b_s_train_1   = 0,
  b_s_train_2   = 0
)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_car")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  
  V[['car']] = 
    asc_car +
    b_c_car * C_CAR +
    b_tt_car * TT_CAR  +
    b_r_car_1 * R_CAR_1 +
    b_r_car_3 * R_CAR_3
  
  V[['bus']] =
    asc_bus +
    b_c_bus * C_BUS +
    b_tt_bus * TT_BUS +
    b_wt_bus * WT_BUS + 
    b_r_bus_1 * R_BUS_1 +
    b_r_bus_3 * R_BUS_3 +
    b_x_bus_1 * X_BUS_1 +
    b_x_bus_2 * X_BUS_2 +
    b_x_bus_3 * X_BUS_3 +
    b_h_bus * H_BUS +
    b_s_bus_1 * S_BUS_1 +
    b_s_bus_2 * S_BUS_2
  
  V[['train']] =
    asc_train +
    b_c_train * C_TRAIN +
    b_tt_train * TT_TRAIN +
    b_wt_train * WT_TRAIN + 
    b_r_train_1 * R_TRAIN_1 +
    b_r_train_3 * R_TRAIN_3 +
    b_x_train_1 * X_TRAIN_1 +
    b_x_train_2 * X_TRAIN_2 +
    b_x_train_3 * X_TRAIN_3 +
    b_h_train * H_TRAIN + 
    b_s_train_1 * S_TRAIN_1 +
    b_s_train_2 * S_TRAIN_2
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(car = 1, bus = 2, train = 3), 
    avail         = list(car = AV_CAR, bus = AV_BUS, train = AV_TRAIN), 
    choiceVar     = CHOICE,
    V             = V
  )
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality) 
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

# ################################################################# #
#### MODEL OUTPUTS                                               ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN)                               ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model, list(printPVal = 2))

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO FILE, using model name)               ----
# ----------------------------------------------------------------- #

apollo_saveOutput(model)

##########

# Extract estimates
estimates <- model$estimate

# Calculate utilities
V_car   <- estimates['asc_car'] +
  estimates['b_c_car'] * database$C_CAR +
  estimates['b_tt_car'] * database$TT_CAR + 
  estimates['b_r_car_1'] * database$R_CAR_1 +
  estimates['b_r_car_3'] * database$R_CAR_3

V_bus   <- estimates['asc_bus'] +
  estimates['b_c_bus'] * database$C_BUS +
  estimates['b_tt_bus'] * database$TT_BUS +
  estimates['b_wt_bus'] * database$WT_BUS + 
  estimates['b_r_bus_1'] * database$R_BUS_1 +
  estimates['b_r_bus_3'] * database$R_BUS_3 +
  estimates['b_x_bus_1'] * database$X_BUS_1 +
  estimates['b_x_bus_2'] * database$X_BUS_2 +
  estimates['b_x_bus_3'] * database$X_BUS_3 +
  estimates['b_h_bus'] * database$H_BUS +  
  estimates['b_s_bus_1'] * database$S_BUS_1 +
  estimates['b_s_bus_2'] * database$S_BUS_2

V_train <- estimates['asc_train'] +
  estimates['b_c_train'] * database$C_TRAIN +
  estimates['b_tt_train'] * database$TT_TRAIN +
  estimates['b_wt_train'] * database$WT_TRAIN + 
  estimates['b_r_train_1'] * database$R_TRAIN_1 +
  estimates['b_r_train_3'] * database$R_TRAIN_3 +
  estimates['b_x_train_1'] * database$X_TRAIN_1 +
  estimates['b_x_train_2'] * database$X_TRAIN_2 +
  estimates['b_x_train_3'] * database$X_TRAIN_3 +
  estimates['b_h_train'] * database$H_TRAIN +
  estimates['b_s_train_1'] * database$S_TRAIN_1 +
  estimates['b_s_train_2'] * database$S_TRAIN_2

# Calculate exponential utilities
exp_V_car   <- exp(V_car)
exp_V_bus   <- exp(V_bus)
exp_V_train <- exp(V_train)

# Calculate total utility
sum_exp_V <- exp_V_car + exp_V_bus + exp_V_train

# Calculate choice probabilities
P_car <- exp_V_car / sum_exp_V
P_bus <- exp_V_bus / sum_exp_V
P_train <- exp_V_train / sum_exp_V

# Save utilities and probabilities in the dataset
database$V_car <- V_car
database$V_bus <- V_bus
database$V_train <- V_train
database$P_car <- P_car
database$P_bus <- P_bus
database$P_train <- P_train

# Summarize the results
summary(database[, c('V_car', 'V_bus', 'V_train', 'P_car', 'P_bus', 'P_train')])

##########

# Extract relevant coefficients from estimates
model_results <- data.frame(
  estimate = estimates[names(estimates) != "call"],
  SE = sapply(estimates[names(estimates) != "call"], sd),  # Assuming standard errors are available
  statistic = "z",
  p.value = 2 * pnorm(-abs(estimates[names(estimates) != "call"] / sapply(estimates[names(estimates) != "call"], sd)))  # Assuming z-statistics
)

# Use stargazer with the created data frame

# Extrahiere Schätzungen und Standardfehler aus dem Modell
estimates <- model$estimate
se <- model$se

# Benenne die Spalten
names(estimates) <- gsub("`", "", names(estimates)) # Entfernt Backticks aus den Namen, falls vorhanden
names(se) <- gsub("`", "", names(se))

# Berechne t-Werte und p-Werte
t_values <- estimates / se
p_values <- 2 * (1 - pnorm(abs(t_values)))

# Berechne 95% Konfidenzintervalle
ci_lower <- estimates - 1.96 * se
ci_upper <- estimates + 1.96 * se

# Erstelle einen Datenrahmen der Schätzungen, Standardfehler, t-Werte, p-Werte und Konfidenzintervalle
results_df <- data.frame(
  Estimate = estimates,
  Std.Error = se,
  t_value = t_values,
  p_value = p_values,
  CI_lower = ci_lower,
  CI_upper = ci_upper
)

# Benutze stargazer, um die Ergebnisse anzuzeigen
library(stargazer)
stargazer(results_df, type = "text", summary = FALSE)
stephanehess
Site Admin
Posts: 1355
Joined: 24 Apr 2020, 16:29

Re: Huge difference if estimated with or without dummy variables

Post by stephanehess »

Hi

can you please add the model fit to your output?

Thanks

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
kristof
Posts: 3
Joined: 09 Aug 2024, 11:34

Re: Huge difference if estimated with or without dummy variables

Post by kristof »

Yes of course.

Model fit with Dummies:

Code: Select all

Model name                                  : MNL
Model description                           : MNL Logit Masterarbeit
Model run at                                : 2024-08-12 12:42:56.945836
Estimation method                           : bgw
Model diagnosis                             : Relative function convergence
Optimisation diagnosis                      : Maximum found
     hessian properties                     : Negative definite
     maximum eigenvalue                     : -19.837196
     reciprocal of condition number         : 1.08227e-06
Number of individuals                       : 891
Number of rows in database                  : 21384
Number of modelled outcomes                 : 21384

Number of cores used                        :  1 
Model without mixing

LL(start)                                   : -23492.73
LL at equal shares, LL(0)                   : -23492.73
LL at observed shares, LL(C)                : -23188.03
LL(final)                                   : -21339.12
Rho-squared vs equal shares                  :  0.0917 
Adj.Rho-squared vs equal shares              :  0.0905 
Rho-squared vs observed shares               :  0.0797 
Adj.Rho-squared vs observed shares           :  0.0786 
AIC                                         :  42734.24 
BIC                                         :  42957.41 

Estimated parameters                        : 28
Time taken (hh:mm:ss)                       :  00:00:7.55 
     pre-estimation                         :  00:00:0.48 
     estimation                             :  00:00:1.26 
          initial estimation                :  00:00:0.93 
          estimation after rescaling        :  00:00:0.33 
     post-estimation                        :  00:00:5.81 
Iterations                                  :  11  
     initial estimation                     :  10 
     estimation after rescaling             :  1 
     
Model fit without Dummies:

Code: Select all

Model name                                  : MNL
Model description                           : MNL Logit Masterarbeit
Model run at                                : 2024-08-12 12:44:25.882344
Estimation method                           : bgw
Model diagnosis                             : Relative function convergence
Optimisation diagnosis                      : Maximum found
     hessian properties                     : Negative definite
     maximum eigenvalue                     : -50.670775
     reciprocal of condition number         : 2.79803e-06
Number of individuals                       : 891
Number of rows in database                  : 21384
Number of modelled outcomes                 : 21384

Number of cores used                        :  1 
Model without mixing

LL(start)                                   : -23492.73
LL at equal shares, LL(0)                   : -23492.73
LL at observed shares, LL(C)                : -23188.03
LL(final)                                   : -21552.33
Rho-squared vs equal shares                  :  0.0826 
Adj.Rho-squared vs equal shares              :  0.0818 
Rho-squared vs observed shares               :  0.0705 
Adj.Rho-squared vs observed shares           :  0.0698 
AIC                                         :  43142.67 
BIC                                         :  43294.11 

Estimated parameters                        : 19
Time taken (hh:mm:ss)                       :  00:00:4.13 
     pre-estimation                         :  00:00:0.37 
     estimation                             :  00:00:0.83 
          initial estimation                :  00:00:0.55 
          estimation after rescaling        :  00:00:0.29 
     post-estimation                        :  00:00:2.93 
Iterations                                  :  10  
     initial estimation                     :  9 
     estimation after rescaling             :  1 
stephanehess
Site Admin
Posts: 1355
Joined: 24 Apr 2020, 16:29

Re: Huge difference if estimated with or without dummy variables

Post by stephanehess »

Hi

so there's nothing to worry about here. The model with the categorical treatment is clearly better. There is evidence of non-linearity in your data, and the linear treatment couldn't handle that of course

Hope that helps

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
kristof
Posts: 3
Joined: 09 Aug 2024, 11:34

Re: Huge difference if estimated with or without dummy variables

Post by kristof »

Hi Stephane,

thank you very much for your great support!

So the model with the dummy variables is working fine - which means if some estimates are not plausible or not significant, I should exclude them in the estimation or in the utility function?

Kristof
stephanehess
Site Admin
Posts: 1355
Joined: 24 Apr 2020, 16:29

Re: Huge difference if estimated with or without dummy variables

Post by stephanehess »

If you remove parameters, you will lose model fit. So it's probably not a good idea in general. and it may also bias other parameters, if the original parameter was statistically different from zero
--------------------------------
Stephane Hess
www.stephanehess.me.uk
Post Reply