Page 1 of 1

Huge difference if estimated with or without dummy variables

Posted: 12 Aug 2024, 08:08
by kristof
Hello,

I do a classical mode choice experiment for my master thesis.
For some parameters I use dummy variables (number of changes, seat availability and days with congestion). I wanted to test the model with and without the dummy variables. Surprisingly, the estimates are very different, especially for parameters without dummy variables used in both cases. Is there an error in the code?

Estimation result without dummy variables:

Code: Select all

===============================================================
           Estimate Std.Error t_value p_value CI_lower CI_upper
---------------------------------------------------------------
asc_car       0                                                
asc_bus     0.568     0.095    5.983     0     0.382    0.754  
asc_train   1.252     0.120   10.471     0     1.017    1.486  
b_c_car     -0.228    0.010   -22.436    0     -0.248   -0.208 
b_c_bus     -0.202    0.007   -28.497    0     -0.216   -0.188 
b_c_train   -0.167    0.009   -18.137    0     -0.186   -0.149 
b_tt_car    -0.030    0.003   -9.669     0     -0.036   -0.024 
b_tt_bus    -0.051    0.002   -22.417    0     -0.056   -0.047 
b_tt_train  -0.057    0.002   -23.986    0     -0.062   -0.053 
b_wt_bus    0.028     0.016    1.783   0.075   -0.003   0.059  
b_wt_train  -0.041    0.009   -4.772  0.00000  -0.057   -0.024 
b_r_car     -0.088    0.016   -5.597  0.00000  -0.119   -0.057 
b_r_bus     -0.088    0.019   -4.574  0.00000  -0.125   -0.050 
b_r_train   -0.202    0.018   -11.342    0     -0.237   -0.167 
b_x_bus     -0.424    0.021   -20.279    0     -0.465   -0.383 
b_x_train   -0.175    0.034   -5.119  0.00000  -0.242   -0.108 
b_h_bus     -0.012    0.001   -12.069    0     -0.014   -0.010 
b_h_train   -0.008    0.001   -11.284    0     -0.010   -0.007 
b_s_bus     0.080     0.029    2.818   0.005   0.024    0.136  
b_s_train   0.002     0.025    0.073   0.942   -0.047   0.050 
Estimation results with dummy variables:

Code: Select all

================================================================
            Estimate Std.Error t_value p_value CI_lower CI_upper
----------------------------------------------------------------
asc_car        0                                                
asc_bus      -0.444    0.110   -4.029  0.0001   -0.660   -0.228 
asc_train    0.535     0.137    3.901  0.0001   0.266    0.804  
b_c_car      -0.149    0.013   -11.056    0     -0.175   -0.123 
b_c_bus      -0.139    0.010   -14.320    0     -0.158   -0.120 
b_c_train    -0.092    0.011   -8.419     0     -0.113   -0.070 
b_tt_car     -0.022    0.003   -6.478     0     -0.029   -0.016 
b_tt_bus     -0.052    0.003   -15.738    0     -0.058   -0.045 
b_tt_train   -0.033    0.003   -11.032    0     -0.039   -0.027 
b_wt_bus     0.115     0.018    6.477     0     0.080    0.149  
b_wt_train   -0.048    0.009   -5.276  0.00000  -0.065   -0.030 
b_r_car_1    -0.446    0.045   -9.921     0     -0.534   -0.358 
b_r_car_3    -0.309    0.050   -6.182     0     -0.407   -0.211 
b_r_bus_1    0.683     0.087    7.874     0     0.513    0.853  
b_r_bus_3    0.118     0.078    1.512   0.130   -0.035   0.271  
b_r_train_1  -0.648    0.053   -12.200    0     -0.752   -0.544 
b_r_train_3  -0.461    0.058   -7.913     0     -0.576   -0.347 
b_x_bus_1    0.186     0.059    3.142   0.002   0.070    0.302  
b_x_bus_2    -0.872    0.085   -10.203    0     -1.040   -0.705 
b_x_bus_3    -1.154    0.079   -14.594    0     -1.309   -0.999 
b_x_train_1  0.350     0.084    4.180  0.00003  0.186    0.515  
b_x_train_2  0.286     0.095    3.002   0.003   0.099    0.472  
b_x_train_3  0.027     0.124    0.216   0.829   -0.216   0.270  
b_h_bus     -0.0004    0.002   -0.236   0.813   -0.004   0.003  
b_h_train    -0.014    0.001   -15.304    0     -0.016   -0.012 
b_s_bus_1    -0.457    0.067   -6.803     0     -0.588   -0.325 
b_s_bus_2    0.199     0.067    2.961   0.003   0.067    0.331  
b_s_train_1  0.021     0.045    0.457   0.647   -0.068   0.110  
b_s_train_2  0.157     0.052    3.034   0.002   0.055    0.258  
----------------------------------------------------------------
The full R code without dummy variables:

Code: Select all

setwd("D:/2 Studium/Masterarbeit/R/Modell")

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  = "MNL",
  modelDescr = "MNL Logit Masterarbeit",
  indivID    = "ID",    # Identifikationsvariable für Personen
  panelData  = TRUE     # Panel-Daten werden verwendet
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

database = read.csv("Choices_Panel_Data.csv", header = TRUE)

### Create Dummy Variables

# X_BUS
#database$X_BUS_1 <- ifelse(database$X_BUS == 1, 1, 0)
#database$X_BUS_2 <- ifelse(database$X_BUS == 2, 1, 0)
#database$X_BUS_3 <- ifelse(database$X_BUS == 3, 1, 0)

# X_TRAIN
#database$X_TRAIN_1 <- ifelse(database$X_TRAIN == 1, 1, 0)
#database$X_TRAIN_2 <- ifelse(database$X_TRAIN == 2, 1, 0)
#database$X_TRAIN_3 <- ifelse(database$X_TRAIN == 3, 1, 0)

# S_BUS
#database$S_BUS_1 <- ifelse(database$S_BUS == 1, 1, 0)
#database$S_BUS_2 <- ifelse(database$S_BUS == 2, 1, 0)

# S_TRAIN
#database$S_TRAIN_1 <- ifelse(database$S_TRAIN == 1, 1, 0)
#database$S_TRAIN_2 <- ifelse(database$S_TRAIN == 2, 1, 0)

# R_CAR
#database$R_CAR_1 <- ifelse(database$R_CAR == 1, 1, 0)
#database$R_CAR_3 <- ifelse(database$R_CAR == 3, 1, 0)

# R_BUS
#database$R_BUS_1 <- ifelse(database$R_BUS == 1, 1, 0)
#database$R_BUS_3 <- ifelse(database$R_BUS == 3, 1, 0)

# R_TRAIN
#database$R_TRAIN_1 <- ifelse(database$R_TRAIN == 1, 1, 0)
#database$R_TRAIN_3 <- ifelse(database$R_TRAIN == 3, 1, 0)

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(
  asc_car       = 0,
  asc_bus       = 0,
  asc_train     = 0,
  b_c_car       = 0,
  b_c_bus       = 0,
  b_c_train     = 0,
  b_tt_car      = 0,
  b_tt_bus      = 0,
  b_tt_train    = 0,
  b_wt_bus      = 0,
  b_wt_train    = 0,
  b_r_car       = 0,
  b_r_bus       = 0,
  b_r_train     = 0,
  #b_r_car_1     = 0,
  #b_r_car_3     = 0,
  #b_r_bus_1     = 0,
  #b_r_bus_3     = 0,
  #b_r_train_1   = 0,
  #b_r_train_3   = 0,
  b_x_bus       = 0,
  b_x_train     = 0,
  #b_x_bus_1     = 0,
  #b_x_bus_2     = 0,
  #b_x_bus_3     = 0,
  #b_x_train_1   = 0,
  #b_x_train_2   = 0,
  #b_x_train_3   = 0,
  b_h_bus       = 0,
  b_h_train     = 0,
  b_s_bus       = 0,
  b_s_train     = 0
  #b_s_bus_1     = 0,
  #b_s_bus_2     = 0,
  #b_s_train_1   = 0,
  #b_s_train_2   = 0
)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_car")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  
  V[['car']] = 
    asc_car +
    b_c_car * C_CAR +
    b_tt_car * TT_CAR  +
    b_r_car * R_CAR
    #b_r_car_1 * R_CAR_1 +
    #b_r_car_3 * R_CAR_3
  
  V[['bus']] =
    asc_bus +
    b_c_bus * C_BUS +
    b_tt_bus * TT_BUS +
    b_wt_bus * WT_BUS +
    b_r_bus * R_BUS +
    b_x_bus * X_BUS +
    #b_r_bus_1 * R_BUS_1 +
    #b_r_bus_3 * R_BUS_3 +
    #b_x_bus_1 * X_BUS_1 +
    #b_x_bus_2 * X_BUS_2 +
    #b_x_bus_3 * X_BUS_3 +
    b_h_bus * H_BUS +
    b_s_bus * S_BUS
    #b_s_bus_1 * S_BUS_1 +
    #b_s_bus_2 * S_BUS_2
  
  V[['train']] =
    asc_train +
    b_c_train * C_TRAIN +
    b_tt_train * TT_TRAIN +
    b_wt_train * WT_TRAIN + 
    b_r_train * R_TRAIN +
    b_x_train * X_TRAIN +
    #b_r_train_1 * R_TRAIN_1 +
    #b_r_train_3 * R_TRAIN_3 +
    #b_x_train_1 * X_TRAIN_1 +
    #b_x_train_2 * X_TRAIN_2 +
    #b_x_train_3 * X_TRAIN_3 +
    b_h_train * H_TRAIN + 
    b_s_train * S_TRAIN
    #b_s_train_1 * S_TRAIN_1 +
    #b_s_train_2 * S_TRAIN_2
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(car = 1, bus = 2, train = 3), 
    avail         = list(car = AV_CAR, bus = AV_BUS, train = AV_TRAIN), 
    choiceVar     = CHOICE,
    V             = V
  )
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality) 
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

# ################################################################# #
#### MODEL OUTPUTS                                               ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN)                               ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model, list(printPVal = 2))

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO FILE, using model name)               ----
# ----------------------------------------------------------------- #

apollo_saveOutput(model)

##########

# Extract estimates
estimates <- model$estimate

# Calculate utilities
V_car   <- estimates['asc_car'] +
  estimates['b_c_car'] * database$C_CAR +
  estimates['b_tt_car'] * database$TT_CAR +
  estimates['b_r_car'] * database$R_CAR
  #estimates['b_r_car_1'] * database$R_CAR_1 +
  #estimates['b_r_car_3'] * database$R_CAR_3

V_bus   <- estimates['asc_bus'] +
  estimates['b_c_bus'] * database$C_BUS +
  estimates['b_tt_bus'] * database$TT_BUS +
  estimates['b_wt_bus'] * database$WT_BUS +
  estimates['b_r_bus'] * database$R_BUS +
  estimates['b_x_bus'] * database$X_BUS +
  #estimates['b_r_bus_1'] * database$R_BUS_1 +
  #estimates['b_r_bus_3'] * database$R_BUS_3 +
  #estimates['b_x_bus_1'] * database$X_BUS_1 +
  #estimates['b_x_bus_2'] * database$X_BUS_2 +
  #estimates['b_x_bus_3'] * database$X_BUS_3 +
  estimates['b_h_bus'] * database$H_BUS +
  estimates['b_s_bus'] * database$S_BUS
  #estimates['b_s_bus_1'] * database$S_BUS_1 +
  #estimates['b_s_bus_2'] * database$S_BUS_2

V_train <- estimates['asc_train'] +
  estimates['b_c_train'] * database$C_TRAIN +
  estimates['b_tt_train'] * database$TT_TRAIN +
  estimates['b_wt_train'] * database$WT_TRAIN +
  estimates['b_r_train'] * database$R_TRAIN +
  estimates['b_x_train'] * database$X_TRAIN +
  #estimates['b_r_train_1'] * database$R_TRAIN_1 +
  #estimates['b_r_train_3'] * database$R_TRAIN_3 +
  #estimates['b_x_train_1'] * database$X_TRAIN_1 +
  #estimates['b_x_train_2'] * database$X_TRAIN_2 +
  #estimates['b_x_train_3'] * database$X_TRAIN_3 +
  estimates['b_h_train'] * database$H_TRAIN +
  estimates['b_s_train'] * database$S_TRAIN
  #estimates['b_s_train_1'] * database$S_TRAIN_1 +
  #estimates['b_s_train_2'] * database$S_TRAIN_2

# Calculate exponential utilities
exp_V_car   <- exp(V_car)
exp_V_bus   <- exp(V_bus)
exp_V_train <- exp(V_train)

# Calculate total utility
sum_exp_V <- exp_V_car + exp_V_bus + exp_V_train

# Calculate choice probabilities
P_car <- exp_V_car / sum_exp_V
P_bus <- exp_V_bus / sum_exp_V
P_train <- exp_V_train / sum_exp_V

# Save utilities and probabilities in the dataset
database$V_car <- V_car
database$V_bus <- V_bus
database$V_train <- V_train
database$P_car <- P_car
database$P_bus <- P_bus
database$P_train <- P_train

# Summarize the results
summary(database[, c('V_car', 'V_bus', 'V_train', 'P_car', 'P_bus', 'P_train')])

##########

# Extract relevant coefficients from estimates
model_results <- data.frame(
  estimate = estimates[names(estimates) != "call"],
  SE = sapply(estimates[names(estimates) != "call"], sd),  # Assuming standard errors are available
  statistic = "z",
  p.value = 2 * pnorm(-abs(estimates[names(estimates) != "call"] / sapply(estimates[names(estimates) != "call"], sd)))  # Assuming z-statistics
)

# Use stargazer with the created data frame

# Extrahiere Schätzungen und Standardfehler aus dem Modell
estimates <- model$estimate
se <- model$se

# Benenne die Spalten
names(estimates) <- gsub("`", "", names(estimates)) # Entfernt Backticks aus den Namen, falls vorhanden
names(se) <- gsub("`", "", names(se))

# Berechne t-Werte und p-Werte
t_values <- estimates / se
p_values <- 2 * (1 - pnorm(abs(t_values)))

# Berechne 95% Konfidenzintervalle
ci_lower <- estimates - 1.96 * se
ci_upper <- estimates + 1.96 * se

# Erstelle einen Datenrahmen der Schätzungen, Standardfehler, t-Werte, p-Werte und Konfidenzintervalle
results_df <- data.frame(
  Estimate = estimates,
  Std.Error = se,
  t_value = t_values,
  p_value = p_values,
  CI_lower = ci_lower,
  CI_upper = ci_upper
)

# Benutze stargazer, um die Ergebnisse anzuzeigen
library(stargazer)
stargazer(results_df, type = "text", summary = FALSE)
The full code with dummy variables:

Code: Select all

setwd("D:/2 Studium/Masterarbeit/R/Modell")

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Clear memory
rm(list = ls())

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  = "MNL",
  modelDescr = "MNL Logit Masterarbeit",
  indivID    = "ID",    # Identifikationsvariable für Personen
  panelData  = TRUE     # Panel-Daten werden verwendet
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

database = read.csv("Choices_Panel_Data.csv", header = TRUE)

#database <- subset(database, STADT == 1)

#database$STADT <- as.numeric(database$STADT)
#database = subset(database,database$STADT==1)

### Create Dummy Variables

# X_BUS
database$X_BUS_1 <- ifelse(database$X_BUS == 1, 1, 0)
database$X_BUS_2 <- ifelse(database$X_BUS == 2, 1, 0)
database$X_BUS_3 <- ifelse(database$X_BUS == 3, 1, 0)

# X_TRAIN
database$X_TRAIN_1 <- ifelse(database$X_TRAIN == 1, 1, 0)
database$X_TRAIN_2 <- ifelse(database$X_TRAIN == 2, 1, 0)
database$X_TRAIN_3 <- ifelse(database$X_TRAIN == 3, 1, 0)

# S_BUS
database$S_BUS_1 <- ifelse(database$S_BUS == 1, 1, 0)
database$S_BUS_2 <- ifelse(database$S_BUS == 2, 1, 0)

# S_TRAIN
database$S_TRAIN_1 <- ifelse(database$S_TRAIN == 1, 1, 0)
database$S_TRAIN_2 <- ifelse(database$S_TRAIN == 2, 1, 0)

# R_CAR
database$R_CAR_1 <- ifelse(database$R_CAR == 1, 1, 0)
database$R_CAR_3 <- ifelse(database$R_CAR == 3, 1, 0)

# R_BUS
database$R_BUS_1 <- ifelse(database$R_BUS == 1, 1, 0)
database$R_BUS_3 <- ifelse(database$R_BUS == 3, 1, 0)

# R_TRAIN
database$R_TRAIN_1 <- ifelse(database$R_TRAIN == 1, 1, 0)
database$R_TRAIN_3 <- ifelse(database$R_TRAIN == 3, 1, 0)

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(
  asc_car       = 0,
  asc_bus       = 0,
  asc_train     = 0,
  b_c_car       = 0,
  b_c_bus       = 0,
  b_c_train     = 0,
  b_tt_car      = 0,
  b_tt_bus      = 0,
  b_tt_train    = 0,
  b_wt_bus      = 0,
  b_wt_train    = 0,
  b_r_car_1     = 0,
  b_r_car_3     = 0,
  b_r_bus_1     = 0,
  b_r_bus_3     = 0,
  b_r_train_1   = 0,
  b_r_train_3   = 0,
  b_x_bus_1     = 0,
  b_x_bus_2     = 0,
  b_x_bus_3     = 0,
  b_x_train_1   = 0,
  b_x_train_2   = 0,
  b_x_train_3   = 0,
  b_h_bus       = 0,
  b_h_train     = 0,
  b_s_bus_1     = 0,
  b_s_bus_2     = 0,
  b_s_train_1   = 0,
  b_s_train_2   = 0
)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_car")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  
  V[['car']] = 
    asc_car +
    b_c_car * C_CAR +
    b_tt_car * TT_CAR  +
    b_r_car_1 * R_CAR_1 +
    b_r_car_3 * R_CAR_3
  
  V[['bus']] =
    asc_bus +
    b_c_bus * C_BUS +
    b_tt_bus * TT_BUS +
    b_wt_bus * WT_BUS + 
    b_r_bus_1 * R_BUS_1 +
    b_r_bus_3 * R_BUS_3 +
    b_x_bus_1 * X_BUS_1 +
    b_x_bus_2 * X_BUS_2 +
    b_x_bus_3 * X_BUS_3 +
    b_h_bus * H_BUS +
    b_s_bus_1 * S_BUS_1 +
    b_s_bus_2 * S_BUS_2
  
  V[['train']] =
    asc_train +
    b_c_train * C_TRAIN +
    b_tt_train * TT_TRAIN +
    b_wt_train * WT_TRAIN + 
    b_r_train_1 * R_TRAIN_1 +
    b_r_train_3 * R_TRAIN_3 +
    b_x_train_1 * X_TRAIN_1 +
    b_x_train_2 * X_TRAIN_2 +
    b_x_train_3 * X_TRAIN_3 +
    b_h_train * H_TRAIN + 
    b_s_train_1 * S_TRAIN_1 +
    b_s_train_2 * S_TRAIN_2
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(car = 1, bus = 2, train = 3), 
    avail         = list(car = AV_CAR, bus = AV_BUS, train = AV_TRAIN), 
    choiceVar     = CHOICE,
    V             = V
  )
  
  ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality) 
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

# ################################################################# #
#### MODEL OUTPUTS                                               ####
# ################################################################# #

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO SCREEN)                               ----
# ----------------------------------------------------------------- #

apollo_modelOutput(model, list(printPVal = 2))

# ----------------------------------------------------------------- #
#---- FORMATTED OUTPUT (TO FILE, using model name)               ----
# ----------------------------------------------------------------- #

apollo_saveOutput(model)

##########

# Extract estimates
estimates <- model$estimate

# Calculate utilities
V_car   <- estimates['asc_car'] +
  estimates['b_c_car'] * database$C_CAR +
  estimates['b_tt_car'] * database$TT_CAR + 
  estimates['b_r_car_1'] * database$R_CAR_1 +
  estimates['b_r_car_3'] * database$R_CAR_3

V_bus   <- estimates['asc_bus'] +
  estimates['b_c_bus'] * database$C_BUS +
  estimates['b_tt_bus'] * database$TT_BUS +
  estimates['b_wt_bus'] * database$WT_BUS + 
  estimates['b_r_bus_1'] * database$R_BUS_1 +
  estimates['b_r_bus_3'] * database$R_BUS_3 +
  estimates['b_x_bus_1'] * database$X_BUS_1 +
  estimates['b_x_bus_2'] * database$X_BUS_2 +
  estimates['b_x_bus_3'] * database$X_BUS_3 +
  estimates['b_h_bus'] * database$H_BUS +  
  estimates['b_s_bus_1'] * database$S_BUS_1 +
  estimates['b_s_bus_2'] * database$S_BUS_2

V_train <- estimates['asc_train'] +
  estimates['b_c_train'] * database$C_TRAIN +
  estimates['b_tt_train'] * database$TT_TRAIN +
  estimates['b_wt_train'] * database$WT_TRAIN + 
  estimates['b_r_train_1'] * database$R_TRAIN_1 +
  estimates['b_r_train_3'] * database$R_TRAIN_3 +
  estimates['b_x_train_1'] * database$X_TRAIN_1 +
  estimates['b_x_train_2'] * database$X_TRAIN_2 +
  estimates['b_x_train_3'] * database$X_TRAIN_3 +
  estimates['b_h_train'] * database$H_TRAIN +
  estimates['b_s_train_1'] * database$S_TRAIN_1 +
  estimates['b_s_train_2'] * database$S_TRAIN_2

# Calculate exponential utilities
exp_V_car   <- exp(V_car)
exp_V_bus   <- exp(V_bus)
exp_V_train <- exp(V_train)

# Calculate total utility
sum_exp_V <- exp_V_car + exp_V_bus + exp_V_train

# Calculate choice probabilities
P_car <- exp_V_car / sum_exp_V
P_bus <- exp_V_bus / sum_exp_V
P_train <- exp_V_train / sum_exp_V

# Save utilities and probabilities in the dataset
database$V_car <- V_car
database$V_bus <- V_bus
database$V_train <- V_train
database$P_car <- P_car
database$P_bus <- P_bus
database$P_train <- P_train

# Summarize the results
summary(database[, c('V_car', 'V_bus', 'V_train', 'P_car', 'P_bus', 'P_train')])

##########

# Extract relevant coefficients from estimates
model_results <- data.frame(
  estimate = estimates[names(estimates) != "call"],
  SE = sapply(estimates[names(estimates) != "call"], sd),  # Assuming standard errors are available
  statistic = "z",
  p.value = 2 * pnorm(-abs(estimates[names(estimates) != "call"] / sapply(estimates[names(estimates) != "call"], sd)))  # Assuming z-statistics
)

# Use stargazer with the created data frame

# Extrahiere Schätzungen und Standardfehler aus dem Modell
estimates <- model$estimate
se <- model$se

# Benenne die Spalten
names(estimates) <- gsub("`", "", names(estimates)) # Entfernt Backticks aus den Namen, falls vorhanden
names(se) <- gsub("`", "", names(se))

# Berechne t-Werte und p-Werte
t_values <- estimates / se
p_values <- 2 * (1 - pnorm(abs(t_values)))

# Berechne 95% Konfidenzintervalle
ci_lower <- estimates - 1.96 * se
ci_upper <- estimates + 1.96 * se

# Erstelle einen Datenrahmen der Schätzungen, Standardfehler, t-Werte, p-Werte und Konfidenzintervalle
results_df <- data.frame(
  Estimate = estimates,
  Std.Error = se,
  t_value = t_values,
  p_value = p_values,
  CI_lower = ci_lower,
  CI_upper = ci_upper
)

# Benutze stargazer, um die Ergebnisse anzuzeigen
library(stargazer)
stargazer(results_df, type = "text", summary = FALSE)

Re: Huge difference if estimated with or without dummy variables

Posted: 12 Aug 2024, 10:42
by stephanehess
Hi

can you please add the model fit to your output?

Thanks

Stephane

Re: Huge difference if estimated with or without dummy variables

Posted: 12 Aug 2024, 11:45
by kristof
Yes of course.

Model fit with Dummies:

Code: Select all

Model name                                  : MNL
Model description                           : MNL Logit Masterarbeit
Model run at                                : 2024-08-12 12:42:56.945836
Estimation method                           : bgw
Model diagnosis                             : Relative function convergence
Optimisation diagnosis                      : Maximum found
     hessian properties                     : Negative definite
     maximum eigenvalue                     : -19.837196
     reciprocal of condition number         : 1.08227e-06
Number of individuals                       : 891
Number of rows in database                  : 21384
Number of modelled outcomes                 : 21384

Number of cores used                        :  1 
Model without mixing

LL(start)                                   : -23492.73
LL at equal shares, LL(0)                   : -23492.73
LL at observed shares, LL(C)                : -23188.03
LL(final)                                   : -21339.12
Rho-squared vs equal shares                  :  0.0917 
Adj.Rho-squared vs equal shares              :  0.0905 
Rho-squared vs observed shares               :  0.0797 
Adj.Rho-squared vs observed shares           :  0.0786 
AIC                                         :  42734.24 
BIC                                         :  42957.41 

Estimated parameters                        : 28
Time taken (hh:mm:ss)                       :  00:00:7.55 
     pre-estimation                         :  00:00:0.48 
     estimation                             :  00:00:1.26 
          initial estimation                :  00:00:0.93 
          estimation after rescaling        :  00:00:0.33 
     post-estimation                        :  00:00:5.81 
Iterations                                  :  11  
     initial estimation                     :  10 
     estimation after rescaling             :  1 
     
Model fit without Dummies:

Code: Select all

Model name                                  : MNL
Model description                           : MNL Logit Masterarbeit
Model run at                                : 2024-08-12 12:44:25.882344
Estimation method                           : bgw
Model diagnosis                             : Relative function convergence
Optimisation diagnosis                      : Maximum found
     hessian properties                     : Negative definite
     maximum eigenvalue                     : -50.670775
     reciprocal of condition number         : 2.79803e-06
Number of individuals                       : 891
Number of rows in database                  : 21384
Number of modelled outcomes                 : 21384

Number of cores used                        :  1 
Model without mixing

LL(start)                                   : -23492.73
LL at equal shares, LL(0)                   : -23492.73
LL at observed shares, LL(C)                : -23188.03
LL(final)                                   : -21552.33
Rho-squared vs equal shares                  :  0.0826 
Adj.Rho-squared vs equal shares              :  0.0818 
Rho-squared vs observed shares               :  0.0705 
Adj.Rho-squared vs observed shares           :  0.0698 
AIC                                         :  43142.67 
BIC                                         :  43294.11 

Estimated parameters                        : 19
Time taken (hh:mm:ss)                       :  00:00:4.13 
     pre-estimation                         :  00:00:0.37 
     estimation                             :  00:00:0.83 
          initial estimation                :  00:00:0.55 
          estimation after rescaling        :  00:00:0.29 
     post-estimation                        :  00:00:2.93 
Iterations                                  :  10  
     initial estimation                     :  9 
     estimation after rescaling             :  1 

Re: Huge difference if estimated with or without dummy variables

Posted: 12 Aug 2024, 13:32
by stephanehess
Hi

so there's nothing to worry about here. The model with the categorical treatment is clearly better. There is evidence of non-linearity in your data, and the linear treatment couldn't handle that of course

Hope that helps

Stephane

Re: Huge difference if estimated with or without dummy variables

Posted: 12 Aug 2024, 14:11
by kristof
Hi Stephane,

thank you very much for your great support!

So the model with the dummy variables is working fine - which means if some estimates are not plausible or not significant, I should exclude them in the estimation or in the utility function?

Kristof

Re: Huge difference if estimated with or without dummy variables

Posted: 13 Aug 2024, 17:17
by stephanehess
If you remove parameters, you will lose model fit. So it's probably not a good idea in general. and it may also bias other parameters, if the original parameter was statistically different from zero