Page 1 of 1

When applying weights at the individual level, weights should be the same for all observations of each individual

Posted: 28 Mar 2022, 18:00
by alemitrani
Good afternoon

I am working with SP-RP data and trying to replicate results I got with Stata. I managed to replicate the results almost 100% but there is a small issue related to the application of weights which results in some of the robust t-statistics being slightly different.

I would like to weight the SP responses and RP responses differently for each person in the data, so that the SP data and RP data will have approximately equal weight in the estimation (each person provides 9 SP choices but only 1 RP choice). Apollo does not like this, the error message I get is this:
Error in apollo_weighting(P, apollo_inputs, functionality) :
When applying weights at the individual level, weights should be the same for all observations of each individual.

I found a work-around which is to specify a "person" ID variable at the level of person-datatype as if the RP and SP responses were from different people. This way Apollo estimates the model and the parameter estimates are all the same as those I got with Stata, the log-likelihood is the same, the only difference is some of the robust t-statistics which are slightly different - almost certainly because of the different definition of the individual in each case.

Is there a way of relaxing the restriction in Apollo that the weights have to be the same for all choices made by each individual? I looked in the Apollo manual and also searched the forum and could not find any existing answer to this question.

It would be useful to be able to do this, as applications with SP and RP data will often have different numbers of SP and RP choices for each person. The results obtained with Stata suggest that it should be possible to do this without causing a problem for the model estimation.

Here is my R code in case it is useful, both with and without the error.

1) The version that produces the error message:

Code: Select all

library(apollo)

# Definition of core settings ---------------------------------------------

# limpiar memoria
rm(list = ls())

# usar Apollo para modelar las elecciones

apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  ="nlogit_prpd_40_03",
  modelDescr ="Datos PDPR",
  indivID    ="sys_respnum2",
  weights = "fexp"
)


# Data loading ------------------------------------------------------------


# leer los datos
database <- read_csv(file="datos_pdpr3.csv")

# filtrar los datos
database <- database %>%
  filter(tcam<=40)

# ordenar los datos
database <- database %>%
  arrange(sys_respnum2, it)

# chequear que no haya casos duplicados

checkuniqueid <- database %>%
  group_by(sys_respnum2, it, modo) %>%
  summarize(count = n()) %>%
  ungroup()

database <- database %>%
  left_join(checkuniqueid) %>%
  filter(count==1)

# chequear que todos los casos tengan una elección

suma_eleccionpdpr <- database %>%
  group_by(sys_respnum2, it) %>%
  summarise(suma_eleccionpdpr = sum(eleccion)) %>%
  ungroup()

database <- database %>%
  left_join(suma_eleccionpdpr) %>%
  filter(suma_eleccionpdpr==1)

# definir nuevas variables requeridas
database <- database %>%
  mutate(motivo_noobligado = ifelse((motivodestino!="Trabajar" & motivodestino!="Estudiar") | is.na(motivodestino)==TRUE,1,0)) %>%
  mutate(tiempo_g2 = 2*tcam + 2*tesp + tabo) %>%
  mutate(tiempo_g2_noobligado = tiempo_g2*motivo_noobligado) %>%
  mutate(disponible=1)

# dejar solamente las variables requeridas
database <- database %>%
  select(sys_respnum2, tipo_datos, it, modo, modon, eleccion, intr, autoinercia, tcam, tesp, tabo, trans, tiempo_g2, tiempo_g2_noobligado, costopp, ingresoclp, lningresoclp, costoporlningreso, auto_disponible, motivodestino, frecuencia, diaviaje, feriado, horainicio_mpm, persgrup, rangoedad, genero, durviajemins, regionorigen, comuna_residencia, comunaorigen, comunadestino, autoshogar, motivo_noobligado, disponible, fexp)

# definir la variable de elección en una tabla aparte
sys_respnum2_it_eleccion <- database %>%
  filter(eleccion==1) %>%
  mutate(eleccion = modon) %>%
  select(sys_respnum2, it, eleccion)

# dejar solamente las variables requeridos para el modelo
database <- database %>%
  select(sys_respnum2, tipo_datos, it, modon, intr, autoinercia, tiempo_g2, tiempo_g2_noobligado, trans, costoporlningreso, disponible, fexp)

# chequear promedios:
database %>%
  group_by() %>%
  summarise(autoinercia = mean(autoinercia),
            tiempo_g2 = mean(tiempo_g2),
            tiempo_g2_noobligado = mean(tiempo_g2_noobligado),
            trans = mean(trans),
            costoporlningreso = mean(costoporlningreso)) %>%
  ungroup()

# reformatear desde formato largo a formato ancho
database <- database %>%
  pivot_wider(names_from = "modon", values_from = c("autoinercia", "tiempo_g2", "tiempo_g2_noobligado", "trans", "costoporlningreso", "disponible"))

# agregar la variable de elección
database <- database %>%
  left_join(sys_respnum2_it_eleccion)

rm(sys_respnum2_it_eleccion)

database <- database %>%
  filter(is.na(eleccion)==FALSE)

# poner como 0 todos las celdas vacías
database[is.na(database)] <- 0

# asegurar que cada caso modelado tenga 2 o más opciones disponibles:
database <- database %>%
  mutate(nopciones = disponible_1 + disponible_2 + disponible_3 + disponible_4 + disponible_5 + disponible_6 + disponible_7 + disponible_8 + disponible_9 + disponible_10 + disponible_21 + disponible_22 + disponible_23 + disponible_24 + disponible_25 + disponible_26 + disponible_27) %>%
  filter(nopciones>1)

# chequear valores de eleccion:
database %>% group_by(eleccion) %>% summarise(check = n()) %>% ungroup()

# Parameter definition ----------------------------------------------------

### Vector of parameters, including any that are kept fixed 
### during estimation
apollo_beta <- c(b_pd_cabu = 0,
                 b_pd_tpbu = 0,
                 b_pd_tabu = 0,
                 b_pd_catr = 0, 
                 b_pd_tptr = 0,
                 b_pd_tatr = 0,
                 b_pd_auto = 0,
                 b_autoinercia = 0,
                 b_pr_cabuca = 0,
                 b_pr_cabutp = 0,
                 b_pr_cabuta = 0,
                 b_pr_tpbuca = 0,
                 b_pr_tpbutp = 0,
                 b_pr_tpbuta = 0,
                 b_pr_tabuca = 0,
                 b_pr_tabutp = 0,
                 b_pr_tabuta = 0,
                 b_pr_auto = 0, 
                 b_tiempo_g2 = 0,
                 b_tiempo_g2_noobligado = 0, 
                 b_trans = 0,
                 b_costoporlningreso = 0,
                 lambda_TPPD = 1,
                 lambda_APD = 1,
                 lambda_TPPR = 0.5,
                 lambda_APR = 1,
                 lambda_PD = 1,
                 lambda_PR = 1
              )

### Vector with names (in quotes) of parameters to be
###  kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed <- c("b_pd_tpbu", "b_pr_tpbutp", "lambda_APR", "lambda_TPPR", "lambda_APD", "lambda_PR")


# Input validation --------------------------------------------------------

apollo_inputs <- apollo_validateInputs()


# Likelihood definition ---------------------------------------------------

apollo_probabilities <- function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){

  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as
  ### in mnl_settings, order is irrelevant.
  V = list()
  V[['pr_cabuca']]    = b_pr_cabuca + b_tiempo_g2*tiempo_g2_1  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_1  + b_trans*trans_1  + b_costoporlningreso*costoporlningreso_1 
  V[['pr_cabutp']]    = b_pr_cabutp + b_tiempo_g2*tiempo_g2_2  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_2  + b_trans*trans_2  + b_costoporlningreso*costoporlningreso_2 
  V[['pr_cabuta']]    = b_pr_cabuta + b_tiempo_g2*tiempo_g2_3  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_3  + b_trans*trans_3  + b_costoporlningreso*costoporlningreso_3 
  V[['pr_tpbuca']]    = b_pr_tpbuca + b_tiempo_g2*tiempo_g2_4  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_4  + b_trans*trans_4  + b_costoporlningreso*costoporlningreso_4 
  V[['pr_tpbutp']]    = b_pr_tpbutp + b_tiempo_g2*tiempo_g2_5  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_5  + b_trans*trans_5  + b_costoporlningreso*costoporlningreso_5 
  V[['pr_tpbuta']]    = b_pr_tpbuta + b_tiempo_g2*tiempo_g2_6  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_6  + b_trans*trans_6  + b_costoporlningreso*costoporlningreso_6 
  V[['pr_tabuca']]    = b_pr_tabuca + b_tiempo_g2*tiempo_g2_7  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_7  + b_trans*trans_7  + b_costoporlningreso*costoporlningreso_7 
  V[['pr_tabutp']]    = b_pr_tabutp + b_tiempo_g2*tiempo_g2_8  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_8  + b_trans*trans_8  + b_costoporlningreso*costoporlningreso_8 
  V[['pr_tabuta']]    = b_pr_tabuta + b_tiempo_g2*tiempo_g2_9  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_9  + b_trans*trans_9  + b_costoporlningreso*costoporlningreso_9 
  V[['pr_auto']]      = b_pr_auto   + b_tiempo_g2*tiempo_g2_10 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_10 + b_trans*trans_10 + b_costoporlningreso*costoporlningreso_10
  V[['pd_cabu']]      = b_pd_cabu   + b_tiempo_g2*tiempo_g2_21 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_21 + b_trans*trans_21 + b_costoporlningreso*costoporlningreso_21
  V[['pd_tpbu']]      = b_pd_tpbu   + b_tiempo_g2*tiempo_g2_22 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_22 + b_trans*trans_22 + b_costoporlningreso*costoporlningreso_22
  V[['pd_tabu']]      = b_pd_tabu   + b_tiempo_g2*tiempo_g2_23 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_23 + b_trans*trans_23 + b_costoporlningreso*costoporlningreso_23
  V[['pd_catr']]      = b_pd_catr   + b_tiempo_g2*tiempo_g2_24 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_24 + b_trans*trans_24 + b_costoporlningreso*costoporlningreso_24
  V[['pd_tptr']]      = b_pd_tptr   + b_tiempo_g2*tiempo_g2_25 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_25 + b_trans*trans_25 + b_costoporlningreso*costoporlningreso_25
  V[['pd_tatr']]      = b_pd_tatr   + b_tiempo_g2*tiempo_g2_26 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_26 + b_trans*trans_26 + b_costoporlningreso*costoporlningreso_26
  V[['pd_auto']]      = b_pd_auto   + b_tiempo_g2*tiempo_g2_27 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_27 + b_trans*trans_27 + b_costoporlningreso*costoporlningreso_27 + b_autoinercia*autoinercia_27
  
  
  ### Specify nests
  nlNests      = list(root=1, PD=lambda_PD, PR=lambda_PR, APD=lambda_APD, TPPD=lambda_TPPD, APR=lambda_APR, TPPR=lambda_TPPR)
  
  ### Specify tree structure for NL model, PD
  nlStructure= list()
  nlStructure[["root"]]   = c("PD", "PR")
  nlStructure[["PD"]]   = c("APD", "TPPD")
  nlStructure[["PR"]]   = c("APR", "TPPR")
  nlStructure[["APD"]]     = c("pd_auto")  
  nlStructure[["TPPD"]]     = c("pd_cabu","pd_tpbu","pd_tabu", "pd_catr","pd_tptr","pd_tatr")
  nlStructure[["APR"]]     = c("pr_auto")
  nlStructure[["TPPR"]]     = c("pr_cabuca","pr_cabutp","pr_cabuta", "pr_tpbuca","pr_tpbutp","pr_tpbuta", "pr_tabuca","pr_tabutp","pr_tabuta")

  ### Define settings for NL model, SP data
  nl_settings_PD <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==2),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PD"]] = apollo_nl(nl_settings_PD, functionality)  
  
  ### Define settings for NL model, RP data  
  
  nl_settings_PR <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==1),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PR"]] = apollo_nl(nl_settings_PR, functionality)  
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Apply weights
  P = apollo_weighting(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  
  return(P)
  
}

# Model estimation and reporting ------------------------------------------

model <- apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, 
                        apollo_inputs,
                        list(writeIter=FALSE))

apollo_modelOutput(model)

apollo_saveOutput(model)

# Postprocessing of results -----------------------------------------------

predictions_base = apollo_prediction(model, 
                                     apollo_probabilities, 
                                     apollo_inputs)
  
#

2) The version with the modified person ID variable which runs ok in Apollo but produces slightly different t-stats:

Code: Select all

library(apollo)

# Definition of core settings ---------------------------------------------

# limpiar memoria
rm(list = ls())

# usar Apollo para modelar las elecciones

apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  ="nlogit_prpd_40_03",
  modelDescr ="Datos PDPR",
  indivID    ="sys_respnum3",
  weights = "fexp"
)


# Data loading ------------------------------------------------------------


# leer los datos
database <- read_csv(file="datos_pdpr3.csv")

# filtrar los datos
database <- database %>%
  filter(tcam<=40)

# ordenar los datos
database <- database %>%
  mutate(sys_respnum3=sys_respnum2*100+tipo_datos) %>%
  arrange(sys_respnum3, it)

# chequear que no haya casos duplicados

checkuniqueid <- database %>%
  group_by(sys_respnum2, it, modo) %>%
  summarize(count = n()) %>%
  ungroup()

database <- database %>%
  left_join(checkuniqueid) %>%
  filter(count==1)

# chequear que todos los casos tengan una elección

suma_eleccionpdpr <- database %>%
  group_by(sys_respnum2, it) %>%
  summarise(suma_eleccionpdpr = sum(eleccion)) %>%
  ungroup()

database <- database %>%
  left_join(suma_eleccionpdpr) %>%
  filter(suma_eleccionpdpr==1)

# definir nuevas variables requeridas
database <- database %>%
  mutate(motivo_noobligado = ifelse((motivodestino!="Trabajar" & motivodestino!="Estudiar") | is.na(motivodestino)==TRUE,1,0)) %>%
  mutate(tiempo_g2 = 2*tcam + 2*tesp + tabo) %>%
  mutate(tiempo_g2_noobligado = tiempo_g2*motivo_noobligado) %>%
  mutate(disponible=1)

# dejar solamente las variables requeridas
database <- database %>%
  select(sys_respnum2, sys_respnum3, tipo_datos, it, modo, modon, eleccion, intr, autoinercia, tcam, tesp, tabo, trans, tiempo_g2, tiempo_g2_noobligado, costopp, ingresoclp, lningresoclp, costoporlningreso, auto_disponible, motivodestino, frecuencia, diaviaje, feriado, horainicio_mpm, persgrup, rangoedad, genero, durviajemins, regionorigen, comuna_residencia, comunaorigen, comunadestino, autoshogar, motivo_noobligado, disponible, fexp)

# definir la variable de elección en una tabla aparte
sys_respnum2_it_eleccion <- database %>%
  filter(eleccion==1) %>%
  mutate(eleccion = modon) %>%
  select(sys_respnum2, it, eleccion)

# dejar solamente las variables requeridos para el modelo
database <- database %>%
  select(sys_respnum2, sys_respnum3, tipo_datos, it, modon, intr, autoinercia, tiempo_g2, tiempo_g2_noobligado, trans, costoporlningreso, disponible, fexp)

# chequear promedios:
database %>%
  group_by() %>%
  summarise(autoinercia = mean(autoinercia),
            tiempo_g2 = mean(tiempo_g2),
            tiempo_g2_noobligado = mean(tiempo_g2_noobligado),
            trans = mean(trans),
            costoporlningreso = mean(costoporlningreso)) %>%
  ungroup()

# reformatear desde formato largo a formato ancho
database <- database %>%
  pivot_wider(names_from = "modon", values_from = c("autoinercia", "tiempo_g2", "tiempo_g2_noobligado", "trans", "costoporlningreso", "disponible"))

# agregar la variable de elección
database <- database %>%
  left_join(sys_respnum2_it_eleccion)

rm(sys_respnum2_it_eleccion)

database <- database %>%
  filter(is.na(eleccion)==FALSE)

# poner como 0 todos las celdas vacías
database[is.na(database)] <- 0

# asegurar que cada caso modelado tenga 2 o más opciones disponibles:
database <- database %>%
  mutate(nopciones = disponible_1 + disponible_2 + disponible_3 + disponible_4 + disponible_5 + disponible_6 + disponible_7 + disponible_8 + disponible_9 + disponible_10 + disponible_21 + disponible_22 + disponible_23 + disponible_24 + disponible_25 + disponible_26 + disponible_27) %>%
  filter(nopciones>1)

# chequear valores de eleccion:
database %>% group_by(eleccion) %>% summarise(check = n()) %>% ungroup()

# Parameter definition ----------------------------------------------------

### Vector of parameters, including any that are kept fixed 
### during estimation
apollo_beta <- c(b_pd_cabu = 0,
                 b_pd_tpbu = 0,
                 b_pd_tabu = 0,
                 b_pd_catr = 0, 
                 b_pd_tptr = 0,
                 b_pd_tatr = 0,
                 b_pd_auto = 0,
                 b_autoinercia = 0,
                 b_pr_cabuca = 0,
                 b_pr_cabutp = 0,
                 b_pr_cabuta = 0,
                 b_pr_tpbuca = 0,
                 b_pr_tpbutp = 0,
                 b_pr_tpbuta = 0,
                 b_pr_tabuca = 0,
                 b_pr_tabutp = 0,
                 b_pr_tabuta = 0,
                 b_pr_auto = 0, 
                 b_tiempo_g2 = 0,
                 b_tiempo_g2_noobligado = 0, 
                 b_trans = 0,
                 b_costoporlningreso = 0,
                 lambda_TPPD = 1,
                 lambda_APD = 1,
                 lambda_TPPR = 0.5,
                 lambda_APR = 1,
                 lambda_PD = 1,
                 lambda_PR = 1
              )

### Vector with names (in quotes) of parameters to be
###  kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed <- c("b_pd_tpbu", "b_pr_tpbutp", "lambda_APR", "lambda_TPPR", "lambda_APD", "lambda_PR")


# Input validation --------------------------------------------------------

apollo_inputs <- apollo_validateInputs()


# Likelihood definition ---------------------------------------------------

apollo_probabilities <- function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){

  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as
  ### in mnl_settings, order is irrelevant.
  V = list()
  V[['pr_cabuca']]    = b_pr_cabuca + b_tiempo_g2*tiempo_g2_1  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_1  + b_trans*trans_1  + b_costoporlningreso*costoporlningreso_1 
  V[['pr_cabutp']]    = b_pr_cabutp + b_tiempo_g2*tiempo_g2_2  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_2  + b_trans*trans_2  + b_costoporlningreso*costoporlningreso_2 
  V[['pr_cabuta']]    = b_pr_cabuta + b_tiempo_g2*tiempo_g2_3  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_3  + b_trans*trans_3  + b_costoporlningreso*costoporlningreso_3 
  V[['pr_tpbuca']]    = b_pr_tpbuca + b_tiempo_g2*tiempo_g2_4  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_4  + b_trans*trans_4  + b_costoporlningreso*costoporlningreso_4 
  V[['pr_tpbutp']]    = b_pr_tpbutp + b_tiempo_g2*tiempo_g2_5  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_5  + b_trans*trans_5  + b_costoporlningreso*costoporlningreso_5 
  V[['pr_tpbuta']]    = b_pr_tpbuta + b_tiempo_g2*tiempo_g2_6  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_6  + b_trans*trans_6  + b_costoporlningreso*costoporlningreso_6 
  V[['pr_tabuca']]    = b_pr_tabuca + b_tiempo_g2*tiempo_g2_7  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_7  + b_trans*trans_7  + b_costoporlningreso*costoporlningreso_7 
  V[['pr_tabutp']]    = b_pr_tabutp + b_tiempo_g2*tiempo_g2_8  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_8  + b_trans*trans_8  + b_costoporlningreso*costoporlningreso_8 
  V[['pr_tabuta']]    = b_pr_tabuta + b_tiempo_g2*tiempo_g2_9  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_9  + b_trans*trans_9  + b_costoporlningreso*costoporlningreso_9 
  V[['pr_auto']]      = b_pr_auto   + b_tiempo_g2*tiempo_g2_10 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_10 + b_trans*trans_10 + b_costoporlningreso*costoporlningreso_10
  V[['pd_cabu']]      = b_pd_cabu   + b_tiempo_g2*tiempo_g2_21 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_21 + b_trans*trans_21 + b_costoporlningreso*costoporlningreso_21
  V[['pd_tpbu']]      = b_pd_tpbu   + b_tiempo_g2*tiempo_g2_22 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_22 + b_trans*trans_22 + b_costoporlningreso*costoporlningreso_22
  V[['pd_tabu']]      = b_pd_tabu   + b_tiempo_g2*tiempo_g2_23 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_23 + b_trans*trans_23 + b_costoporlningreso*costoporlningreso_23
  V[['pd_catr']]      = b_pd_catr   + b_tiempo_g2*tiempo_g2_24 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_24 + b_trans*trans_24 + b_costoporlningreso*costoporlningreso_24
  V[['pd_tptr']]      = b_pd_tptr   + b_tiempo_g2*tiempo_g2_25 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_25 + b_trans*trans_25 + b_costoporlningreso*costoporlningreso_25
  V[['pd_tatr']]      = b_pd_tatr   + b_tiempo_g2*tiempo_g2_26 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_26 + b_trans*trans_26 + b_costoporlningreso*costoporlningreso_26
  V[['pd_auto']]      = b_pd_auto   + b_tiempo_g2*tiempo_g2_27 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_27 + b_trans*trans_27 + b_costoporlningreso*costoporlningreso_27 + b_autoinercia*autoinercia_27
  
  
  ### Specify nests
  nlNests      = list(root=1, PD=lambda_PD, PR=lambda_PR, APD=lambda_APD, TPPD=lambda_TPPD, APR=lambda_APR, TPPR=lambda_TPPR)
  
  ### Specify tree structure for NL model, PD
  nlStructure= list()
  nlStructure[["root"]]   = c("PD", "PR")
  nlStructure[["PD"]]   = c("APD", "TPPD")
  nlStructure[["PR"]]   = c("APR", "TPPR")
  nlStructure[["APD"]]     = c("pd_auto")  
  nlStructure[["TPPD"]]     = c("pd_cabu","pd_tpbu","pd_tabu", "pd_catr","pd_tptr","pd_tatr")
  nlStructure[["APR"]]     = c("pr_auto")
  nlStructure[["TPPR"]]     = c("pr_cabuca","pr_cabutp","pr_cabuta", "pr_tpbuca","pr_tpbutp","pr_tpbuta", "pr_tabuca","pr_tabutp","pr_tabuta")

  ### Define settings for NL model, SP data
  nl_settings_PD <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==2),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PD"]] = apollo_nl(nl_settings_PD, functionality)  
  
  ### Define settings for NL model, RP data  
  
  nl_settings_PR <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==1),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PR"]] = apollo_nl(nl_settings_PR, functionality)  
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Apply weights
  P = apollo_weighting(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  
  return(P)
  
}

# Model estimation and reporting ------------------------------------------

model <- apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, 
                        apollo_inputs,
                        list(writeIter=FALSE))

apollo_modelOutput(model)

apollo_saveOutput(model)

# Postprocessing of results -----------------------------------------------

predictions_base = apollo_prediction(model, 
                                     apollo_probabilities, 
                                     apollo_inputs)
  
#

I look forward to your reply.

Thanks very much.

Kind regards

Alex Mitrani

Re: When applying weights at the individual level, weights should be the same for all observations of each individual

Posted: 29 Apr 2022, 21:32
by dpalma
Hi Alex,

Sorry for the belated reply.

You should be able to use weights at the observation level in Apollo v0.2.7, the trick is to call apollo_weighting before apollo_panelProd. Below you will find a modified version of example MNL_RP_SP, where the same weight is assigned to RP and SP responses, even though each individual has 14 SP and only 2 RP responses.

It should be straightforward to do something analogous in your code, but let us know if you run into further issues.

Cheers
David

Code: Select all

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Initialise
rm(list = ls())
library(apollo)
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  = "mnl_RP_SP_weights",
  modelDescr = "RP-SP model on mode choice data",
  indivID    = "ID", 
  weights    = "weights"
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

### Loading data from package
database = apollo_modeChoiceData
### for data dictionary, use ?apollo_modeChoiceData

# Create weights
# In a real use case, weights would be already in the database
database$weights <- database$RP/sum(database$RP)/2 + database$SP/sum(database$SP)/2
database$weights <- (database$RP/2 + database$SP/14)/2

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta=c(asc_car     = 0,
              asc_bus     = 0,
              asc_air     = 0,
              asc_rail    = 0,
              b_tt        = 0,
              b_access    = 0,
              b_cost      = 0,
              b_no_frills = 0,
              b_wifi      = 0,
              b_food      = 0,
              mu_RP       = 1,
              mu_SP       = 1)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_car","b_no_frills","mu_RP")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities <- function(apollo_beta, apollo_inputs, 
                                 functionality="estimate"){
  
  ### Initialise
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  P = list()
  
  
  ### List of utilities (before applying scales)
  V = list()
  V[["car"]]  = asc_car  + b_tt*time_car                         + b_cost*cost_car
  V[["bus"]]  = asc_bus  + b_tt*time_bus  + b_access*access_bus  + b_cost*cost_bus 
  V[["air"]]  = asc_air  + b_tt*time_air  + b_access*access_air  + b_cost*cost_air   + b_no_frills*(service_air ==1) + b_wifi*(service_air ==2) + b_food*(service_air ==3)
  V[["rail"]] = asc_rail + b_tt*time_rail + b_access*access_rail + b_cost*cost_rail  + b_no_frills*(service_rail==1) + b_wifi*(service_rail==2) + b_food*(service_rail==3)
  
  ### Compute probabilities for the RP part of the data using MNL model
  mnl_settings_RP = list(
    alternatives  = c(car=1, bus=2, air=3, rail=4), 
    avail         = list(car=av_car, bus=av_bus, air=av_air, rail=av_rail), 
    choiceVar     = choice, 
    utilities     = list(car  = mu_RP*V[["car"]],
                         bus  = mu_RP*V[["bus"]],
                         air  = mu_RP*V[["air"]],
                         rail = mu_RP*V[["rail"]]),
    rows          = (RP==1)
  )
  P[["RP"]] = apollo_mnl(mnl_settings_RP, functionality)
  
  ### Compute probabilities for the SP part of the data using MNL model
  mnl_settings_SP = list(
    alternatives  = c(car=1, bus=2, air=3, rail=4), 
    avail         = list(car=av_car, bus=av_bus, air=av_air, rail=av_rail), 
    choiceVar     = choice, 
    utilities     = list(car  = mu_SP*V[["car"]],
                         bus  = mu_SP*V[["bus"]],
                         air  = mu_SP*V[["air"]],
                         rail = mu_SP*V[["rail"]]),
    rows          = (SP==1)
  )
  P[["SP"]] = apollo_mnl(mnl_settings_SP, functionality)
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Apply weights before taking the product of all observations of each indiv
  P = apollo_weighting(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION AND OUTPUT                                 ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs)

apollo_modelOutput(model)

apollo_saveOutput(model)

Re: When applying weights at the individual level, weights should be the same for all observations of each individual

Posted: 09 Jun 2023, 00:44
by alemitrani
Hi David

No problem.

Thanks very much for your helpful reply.

Kind regards

Alex