Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

When applying weights at the individual level, weights should be the same for all observations of each individual

Ask questions about errors you encouunter. Please make sure to include full details about your model specifications, and ideally your model file.
Post Reply
User avatar
alemitrani
Posts: 2
Joined: 25 Mar 2022, 00:03
Location: Santiago, Chile
Contact:

When applying weights at the individual level, weights should be the same for all observations of each individual

Post by alemitrani »

Good afternoon

I am working with SP-RP data and trying to replicate results I got with Stata. I managed to replicate the results almost 100% but there is a small issue related to the application of weights which results in some of the robust t-statistics being slightly different.

I would like to weight the SP responses and RP responses differently for each person in the data, so that the SP data and RP data will have approximately equal weight in the estimation (each person provides 9 SP choices but only 1 RP choice). Apollo does not like this, the error message I get is this:
Error in apollo_weighting(P, apollo_inputs, functionality) :
When applying weights at the individual level, weights should be the same for all observations of each individual.

I found a work-around which is to specify a "person" ID variable at the level of person-datatype as if the RP and SP responses were from different people. This way Apollo estimates the model and the parameter estimates are all the same as those I got with Stata, the log-likelihood is the same, the only difference is some of the robust t-statistics which are slightly different - almost certainly because of the different definition of the individual in each case.

Is there a way of relaxing the restriction in Apollo that the weights have to be the same for all choices made by each individual? I looked in the Apollo manual and also searched the forum and could not find any existing answer to this question.

It would be useful to be able to do this, as applications with SP and RP data will often have different numbers of SP and RP choices for each person. The results obtained with Stata suggest that it should be possible to do this without causing a problem for the model estimation.

Here is my R code in case it is useful, both with and without the error.

1) The version that produces the error message:

Code: Select all

library(apollo)

# Definition of core settings ---------------------------------------------

# limpiar memoria
rm(list = ls())

# usar Apollo para modelar las elecciones

apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  ="nlogit_prpd_40_03",
  modelDescr ="Datos PDPR",
  indivID    ="sys_respnum2",
  weights = "fexp"
)


# Data loading ------------------------------------------------------------


# leer los datos
database <- read_csv(file="datos_pdpr3.csv")

# filtrar los datos
database <- database %>%
  filter(tcam<=40)

# ordenar los datos
database <- database %>%
  arrange(sys_respnum2, it)

# chequear que no haya casos duplicados

checkuniqueid <- database %>%
  group_by(sys_respnum2, it, modo) %>%
  summarize(count = n()) %>%
  ungroup()

database <- database %>%
  left_join(checkuniqueid) %>%
  filter(count==1)

# chequear que todos los casos tengan una elección

suma_eleccionpdpr <- database %>%
  group_by(sys_respnum2, it) %>%
  summarise(suma_eleccionpdpr = sum(eleccion)) %>%
  ungroup()

database <- database %>%
  left_join(suma_eleccionpdpr) %>%
  filter(suma_eleccionpdpr==1)

# definir nuevas variables requeridas
database <- database %>%
  mutate(motivo_noobligado = ifelse((motivodestino!="Trabajar" & motivodestino!="Estudiar") | is.na(motivodestino)==TRUE,1,0)) %>%
  mutate(tiempo_g2 = 2*tcam + 2*tesp + tabo) %>%
  mutate(tiempo_g2_noobligado = tiempo_g2*motivo_noobligado) %>%
  mutate(disponible=1)

# dejar solamente las variables requeridas
database <- database %>%
  select(sys_respnum2, tipo_datos, it, modo, modon, eleccion, intr, autoinercia, tcam, tesp, tabo, trans, tiempo_g2, tiempo_g2_noobligado, costopp, ingresoclp, lningresoclp, costoporlningreso, auto_disponible, motivodestino, frecuencia, diaviaje, feriado, horainicio_mpm, persgrup, rangoedad, genero, durviajemins, regionorigen, comuna_residencia, comunaorigen, comunadestino, autoshogar, motivo_noobligado, disponible, fexp)

# definir la variable de elección en una tabla aparte
sys_respnum2_it_eleccion <- database %>%
  filter(eleccion==1) %>%
  mutate(eleccion = modon) %>%
  select(sys_respnum2, it, eleccion)

# dejar solamente las variables requeridos para el modelo
database <- database %>%
  select(sys_respnum2, tipo_datos, it, modon, intr, autoinercia, tiempo_g2, tiempo_g2_noobligado, trans, costoporlningreso, disponible, fexp)

# chequear promedios:
database %>%
  group_by() %>%
  summarise(autoinercia = mean(autoinercia),
            tiempo_g2 = mean(tiempo_g2),
            tiempo_g2_noobligado = mean(tiempo_g2_noobligado),
            trans = mean(trans),
            costoporlningreso = mean(costoporlningreso)) %>%
  ungroup()

# reformatear desde formato largo a formato ancho
database <- database %>%
  pivot_wider(names_from = "modon", values_from = c("autoinercia", "tiempo_g2", "tiempo_g2_noobligado", "trans", "costoporlningreso", "disponible"))

# agregar la variable de elección
database <- database %>%
  left_join(sys_respnum2_it_eleccion)

rm(sys_respnum2_it_eleccion)

database <- database %>%
  filter(is.na(eleccion)==FALSE)

# poner como 0 todos las celdas vacías
database[is.na(database)] <- 0

# asegurar que cada caso modelado tenga 2 o más opciones disponibles:
database <- database %>%
  mutate(nopciones = disponible_1 + disponible_2 + disponible_3 + disponible_4 + disponible_5 + disponible_6 + disponible_7 + disponible_8 + disponible_9 + disponible_10 + disponible_21 + disponible_22 + disponible_23 + disponible_24 + disponible_25 + disponible_26 + disponible_27) %>%
  filter(nopciones>1)

# chequear valores de eleccion:
database %>% group_by(eleccion) %>% summarise(check = n()) %>% ungroup()

# Parameter definition ----------------------------------------------------

### Vector of parameters, including any that are kept fixed 
### during estimation
apollo_beta <- c(b_pd_cabu = 0,
                 b_pd_tpbu = 0,
                 b_pd_tabu = 0,
                 b_pd_catr = 0, 
                 b_pd_tptr = 0,
                 b_pd_tatr = 0,
                 b_pd_auto = 0,
                 b_autoinercia = 0,
                 b_pr_cabuca = 0,
                 b_pr_cabutp = 0,
                 b_pr_cabuta = 0,
                 b_pr_tpbuca = 0,
                 b_pr_tpbutp = 0,
                 b_pr_tpbuta = 0,
                 b_pr_tabuca = 0,
                 b_pr_tabutp = 0,
                 b_pr_tabuta = 0,
                 b_pr_auto = 0, 
                 b_tiempo_g2 = 0,
                 b_tiempo_g2_noobligado = 0, 
                 b_trans = 0,
                 b_costoporlningreso = 0,
                 lambda_TPPD = 1,
                 lambda_APD = 1,
                 lambda_TPPR = 0.5,
                 lambda_APR = 1,
                 lambda_PD = 1,
                 lambda_PR = 1
              )

### Vector with names (in quotes) of parameters to be
###  kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed <- c("b_pd_tpbu", "b_pr_tpbutp", "lambda_APR", "lambda_TPPR", "lambda_APD", "lambda_PR")


# Input validation --------------------------------------------------------

apollo_inputs <- apollo_validateInputs()


# Likelihood definition ---------------------------------------------------

apollo_probabilities <- function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){

  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as
  ### in mnl_settings, order is irrelevant.
  V = list()
  V[['pr_cabuca']]    = b_pr_cabuca + b_tiempo_g2*tiempo_g2_1  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_1  + b_trans*trans_1  + b_costoporlningreso*costoporlningreso_1 
  V[['pr_cabutp']]    = b_pr_cabutp + b_tiempo_g2*tiempo_g2_2  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_2  + b_trans*trans_2  + b_costoporlningreso*costoporlningreso_2 
  V[['pr_cabuta']]    = b_pr_cabuta + b_tiempo_g2*tiempo_g2_3  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_3  + b_trans*trans_3  + b_costoporlningreso*costoporlningreso_3 
  V[['pr_tpbuca']]    = b_pr_tpbuca + b_tiempo_g2*tiempo_g2_4  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_4  + b_trans*trans_4  + b_costoporlningreso*costoporlningreso_4 
  V[['pr_tpbutp']]    = b_pr_tpbutp + b_tiempo_g2*tiempo_g2_5  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_5  + b_trans*trans_5  + b_costoporlningreso*costoporlningreso_5 
  V[['pr_tpbuta']]    = b_pr_tpbuta + b_tiempo_g2*tiempo_g2_6  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_6  + b_trans*trans_6  + b_costoporlningreso*costoporlningreso_6 
  V[['pr_tabuca']]    = b_pr_tabuca + b_tiempo_g2*tiempo_g2_7  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_7  + b_trans*trans_7  + b_costoporlningreso*costoporlningreso_7 
  V[['pr_tabutp']]    = b_pr_tabutp + b_tiempo_g2*tiempo_g2_8  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_8  + b_trans*trans_8  + b_costoporlningreso*costoporlningreso_8 
  V[['pr_tabuta']]    = b_pr_tabuta + b_tiempo_g2*tiempo_g2_9  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_9  + b_trans*trans_9  + b_costoporlningreso*costoporlningreso_9 
  V[['pr_auto']]      = b_pr_auto   + b_tiempo_g2*tiempo_g2_10 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_10 + b_trans*trans_10 + b_costoporlningreso*costoporlningreso_10
  V[['pd_cabu']]      = b_pd_cabu   + b_tiempo_g2*tiempo_g2_21 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_21 + b_trans*trans_21 + b_costoporlningreso*costoporlningreso_21
  V[['pd_tpbu']]      = b_pd_tpbu   + b_tiempo_g2*tiempo_g2_22 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_22 + b_trans*trans_22 + b_costoporlningreso*costoporlningreso_22
  V[['pd_tabu']]      = b_pd_tabu   + b_tiempo_g2*tiempo_g2_23 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_23 + b_trans*trans_23 + b_costoporlningreso*costoporlningreso_23
  V[['pd_catr']]      = b_pd_catr   + b_tiempo_g2*tiempo_g2_24 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_24 + b_trans*trans_24 + b_costoporlningreso*costoporlningreso_24
  V[['pd_tptr']]      = b_pd_tptr   + b_tiempo_g2*tiempo_g2_25 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_25 + b_trans*trans_25 + b_costoporlningreso*costoporlningreso_25
  V[['pd_tatr']]      = b_pd_tatr   + b_tiempo_g2*tiempo_g2_26 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_26 + b_trans*trans_26 + b_costoporlningreso*costoporlningreso_26
  V[['pd_auto']]      = b_pd_auto   + b_tiempo_g2*tiempo_g2_27 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_27 + b_trans*trans_27 + b_costoporlningreso*costoporlningreso_27 + b_autoinercia*autoinercia_27
  
  
  ### Specify nests
  nlNests      = list(root=1, PD=lambda_PD, PR=lambda_PR, APD=lambda_APD, TPPD=lambda_TPPD, APR=lambda_APR, TPPR=lambda_TPPR)
  
  ### Specify tree structure for NL model, PD
  nlStructure= list()
  nlStructure[["root"]]   = c("PD", "PR")
  nlStructure[["PD"]]   = c("APD", "TPPD")
  nlStructure[["PR"]]   = c("APR", "TPPR")
  nlStructure[["APD"]]     = c("pd_auto")  
  nlStructure[["TPPD"]]     = c("pd_cabu","pd_tpbu","pd_tabu", "pd_catr","pd_tptr","pd_tatr")
  nlStructure[["APR"]]     = c("pr_auto")
  nlStructure[["TPPR"]]     = c("pr_cabuca","pr_cabutp","pr_cabuta", "pr_tpbuca","pr_tpbutp","pr_tpbuta", "pr_tabuca","pr_tabutp","pr_tabuta")

  ### Define settings for NL model, SP data
  nl_settings_PD <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==2),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PD"]] = apollo_nl(nl_settings_PD, functionality)  
  
  ### Define settings for NL model, RP data  
  
  nl_settings_PR <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==1),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PR"]] = apollo_nl(nl_settings_PR, functionality)  
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Apply weights
  P = apollo_weighting(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  
  return(P)
  
}

# Model estimation and reporting ------------------------------------------

model <- apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, 
                        apollo_inputs,
                        list(writeIter=FALSE))

apollo_modelOutput(model)

apollo_saveOutput(model)

# Postprocessing of results -----------------------------------------------

predictions_base = apollo_prediction(model, 
                                     apollo_probabilities, 
                                     apollo_inputs)
  
#

2) The version with the modified person ID variable which runs ok in Apollo but produces slightly different t-stats:

Code: Select all

library(apollo)

# Definition of core settings ---------------------------------------------

# limpiar memoria
rm(list = ls())

# usar Apollo para modelar las elecciones

apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  ="nlogit_prpd_40_03",
  modelDescr ="Datos PDPR",
  indivID    ="sys_respnum3",
  weights = "fexp"
)


# Data loading ------------------------------------------------------------


# leer los datos
database <- read_csv(file="datos_pdpr3.csv")

# filtrar los datos
database <- database %>%
  filter(tcam<=40)

# ordenar los datos
database <- database %>%
  mutate(sys_respnum3=sys_respnum2*100+tipo_datos) %>%
  arrange(sys_respnum3, it)

# chequear que no haya casos duplicados

checkuniqueid <- database %>%
  group_by(sys_respnum2, it, modo) %>%
  summarize(count = n()) %>%
  ungroup()

database <- database %>%
  left_join(checkuniqueid) %>%
  filter(count==1)

# chequear que todos los casos tengan una elección

suma_eleccionpdpr <- database %>%
  group_by(sys_respnum2, it) %>%
  summarise(suma_eleccionpdpr = sum(eleccion)) %>%
  ungroup()

database <- database %>%
  left_join(suma_eleccionpdpr) %>%
  filter(suma_eleccionpdpr==1)

# definir nuevas variables requeridas
database <- database %>%
  mutate(motivo_noobligado = ifelse((motivodestino!="Trabajar" & motivodestino!="Estudiar") | is.na(motivodestino)==TRUE,1,0)) %>%
  mutate(tiempo_g2 = 2*tcam + 2*tesp + tabo) %>%
  mutate(tiempo_g2_noobligado = tiempo_g2*motivo_noobligado) %>%
  mutate(disponible=1)

# dejar solamente las variables requeridas
database <- database %>%
  select(sys_respnum2, sys_respnum3, tipo_datos, it, modo, modon, eleccion, intr, autoinercia, tcam, tesp, tabo, trans, tiempo_g2, tiempo_g2_noobligado, costopp, ingresoclp, lningresoclp, costoporlningreso, auto_disponible, motivodestino, frecuencia, diaviaje, feriado, horainicio_mpm, persgrup, rangoedad, genero, durviajemins, regionorigen, comuna_residencia, comunaorigen, comunadestino, autoshogar, motivo_noobligado, disponible, fexp)

# definir la variable de elección en una tabla aparte
sys_respnum2_it_eleccion <- database %>%
  filter(eleccion==1) %>%
  mutate(eleccion = modon) %>%
  select(sys_respnum2, it, eleccion)

# dejar solamente las variables requeridos para el modelo
database <- database %>%
  select(sys_respnum2, sys_respnum3, tipo_datos, it, modon, intr, autoinercia, tiempo_g2, tiempo_g2_noobligado, trans, costoporlningreso, disponible, fexp)

# chequear promedios:
database %>%
  group_by() %>%
  summarise(autoinercia = mean(autoinercia),
            tiempo_g2 = mean(tiempo_g2),
            tiempo_g2_noobligado = mean(tiempo_g2_noobligado),
            trans = mean(trans),
            costoporlningreso = mean(costoporlningreso)) %>%
  ungroup()

# reformatear desde formato largo a formato ancho
database <- database %>%
  pivot_wider(names_from = "modon", values_from = c("autoinercia", "tiempo_g2", "tiempo_g2_noobligado", "trans", "costoporlningreso", "disponible"))

# agregar la variable de elección
database <- database %>%
  left_join(sys_respnum2_it_eleccion)

rm(sys_respnum2_it_eleccion)

database <- database %>%
  filter(is.na(eleccion)==FALSE)

# poner como 0 todos las celdas vacías
database[is.na(database)] <- 0

# asegurar que cada caso modelado tenga 2 o más opciones disponibles:
database <- database %>%
  mutate(nopciones = disponible_1 + disponible_2 + disponible_3 + disponible_4 + disponible_5 + disponible_6 + disponible_7 + disponible_8 + disponible_9 + disponible_10 + disponible_21 + disponible_22 + disponible_23 + disponible_24 + disponible_25 + disponible_26 + disponible_27) %>%
  filter(nopciones>1)

# chequear valores de eleccion:
database %>% group_by(eleccion) %>% summarise(check = n()) %>% ungroup()

# Parameter definition ----------------------------------------------------

### Vector of parameters, including any that are kept fixed 
### during estimation
apollo_beta <- c(b_pd_cabu = 0,
                 b_pd_tpbu = 0,
                 b_pd_tabu = 0,
                 b_pd_catr = 0, 
                 b_pd_tptr = 0,
                 b_pd_tatr = 0,
                 b_pd_auto = 0,
                 b_autoinercia = 0,
                 b_pr_cabuca = 0,
                 b_pr_cabutp = 0,
                 b_pr_cabuta = 0,
                 b_pr_tpbuca = 0,
                 b_pr_tpbutp = 0,
                 b_pr_tpbuta = 0,
                 b_pr_tabuca = 0,
                 b_pr_tabutp = 0,
                 b_pr_tabuta = 0,
                 b_pr_auto = 0, 
                 b_tiempo_g2 = 0,
                 b_tiempo_g2_noobligado = 0, 
                 b_trans = 0,
                 b_costoporlningreso = 0,
                 lambda_TPPD = 1,
                 lambda_APD = 1,
                 lambda_TPPR = 0.5,
                 lambda_APR = 1,
                 lambda_PD = 1,
                 lambda_PR = 1
              )

### Vector with names (in quotes) of parameters to be
###  kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed <- c("b_pd_tpbu", "b_pr_tpbutp", "lambda_APR", "lambda_TPPR", "lambda_APD", "lambda_PR")


# Input validation --------------------------------------------------------

apollo_inputs <- apollo_validateInputs()


# Likelihood definition ---------------------------------------------------

apollo_probabilities <- function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){

  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as
  ### in mnl_settings, order is irrelevant.
  V = list()
  V[['pr_cabuca']]    = b_pr_cabuca + b_tiempo_g2*tiempo_g2_1  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_1  + b_trans*trans_1  + b_costoporlningreso*costoporlningreso_1 
  V[['pr_cabutp']]    = b_pr_cabutp + b_tiempo_g2*tiempo_g2_2  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_2  + b_trans*trans_2  + b_costoporlningreso*costoporlningreso_2 
  V[['pr_cabuta']]    = b_pr_cabuta + b_tiempo_g2*tiempo_g2_3  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_3  + b_trans*trans_3  + b_costoporlningreso*costoporlningreso_3 
  V[['pr_tpbuca']]    = b_pr_tpbuca + b_tiempo_g2*tiempo_g2_4  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_4  + b_trans*trans_4  + b_costoporlningreso*costoporlningreso_4 
  V[['pr_tpbutp']]    = b_pr_tpbutp + b_tiempo_g2*tiempo_g2_5  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_5  + b_trans*trans_5  + b_costoporlningreso*costoporlningreso_5 
  V[['pr_tpbuta']]    = b_pr_tpbuta + b_tiempo_g2*tiempo_g2_6  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_6  + b_trans*trans_6  + b_costoporlningreso*costoporlningreso_6 
  V[['pr_tabuca']]    = b_pr_tabuca + b_tiempo_g2*tiempo_g2_7  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_7  + b_trans*trans_7  + b_costoporlningreso*costoporlningreso_7 
  V[['pr_tabutp']]    = b_pr_tabutp + b_tiempo_g2*tiempo_g2_8  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_8  + b_trans*trans_8  + b_costoporlningreso*costoporlningreso_8 
  V[['pr_tabuta']]    = b_pr_tabuta + b_tiempo_g2*tiempo_g2_9  + b_tiempo_g2_noobligado*tiempo_g2_noobligado_9  + b_trans*trans_9  + b_costoporlningreso*costoporlningreso_9 
  V[['pr_auto']]      = b_pr_auto   + b_tiempo_g2*tiempo_g2_10 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_10 + b_trans*trans_10 + b_costoporlningreso*costoporlningreso_10
  V[['pd_cabu']]      = b_pd_cabu   + b_tiempo_g2*tiempo_g2_21 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_21 + b_trans*trans_21 + b_costoporlningreso*costoporlningreso_21
  V[['pd_tpbu']]      = b_pd_tpbu   + b_tiempo_g2*tiempo_g2_22 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_22 + b_trans*trans_22 + b_costoporlningreso*costoporlningreso_22
  V[['pd_tabu']]      = b_pd_tabu   + b_tiempo_g2*tiempo_g2_23 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_23 + b_trans*trans_23 + b_costoporlningreso*costoporlningreso_23
  V[['pd_catr']]      = b_pd_catr   + b_tiempo_g2*tiempo_g2_24 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_24 + b_trans*trans_24 + b_costoporlningreso*costoporlningreso_24
  V[['pd_tptr']]      = b_pd_tptr   + b_tiempo_g2*tiempo_g2_25 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_25 + b_trans*trans_25 + b_costoporlningreso*costoporlningreso_25
  V[['pd_tatr']]      = b_pd_tatr   + b_tiempo_g2*tiempo_g2_26 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_26 + b_trans*trans_26 + b_costoporlningreso*costoporlningreso_26
  V[['pd_auto']]      = b_pd_auto   + b_tiempo_g2*tiempo_g2_27 + b_tiempo_g2_noobligado*tiempo_g2_noobligado_27 + b_trans*trans_27 + b_costoporlningreso*costoporlningreso_27 + b_autoinercia*autoinercia_27
  
  
  ### Specify nests
  nlNests      = list(root=1, PD=lambda_PD, PR=lambda_PR, APD=lambda_APD, TPPD=lambda_TPPD, APR=lambda_APR, TPPR=lambda_TPPR)
  
  ### Specify tree structure for NL model, PD
  nlStructure= list()
  nlStructure[["root"]]   = c("PD", "PR")
  nlStructure[["PD"]]   = c("APD", "TPPD")
  nlStructure[["PR"]]   = c("APR", "TPPR")
  nlStructure[["APD"]]     = c("pd_auto")  
  nlStructure[["TPPD"]]     = c("pd_cabu","pd_tpbu","pd_tabu", "pd_catr","pd_tptr","pd_tatr")
  nlStructure[["APR"]]     = c("pr_auto")
  nlStructure[["TPPR"]]     = c("pr_cabuca","pr_cabutp","pr_cabuta", "pr_tpbuca","pr_tpbutp","pr_tpbuta", "pr_tabuca","pr_tabutp","pr_tabuta")

  ### Define settings for NL model, SP data
  nl_settings_PD <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==2),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PD"]] = apollo_nl(nl_settings_PD, functionality)  
  
  ### Define settings for NL model, RP data  
  
  nl_settings_PR <- list(
    alternatives  = c(pr_cabuca = 1,
                      pr_cabutp = 2, 
                      pr_cabuta = 3,
                      pr_tpbuca = 4,
                      pr_tpbutp = 5,
                      pr_tpbuta = 6,
                      pr_tabuca = 7,
                      pr_tabutp = 8,
                      pr_tabuta = 9, 
                      pr_auto = 10,
                      pd_cabu = 21,
                      pd_tpbu = 22,
                      pd_tabu = 23,
                      pd_catr = 24,
                      pd_tptr = 25,
                      pd_tatr = 26,
                      pd_auto = 27),
    avail         = list(pr_cabuca = disponible_1,
                         pr_cabutp = disponible_2, 
                         pr_cabuta = disponible_3,
                         pr_tpbuca = disponible_4,
                         pr_tpbutp = disponible_5,
                         pr_tpbuta = disponible_6,
                         pr_tabuca = disponible_7,
                         pr_tabutp = disponible_8,
                         pr_tabuta = disponible_9, 
                         pr_auto = disponible_10,
                         pd_cabu = disponible_21,
                         pd_tpbu = disponible_22,
                         pd_tabu = disponible_23,
                         pd_catr = disponible_24,
                         pd_tptr = disponible_25,
                         pd_tatr = disponible_26,
                         pd_auto = disponible_27),
    choiceVar    = eleccion,
    utilities    =     list(pr_cabuca  = V[["pr_cabuca"]],
                            pr_cabutp  = V[["pr_cabutp"]],
                            pr_cabuta  = V[["pr_cabuta"]],
                            pr_tpbuca  = V[["pr_tpbuca"]],
                            pr_tpbutp  = V[["pr_tpbutp"]],
                            pr_tpbuta  = V[["pr_tpbuta"]],
                            pr_tabuca  = V[["pr_tabuca"]],
                            pr_tabutp  = V[["pr_tabutp"]],
                            pr_tabuta  = V[["pr_tabuta"]],
                            pr_auto  = V[["pr_auto"]],
                            pd_cabu  = V[["pd_cabu"]],
                            pd_tpbu  = V[["pd_tpbu"]],
                            pd_tabu  = V[["pd_tabu"]],
                            pd_catr  = V[["pd_catr"]],
                            pd_tptr  = V[["pd_tptr"]],
                            pd_tatr  = V[["pd_tatr"]],
                            pd_auto  = V[["pd_auto"]]),
    rows          = (tipo_datos==1),
    nlNests      = nlNests,
    nlStructure  = nlStructure
  )
  
  ### Compute probabilities using NL model
  P[["PR"]] = apollo_nl(nl_settings_PR, functionality)  
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Apply weights
  P = apollo_weighting(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  
  return(P)
  
}

# Model estimation and reporting ------------------------------------------

model <- apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, 
                        apollo_inputs,
                        list(writeIter=FALSE))

apollo_modelOutput(model)

apollo_saveOutput(model)

# Postprocessing of results -----------------------------------------------

predictions_base = apollo_prediction(model, 
                                     apollo_probabilities, 
                                     apollo_inputs)
  
#

I look forward to your reply.

Thanks very much.

Kind regards

Alex Mitrani
dpalma
Posts: 190
Joined: 24 Apr 2020, 17:54

Re: When applying weights at the individual level, weights should be the same for all observations of each individual

Post by dpalma »

Hi Alex,

Sorry for the belated reply.

You should be able to use weights at the observation level in Apollo v0.2.7, the trick is to call apollo_weighting before apollo_panelProd. Below you will find a modified version of example MNL_RP_SP, where the same weight is assigned to RP and SP responses, even though each individual has 14 SP and only 2 RP responses.

It should be straightforward to do something analogous in your code, but let us know if you run into further issues.

Cheers
David

Code: Select all

# ################################################################# #
#### LOAD LIBRARY AND DEFINE CORE SETTINGS                       ####
# ################################################################# #

### Initialise
rm(list = ls())
library(apollo)
apollo_initialise()

### Set core controls
apollo_control = list(
  modelName  = "mnl_RP_SP_weights",
  modelDescr = "RP-SP model on mode choice data",
  indivID    = "ID", 
  weights    = "weights"
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

### Loading data from package
database = apollo_modeChoiceData
### for data dictionary, use ?apollo_modeChoiceData

# Create weights
# In a real use case, weights would be already in the database
database$weights <- database$RP/sum(database$RP)/2 + database$SP/sum(database$SP)/2
database$weights <- (database$RP/2 + database$SP/14)/2

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta=c(asc_car     = 0,
              asc_bus     = 0,
              asc_air     = 0,
              asc_rail    = 0,
              b_tt        = 0,
              b_access    = 0,
              b_cost      = 0,
              b_no_frills = 0,
              b_wifi      = 0,
              b_food      = 0,
              mu_RP       = 1,
              mu_SP       = 1)

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("asc_car","b_no_frills","mu_RP")

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs = apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities <- function(apollo_beta, apollo_inputs, 
                                 functionality="estimate"){
  
  ### Initialise
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  P = list()
  
  
  ### List of utilities (before applying scales)
  V = list()
  V[["car"]]  = asc_car  + b_tt*time_car                         + b_cost*cost_car
  V[["bus"]]  = asc_bus  + b_tt*time_bus  + b_access*access_bus  + b_cost*cost_bus 
  V[["air"]]  = asc_air  + b_tt*time_air  + b_access*access_air  + b_cost*cost_air   + b_no_frills*(service_air ==1) + b_wifi*(service_air ==2) + b_food*(service_air ==3)
  V[["rail"]] = asc_rail + b_tt*time_rail + b_access*access_rail + b_cost*cost_rail  + b_no_frills*(service_rail==1) + b_wifi*(service_rail==2) + b_food*(service_rail==3)
  
  ### Compute probabilities for the RP part of the data using MNL model
  mnl_settings_RP = list(
    alternatives  = c(car=1, bus=2, air=3, rail=4), 
    avail         = list(car=av_car, bus=av_bus, air=av_air, rail=av_rail), 
    choiceVar     = choice, 
    utilities     = list(car  = mu_RP*V[["car"]],
                         bus  = mu_RP*V[["bus"]],
                         air  = mu_RP*V[["air"]],
                         rail = mu_RP*V[["rail"]]),
    rows          = (RP==1)
  )
  P[["RP"]] = apollo_mnl(mnl_settings_RP, functionality)
  
  ### Compute probabilities for the SP part of the data using MNL model
  mnl_settings_SP = list(
    alternatives  = c(car=1, bus=2, air=3, rail=4), 
    avail         = list(car=av_car, bus=av_bus, air=av_air, rail=av_rail), 
    choiceVar     = choice, 
    utilities     = list(car  = mu_SP*V[["car"]],
                         bus  = mu_SP*V[["bus"]],
                         air  = mu_SP*V[["air"]],
                         rail = mu_SP*V[["rail"]]),
    rows          = (SP==1)
  )
  P[["SP"]] = apollo_mnl(mnl_settings_SP, functionality)
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Apply weights before taking the product of all observations of each indiv
  P = apollo_weighting(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION AND OUTPUT                                 ####
# ################################################################# #

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, apollo_inputs)

apollo_modelOutput(model)

apollo_saveOutput(model)
User avatar
alemitrani
Posts: 2
Joined: 25 Mar 2022, 00:03
Location: Santiago, Chile
Contact:

Re: When applying weights at the individual level, weights should be the same for all observations of each individual

Post by alemitrani »

Hi David

No problem.

Thanks very much for your helpful reply.

Kind regards

Alex
Post Reply