Joint Estimation for Dual Response Survey with Two Segments

bye1830 · Post by **bye1830** » 25 Apr 2024, 02:36

Hi Apollo Team,

Thank you for your support in organizing this forum.

I have conducted a choice experiment-based survey using a dual response design. After referring the Apollo manual, I thought I should use a joint estimation. I have a couple of questions and would greatly appreciate your guidance.

For context, my experiment focuses on zero-emission truck choices, including Battery Electric Trucks (BETs) and Hydrogen Fuel Cell Electric Trucks (HFCETs). The dual response format that I used consists of a forced choice followed by an unforced choice. The survey respondents are segmented into two groups: fleet operators who exclusively use diesel trucks ("diesel fleets") and those who also operate natural gas trucks ("NG fleets"). For diesel fleets, the reference alternative is a diesel truck, while for NG fleets, the reference alternatives include both diesel and natural gas trucks.

In each choice task, I asked two questions: the first requires choosing between BET and HFCET, and the second asks if they would still choose the option selected in the first question if the reference alternative(s) were available. Thus, in the second question, diesel fleets choose between 1) a diesel truck and 2) the BET/HFCET selected previously, while NG fleets choose from 1) a diesel truck, 2) a natural gas truck, and 3) the BET/HFCET selected previously.

I have a total of 54 respondents for this choice experiment section, with 12 from NG fleets and 42 from diesel fleets. Each respondent received 6 choice tasks, resulting in a total of 324 choice tasks in my survey data. Each task consists of a forced choice followed by an unforced choice.

Regarding the joint estimation, which of the following approaches would you recommend? I have also included R scripts below.

CASE 1: Joint estimation of two datasets – forced choice data (324 observations) and unforced choice data (324 observations)

CASE 2: Joint estimation of three datasets – forced choice data (324 observations), unforced choice data for diesel fleets (252 observations), and unforced choice data for NG fleets (72 observations)

CASE 1 - Joint estimation of two datasets

Code: Select all

###############################################
### LOAD LIBRARY AND DEFINE CORE SETTINGS   ###
###############################################

rm(list=ls())
install.packages("apollo")
library(apollo)

#Initialize code
apollo_initialise()

#Set core controls
apollo_control = list(
  modelName = "Main_JOINT-INTRTN1-TWO-DATASETS",
  modelDescr = "Forced-unforced data joint model",
  indivID = "ID",
  outputDirectory = "output"
)

###############################################
### LOAD DATA AND APPLY ANY TRANSFORMATIONS ###
###############################################

#consider both forced and unforced choice data
database = read.csv("E:/Survey/Estimation/main_survey.csv", header=TRUE)

###############################################
### DEFINE MODEL PARAMETERS                 ###
###############################################

#Vector of parameters
apollo_beta = c(asc_bev = 0, asc_hfcev = 0, asc_ngv = 0, asc_dsl = 0,
                b_pcost = 0, b_ocost = 0, b_range = 0, b_offsite = 0, b_onsite_bev = 0, b_onsite_hfcev = 0,
                asc_bev_shift_adopter = 0, asc_hfcev_shift_adopter = 0,
                asc_ngv_shift_small_org = 0,
                b_ocost_shift_small_fleet = 0,
                mu_unforced = 1)

# Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use
apollo_fixed = c("asc_dsl")

###############################################
### GROUP AND VALIDATE INPUTS               ###
###############################################

apollo_inputs = apollo_validateInputs()

###############################################
### DEFINE MODEL AND LIKELIHOOD FUNCTION    ###
###############################################

apollo_probabilities = function (apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### Create coefficients using interactions with fleet characteristics
  asc_bev_value = asc_bev + asc_bev_shift_adopter * bev_adopter
  asc_hfcev_value = asc_hfcev + asc_hfcev_shift_adopter * hfcev_adopter
  asc_ngv_value = asc_ngv + asc_ngv_shift_small_org * small_org3
  b_pcost_value = b_pcost / relative_annual_revenue
  b_ocost_value = b_ocost + b_ocost_shift_small_fleet * small_fleet3
  b_offsite_value = b_offsite * small_fleet3
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  V[["bev"]] = asc_bev_value + b_pcost_value * bev_pcost + b_ocost_value * bev_ocost + b_range * bev_range + b_offsite_value * bev_offsite_binary + b_onsite_bev * bev_onsite
  V[["hfcev"]] = asc_hfcev_value + b_pcost_value * hfcev_pcost + b_ocost_value * hfcev_ocost + b_range * hfcev_range + b_offsite_value * hfcev_offsite_binary + b_onsite_hfcev * hfcev_onsite
  V[["dsl"]] = asc_dsl + b_pcost_value * dsl_pcost + b_ocost_value * dsl_ocost + b_range * dsl_range + b_offsite_value * dsl_offsite_binary
  V[["ngv"]] = asc_ngv_value + b_pcost_value * ngv_pcost + b_ocost_value * ngv_ocost + b_range * ngv_range + b_offsite_value * ngv_offsite_binary
  
  
  ### Compute probabilities for "forced" choice using MNL model
  mnl_settings_forced = list(
    alternatives = c(bev=1, hfcev=2),
    avail = list(bev=alt_electric, hfcev=alt_hydrogen),
    choiceVar = choice, 
    utilities = list(bev = V[["bev"]],
                     hfcev = V[["hfcev"]]),
    rows = (forced==1)
  )
  
  P[["choice_forced"]] = apollo_mnl(mnl_settings_forced, functionality)
  
  
  ### Compute probabilities for "unforced" choice using MNL model
  mnl_settings_unforced = list(
    alternatives = c(bev=1, hfcev=2, dsl=3, ngv=4),
    avail = list(bev=alt_electric, hfcev=alt_hydrogen, dsl=alt_diesel, ngv=alt_cng),
    choiceVar = choice, 
    utilities = list(bev = mu_unforced*V[["bev"]],
                     hfcev = mu_unforced*V[["hfcev"]],
                     dsl = mu_unforced*V[["dsl"]],
                     ngv = mu_unforced*V[["ngv"]]),
    rows = (forced==2)
  )
  
  P[["choice_unforced"]] = apollo_mnl(mnl_settings_unforced, functionality)
  
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

#Model estimation
model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

#Model outputs
# ------------------------------------------------------------------------------- #
# -------------------------- FORMATTED OUTPUT (TO SCREEN) ------------------------
# ------------------------------------------------------------------------------- #
apollo_modelOutput(model)
# ------------------------------------------------------------------------------- #
# ------------------ FORMATTED OUTPUT (TO FILE, using model name) ----------------
# ------------------------------------------------------------------------------- #
apollo_saveOutput(model)

CASE 2 - Joint estimation of three datasets

Code: Select all

###############################################
### LOAD LIBRARY AND DEFINE CORE SETTINGS   ###
###############################################

rm(list=ls())
install.packages("apollo")
library(apollo)

#Initialize code
apollo_initialise()

#Set core controls
apollo_control = list(
  modelName = "Main_JOINT-INTRTN2-THREE-DATASETS",
  modelDescr = "Joint estimation of forced, unforced diesel, and unforced NG datasets",
  indivID = "ID",
  outputDirectory = "output"
)

###############################################
### LOAD DATA AND APPLY ANY TRANSFORMATIONS ###
###############################################

#consider both forced and unforced choice data
database = read.csv("E:/Survey/Estimation/main_survey.csv", header=TRUE)


###############################################
### DEFINE MODEL PARAMETERS                 ###
###############################################

#Vector of parameters
apollo_beta = c(asc_bev = 0, asc_hfcev = 0, asc_ngv = 0, asc_dsl = 0,
                b_pcost = 0, b_ocost = 0, b_range = 0, b_offsite = 0, b_onsite_bev = 0, b_onsite_hfcev = 0,
                asc_bev_shift_adopter = 0, asc_hfcev_shift_adopter = 0,
                asc_ngv_shift_small_org = 0,
                b_ocost_shift_small_fleet = 0,
                mu_unforced_dsl = 1, mu_unforced_ng = 1)

# Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use
apollo_fixed = c("asc_dsl")


###############################################
### GROUP AND VALIDATE INPUTS               ###
###############################################

apollo_inputs = apollo_validateInputs()


###############################################
### DEFINE MODEL AND LIKELIHOOD FUNCTION    ###
###############################################

apollo_probabilities = function (apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### Create coefficients using interactions with fleet characteristics
  asc_bev_value = asc_bev + asc_bev_shift_adopter * bev_adopter
  asc_hfcev_value = asc_hfcev + asc_hfcev_shift_adopter * hfcev_adopter
  asc_ngv_value = asc_ngv + asc_ngv_shift_small_org * small_org3
  b_pcost_value = b_pcost / relative_annual_revenue
  b_ocost_value = b_ocost + b_ocost_shift_small_fleet * small_fleet3
  b_offsite_value = b_offsite * small_fleet3
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  V[["bev"]] = asc_bev_value + b_pcost_value * bev_pcost + b_ocost_value * bev_ocost + b_range * bev_range + b_offsite_value * bev_offsite_binary + b_onsite_bev * bev_onsite
  V[["hfcev"]] = asc_hfcev_value + b_pcost_value * hfcev_pcost + b_ocost_value * hfcev_ocost + b_range * hfcev_range + b_offsite_value * hfcev_offsite_binary + b_onsite_hfcev * hfcev_onsite
  V[["dsl"]] = asc_dsl + b_pcost_value * dsl_pcost + b_ocost_value * dsl_ocost + b_range * dsl_range + b_offsite_value * dsl_offsite_binary
  V[["ngv"]] = asc_ngv_value + b_pcost_value * ngv_pcost + b_ocost_value * ngv_ocost + b_range * ngv_range + b_offsite_value * ngv_offsite_binary
  
  
  ### Compute probabilities for "forced" choice using MNL model
  mnl_settings_forced = list(
    alternatives = c(bev=1, hfcev=2),
    avail = list(bev=alt_electric, hfcev=alt_hydrogen),
    choiceVar = choice, 
    utilities = list(bev = V[["bev"]],
                     hfcev = V[["hfcev"]]),
    rows = (choice_set==2)
  )
  
  P[["choice_forced"]] = apollo_mnl(mnl_settings_forced, functionality)
  
  
  ### Compute probabilities for "unforced" choice for "diesel fleets" using MNL model
  mnl_settings_unforced_dsl = list(
    alternatives = c(bev=1, hfcev=2, dsl=3),
    avail = list(bev=alt_electric, hfcev=alt_hydrogen, dsl=alt_diesel),
    choiceVar = choice, 
    utilities = list(bev = mu_unforced_dsl*V[["bev"]],
                     hfcev = mu_unforced_dsl*V[["hfcev"]],
                     dsl = mu_unforced_dsl*V[["dsl"]]),
    rows = (choice_set==3)
  )
  
  P[["choice_unforced_dsl"]] = apollo_mnl(mnl_settings_unforced_dsl, functionality)
  
  
  ### Compute probabilities for "unforced" choice for "NG fleets" using MNL model
  mnl_settings_unforced_ng = list(
    alternatives = c(bev=1, hfcev=2, dsl=3, ngv=4),
    avail = list(bev=alt_electric, hfcev=alt_hydrogen, dsl=alt_diesel, ngv=alt_cng),
    choiceVar = choice, 
    utilities = list(bev = mu_unforced_ng*V[["bev"]],
                     hfcev = mu_unforced_ng*V[["hfcev"]],
                     dsl = mu_unforced_ng*V[["dsl"]],
                     ngv = mu_unforced_ng*V[["ngv"]]),
    rows = (choice_set==4)
  )
  
  P[["choice_unforced_ng"]] = apollo_mnl(mnl_settings_unforced_ng, functionality)
  
  
  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

#Model estimation
model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)

#Model outputs
# ------------------------------------------------------------------------------- #
# -------------------------- FORMATTED OUTPUT (TO SCREEN) ------------------------
# ------------------------------------------------------------------------------- #
apollo_modelOutput(model)
# ------------------------------------------------------------------------------- #
# ------------------ FORMATTED OUTPUT (TO FILE, using model name) ----------------
# ------------------------------------------------------------------------------- #
apollo_saveOutput(model)

In addition, could you please let me know if there are any erroneous parts in my R scripts above? I'd appreciate any suggestions for improvements.

Thank you very much!

Best regards,

YB

Post by **stephanehess** » 05 May 2024, 11:03

Hi

looks correct to me. I would prefer specification 2 as you allow for further scale differences. In addition however, did you test for differences between the samples in how they react to the individual attributes?

Stephane

bye1830 · Post by **bye1830** » 09 May 2024, 07:34

Hi Stephane,

Thank you for reviewing my script and providing suggestions. Regarding the test for differences between the two segments, could you educate me about the types of approaches I could use?

I've attempted one potential method. For diesel fleets and NG fleets, the forced choice questions are the same, while the unforced choice questions were presented with different options. Thus, I attempted to compare estimation results using the forced-choice data between the two segments and also conducted joint estimation combining these two segments.

Here are the results. Just FYI, I've modified the utility functions from my previous post after reflecting on your reply to my other post (i.e., treating 'annual_revenue' as a categorical variable).

Estimation results for diesel fleets (forced choice, 252 observations)

Code: Select all

				Estimate Std.err. t-ratio(0) Rob.std.err. Rob.t-ratio(0)
asc_bev				1.147	1.145	1.002	1.371	0.836
asc_hfcev			0.000	NA	NA	NA	NA
b_pcost				0.127	0.773	0.164	0.636	0.199
b_ocost				-0.840	0.582	-1.443	0.669	-1.254
b_range				0.261	0.055	4.784	0.067	3.877
b_offsite			0.259	0.194	1.337	0.206	1.262
b_onsite_bev			-1.053	0.997	-1.056	1.197	-0.880
b_onsite_hfcev			0.429	1.004	0.427	1.266	0.339
asc_bev_shift_adopter		1.530	0.590	2.596	0.552	2.771
b_pcost_AR_less_than_10M	-0.396	0.808	-0.489	0.669	-0.591
b_pcost_AR_between_10M_15M	-0.206	0.984	-0.209	0.697	-0.295
b_pcost_AR_between_15M_30M	-2.098	1.312	-1.598	0.907	-2.314
b_pcost_AR_NA			-0.130	0.906	-0.143	0.709	-0.183

*The 'asc_hfcev_shift_adopter' parameter was excluded from the utility function as none of the diesel fleets operate HFCEVs in my sample.

--> The estimates for 'b_range', 'asc_bev_shift_adoper', and 'b_pcost_AR_btw_15_30M' are significant at the 1% or 5% level.

Estimation results for NG fleets (forced choice, 72 observations)

Code: Select all

				Estimate Std.err. t-ratio(0) Rob.std.err. Rob.t-ratio(0)
asc_bev				-0.092	2.136	-0.043	3.474	-0.026
asc_hfcev			0.000	NA	NA	NA	NA
b_pcost				-0.498	0.672	-0.740	0.592	-0.841
b_ocost				-1.249	1.112	-1.123	0.887	-1.408
b_range				0.186	0.107	1.730	0.114	1.624
b_offsite			0.862	0.551	1.564	0.631	1.367
b_onsite_bev			-0.623	1.846	-0.338	3.035	-0.205
b_onsite_hfcev			0.005	1.859	0.002	2.983	0.002
asc_bev_shift_adopter		2.003	0.886	2.261	0.482	4.159
asc_hfcev_shift_adopter		1.958	0.969	2.019	0.856	2.288
b_pcost_AR_less_than_10M	-0.142	0.910	-0.156	0.873	-0.162
b_pcost_AR_between_10M_15M	0.438	1.198	0.366	0.813	0.539
b_pcost_AR_between_15M_30M	0.010	1.360	0.008	0.643	0.016

*The 'b_pcost_AR_NA' parameter was excluded from the utility functions because none of the NG fleets chose the 'Decline to state' option in the annual revenue question."

--> The estimates for 'b_range', 'asc_bev_shift_adoper', and 'asc_hfcev_shift_adopter' are significant at the 1%, 5%, or 10% level. For those commonly significant estimates between diesel and NG fleets, their absolute values are different (0.261 vs 0.186 for 'b_range', also 1.530 vs 2.003 for 'asc_bev_shift_adoper').

Joint estimation results for both fleets (forced choice, 324 observations)

Code: Select all

				Estimate Std.err. t-ratio(0) Rob.std.err. Rob.t-ratio(0)
asc_bev				1.085	1.055	1.028	1.284	0.845
asc_hfcev			0.000	NA	NA	NA	NA
b_pcost				-0.181	0.558	-0.324	0.481	-0.375
b_ocost				-0.921	0.534	-1.724	0.601	-1.533
b_range				0.253	0.053	4.801	0.065	3.902
b_offsite			0.320	0.186	1.724	0.201	1.592
b_onsite_bev			-1.107	0.915	-1.210	1.116	-0.991
b_onsite_hfcev			0.451	0.919	0.490	1.171	0.385
asc_bev_shift_adopter		1.784	0.539	3.309	0.484	3.689
asc_hfcev_shift_adopter		2.216	0.774	2.861	1.031	2.149
b_pcost_AR_less_than_10M	-0.134	0.602	-0.222	0.523	-0.256
b_pcost_AR_between_10M_15M	0.151	0.778	0.194	0.551	0.273
b_pcost_AR_between_15M_30M	-1.339	1.033	-1.296	0.835	-1.603
b_pcost_AR_NA			0.182	0.734	0.248	0.577	0.316
mu_ngv_fleets			0.785	0.326	2.409	0.305	2.576

--> The estimates for 'b_ocost', 'b_range', 'b_offsite', 'asc_bev_shift_adoper', and 'asc_hfcev_shift_adopter' are found significant at the 1%, 5%, or 10% level. In particular, the scale parameter 'mu_ngv_fleets' is significant at the 5% level.

Then, would it be reasonable to say these two segments have differences in responding to individual attributes? Are there any approaches to test the differences?

In addition, I wonder which t-ratio between traditional ones and robust ones I should refer to when determining the significance of each estimate. Do you have any recommendations, especially given that my sample is relatively not very large (324 observations for each of forced and unforced choices)?

I'd greatly appreciate any wisdom and insights you could share. Thank you very much!

Best regards,

YB

Post by **stephanehess** » 22 May 2024, 11:53

Hi

the two separate models is the same as if you had a fully segmented model. So you can use a LR test to compare the sum of the two separaet models against your generic model without differences. The degrees of freedom would be the additional parameters needed for two separate models

Stephane

ApolloChoiceModelling forum

Joint Estimation for Dual Response Survey with Two Segments

Joint Estimation for Dual Response Survey with Two Segments

Re: Joint Estimation for Dual Response Survey with Two Segments

Re: Joint Estimation for Dual Response Survey with Two Segments

Re: Joint Estimation for Dual Response Survey with Two Segments