Structuring a dataset with forced choice followed by unforced/opt-out

theycallmemylinh · Post by **theycallmemylinh** » 12 Jun 2021, 00:55

Hello there,

I was interested to know how to code a dataset that presents the participant with a forced choice (Program A or Program B), followed by an opportunity to opt-out (Program A or not participate at all, Program B or not participate at all). The DCE_forced and opt-out image shows what the participant might see in the survey (--only one of the opt-out questions is shown to the participant based on the previous forced choice of Program A or Program B)

: DCE_forced and opt-out.png (47.72 KiB) Viewed 14249 times

My full dataset is available at the OSF: https://osf.io/jfz5h, and I've also included a screenshot of the dataset to give you a sense of how I've currently structured the dataset.

For example Program A is represented by: goal1, form1, mag1, dir1; Program B is represented by: goal2, form2, mag2, dir2. At present if the participant opted out of participation, I have that represented by goal3, form3, mag3, dir3 (all with the values are "SQ"). Column D: choice_forced represents whether the participant selected Program A=1 or Program B=2, Column E: choice_tx represents whether the participant selected to participate in either Program (=1) or opt-out (=0), and Column F: choice_best represent the first choice of the participant (Program A=1, Program B=2, opt-out=0)

: DCE_dataset_screenshot.png (101.3 KiB) Viewed 14249 times

I appreciate any guidance you can provide on how to code these opt-out data in the dataset.

Thanks!
My-Linh

Post by **stephanehess** » 12 Jun 2021, 13:58

Hi

this depends entirely on what you want to do in your model. Are you wanting to jointly model the choice of the preferred programme and then for each programme the decision on whether they would participate or not? This is not difficult to do, but it would mean you needing to make assumptions about whether the preferences that drive the choice between are the same as those that determine acceptance or not of either programme. If you're happy with that, you could model this as having three dependent variables, the choice between, and then the participation in each, and all three would be binary models. Is that you want, then I can help you set it up?

Stephane

theycallmemylinh · Post by **theycallmemylinh** » 21 Jun 2021, 16:09

Hi Stephane!

Thanks for your response; yes, the suggestion you made below is the direction we would like to go in. Greatly appreciate your assistance

My-Linh

stephanehess wrote: ↑12 Jun 2021, 13:58 Hi

this depends entirely on what you want to do in your model. Are you wanting to jointly model the choice of the preferred programme and then for each programme the decision on whether they would participate or not? This is not difficult to do, but it would mean you needing to make assumptions about whether the preferences that drive the choice between are the same as those that determine acceptance or not of either programme. If you're happy with that, you could model this as having three dependent variables, the choice between, and then the participation in each, and all three would be binary models. Is that you want, then I can help you set it up?

Stephane

Post by **stephanehess** » 29 Jun 2021, 22:00

Hi

the easiest way to prepare the data would be one row per choice card, with the choice between and the participation questions all in the same row. Then you would just have a model with three components, probably all MNL to start with

Stephane

theycallmemylinh · Post by **theycallmemylinh** » 08 Jul 2021, 00:49

Hi Stephane,

I am not sure if I have understood your comments. As the data are currently structured, I do have one row per choice (Column C: task) with the choice between two programs (Column D: choice_forced) and the choice between the selected program and the treatment (Column E: choice_tx). At present I have the unforced choice/opt-out coded in the data as "SQ" across the different levels of the attribute (Columns O:R), which I presume is not appropriate. [please see the attached screenshot]. I have been able to successfully run an MNL model for the forced choice for the choice between, but wasn't clear on how to run the MNL model for the unforced choice for the participation in each since I don't believe I've coded the data correctly to reflect the opt-out choice.

Should the data be structured in a different way such that the forced choice and unforced choice are in two separate datasets or two different rows?

Thank you for your assistance!
My-Linh

Post by **stephanehess** » 08 Jul 2021, 10:33

Hi

please also show us your code

Thanks

theycallmemylinh · Post by **theycallmemylinh** » 10 Jul 2021, 01:03

Hi Stephane,

This is the code I used for an MNL with the forced choice model, using these data https://osf.io/jfz5h

Code: Select all

### Load libraries
library(here)
library(readr)
library(apollo)
library(dplyr)
### Initialise code
apollo_initialise()

### Set core controls
apollo_control<-list(
  modelName= "dce_model1",
  modelDescr="MNL model on SP data",
  indivID= "ID"
)

# ################################################################# #
#### 2. Data loading and apply any transformations               ####
# ################################################################# #
database<-read_rds(here("01_data","02_processed", "00_data_processed.rds")) 
 

# ####################################################### #
#### 3. Parameter definition                           ####
# ####################################################### #

### Vector of parameters, including any that are kept fixed 
### during estimation

apollo_beta = c(
  b_goal_30=0,
  b_goal_60=0,
  b_goal_90=0,
  b_form_cash=0,
  b_form_voucher=0,
  b_form_donate=0,
  b_mag_160=0,
  b_mag_300=0,
  b_mag_500=0,
  b_dir_pos=0,
  b_dir_neg=0
  )

### Vector with names (in quotes) of parameters to be
###  kept fixed at their starting value in apollo_beta.
### Use apollo_beta_fixed = c() for no fixed parameters.
apollo_fixed<-c("b_goal_30","b_form_cash", "b_mag_160", "b_dir_pos")

#apollo_fixed <- c()
# ####################################################### #
#### 4. Input validation                               ####
# ####################################################### #

apollo_inputs = apollo_validateInputs()
# Several observations per individual detected based on the value of ID. Setting panelData in apollo_control set
# to TRUE.
# All checks on apollo_control completed.
# All checks on database completed.

# ####################################################### #
#### 5. Define Model and Likelihood definition                          ####
# ####################################################### #

apollo_probabilities=function(apollo_beta, apollo_inputs, 
                              functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as
  ### in mnl_settings, order is irrelevant.
  V = list()
  V[['alt1']] = b_goal_30*(goal1==30) + b_goal_60*(goal1==60)+ b_goal_90*(goal1==90)+
    b_form_cash*(form1=="cash")+b_form_donate*(form1=="donate")+b_form_voucher*(form1=="voucher")+
    b_mag_160*(mag1==160)+b_mag_300*(mag1==300)+b_mag_500*(mag1==500)+
    b_dir_pos*(dir1=="pos")+b_dir_neg*(dir1=="neg")
  V[['alt2']] = b_goal_30*(goal2==30) + b_goal_60*(goal2==60)+ b_goal_90*(goal2==90)+
    b_form_cash*(form2=="cash")+b_form_donate*(form2=="donate")+b_form_voucher*(form2=="voucher")+
    b_mag_160*(mag2==160)+b_mag_300*(mag2==300)+b_mag_500*(mag2==500)+
    b_dir_pos*(dir2=="pos")+b_dir_neg*(dir2=="neg")
  
  # asc_1 + 
  # asc_2 +
  
  ### Define settings for MNL model component
  mnl_settings = list(
    alternatives  = c(alt1=1, alt2=2), 
    avail         = 1, 
    choiceVar     = choice_forced,
    # explanators  = database[,c("risk_score","loss_score","intrinsic","extrinsic", "pain_NRS", "function_NRS",
    #                            "IPAQ_cat", 
    #                            "gender", "age", "BMI_calc", "income")],
  V=V)
  
    ### Compute probabilities using MNL model
  P[['model']] = apollo_mnl(mnl_settings, functionality)
  
  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}


# ####################################################### #
#### 6. Model estimation and reporting                 ####
# ####################################################### #

model = apollo_estimate(apollo_beta, apollo_fixed, 
                        apollo_probabilities, 
                        apollo_inputs)

# Testing likelihood function...
# WARNING: Availability not provided (or some elements are NA). Full availability assumed.
# 
# Overview of choices for MNL model component :
#                                    alt1    alt2
# Times available                  2294.00 2294.00
# Times chosen                     1067.00 1227.00
# Percentage chosen overall          46.51   53.49
# Percentage chosen when available   46.51   53.49
# 
# Pre-processing likelihood function...
# Preparing pre-processing report
# 
# Testing influence of parameters.......
# Starting main estimation
# Initial function value: -1590.08 
# Initial gradient value:
#   b_goal_60      b_goal_90 b_form_voucher  b_form_donate      b_mag_300      b_mag_500      b_dir_neg 
#     30.5         -186.5          138.0         -485.0           22.5          152.5            8.0 
# initial  value 1590.079632 
# iter   2 value 1268.441533
# iter   3 value 1202.052010
# iter   4 value 1200.157902
# iter   5 value 1195.722906
# iter   6 value 1191.905540
# iter   7 value 1157.219204
# iter   8 value 1156.673704
# iter   9 value 1139.146684
# iter  10 value 1132.782311
# iter  11 value 1099.779611
# iter  12 value 1099.463659
# iter  13 value 1099.453357
# iter  14 value 1099.451959
# iter  15 value 1099.451820
# iter  15 value 1099.451818
# iter  15 value 1099.451818
# final  value 1099.451818 
# converged
# Estimated parameters:
#   Estimate
# b_goal_30          0.00000
# b_goal_60         -0.19134
# b_goal_90         -0.86738
# b_form_cash        0.00000
# b_form_voucher    -0.51883
# b_form_donate     -1.96009
# b_mag_160          0.00000
# b_mag_300          0.63871
# b_mag_500          0.84113
# b_dir_pos          0.00000
# b_dir_neg         -0.06212
# 
# Computing covariance matrix using analytical gradient.
# 0%....25%....50%....75%....100%
# Negative definite Hessian with maximum eigenvalue: -77.850397
# Computing score matrix...
# Calculating LL(0) for applicable models...
# Calculating LL of each model component...

apollo_modelOutput(model, list (printPVal=TRUE))
# Model run using Apollo for R, version 0.2.5 on Windows by My-Linh 
# www.ApolloChoiceModelling.com
# 
# Model name                       : dce_model1
# Model description                : MNL model on SP data
# Model run at                     : 2021-06-11 16:53:46
# Estimation method                : bfgs
# Model diagnosis                  : successful convergence 
# Number of individuals            : 288
# Number of rows in database       : 2294
# Number of modelled outcomes      : 2294
# 
# Number of cores used             :  1 
# Model without mixing
# 
# LL(start)                        : -1590.08
# LL(0)                            : -1590.08
# LL(final)                        : -1099.452
# Rho-square (0)                   :  0.3086 
# Adj.Rho-square (0)               :  0.3042 
# AIC                              :  2212.9 
# BIC                              :  2253.07 
# 
# 
# Estimated parameters             :  7
# Time taken (hh:mm:ss)            :  00:00:2.07 
# pre-estimation              :  00:00:0.8 
# estimation                  :  00:00:0.71 
# post-estimation             :  00:00:0.56 
# Iterations                       :  17  
# Min abs eigenvalue of Hessian    :  77.8504 
# 
# Estimates:
#                   Estimate        s.e.   t.rat.(0)  p(1-sided)    Rob.s.e. Rob.t.rat.(0)  p(1-sided)
# b_goal_30          0.00000          NA          NA          NA          NA            NA          NA
# b_goal_60         -0.19134     0.07720      -2.478    0.006599     0.08397        -2.279     0.01134
# b_goal_90         -0.86738     0.07617     -11.388    0.000000     0.09436        -9.192     0.00000
# b_form_cash        0.00000          NA          NA          NA          NA            NA          NA
# b_form_voucher    -0.51883     0.07106      -7.301   1.428e-13     0.07595        -6.831   4.206e-12
# b_form_donate     -1.96009     0.08633     -22.705    0.000000     0.11333       -17.295     0.00000
# b_mag_160          0.00000          NA          NA          NA          NA            NA          NA
# b_mag_300          0.63871     0.08125       7.861   1.887e-15     0.07983         8.001   6.661e-16
# b_mag_500          0.84113     0.07464      11.269    0.000000     0.08854         9.500     0.00000
# b_dir_pos          0.00000          NA          NA          NA          NA            NA          NA
# b_dir_neg         -0.06212     0.05372      -1.157    0.123727     0.05476        -1.135     0.12828

Post by **stephanehess** » 12 Jul 2021, 11:36

Thanks. Two more questions

1. Do you want to model both follow-up questions, i.e. for each alternative? If so, what are the columns?
2. Do you want to allow for a scale difference in the utility for the follow-up question compared to the forced choice?

Stephane

theycallmemylinh · Post by **theycallmemylinh** » 13 Jul 2021, 02:37

Hi Stephane,

(1) I don't think it makes sense to model both follow-up questions, since participants only saw one of the questions, based on which Program they selected (i.e. based on the question logic, they would only see Q5.11 if in the previous question they selected Program A, and only see Q5.12 if they selected Program B in the previous question).

: DCE_forced and opt-out.png (47.72 KiB) Viewed 13498 times

Since not all options A, B and opt-out were presented at one time (intentionally, to avoid a participant opting-out for all questions & not providing any preference data), there aren't enough data for ranking (e.g. best-worst DCE) but having the data for the preferred treatment (Program A or Program B) & status quo data available in one column might make more sense to model as opposed to treatment v no treatment. Does it make more sense to use the "choice_best" column to model the unforced choice? In the "choice_best" column 0= status quo, 1= Program A, 2= Program B, whereas in the "choice_tx" column 0=status quo, 1= treatment (Program A or Program B)

: DCE_data structure.png (84.24 KiB) Viewed 13498 times

(2) I think it makes sense to allow for a scale difference for the follow-up question

Thanks for helping me think this through carefully!
My-Linh

Post by **stephanehess** » 19 Jul 2021, 18:42

sorry, not sure I follow, but it seems straightforward to me to just model what is in the survey.

So you would have a utility for A and for B, and then you would model the preferred option out of A and B, followed by the choice between the chosen one and the status quo.

Something a bit like this

Code: Select all

  ### Create list of probabilities P
  P = list()
  
  ### List of utilities: these must use the same names as in mnl_settings, order is irrelevant
  V = list()
  V[['A']] = ...
  V[['B']] = ...
  V[['SQ']] = 0

  ### Compute probabilities for preferred out of A and B
  mnl_settings = list(
    alternatives = c(A=1, B=2, SQ=3),
    avail        = list(A=1, B=1, SQ=0),
    choiceVar    = choice_best,
    V            = V
  )
  P[['choice_best']] = apollo_mnl(mnl_settings, functionality)
  
  ### Compute probabilities for 'worst' choice using MNL model
  mnl_settings$avail        = list(A=(choice_best==1), B=(choice_best=2), SQ=1)
  mnl_settings$choiceVar    = (choice_forced==1)*choice_best+(choice_forced==2)*3
  mnl_settings$V            = lapply(V,"*",mu_forced)
  
  P[['forced']] = apollo_mnl(mnl_settings, functionality)

  ### Combined model
  P = apollo_combineModels(P, apollo_inputs, functionality)

  ### Take product across observation for same individual
  P = apollo_panelProd(P, apollo_inputs, functionality)

ApolloChoiceModelling forum

Structuring a dataset with forced choice followed by unforced/opt-out

Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out

Re: Structuring a dataset with forced choice followed by unforced/opt-out