Integration of covariates into HB estimation

cybey · Post by **cybey** » 11 May 2020, 09:19

Hello, everyone,

I have a question on using covariates based on an old topic in the Google group. I would like to estimate a MIXL model with covariates using HB. When using HB, I define the values of the transformation in apollo_probabilities()…

Code: Select all

b_Price_value = b_Price +
    eta_Income_Price * Income

… and get the conditionals for every coefficient in the model$estimate part. If I include the covariate as a fixed parameter, then under model$estimate I have only one parameter for the covariate, since they are the same for the whole population. In contrast, with a normal distribution, I have one parameter per observation/respondent.

1) In my example, I would like to get/plot the conditionals of b_Price_value and not of b_Price for every respondent. Is there an easy way to do this in apollo? When estimating with MSL I can define this transformation directly in randCoeff(), which is unfortunately not possible with HB.
The only thing I can think of spontaneously is to do it "by hand": With normal distribution, I could simply add the conditionals of model$estimate to get b_Price_value. With fixed covariates (“F”), on the other hand, I would have to multiply the covariates by the properties of the respondents (e.g. income). Is that right?

The problem with my data is that the estimated part-worth utilities (here: b_price) are so strongly influenced by the covariates that they can hardly be interpreted meaningfully without taking the covariates into account.

2) Furthermore, I wonder how the covariates are included in apollo_predictions() when they are fixed? Do they only enter the conditionals indirectly via the "upper model" estimates?

I look forward to your answers.

Post by **stephanehess** » 11 May 2020, 21:27

Hi

Apollo relies on RSGHB for Bayesian estimation, and the posteriors are for each individual coefficient rather than an addition or other transformation involving multiple coefficients. With Normals, you could add up the posterior means, but would you really want to use a Normal for price anyway? With fixed coefficients, you could add the values.

You should also consider whether the conditionals are actually what you want to use in the post-estimation work and whether you should instead work with the upper level model.

In relation to what Apollo uses in prediction with models estimated with HB, these are the posterior means, as discussed in the manual, and with the caveat that users should be careful about using posteriors for this purpose.

Best wishes

Stephane

cybey · Post by **cybey** » 12 May 2020, 08:23

Maybe I should describe in more detail what I want to do. I wish to estimate a MIXL model in WTP space including covariates. Here are the distributional assumptions in hbDist():

Code: Select all

apollo_HB = list(
  hbDist         = c(wtp_Anbieter2 = "N",
                     wtp_Anbieter3 = "N",
                     wtp_Strommix2 = "N",
                     wtp_Strommix3 = "N",
                     wtp_Strommix4 = "N",
                     wtp_Regioanteil2 = "N",
                     wtp_Regioanteil3 = "N",
                     b_Preis = "CN-",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Anbieter2
                     # ----------------------------------------------------------------- #
                     
                     ## Sociodemographics
                     
                     ## Current
                     wtp_CurrentSupplierKVU_Anbieter2 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Anbieter3
                     # ----------------------------------------------------------------- #
                     
                     ## Sociodemographics

                     ## Current
                     wtp_CurrentSupplierBEG_Anbieter3 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Strommix2
                     # ----------------------------------------------------------------- #
                     
                     ## Sociodemographics
                     
                     ## Current
                     wtp_CurrentMix_Strommix2 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Strommix3
                     # ----------------------------------------------------------------- #

                     ## Sociodemographics
                     
                     ## Current
                     wtp_CurrentMix_Strommix3 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Strommix4
                     # ----------------------------------------------------------------- #

                     ## Sociodemographics
                     
                     ## Current
                     wtp_CurrentMix_Strommix4 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Regioanteil2
                     # ----------------------------------------------------------------- #
                     
                     ## Sociodemographics
                     wtp_Gender_Regioanteil2 = "F",
                     wtp_Age_Regioanteil2 = "F",
                     wtp_Education_Regioanteil2 = "F",
                     wtp_Residence_Regioanteil2 = "F",
                     wtp_FederalState.Wind_Regioanteil2 = "F",
                     wtp_FederalState.PV_Regioanteil2 = "F",

                     ## Current
                     wtp_CurrentMix_Regioanteil2 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Regioanteil3
                     # ----------------------------------------------------------------- #
                     
                     ## Sociodemographics
                     wtp_Gender_Regioanteil3 = "F",
                     wtp_Age_Regioanteil3 = "F",
                     wtp_Education_Regioanteil3 = "F",
                     wtp_Residence_Regioanteil3 = "F",
                     wtp_FederalState.Wind_Regioanteil3 = "F",
                     wtp_FederalState.PV_Regioanteil3 = "F",
                     
                     ## Current
                     wtp_CurrentMix_Regioanteil3 = "F",
                     
                     
                     # ----------------------------------------------------------------- #
                     #---- Preis
                     # ----------------------------------------------------------------- #
                     
                     ## Sociodemographics
                     b_Gender_Preis = "F",
                     b_Age_Preis = "F",
                     b_Income_Preis = "F",
                     b_Residence_Preis = "F",
                     
                     ## Current
                     b_PriceMonthly_centered_Preis = "F"
  )

Because of the covariates, I implement transformations of the variables in apollo_probabilities(). For example:

Code: Select all

  # ----------------------------------------------------------------- #
  #---- Regioanteil2
  # ----------------------------------------------------------------- #
  
  wtp_Regioanteil2_value = wtp_Regioanteil2 +

    ## Sociodemographics
    wtp_Gender_Regioanteil2 * Gender +
    wtp_Age_Regioanteil2 * Age +
    wtp_Education_Regioanteil2 * Education +
    wtp_Residence_Regioanteil2 * Residence +
    wtp_FederalState.Wind_Regioanteil2 * FederalState.Wind +
    wtp_FederalState.PV_Regioanteil2 * FederalState.PV +

    ## Current
    wtp_CurrentMix_Regioanteil2 * CurrentMix


  # ----------------------------------------------------------------- #
  #---- Preis
  # ----------------------------------------------------------------- #
  
  b_Preis_value = b_Preis +
    
    ## Sociodemographics
    b_Gender_Preis * Gender +
    b_Age_Preis * Age +
    b_Income_Preis * Income +
    b_Residence_Preis * Residence +
    
    ## Current
    b_PriceMonthly_centered_Preis * PriceMonthly_centered

The utilities of alternatives then are:

Code: Select all

  V = list()
  V[['alt1']] = b_Preis_value * ( wtp_Anbieter2_value * Anbieter2.1 + wtp_Anbieter3_value * Anbieter3.1 +
                                    wtp_Strommix2_value * Strommix2.1 + wtp_Strommix3_value * Strommix3.1 + wtp_Strommix4_value * Strommix4.1 +
                                    wtp_Regioanteil2_value * Regioanteil2.1 + wtp_Regioanteil3_value * Regioanteil3.1 +
                                    Preis.1)
  
  V[['alt2']] = b_Preis_value * ( wtp_Anbieter2_value * Anbieter2.2 + wtp_Anbieter3_value * Anbieter3.2 +
                                    wtp_Strommix2_value * Strommix2.2 + wtp_Strommix3_value * Strommix3.2 + wtp_Strommix4_value * Strommix4.2 +
                                    wtp_Regioanteil2_value * Regioanteil2.2 + wtp_Regioanteil3_value * Regioanteil3.2 +
                                    Preis.2)
  
  V[['alt3']] = b_Preis_value * ( wtp_Anbieter2_value * Anbieter2.3 + wtp_Anbieter3_value * Anbieter3.3 + 
                                    wtp_Strommix2_value * Strommix2.3 + wtp_Strommix3_value * Strommix3.3 + wtp_Strommix4_value * Strommix4.3 +
                                    wtp_Regioanteil2_value * Regioanteil2.3 + wtp_Regioanteil3_value * Regioanteil3.3 +
                                    Preis.3)

I would like to check whether, on average, the covariates have a significant influence on the WTP and/or the price coefficient. For this reason, I would like to include the covariates with a fixed distribution. A normal distribution (“N” instead of “F” in the code above) for the covariates leads to a (considerable) improvement in the model fit, but the fit is bad for the holdouts, which could indicate overfitting?

The problem with my data is that the estimated parameters, especially wtp_Regioanteil2 and wtp_Regioanteil3, are so strongly influenced by the covariates that in this case they can hardly be interpreted meaningfully alone. For example, without covariates wtp_Regioanteil3 = -0.8, indicating a positive willingness-to-pay, but with covariates wtp_Regioanteil3 = 0.2. This means that not only the absolute value of the parameters changes, but sometimes even the sign. In my understanding, it therefore makes no sense to interpret wtp_Regioanteil3, but only the transformation wtp_Regioanteil3_value? If I have understood your answer correctly, then apollo_prediction() does just that, since it also uses the conditionals per respondent?

So I just can add the fixed covariate estimates to the respective parameter for each respondent? The same not only for the conditionals per respondent, but also for the upper model estimates?

Post by **stephanehess** » 12 May 2020, 13:39

Hi

apollo_prediction uses only the posterior means, so I would be very careful in using it in the situation where some of your marginal utility parameters are actually given by sums of multiple model parameters. Even if that wasn't the case, you should be very careful with using posterior means as these have error measures around them too.

I would encourage you to instead use the upper level model, i.e. like in classical estimation, when you can just add the components up but you should recognise the full distribution where appropriate.

Best wishes

Stephane

ApolloChoiceModelling forum

Integration of covariates into HB estimation

Integration of covariates into HB estimation

Re: Integration of covariates into HB estimation

Re: Integration of covariates into HB estimation

Re: Integration of covariates into HB estimation