Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

Handling 'Not Available' Data

Ask questions about data format and processing of data, including the use of pre-estimation functions in Apollo. If your question relates to a specific error you are getting, please provide some of the output.
Post Reply
bye1830
Posts: 3
Joined: 24 Apr 2024, 06:40

Handling 'Not Available' Data

Post by bye1830 »

Hi Apollo Team,

I'm grateful for your support in managing this forum.

I've posted previously about my survey, but I am uploading this as I have a different question related to data processing.

I conducted a choice experiment-based survey regarding zero-emission truck choices, including battery electric trucks and hydrogen fuel cell electric trucks. A total of 54 freight companies participated. One of the important explanatory variables for specifying models is the annual revenue of the participating companies. In my survey questionnaire, this variable was presented with five options, and respondents were asked to select one: 1) <$10M, 2) $10-15M, 3) $15-30M, 4) >$30M, and 5) Decline to state (i.e., N/A). The issue is that 16% of respondents chose the "decline to state" option. How would you recommend handling these 'Not Available' data entries?

I have considered the following options, but would like to know which one might be recommended, what cautions should be exercised for each, or if there is a better way to handle this situation:

1. Treating the Annual Revenue variable as a categorical variable.
2. Assuming 'N/A' corresponds to an average annual revenue (e.g., $10-15M in my survey).
3. Excluding the observations with 'N/A' for the Annual Revenue variable (i.e., using 84% of the total observations).
4. Excluding the Annual Revenue variable from model specification.

Please let me know if I need to provide any additional information.

I'd greatly appreciate any guidance and insights you could share. Thank you very much!

Best regards,

YB
stephanehess
Site Admin
Posts: 1046
Joined: 24 Apr 2020, 16:29

Re: Handling 'Not Available' Data

Post by stephanehess »

Hi

I would in your case recommend option 1. Even when I treat income as a continuous variable in my models, I do not exclude people with missing data, nor do I assign them to another category. Rather, I estimate a different parameter for them, which would be in line with your idea to treat it as categorical, where this will happen anyway

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
bye1830
Posts: 3
Joined: 24 Apr 2024, 06:40

Re: Handling 'Not Available' Data

Post by bye1830 »

Hi Stephane,

Thank you for sharing your insights. I believe I've understood your suggestion, and I've applied the approach you recommended. Specifically, I've incorporated this categorical variable into an interaction term with vehicle purchase costs. Could you please review what I've done and correct any errors?

The annual revenue (AR) variable in my dataset takes on one of the following values:

*annual_revenue == 1 (for AR <$10M)
*annual_revenue == 2 (for AR between $10M-15M)
*annual_revenue == 3 (for AR between $15M-30M)
*annual_revenue == 4 (for AR >$30M)
*annual_revenue == 5 (for the case of declining to state)

I've treated 'AR > $30M' as the reference category and applied shift terms, as shown below:

b_pcost_value = b_pcost + b_pcost_AR_less_than_10M*(annual_revenue==1) + b_pcost_AR_between_10M_15M*(annual_revenue==2) + b_pcost_AR_between_15M_30M*(annual_revenue==3) + b_pcost_AR_NA*(annual_revenue==5)

An example of the utility function for one alternative (battery electric vehicle) is shown below:

V[["bev"]] = asc_bev_value + b_pcost_value * bev_pcost + b_ocost_value * bev_ocost + b_range * bev_range + b_offsite_value * bev_offsite_binary + b_onsite_bev * bev_onsite

With these settings, I've obtained the following estimation results: Only "b_pcost_AR_between_15M_30M" is significant at the 5% level.

Code: Select all

				Estimate Std.err. t-ratio(0) Rob.std.err. Rob.t-ratio(0)

b_pcost				0.007	0.323	0.023	0.489	0.015
...
b_pcost_AR_less_than_10M	-0.268	0.363	-0.739	0.553	-0.485
b_pcost_AR_between_10M_15M	-0.123	0.483	-0.255	0.670	-0.184
b_pcost_AR_between_15M_30M	-1.868	0.792	-2.360	0.744	-2.511
b_pcost_AR_NA			-0.673	0.478	-1.408	0.621	-1.083
I'm curious if there are alternative approaches for formulating the interaction term between this categorical variable and vehicle purchase costs. Also, I'm wondering if it's still meaningful to have obtained such a small number of significant estimates.

I'd greatly appreciate any suggestions or insights you could provide. Thank you very much!

Best regards,

YB
stephanehess
Site Admin
Posts: 1046
Joined: 24 Apr 2020, 16:29

Re: Handling 'Not Available' Data

Post by stephanehess »

Hi

your specification is correct but your results are worrying as the effect in the reference group is positive and as the income effect is not monotonic

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
Post Reply