Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

prediction at individual level

Ask questions about post-estimation functions (e.g. prediction, conditionals, etc) or other processing of results.
zx9203
Posts: 24
Joined: 13 Jun 2020, 09:52

prediction at individual level

Post by zx9203 »

Dear all,

Following the example 3 and the manual, it seems that the prediction function only works out predicted demand for each alternative at aggregated level. Since my alternatives are unlabelled, I'm more interested in individual rather than aggregated level fitness.

Anyone can help me find out individuals who are badly predicted, so I can remove them from the dataset? I tried calculating utilities and probabilities using the estimates. In this way, I can find out the badly predicted at observation level, and remove the individual with most mismatches. But I wonder if there is a better way.

Thank you in advance!

Best,
Xian
stephanehess
Site Admin
Posts: 1042
Joined: 24 Apr 2020, 16:29

Re: prediction at individual level

Post by stephanehess »

Hi Xian

the function apollo_prediction returns the predictions at the level of individual observations, so it should be exactly what you're wanting.The list returned by apollo_prediction contains a column called chosen.

You can also after estimation make the following call which will give you the likelihoods at the individual level (rather than observation level).

Lind=apollo_probabilities(model$estimate, apollo_inputs, functionality="estimate")

Best wishes

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
zx9203
Posts: 24
Joined: 13 Jun 2020, 09:52

Re: prediction at individual level

Post by zx9203 »

Dear Stephane,

Thank you so much! This works perfectly!

Best,
Xian
cybey
Posts: 60
Joined: 26 Apr 2020, 19:38

Re: prediction at individual level

Post by cybey »

Hi Xian,

maybe this post of Bryan Orme is also interesing for you?

A poor internal or external validity is not necessarily an indicator of poor response behaviour, as the respondents may have little interest in the topic or may have been overstrained with the tasks. Consequently, they could simply choose anything, with the result that your model does not give a good prediction.

Best,
Nico
stephanehess
Site Admin
Posts: 1042
Joined: 24 Apr 2020, 16:29

Re: prediction at individual level

Post by stephanehess »

Nico

thanks for bringing this post to our attention. However, I (and many fellow choice modellers, I believe) would fundamentally disagree with the suggestion to "clean from 15% to 30% of "bad" respondents from stated discrete choice".

Outliers in data are often the most useful respondents in a dataset as they tell you there are people whose choices the model is struggling to explain. This is often not the fault of the respondent, but the fault of the model. So outliers are a great opportunity for improving a model.

There is an excellent discussion on this topic in the Ben-Akiva and Lerman book

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
cybey
Posts: 60
Joined: 26 Apr 2020, 19:38

Re: prediction at individual level

Post by cybey »

Hi, Stephane,

I find this 15-30 percent also very high. In principle, however, I like the idea of combining several indicators for (potentially) poor response behavior. One could choose the speeding indicator very conservatively, e.g. faster than 33% of the median time, and combine these "candidates" of respondents with another indicator, e.g. RLH. However - and this is probably your point - the fact that respondents may simply find the choice experiment uninteresting (e.g. a product in which there is no interest) suggests that the answers of these respondents are still valid. On the other hand, I found with two data sets that half of the respondents identified in this way were also conspicuous in other indicators. For example, these respondents did not show any variance in their response behavior in scale questions using several items (Likert scale 1-7). The respondents identified in this way accounted for only 5% in the first and <10% in the second data set.

Best wishes
Nico
stephanehess
Site Admin
Posts: 1042
Joined: 24 Apr 2020, 16:29

Re: prediction at individual level

Post by stephanehess »

Nico

time to completion is another tricky point. I know some analysts routinely remove "fast" respondents from the data. But it's much better to try to let the data speak and understand the differences in behaviour across people, e.g. including response time as an indicator (not as an explanatory variable) of response quality. Maybe the people who respond more quickly find the experiments easy but still make meaningful choices. Maybe those respondents who take a longer time are not really concentrating more on your choice tasks but are watching TV at the same time, etc.

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
stephanehess
Site Admin
Posts: 1042
Joined: 24 Apr 2020, 16:29

Re: prediction at individual level

Post by stephanehess »

Xian

apollo_prediction returns probabilities at the observation level, not just the aggregate level. There is also a final column called chosen which you can use for your purpose. Or you can use apollo_llFitsTest. Or my earlier suggestion to you to use Lind=apollo_probabilities(model$estimate, apollo_inputs, functionality="estimate")

But either way, like in my reply to Nico, it's not a good idea to "find out individuals who are badly predicted, so I can remove them from the dataset". Instead, use them to improve your model

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
zx9203
Posts: 24
Joined: 13 Jun 2020, 09:52

Re: prediction at individual level

Post by zx9203 »

Dear Stephane and Nico,

Thank you for bringing in the discussion about removing "bad" respondents. My concern is that, I would like to compare the likelihood between different subsamples, but they originally contained different numbers of respondents. That's the reason I want to trim those outliers to make balanced datasets. Is there any other way to compare the likelihood with different numbers of observations? And I'm confused about how to improve the model while keeping the outliers.

Thanks a lot!!!

Best,
Xian
dpalma
Posts: 190
Joined: 24 Apr 2020, 17:54

Re: prediction at individual level

Post by dpalma »

Hi Xian,

If you have different number of individuals in each sample, you could calculate the average likelihood per individual (or per observation) in each sample and compare that value. That way you would be controlling for the different number of individuals (or observations).

You can obtain the likelihoods at the observation level using apollo_prediction, and at the individual level by using apollo_probabilities(model$estimate, apollo_inputs, functionality="estimate")

If you use apollo_prediction, and you have a choice model, remember to use the probability reported under the "chosen" column in the output.

Best
David
Post Reply