Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

Non-overlapping choice sets

Ask questions about data format and processing of data, including the use of pre-estimation functions in Apollo. If your question relates to a specific error you are getting, please provide some of the output.
Post Reply
BTHopkins
Posts: 4
Joined: 25 Jul 2023, 21:03

Non-overlapping choice sets

Post by BTHopkins »

Hi Stephanie,

I am just getting started on my journey with Apollo.

I am working with a dataset where individual's choice sets depend on where they live and the year. For people in different areas / years, the choice sets are non-overlapping. Some people can choose between a handful of products, and others several hundred. That means there are tens of thousands of unique choices in the dataset. When I put the data in wide format, the dataset is much too large to use.

Since the choice sets are non-overlapping, I thought that I might be able to reuse choice IDs to reduce the size of the dataset. In other words, choice #1 for area / year A is not the same product as for area / year B. I don't see why this would cause an issue for calculating probabilities, but I wasn't sure if it would cause an issue in Apollo.

Does that make sense? Thanks for your help!
dpalma
Posts: 190
Joined: 24 Apr 2020, 17:54

Re: Non-overlapping choice sets

Post by dpalma »

Hi,

Yes, you can do that without issues. That way of coding the alternative would be similar to how non-labelled data from an stated choice experiment is recorded.

For example, le us imagine you are modelling ice cream choice. Ice-cream is described by both flavour and price. The flavours can be vanilla, chocolate, lemon and pineapple. However, not all flavours are available for every individual, because of where they live. You have two ways of coding this.

The first way of coding the data is in a labelled form, it would look like below, where cost_j and av_j is the cost and availability of alternative j. The problem with this approach is that if you have too many flavours (alternatives) then you will have a lot of columns.

Code: Select all

id cost_vani cost_choc cost_lemo cost_pine av_vani av_choc av_lemo av_pine
 1         6         9        NA        NA       1       1       0       0
 2         7         8         7        NA       1       1       1       0
 3         9         8        NA         7       1       1       0       1
…
The second approach is the "unlabelled" form, that would look as below. Here we have an additional attribute for each alternatives which is "flavour". So the alternative is not defined by its flavour, but instead each alternative is just a mute container, and the flavour becomes an attribute. Note that you will have to define has many alternatives as the maximum number of alternatives that any individual in your sample has available.

Code: Select all

id flav_1 flav_2 flav_3 cost_1 cost_2 cost_3 av_1 av_2 av_3
 1   vani   choc     NA      6      9     NA    1    1    0
 2   vani   choc   lemo      7      8      7    1    1    1
 3   vani   choc   pine      9      8      7    1    1    1
 ...
Best wishes
David
cheriedavy
Posts: 1
Joined: 27 Aug 2023, 10:58

Re: Non-overlapping choice sets

Post by cheriedavy »

Yes, your approach of reusing choice IDs for non-overlapping choice sets based on area and year makes sense. This should help reduce the size of the dataset while still maintaining the distinction between different products in different contexts. As long as the choice IDs remain unique within their respective area and year combinations, it should not cause issues for calculating probabilities or using Apollo. It's a valid strategy to manage your dataset efficiently. Good luck with your work!
BTHopkins
Posts: 4
Joined: 25 Jul 2023, 21:03

Re: Non-overlapping choice sets

Post by BTHopkins »

Thank you both for the responses! I didn't notice until now that you had replied.

Before you responded, I also tested a slimmed down version of the model I was running using the two ways of coding, and confirmed that they yield equivalent results.

Coding variables as an attribute seems to be useful for saving space in general. In my context, for example, I include a shifter for the company that sells an alternative. With J companies and K alternatives, that requires J x K dummy variables. But if I have a variable containing the name of the company instead, I can use the value of that variable in the utility function to refer to the correct shifter. That only requires K variables.
Post Reply