Handling categorical attributes in model with large choice sets with variable alternatives
Posted: 07 Oct 2024, 04:55
Hello Apollo team,
Thank you so much for this amazing package and for maintaining this forum.
I'm new to choice modeling and this package, and I couldn't find an answer to my question after reading the manual and examples (maybe I've missed?). I want to ask a question about handling categorical attributes in an MNL model with large choice sets with variable alternatives.
Here is a simple example: I have RP data and want to understand travels' choices among 100 different destinations (say, loc_1, ..., loc_100) using an MNL model, and I have two attributes called (1) DIST which is the distance in km from each traveler to each alternative, (2) HOTEL (say, basic, good, luxury) which is the hotel type at each destination, assuming only one hotel type is available at each destination for simplicity of this example. The choice set of each traveler is constructed such that only destinations less than 100km away from that traveler are considered. Here comes this issue:
For certain travelers, all the destinations less than 100km away from them have only one hotel type, meaning that I cannot possibly estimate the effect of HOTEL on utility for these travels. There is a small percentage of travelers in this situation (~10%). I have considered the following options, but would like to know which one might be recommended in Apollo, what cautions should be exercised for each, or if there is a better way to handle this situation:
1. leave it as it is.
2. remove travelers who have this issue.
3. group HOTEL level good and luxury together as better, which will give me a smaller percentage of travelers in this situation. Then I can either leave it as it is or remove a smaller percentage of travelers who have this issue.
Please let me know if I need to provide additional information. Thank you for your help!
Thank you so much for this amazing package and for maintaining this forum.
I'm new to choice modeling and this package, and I couldn't find an answer to my question after reading the manual and examples (maybe I've missed?). I want to ask a question about handling categorical attributes in an MNL model with large choice sets with variable alternatives.
Here is a simple example: I have RP data and want to understand travels' choices among 100 different destinations (say, loc_1, ..., loc_100) using an MNL model, and I have two attributes called (1) DIST which is the distance in km from each traveler to each alternative, (2) HOTEL (say, basic, good, luxury) which is the hotel type at each destination, assuming only one hotel type is available at each destination for simplicity of this example. The choice set of each traveler is constructed such that only destinations less than 100km away from that traveler are considered. Here comes this issue:
For certain travelers, all the destinations less than 100km away from them have only one hotel type, meaning that I cannot possibly estimate the effect of HOTEL on utility for these travels. There is a small percentage of travelers in this situation (~10%). I have considered the following options, but would like to know which one might be recommended in Apollo, what cautions should be exercised for each, or if there is a better way to handle this situation:
1. leave it as it is.
2. remove travelers who have this issue.
3. group HOTEL level good and luxury together as better, which will give me a smaller percentage of travelers in this situation. Then I can either leave it as it is or remove a smaller percentage of travelers who have this issue.
Please let me know if I need to provide additional information. Thank you for your help!