Dummy coding of categorical variables in MNL_SP example.
Posted: 06 Jul 2023, 16:21
Dear Prof. Hess,
This might be a simple question, but might also be useful also to others starting out in Apollo.
In the Apollo MNL_SP model example, there is a categorical attribute for service (service_rail), which has 4 levels: 1 for no-frills, 2 for wifi, 3 for food, 0 if not used. Also in this example, "b_no_frills" is kept at its starting value in Apollo_fixed. In the utility functions, these attributes are listed as: ( service_rail == 1 ), ( service_rail == 2 ) and ( service_rail == 2 ).
I have a couple of questions relating to this question and dummy coding of categorical variables in Apollo.
Is my understanding of coding categorical variables (with more than two categories) in Apollo correct:
1) The ability to write out utility functions in this way in Apollo for categorical attributes (e.g., service_rail == 2), prevents the need for creating multiple dummy variables in the data (css file). It is my understanding that in most other analyses (outside of Apollo), we would usually create k-1 dummy (binary) variables (where k is the number of levels, here 4) and select one as the reference category by listing it as 0,0,0,0 to which these would be compared/relative to? In the MNL_SP csv file for example, there are no such dummy variables which makes me think that this is the case. This was the only case I could find in the Apollo examples where a categorical variable with more than 2 (i.e., binary) levels was used in any of the csv files so wanted to be clear my understanding was correct. I think is due to ease of interpretation of binary categorical variables versus those with 2+ levels. As a follow-on the interpretation then
2) Also, listing b_no_frills in Apollo_fixed performs the same function (effectively) as selecting a reference category by using a 0,0,0,0 dummy if coded that way. Including this level (service_rail ==1) in the utility functions is therefore optional as no beta will be calculated, but having a 0.000 in the output simplifies interpretation?
Thanks in advance,
Robin
This might be a simple question, but might also be useful also to others starting out in Apollo.
In the Apollo MNL_SP model example, there is a categorical attribute for service (service_rail), which has 4 levels: 1 for no-frills, 2 for wifi, 3 for food, 0 if not used. Also in this example, "b_no_frills" is kept at its starting value in Apollo_fixed. In the utility functions, these attributes are listed as: ( service_rail == 1 ), ( service_rail == 2 ) and ( service_rail == 2 ).
I have a couple of questions relating to this question and dummy coding of categorical variables in Apollo.
Is my understanding of coding categorical variables (with more than two categories) in Apollo correct:
1) The ability to write out utility functions in this way in Apollo for categorical attributes (e.g., service_rail == 2), prevents the need for creating multiple dummy variables in the data (css file). It is my understanding that in most other analyses (outside of Apollo), we would usually create k-1 dummy (binary) variables (where k is the number of levels, here 4) and select one as the reference category by listing it as 0,0,0,0 to which these would be compared/relative to? In the MNL_SP csv file for example, there are no such dummy variables which makes me think that this is the case. This was the only case I could find in the Apollo examples where a categorical variable with more than 2 (i.e., binary) levels was used in any of the csv files so wanted to be clear my understanding was correct. I think is due to ease of interpretation of binary categorical variables versus those with 2+ levels. As a follow-on the interpretation then
2) Also, listing b_no_frills in Apollo_fixed performs the same function (effectively) as selecting a reference category by using a 0,0,0,0 dummy if coded that way. Including this level (service_rail ==1) in the utility functions is therefore optional as no beta will be calculated, but having a 0.000 in the output simplifies interpretation?
Thanks in advance,
Robin