Dear Stephane & David,
I have some questions on producing hypothetical scenarios post-estimation as an extension of predictions.
I have looked through the manual, examples and forum - the only thing I can find was a previous similar question on the forum which was never answered (here: viewtopic.php?p=2150&hilit=scenario#p2150). Apologies if I have missed anything simple in advance.
Background: I have an unlabelled experiment with 2 choices: Drug A and Drug B. Each have attributes (including WAIT TIME, numerical variable) and there are also important scenario attributes (e.g. QoL (categorical) and Life Expectancy (numerical)) which are added as interactions (because they're constant across alternatives). I have used Apollo Predictions to calculate impact on choice share (A/B) on increasing Wait time with no issue.
I would now like to create some hypothetical (policy-relevant) scenarios and look at the impact on choice share of changing certain attributes - I guess similar to a sub-group analysis (but instead of using actual data, use the model to predict pre- and post- when different attributes are changed. As an example, I might like to have in Scenario A: individuals have "X" QoL and "X" Life Expectancy and the Drug A attributes also have certain levels, I would then like to show what would be the impact of increasing wait time versus when the drug attributes have different level etc.
The purpose of doing this would be to create "policy-relevant" scenarios that would be easier for people to interpret. It might also serve as a validity check by looking at the model base prediction to see if it is as we would expect when certain attributes are fixed.
Is there a way to use specify multiple attributes in the predictions_base and then change multiple attribute in predictions_new? For example, when changing the database to show an increase in DrugA_WAIT of 1 year, could I also specify that QoL = 1 and Life = 2:
E.g. what I currently have:
database$DrugA_WAIT=1+database$DrugA_WAIT
I would then add something like database$QOL==1 to select only the cases where QOL =1 etc.?
Thanks in advance for your advice,
Rob
Important: Read this before posting to this forum
- This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
- There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
- Before asking a question on the forum, users are kindly requested to follow these steps:
- Check that the same issue has not already been addressed in the forum - there is a search tool.
- Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
- Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
- Make sure that R is using the latest official release of Apollo.
- Users can check which version they are running by entering packageVersion("apollo").
- Then check what is the latest full release (not development version) at http://www.ApolloChoiceModelling.com/code.html.
- To update to the latest official version, just enter install.packages("apollo"). To update to a development version, download the appropriate binary file from http://www.ApolloChoiceModelling.com/code.html, and install the package from file
- If the above steps do not resolve the issue, then users should follow these steps when posting a question:
- provide full details on the issue, including the entire code and output, including any error messages
- posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.
post-estimation scenario analysis
Re: post-estimation scenario analysis
As an extension of my question above, I have the following code which might better illustrate what I am trying to do and an example scenario.
Where I have added "[Q1]" to the code, I am trying to set up a scenario for a model prediction where one of my categorical variables is fixed at level 3. I then double the WAIT time in the way you suggest. Is this method valid/ agreeable with you or should I be combining the arguments together rather than adding one after the other.
Where I have added "[Q2]" to the code, I am trying to show the mean impact on choice share for the above scenario when LIFE=3, and QOL=3 using logic (by using "&").
The purpose for doing this is to use the model to specify unique cases where I can see if it responds as expected to changes (e.g. in change to WAIT) i.e., rather a validity check that for forecasting purposes.
Finally, in a previous post (viewtopic.php?t=456) you suggested the following code database$DumVar1_Alt1==2 = database$DumVar1_Alt1==1. In my [Q1] above I think I am trying to achieve similar but changing all to 1 rather than just 2.
Thanks in advance for any advice which is much appreciated!
Rob
Where I have added "[Q1]" to the code, I am trying to set up a scenario for a model prediction where one of my categorical variables is fixed at level 3. I then double the WAIT time in the way you suggest. Is this method valid/ agreeable with you or should I be combining the arguments together rather than adding one after the other.
Where I have added "[Q2]" to the code, I am trying to show the mean impact on choice share for the above scenario when LIFE=3, and QOL=3 using logic (by using "&").
The purpose for doing this is to use the model to specify unique cases where I can see if it responds as expected to changes (e.g. in change to WAIT) i.e., rather a validity check that for forecasting purposes.
Code: Select all
# ----------------------------------------------------------------- #
#---- PREDICTIONS [3] ----
# ----------------------------------------------------------------- #
### Use the estimated model to make predictions
predictions_base = apollo_prediction(model, apollo_probabilities, apollo_inputs, prediction_settings=list(runs=30))
### Now imagine a scenario where WAIT for Drug A increases by double, AND DrugA_Benefit (a categorical variable) = 3
[Q1]
database$DrugA_Benefit <- 3
database$DrugA_WAIT = 2*database$DrugA_WAIT
### Rerun predictions with the new data
apollo_inputs = apollo_validateInputs()
predictions_new = apollo_prediction(model, apollo_probabilities, apollo_inputs)
### Return to original data
?not sure how I would return DrugA_Benefit back to normal?
database$DrugA_WAIT = 1/2*database$DrugA_WAIT
apollo_inputs = apollo_validateInputs()
### work with predictions at estimates
predictions_base=predictions_base[["at_estimates"]]
### Compute change in probabilities
change=(predictions_new-predictions_base)/predictions_base
### Not interested in chosen alternative now, so drop last column
change=change[,-ncol(change)]
### First two columns (change in ID and task) also not needed
change=change[,-c(1,2)]
### And person 9, who has all 4 modes available
# change[database$ID==9,]
### Summary of changes (possible presence of NAs for unavailable alternatives)
summary(change)
### Look at mean changes for subsets of the data, ignoring NAs
colMeans(change,na.rm=TRUE)
### Look at mean changes for an increase in WAIT when LIFE = 1, ignoring NAs
colMeans(subset(change,database$LIFE==1),na.rm=TRUE)
### Look at mean changes for an increase in WAIT when LIFE = 2, ignoring NAs
colMeans(subset(change,database$LIFE==2),na.rm=TRUE)
### Look at mean changes for an increase in WAIT when LIFE = 3 AND QOL = 3, ignoring NAs
[Q2]
colMeans(subset(change,database$LIFE==3 & database$QOL==3 ),na.rm=TRUE)
Thanks in advance for any advice which is much appreciated!
Rob
-
- Site Admin
- Posts: 1046
- Joined: 24 Apr 2020, 16:29
Re: post-estimation scenario analysis
Hi
first, you mention:
What you are doing below would seem to work, but when you go back to the original data, you indeed also need to return the benefits to the original. The easiest way would be to keep a backup of the original data before changes, e.g. database_backup=database, and then return to that later
Stephane
first, you mention:
Have another look at that post. I don't actually suggest that code but say that it would be wrong.in a previous post (viewtopic.php?t=456) you suggested the following code database$DumVar1_Alt1==2 = database$DumVar1_Alt1==1. In my [Q1] above I think I am trying to achieve similar but changing all to 1 rather than just 2.
What you are doing below would seem to work, but when you go back to the original data, you indeed also need to return the benefits to the original. The easiest way would be to keep a backup of the original data before changes, e.g. database_backup=database, and then return to that later
Stephane
Re: post-estimation scenario analysis
Dear Stephane,
Thank you for your reply, the database backup works, however I am having some issues with working with predictions at estimates.
In the Apollo manual (Fig. 9.9), when running predictions, the mean change calculated after you change the cost of rail by 1% corresponds with the relative change in average prob. before/after. However this is not the case in my code.
In the code below, the prob. of choosing Drug A before any changes to the database is 0.53 and the average prob. after I change a variable Wait by 1 additional year is 0.43. However, when I calculate the difference using the code given in the Apollo manual, it gives a mean of -0.22 which does not reflect this relative difference before/after. (0.4312-0.5384/0.0384 = -0.199)? I can not then accurately look at mean changes at estimates and cannot work out where -0.22 is coming from?
Similarly the prob of choosing Drug B goes from 0.4625 to 0.5688 (relative change of 0.229) but in the output the change is given as 0.29?
Even if I change the code so it is the absolute change rather than relative change i.e., I use: change=(predictions_new-predictions_base) instead of change=(predictions_new-predictions_base)/predictions_base, it still doesn't calculate the correct difference?
Any advice would be much appreciated as I've ran it several times and cannot get round it but need to work with predictions at estimates.
Thank you in advance,
Rob
Thank you for your reply, the database backup works, however I am having some issues with working with predictions at estimates.
In the Apollo manual (Fig. 9.9), when running predictions, the mean change calculated after you change the cost of rail by 1% corresponds with the relative change in average prob. before/after. However this is not the case in my code.
In the code below, the prob. of choosing Drug A before any changes to the database is 0.53 and the average prob. after I change a variable Wait by 1 additional year is 0.43. However, when I calculate the difference using the code given in the Apollo manual, it gives a mean of -0.22 which does not reflect this relative difference before/after. (0.4312-0.5384/0.0384 = -0.199)? I can not then accurately look at mean changes at estimates and cannot work out where -0.22 is coming from?
Similarly the prob of choosing Drug B goes from 0.4625 to 0.5688 (relative change of 0.229) but in the output the change is given as 0.29?
Even if I change the code so it is the absolute change rather than relative change i.e., I use: change=(predictions_new-predictions_base) instead of change=(predictions_new-predictions_base)/predictions_base, it still doesn't calculate the correct difference?
Any advice would be much appreciated as I've ran it several times and cannot get round it but need to work with predictions at estimates.
Thank you in advance,
Rob
Code: Select all
# ----------------------------------------------------------------- #
> #---- PREDICTIONS [3] ----
> # ----------------------------------------------------------------- #
>
> ### Use the estimated model to make predictions
> predictions_base = apollo_prediction(model, apollo_probabilities, apollo_inputs, prediction_settings=list(runs=30))
Running predictions from model using parameter estimates...
Running predictions across draws from the asymptotic distribution for maximum likelihood estimates.
Predicting for set of draws 1/30...
Predicting for set of draws 2/30...
Predicting for set of draws 3/30...
Predicting for set of draws 4/30...
Predicting for set of draws 5/30...
Predicting for set of draws 6/30...
Predicting for set of draws 7/30...
Predicting for set of draws 8/30...
Predicting for set of draws 9/30...
Predicting for set of draws 10/30...
Predicting for set of draws 11/30...
Predicting for set of draws 12/30...
Predicting for set of draws 13/30...
Predicting for set of draws 14/30...
Predicting for set of draws 15/30...
Predicting for set of draws 16/30...
Predicting for set of draws 17/30...
Predicting for set of draws 18/30...
Predicting for set of draws 19/30...
Predicting for set of draws 20/30...
Predicting for set of draws 21/30...
Predicting for set of draws 22/30...
Predicting for set of draws 23/30...
Predicting for set of draws 24/30...
Predicting for set of draws 25/30...
Predicting for set of draws 26/30...
Predicting for set of draws 27/30...
Predicting for set of draws 28/30...
Predicting for set of draws 29/30...
Predicting for set of draws 30/30...
Aggregated prediction
at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
DrugA 5621 5611 59.01 5527 5722
DrugB 4819 4829 59.01 4718 4913
Average prediction
at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
DrugA 0.5384 0.5375 0.005652 0.5294 0.5481
DrugB 0.4616 0.4625 0.005652 0.4519 0.4706
The output from apollo_prediction is a list with two elements: a data.frame containing the predictions at the estimated values, and an array with predictions for
different values of the parameters drawn from their asymptotic distribution.
> summary(predictions_base)
Length Class Mode
at_estimates 5 data.frame list
draws 939600 -none- numeric
>
> ###Create a database backup to return to original data once predictions are run
> database_backup=database
> ### Imagine adding 1 year to WAIT
> database$DrugA_WAIT = database$DrugA_WAIT + 1
> ### Rerun predictions with the new data
> apollo_inputs = apollo_validateInputs()
Several observations per individual detected based on the value of RID. Setting panelData in apollo_control set to TRUE.
All checks on apollo_control completed.
All checks on database completed.
> predictions_new = apollo_prediction(model, apollo_probabilities, apollo_inputs)
Running predictions from model using parameter estimates...
Prediction at model estimates
DrugA DrugB
Aggregate 4501.95 5938.05
Average 0.43 0.57
The output from apollo_prediction is a matrix containing the predictions at the estimated values.
> summary(predictions_new)
ID Observation DrugA DrugB chosen
Min. : 9 Min. : 1.00 Min. :0.1055 Min. :0.1796 Min. :0.1055
1st Qu.: 1016 1st Qu.: 3.75 1st Qu.:0.2703 1st Qu.:0.4345 1st Qu.:0.4245
Median : 1942 Median : 6.50 Median :0.4011 Median :0.5989 Median :0.5802
Mean : 56227 Mean : 6.50 Mean :0.4312 Mean :0.5688 Mean :0.5686
3rd Qu.: 2688 3rd Qu.: 9.25 3rd Qu.:0.5655 3rd Qu.:0.7297 3rd Qu.:0.7518
Max. :1000335 Max. :12.00 Max. :0.8204 Max. :0.8945 Max. :0.8945
> ### Return to original database (undo all changes) using the database_backup
> database=database_backup
> ### validate inputs
> apollo_inputs = apollo_validateInputs()
Several observations per individual detected based on the value of RID. Setting panelData in apollo_control set to TRUE.
All checks on apollo_control completed.
All checks on database completed.
>
> ### work with predictions at estimates
> predictions_base=predictions_base[["at_estimates"]]
>
> ### Compute change in probabilities
> change=(predictions_new-predictions_base)/predictions_base
> #view without last bit:
> summary(change)
ID Observation DrugA DrugB chosen
Min. :0 Min. :0 Min. :-0.52091 Min. :0.03785 Min. :-0.5209052
1st Qu.:0 1st Qu.:0 1st Qu.:-0.29001 1st Qu.:0.16212 1st Qu.:-0.1974178
Median :0 Median :0 Median :-0.21251 Median :0.24980 Median :-0.0917861
Mean :0 Mean :0 Mean :-0.22367 Mean :0.30530 Mean :-0.0006838
3rd Qu.:0 3rd Qu.:0 3rd Qu.:-0.15473 3rd Qu.:0.35399 3rd Qu.: 0.2091958
Max. :0 Max. :0 Max. :-0.04592 Max. :1.14231 Max. : 1.1423098
> ### Not interested in chosen alternative now, so drop last column
> change=change[,-ncol(change)]
>
> ### First two columns (change in ID and task) also not needed
> change=change[,-c(1,2)]
>
> ### And person 9, who has all 4 modes available
> # change[database$ID==9,]
>
> ### Summary of changes (possible presence of NAs for unavailable alternatives)
> summary(change)
DrugA DrugB
Min. :-0.52091 Min. :0.03785
1st Qu.:-0.29001 1st Qu.:0.16212
Median :-0.21251 Median :0.24980
Mean :-0.22367 Mean :0.30530
3rd Qu.:-0.15473 3rd Qu.:0.35399
Max. :-0.04592 Max. :1.14231
>
> ### Look at mean changes for subsets of the data, ignoring NAs
> colMeans(change,na.rm=TRUE)
DrugA DrugB
-0.2236692 0.3053020
>
> ### Look at mean changes if LIFE = 1, ignoring NAs
> colMeans(subset(change,database$QOL==4),na.rm=TRUE)
DrugA DrugB
-0.2371106 0.2999949
>
-
- Site Admin
- Posts: 1046
- Joined: 24 Apr 2020, 16:29
Re: post-estimation scenario analysis
Hi
the answer to your question is that you are comparing a mean of ratios approach to a ratio of means. The change in the code below is calculated at the observation level, and averaging that across observations is not going to be the same as the change in the average probabilities
Stephane
the answer to your question is that you are comparing a mean of ratios approach to a ratio of means. The change in the code below is calculated at the observation level, and averaging that across observations is not going to be the same as the change in the average probabilities
Stephane