Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. This may take a day or two at busy times. There is no need to submit the post multiple times.

post-estimation scenario analysis

Ask questions about post-estimation functions (e.g. prediction, conditionals, etc) or other processing of results.
Post Reply
Bob123
Posts: 9
Joined: 20 Jun 2023, 13:39

post-estimation scenario analysis

Post by Bob123 »

Dear Stephane & David,

I have some questions on producing hypothetical scenarios post-estimation as an extension of predictions.

I have looked through the manual, examples and forum - the only thing I can find was a previous similar question on the forum which was never answered (here: viewtopic.php?p=2150&hilit=scenario#p2150). Apologies if I have missed anything simple in advance.

Background: I have an unlabelled experiment with 2 choices: Drug A and Drug B. Each have attributes (including WAIT TIME, numerical variable) and there are also important scenario attributes (e.g. QoL (categorical) and Life Expectancy (numerical)) which are added as interactions (because they're constant across alternatives). I have used Apollo Predictions to calculate impact on choice share (A/B) on increasing Wait time with no issue.

I would now like to create some hypothetical (policy-relevant) scenarios and look at the impact on choice share of changing certain attributes - I guess similar to a sub-group analysis (but instead of using actual data, use the model to predict pre- and post- when different attributes are changed. As an example, I might like to have in Scenario A: individuals have "X" QoL and "X" Life Expectancy and the Drug A attributes also have certain levels, I would then like to show what would be the impact of increasing wait time versus when the drug attributes have different level etc.

The purpose of doing this would be to create "policy-relevant" scenarios that would be easier for people to interpret. It might also serve as a validity check by looking at the model base prediction to see if it is as we would expect when certain attributes are fixed.

Is there a way to use specify multiple attributes in the predictions_base and then change multiple attribute in predictions_new? For example, when changing the database to show an increase in DrugA_WAIT of 1 year, could I also specify that QoL = 1 and Life = 2:

E.g. what I currently have:
database$DrugA_WAIT=1+database$DrugA_WAIT

I would then add something like database$QOL==1 to select only the cases where QOL =1 etc.?

Thanks in advance for your advice,

Rob
Bob123
Posts: 9
Joined: 20 Jun 2023, 13:39

Re: post-estimation scenario analysis

Post by Bob123 »

As an extension of my question above, I have the following code which might better illustrate what I am trying to do and an example scenario.

Where I have added "[Q1]" to the code, I am trying to set up a scenario for a model prediction where one of my categorical variables is fixed at level 3. I then double the WAIT time in the way you suggest. Is this method valid/ agreeable with you or should I be combining the arguments together rather than adding one after the other.

Where I have added "[Q2]" to the code, I am trying to show the mean impact on choice share for the above scenario when LIFE=3, and QOL=3 using logic (by using "&").

The purpose for doing this is to use the model to specify unique cases where I can see if it responds as expected to changes (e.g. in change to WAIT) i.e., rather a validity check that for forecasting purposes.

Code: Select all

      # ----------------------------------------------------------------- #
      #---- PREDICTIONS [3]                                               ----
      # ----------------------------------------------------------------- #
      
      ### Use the estimated model to make predictions
      predictions_base = apollo_prediction(model, apollo_probabilities, apollo_inputs, prediction_settings=list(runs=30))
      
      ### Now imagine a scenario where WAIT for Drug A increases by double, AND DrugA_Benefit (a categorical variable) = 3
[Q1] 
      database$DrugA_Benefit <- 3
      database$DrugA_WAIT = 2*database$DrugA_WAIT
      
      ### Rerun predictions with the new data
      apollo_inputs = apollo_validateInputs()
      predictions_new = apollo_prediction(model, apollo_probabilities, apollo_inputs)
      
      ### Return to original data
      
      ?not sure how I would return DrugA_Benefit back to normal?
      database$DrugA_WAIT = 1/2*database$DrugA_WAIT
      apollo_inputs = apollo_validateInputs()
      
      ### work with predictions at estimates
      predictions_base=predictions_base[["at_estimates"]]
      
      ### Compute change in probabilities
      change=(predictions_new-predictions_base)/predictions_base
      
      ### Not interested in chosen alternative now, so drop last column
      change=change[,-ncol(change)]
      
      ### First two columns (change in ID and task) also not needed
      change=change[,-c(1,2)]
      
      ### And person 9, who has all 4 modes available
      # change[database$ID==9,]
      
      ### Summary of changes (possible presence of NAs for unavailable alternatives)
      summary(change)
      
      ### Look at mean changes for subsets of the data, ignoring NAs
      colMeans(change,na.rm=TRUE)
      
      ### Look at mean changes for an increase in WAIT when LIFE = 1, ignoring NAs
      colMeans(subset(change,database$LIFE==1),na.rm=TRUE)
      
      ### Look at mean changes for an increase in WAIT when LIFE = 2, ignoring NAs
      colMeans(subset(change,database$LIFE==2),na.rm=TRUE)
      
      ### Look at mean changes for an increase in WAIT when LIFE = 3 AND QOL = 3, ignoring NAs
[Q2]  
      colMeans(subset(change,database$LIFE==3 & database$QOL==3 ),na.rm=TRUE)
Finally, in a previous post (viewtopic.php?t=456) you suggested the following code database$DumVar1_Alt1==2 = database$DumVar1_Alt1==1. In my [Q1] above I think I am trying to achieve similar but changing all to 1 rather than just 2.

Thanks in advance for any advice which is much appreciated!

Rob
stephanehess
Site Admin
Posts: 998
Joined: 24 Apr 2020, 16:29

Re: post-estimation scenario analysis

Post by stephanehess »

Hi

first, you mention:
in a previous post (viewtopic.php?t=456) you suggested the following code database$DumVar1_Alt1==2 = database$DumVar1_Alt1==1. In my [Q1] above I think I am trying to achieve similar but changing all to 1 rather than just 2.
Have another look at that post. I don't actually suggest that code but say that it would be wrong.

What you are doing below would seem to work, but when you go back to the original data, you indeed also need to return the benefits to the original. The easiest way would be to keep a backup of the original data before changes, e.g. database_backup=database, and then return to that later

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
Bob123
Posts: 9
Joined: 20 Jun 2023, 13:39

Re: post-estimation scenario analysis

Post by Bob123 »

Dear Stephane,

Thank you for your reply, the database backup works, however I am having some issues with working with predictions at estimates.

In the Apollo manual (Fig. 9.9), when running predictions, the mean change calculated after you change the cost of rail by 1% corresponds with the relative change in average prob. before/after. However this is not the case in my code.

In the code below, the prob. of choosing Drug A before any changes to the database is 0.53 and the average prob. after I change a variable Wait by 1 additional year is 0.43. However, when I calculate the difference using the code given in the Apollo manual, it gives a mean of -0.22 which does not reflect this relative difference before/after. (0.4312-0.5384/0.0384 = -0.199)? I can not then accurately look at mean changes at estimates and cannot work out where -0.22 is coming from?

Similarly the prob of choosing Drug B goes from 0.4625 to 0.5688 (relative change of 0.229) but in the output the change is given as 0.29?

Even if I change the code so it is the absolute change rather than relative change i.e., I use: change=(predictions_new-predictions_base) instead of change=(predictions_new-predictions_base)/predictions_base, it still doesn't calculate the correct difference?

Any advice would be much appreciated as I've ran it several times and cannot get round it but need to work with predictions at estimates.

Thank you in advance,

Rob

Code: Select all

 # ----------------------------------------------------------------- #
> #---- PREDICTIONS [3]                                               ----
> # ----------------------------------------------------------------- #
> 
> ### Use the estimated model to make predictions
> predictions_base = apollo_prediction(model, apollo_probabilities, apollo_inputs, prediction_settings=list(runs=30))
Running predictions from model using parameter estimates...
Running predictions across draws from the asymptotic distribution for maximum likelihood estimates.
Predicting for set of draws 1/30...
Predicting for set of draws 2/30...
Predicting for set of draws 3/30...
Predicting for set of draws 4/30...
Predicting for set of draws 5/30...
Predicting for set of draws 6/30...
Predicting for set of draws 7/30...
Predicting for set of draws 8/30...
Predicting for set of draws 9/30...
Predicting for set of draws 10/30...
Predicting for set of draws 11/30...
Predicting for set of draws 12/30...
Predicting for set of draws 13/30...
Predicting for set of draws 14/30...
Predicting for set of draws 15/30...
Predicting for set of draws 16/30...
Predicting for set of draws 17/30...
Predicting for set of draws 18/30...
Predicting for set of draws 19/30...
Predicting for set of draws 20/30...
Predicting for set of draws 21/30...
Predicting for set of draws 22/30...
Predicting for set of draws 23/30...
Predicting for set of draws 24/30...
Predicting for set of draws 25/30...
Predicting for set of draws 26/30...
Predicting for set of draws 27/30...
Predicting for set of draws 28/30...
Predicting for set of draws 29/30...
Predicting for set of draws 30/30...

Aggregated prediction
      at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
DrugA   5621         5611            59.01           5527           5722
DrugB   4819         4829            59.01           4718           4913

Average prediction
      at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
DrugA 0.5384       0.5375         0.005652         0.5294         0.5481
DrugB 0.4616       0.4625         0.005652         0.4519         0.4706

The output from apollo_prediction is a list with two elements: a data.frame containing the predictions at the estimated values, and an array with predictions for
  different values of the parameters drawn from their asymptotic distribution.
> summary(predictions_base)
             Length Class      Mode   
at_estimates      5 data.frame list   
draws        939600 -none-     numeric
> 
> ###Create a database backup to return to original data once predictions are run
> database_backup=database
> ### Imagine adding 1 year to WAIT
> database$DrugA_WAIT = database$DrugA_WAIT + 1
> ### Rerun predictions with the new data
> apollo_inputs = apollo_validateInputs()
Several observations per individual detected based on the value of RID. Setting panelData in apollo_control set to TRUE.
All checks on apollo_control completed.
All checks on database completed.
> predictions_new = apollo_prediction(model, apollo_probabilities, apollo_inputs)
Running predictions from model using parameter estimates...
Prediction at model estimates
            DrugA   DrugB
Aggregate 4501.95 5938.05
Average      0.43    0.57

The output from apollo_prediction is a matrix containing the predictions at the estimated values.
> summary(predictions_new)
       ID           Observation        DrugA            DrugB            chosen      
 Min.   :      9   Min.   : 1.00   Min.   :0.1055   Min.   :0.1796   Min.   :0.1055  
 1st Qu.:   1016   1st Qu.: 3.75   1st Qu.:0.2703   1st Qu.:0.4345   1st Qu.:0.4245  
 Median :   1942   Median : 6.50   Median :0.4011   Median :0.5989   Median :0.5802  
 Mean   :  56227   Mean   : 6.50   Mean   :0.4312   Mean   :0.5688   Mean   :0.5686  
 3rd Qu.:   2688   3rd Qu.: 9.25   3rd Qu.:0.5655   3rd Qu.:0.7297   3rd Qu.:0.7518  
 Max.   :1000335   Max.   :12.00   Max.   :0.8204   Max.   :0.8945   Max.   :0.8945  
> ### Return to original database (undo all changes) using the database_backup
> database=database_backup
> ### validate inputs
> apollo_inputs = apollo_validateInputs()
Several observations per individual detected based on the value of RID. Setting panelData in apollo_control set to TRUE.
All checks on apollo_control completed.
All checks on database completed.
> 
> ### work with predictions at estimates
> predictions_base=predictions_base[["at_estimates"]]
> 
> ### Compute change in probabilities
> change=(predictions_new-predictions_base)/predictions_base
> #view without last bit:
> summary(change)
       ID     Observation     DrugA              DrugB             chosen          
 Min.   :0   Min.   :0    Min.   :-0.52091   Min.   :0.03785   Min.   :-0.5209052  
 1st Qu.:0   1st Qu.:0    1st Qu.:-0.29001   1st Qu.:0.16212   1st Qu.:-0.1974178  
 Median :0   Median :0    Median :-0.21251   Median :0.24980   Median :-0.0917861  
 Mean   :0   Mean   :0    Mean   :-0.22367   Mean   :0.30530   Mean   :-0.0006838  
 3rd Qu.:0   3rd Qu.:0    3rd Qu.:-0.15473   3rd Qu.:0.35399   3rd Qu.: 0.2091958  
 Max.   :0   Max.   :0    Max.   :-0.04592   Max.   :1.14231   Max.   : 1.1423098  
> ### Not interested in chosen alternative now, so drop last column
> change=change[,-ncol(change)]
> 
> ### First two columns (change in ID and task) also not needed
> change=change[,-c(1,2)]
> 
> ### And person 9, who has all 4 modes available
> # change[database$ID==9,]
> 
> ### Summary of changes (possible presence of NAs for unavailable alternatives)
> summary(change)
     DrugA              DrugB        
 Min.   :-0.52091   Min.   :0.03785  
 1st Qu.:-0.29001   1st Qu.:0.16212  
 Median :-0.21251   Median :0.24980  
 Mean   :-0.22367   Mean   :0.30530  
 3rd Qu.:-0.15473   3rd Qu.:0.35399  
 Max.   :-0.04592   Max.   :1.14231  
> 
> ### Look at mean changes for subsets of the data, ignoring NAs
> colMeans(change,na.rm=TRUE)
     DrugA      DrugB 
-0.2236692  0.3053020 
> 
> ### Look at mean changes if LIFE = 1, ignoring NAs
> colMeans(subset(change,database$QOL==4),na.rm=TRUE)
     DrugA      DrugB 
-0.2371106  0.2999949 
> 
stephanehess
Site Admin
Posts: 998
Joined: 24 Apr 2020, 16:29

Re: post-estimation scenario analysis

Post by stephanehess »

Hi

the answer to your question is that you are comparing a mean of ratios approach to a ratio of means. The change in the code below is calculated at the observation level, and averaging that across observations is not going to be the same as the change in the average probabilities

Stephane
--------------------------------
Stephane Hess
www.stephanehess.me.uk
Post Reply