Page 1 of 1
Prediction on new data
Posted: 14 Sep 2020, 16:19
by dkopcino
Hi,
Let's assume that we have a model built on data that did not include all possible combinations of attributes (attribute levels), e.g. if data is a fractional factorial. Does Apollo have a prediction function to predict the results on a new data that was not initially included in the estimation/model building phase? Or does this have to be done manually, extracting the coefficients etc.?
Thanks!
Re: Prediction on new data
Posted: 14 Sep 2020, 20:14
by stephanehess
Hi
model estimation can produce parameters only for attributes included in the data. Only those parameters will be available in prediction. If you include attributes in prediction that were not included in estimation, then they will have no impact on the predictions, unless you make assumptions about the associated parameters. This relates to main effects as well as interactions. Of course, you can include combinations of attribute levels in prediction that did not exist in the estimation data, but the model will not capture a specific value for those interactions, but the main effects will still matter. Any software that claims to do so is making stuff up or using priors
Hope this helps
Stephane
Re: Prediction on new data
Posted: 14 Sep 2020, 21:55
by dkopcino
You say that I can include new attribute levels in prediction that were not previously used in the model estimation process. That is exactly what I'm trying to do, but how do I do that? In the examples the only change is in the already existing data (e.g. increase the rail price by 1%). Isn't there an option to simply give the routine new data (properly formatted, of course)?
Thx,
Danijel
Re: Prediction on new data
Posted: 14 Sep 2020, 22:25
by stephanehess
Hi Danijel
you can use different data in forecasting. The process is just the same as when you adjust an existing attribute.
However, a couple of caveats:
- if the attribute for which you introduce new levels is categorical, then you'll have the issue I mention below, i.e. you won't have parameters associated with those levels in the utilities
- if the attribute for which you introduce new levels is continuous, then you need to just ask yourself whether the new levels are not too different from the ones in the estimation data for the model to become invalid.
For that second point, just think of a situation where you have estimated a model with cost levels going from £1 to £5, but then you include a level of £10 in the prediction. Whatever assumption and linearity or non-linearity is correct for the original data might not apply for the new data.
Stephane
Re: Prediction on new data
Posted: 15 Sep 2020, 11:19
by dkopcino
Hi,
Sorry, perhaps I wasn't clear. Let's say that I build a model using fractional factorial, e.g. for attributes A (with levels A1 and A2) and B (with levels B1 and B2) I use only (A1, B1), (A1, B2), (A2, B2) combinations for building the model. These combinations are added to the database element in the apollo_inputs list, the model is built and now I want to predict (calculate utility) for (A2, B1) -- which is not in the database -- and compare it with some other combination. How do I do that?
The code examples suggest the following flow:
predictions_base = apollo_prediction(model, apollo_probabilities, apollo_inputs)
database$cost_rail = 1.01*database$cost_rail
predictions_new = apollo_prediction(model, apollo_probabilities, apollo_inputs)
database$cost_rail = 1/1.01*database$cost_rail
First, changing the global database variable does not change the database element in the apollo_inputs list, so I think that this doesn't do the job, please correct me if I'm wrong.
Second, I can only make changes to the existing data that was already used for building the model, as suggested by changing the existing database variable.
The overall issue is then: how to add/pass new data (e.g. (A2, B1) from my example above) to the apollo_prediction routine?
Thx,
Danijel
Re: Prediction on new data
Posted: 15 Sep 2020, 13:11
by stephanehess
Hi
this is not difficult. You can just load a new database into memory, and that will then be used by apollo_prediction, as long as it is called database.
In relation to your two points, apollo_prediction regenerates apollo_inputs, so it updates the database inside it with the one that is in the global database variable.
In relation to your second point, you can use a completely different database, with more or fewer rows, etc. You just need to make sure that the database is compatible with your model.
Best wishes
Stephane
Re: Prediction on new data
Posted: 15 Sep 2020, 14:05
by stephanehess
I should have added that in the next official release, Apollo will run a check to see if there are changes in global variables made by the user (such as database) that have not been reflected in apollo_inputs. A message will then be printed to ask the user to run apollo_validateInputs again