Cross validation and Prediction
Posted: 07 May 2025, 14:02
Hi,
I am working on comparing the prediction performance of mixed logit models estimated with and without weighting calibration using the Apollo package. The weights have been applied at the task level (i.e., per row in the dataset).
I try to compare their prediction performance indicators such as accuracy, F1-score, and precision. However, I have not found a straightforward way to extract these indicators from the current outputs. The apollo_outOfSample function provides the difference in log-likelihood between the training and validation datasets, but it does not report other performance indicators or i do not get the point.
I have also attempted to use the apollo_prediction function. While it returns predicted probabilities, I noticed that:
1) The sum of average predictions at MLE does not equal 1.
2) The aggregated predictions across alternatives do not sum to the total number of observations.
Here is a sample of the output from apollo_prediction:
Aggregated prediction:
at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
alt1 130.7 129.9 5.039 120.7 138.7
alt2 130.8 130.0 4.899 121.5 138.6
alt3 204.3 205.9 9.935 188.5 223.6
Average prediction:
at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
alt1 0.06985 0.06942 0.002693 0.06453 0.07413
alt2 0.06990 0.06946 0.002618 0.06493 0.07409
alt3 0.10920 0.11007 0.005310 0.10073 0.11950
Could you kindly advise:
1) How I might obtain accuracy, F1-score, or precision for prediction evaluation in Apollo?
2) Whether these discrepancies in the prediction summaries are expected?
3) What would be the recommended way to compare predictive performance between models with and without weights?
Thank you very much for your time and assistance.
Best regards,
Kiki
I am working on comparing the prediction performance of mixed logit models estimated with and without weighting calibration using the Apollo package. The weights have been applied at the task level (i.e., per row in the dataset).
I try to compare their prediction performance indicators such as accuracy, F1-score, and precision. However, I have not found a straightforward way to extract these indicators from the current outputs. The apollo_outOfSample function provides the difference in log-likelihood between the training and validation datasets, but it does not report other performance indicators or i do not get the point.
I have also attempted to use the apollo_prediction function. While it returns predicted probabilities, I noticed that:
1) The sum of average predictions at MLE does not equal 1.
2) The aggregated predictions across alternatives do not sum to the total number of observations.
Here is a sample of the output from apollo_prediction:
Aggregated prediction:
at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
alt1 130.7 129.9 5.039 120.7 138.7
alt2 130.8 130.0 4.899 121.5 138.6
alt3 204.3 205.9 9.935 188.5 223.6
Average prediction:
at MLE Sampled mean Sampled std.dev. Quantile 0.025 Quantile 0.975
alt1 0.06985 0.06942 0.002693 0.06453 0.07413
alt2 0.06990 0.06946 0.002618 0.06493 0.07409
alt3 0.10920 0.11007 0.005310 0.10073 0.11950
Could you kindly advise:
1) How I might obtain accuracy, F1-score, or precision for prediction evaluation in Apollo?
2) Whether these discrepancies in the prediction summaries are expected?
3) What would be the recommended way to compare predictive performance between models with and without weights?
Thank you very much for your time and assistance.
Best regards,
Kiki