Out of sample results

JuliavB · Post by **JuliavB** » 12 Nov 2022, 13:08

Dear Stephane,

I´ve conducted the Out of Sample test function in Apollo resulting in the following outputs:

LL per obs in estimation sample LL per obs in validation sample % difference
1 -1.0348 -1.0144 1.97
2 -1.0338 -1.0256 0.79
3 -1.0236 -1.1150 -8.94
4 -1.0296 -1.0592 -2.88
5 -1.0289 -1.0652 -3.53
6 -1.0386 -0.9809 5.56
7 -1.0367 -0.9970 3.83
8 -1.0339 -1.0222 1.13
9 -1.0378 -0.9923 4.38
10 -1.0312 -1.0448 -1.32
Average -1.0329 -1.0317 0.10

Unfortunately, I did not find any specific quantifiable interpretation guideline for the out of sample outputs in the manual.
Can you tell how the results can be interpreted? And is it okay to stay with the default of 10% validation sample for a total sample size of N=330?

Thanks for your support in advance.
Best,
J.

Post by **stephanehess** » 25 Nov 2022, 13:25

Hi

there is no specific hard rule in terms of what size of difference is acceptable, but your differences look pretty small, suggesting no specific risk of overfitting

Stephane

kkavta · Post by **kkavta** » 09 Feb 2024, 13:00

Dear Prof. Stephane,

I hope this email finds you well.

I have some follow-up questions regarding the same topic. How should we interpret the average difference in log-likelihood (LL) between the estimated sample and validation sample? Does a smaller value for percentage difference imply that the estimated model fits the validation sample better compared to situations with a higher difference?

In my case, I'm obtaining an average difference value of -2.7 % for MNL and -4.7% fro MMNL model. Is this acceptable?

Also, Is there any way to calculate the "% correct predicted" metric for validation in Apollo?

Thank you for your assistance.

Best regards,
K.

Post by **stephanehess** » 06 May 2024, 08:03

Hi

apologies for the slow reply

So what you're finding is that you MMNL overfits the estimation data a bit more than MNL, but these differences are small.

No, Apollo does not compute the % correct predicted metric as this is a misleading metric that is contrary to probabilistic choice models. It should never be used. See Kenneth Train's book

Stephane

ApolloChoiceModelling forum

Out of sample results

Out of sample results

Re: Out of sample results

Re: Out of sample results

Re: Out of sample results