Page 1 of 1

What distributions for the coefficients? Mixed Logit

Posted: 14 Oct 2024, 16:33
by dce.farmers
Hi,

I am estimating a Mixed Logit model and I have questions about the right distributions of the parameters I should use/test for, especially for my interaction terms. This is in the context of material/labour sharing among farmers for adoption of a new technology.

Here are my attribute and levels (4 categorical attributes and 1 continious attribute):
-Attribute a: Technical support before the replantation 1.None 2. Personalized 3. Collective
- Attribute b: Replantation work 1.Individual approach 2.Share of material and/or labour through independent group 3.Share of material and/or labour through cooperative
- Attribute c: Investment cost: 45 000 euros/ha, 50 000euros/ha, 55 000euros/ha, 60 000 euros/ha
- Attribute d: Crop protection practices for resistant varieties 1.Individual approach 2.Share of material and/or labour through independent group
3.Share of material and/or labour through cooperative
- Attribute e: Technical support after the replantation 1.Personalized 2. Collective

In addition I have two interactions in my model:
- attribute b level 2 and 3 intereacted with an attitudinal variable formulated in this way "collaborating with other farmers can lead to economic efficiency" (transforemd scale as 1= totally agree or agree and 0= and 1= do not agree, fully disagree , do not agree/disagree)
- attribute e level 2 interacted with an attitudinal variable formulated in this way "sharing knowledge and experience between farmers is important" (transforemd scale as 1= totally agree or agree and 0= and 1= do not agree, fully disagree , do not agree/disagree)

And in addition to that, I have added a few socio-eco characteristics interacted with the status quo (SQ* intention to replant, SQ* Turnover, SQ*first factor impacting their yield, SQ* status of farmers, SQ * knowledge on a specific variety etc.).

Here is what I have assumed so far as distributions for the coefficients of my different levels and attributes:
Normal distribution for all coefficients except for the coefiicient of my cost attribute which is non-random/constant. In fact,I tested for 3 different distributions for the cost coefficient (normal, negative log normal and constant) and cost modelled as a constant turn out to be the model with the best fit. Should I test for other distributions?

Here are below my results for the model I am describing above just in case.

Thank you!

Code: Select all

Model name                                  : MMNL_uncorrelated
Model description                           : Mixed logit model WITH sobol draws log normal cost with socio-demographics
Model run at                                : 2024-10-11 12:17:15.075789
Estimation method                           : bgw
Model diagnosis                             : Relative function convergence
Optimisation diagnosis                      : Maximum found
     hessian properties                     : Negative definite
     maximum eigenvalue                     : -0.080965
     reciprocal of condition number         : 5.16905e-14
Number of individuals                       : 120
Number of rows in database                  : 720
Number of modelled outcomes                 : 720

Number of cores used                        :  11 
Number of inter-individual draws            : 1000 (sobol)

LL(start)                                   : -580.37
LL at equal shares, LL(0)                   : -791
LL at observed shares, LL(C)                : -656.96
LL(final)                                   : -503.75
Rho-squared vs equal shares                  :  0.3631 
Adj.Rho-squared vs equal shares              :  0.3189 
Rho-squared vs observed shares               :  0.2332 
Adj.Rho-squared vs observed shares           :  0.183 
AIC                                         :  1077.5 
BIC                                         :  1237.78 

Estimated parameters                        : 35
Time taken (hh:mm:ss)                       :  00:02:38.57 
     pre-estimation                         :  00:00:24.77 
     estimation                             :  00:00:43.92 
          initial estimation                :  00:00:42.56 
          estimation after rescaling        :  00:00:1.36 
     post-estimation                        :  00:01:29.88 
Iterations                                  :  62  
     initial estimation                     :  61 
     estimation after rescaling             :  1 

Unconstrained optimisation.

Estimates:
                       Estimate        s.e.   t.rat.(0)  p(1-sided)    Rob.s.e.
b_asc_alt3             -8.98671      3.4068    -2.63785    0.004172      2.9231
sd_asc_alt3             1.10843      0.6005     1.84590    0.032453      0.3785
b_asc_alt3Ren           0.92108      1.0145     0.90787    0.181973      0.8314
sd_asc_alt3Ren          0.08045      0.5225     0.15398    0.438814      0.4085
b_asc_alt3SV           -1.92175      1.4182    -1.35506    0.087699      1.1513
sd_asc_alt3SV          -2.26705      0.6328    -3.58278  1.6998e-04      0.5839
b_asc_alt3KNOW         -1.55372      1.0075    -1.54223    0.061509      0.8530
sd_asc_alt3KNOW         0.80504      0.2868     2.80653    0.002504      0.2019
b_asc_alt3MIL           0.28799      0.8564     0.33627    0.368335      0.6624
sd_asc_alt3MIL          0.55878      0.3949     1.41516    0.078511      0.2265
b_asc_alt3CA         -1.604e-06   1.915e-06    -0.83759    0.201129   1.034e-06
sd_asc_alt3CA         1.845e-07   9.955e-07     0.18529    0.426502   3.418e-07
b_asc_alt3Trad          0.42534      1.0166     0.41839    0.337830      0.8810
sd_asc_alt3Trad        -0.77116      0.8518    -0.90538    0.182632      0.4703
mean_b_suppnone         0.00000          NA          NA          NA          NA
sd_b_suppnone           0.00000          NA          NA          NA          NA
mean_b_suppindiv        0.82810      0.2804     2.95301    0.001573      0.3260
sd_b_suppindiv          0.98424      0.4288     2.29541    0.010855      0.4255
mean_b_suppcoll         1.02037      0.3107     3.28446  5.1090e-04      0.3293
sd_b_suppcoll           1.28049      0.3757     3.40862  3.2646e-04      0.3599
mean_b_tvxindiv         0.00000          NA          NA          NA          NA
sd_b_tvxindiv           0.00000          NA          NA          NA          NA
mean_b_tvxindgp         0.11497      0.5066     0.22695    0.410233      0.6437
sd_b_tvxindgp           1.20193      0.4725     2.54371    0.005484      0.5289
mean_b_tvxindgpCOL6    -0.16780      0.5632    -0.29797    0.382864      0.6974
sd_b_tvxindgpCOL6      -0.47913      0.7361    -0.65093    0.257546      0.4908
mean_b_tvxcoop          0.02373      0.4929     0.04814    0.480802      0.5764
sd_b_tvxcoop            0.60884      0.6676     0.91199    0.180886      0.5655
mean_b_tvxcoopCOL6     -0.26907      0.5352    -0.50276    0.307568      0.6380
sd_b_tvxcoopCOL6        0.82921      0.6541     1.26780    0.102435      0.5829
mean_b_pulvindiv        0.00000          NA          NA          NA          NA
sd_b_pulvindiv          0.00000          NA          NA          NA          NA
mean_b_pulvindgp       -0.95271      0.2640    -3.60825  1.5413e-04      0.3071
sd_b_pulvindgp          1.41891      0.4427     3.20526  6.7469e-04      0.5807
mean_b_pulvcoop        -1.44252      0.3357    -4.29709   8.653e-06      0.4311
sd_b_pulvcoop           1.10164      0.4289     2.56838    0.005109      0.5371
mean_b_techindiv        0.00000          NA          NA          NA          NA
sd_b_techindiv          0.00000          NA          NA          NA          NA
mean_b_techcoll        -0.49574      0.2136    -2.32115    0.010139      0.2349
sd_b_techcoll           1.21609      0.3515     3.46007  2.7002e-04      0.3812
mean_b_techcollCOL3     0.54906      0.8096     0.67814    0.248840      0.7262
sd_b_techcollCOL3      -1.77534      0.9890    -1.79500    0.036327      0.8150
c_b_cost            -1.5288e-04   2.978e-05    -5.13354   1.422e-07   3.294e-05


Re: What distributions for the coefficients? Mixed Logit

Posted: 21 Oct 2024, 17:09
by dpalma
Hi,

I'm afraid there is no definitive rule on what distributions to use for coefficients. It will depend on the nature of the explanatory variables, and on how much information your data contains.

In general, the best practise is to always begin by using simple MNL models. Personally, I begin by estimating the simplest reasonable model: an MNL with only direct effects of alternative's attributes, without any interaction with sociodemographics. Then I introduce relevant interactions among parameters (if any). Then I start testing interactions with sociodemographics.

Only after finding an MNL models I feel comfortable with, I move to an MMNL (mixed) model definition. Some basic ideas on what distribution to use for a coefficient:
  • If there is no prior info on the sign of the parameter, then I use a Normal distribution.
  • If I know a priori the sign of the coefficient (e.g. cost coefficient) I use a negative or positive lognormal as appropriate.
  • If the lognormal does not converge, or has a tail that is too big, or is too concentrated around zero, I try other distributions, such as log-uniform, or triangular.
In general, systematic taste variation (i.e. deterministic heterogeneity, like the one you capture when interacting attributes and socio-demographics) tends to be much more informative than random heterogeneity. So in most cases I would worry more about capturing deterministic heterogeneity rather than finding the optimal random distribution (unless you are interesting in things such as the s.d. of the coefficient).

I hope this is useful.

Best wishes
David