Important: Read this before posting to this forum

  1. This forum is for questions related to the use of Apollo. We will answer some general choice modelling questions too, where appropriate, and time permitting. We cannot answer questions about how to estimate choice models with other software packages.
  2. There is a very detailed manual for Apollo available at http://www.ApolloChoiceModelling.com/manual.html. This contains detailed descriptions of the various Apollo functions, and numerous examples are available at http://www.ApolloChoiceModelling.com/examples.html. In addition, help files are available for all functions, using e.g. ?apollo_mnl
  3. Before asking a question on the forum, users are kindly requested to follow these steps:
    1. Check that the same issue has not already been addressed in the forum - there is a search tool.
    2. Ensure that the correct syntax has been used. For any function, detailed instructions are available directly in Apollo, e.g. by using ?apollo_mnl for apollo_mnl
    3. Check the frequently asked questions section on the Apollo website, which discusses some common issues/failures. Please see http://www.apollochoicemodelling.com/faq.html
    4. Make sure that R is using the latest official release of Apollo.
  4. If the above steps do not resolve the issue, then users should follow these steps when posting a question:
    1. provide full details on the issue, including the entire code and output, including any error messages
    2. posts will not immediately appear on the forum, but will be checked by a moderator first. We check the forum at least twice a week. It may thus take a couple of days for your post to appear and before we reply. There is no need to submit the post multiple times.

What distributions for the coefficients? Mixed Logit

Ask questions about model specifications. Ideally include a mathematical explanation of your proposed model.
Post Reply
dce.farmers
Posts: 9
Joined: 03 Sep 2024, 16:27

What distributions for the coefficients? Mixed Logit

Post by dce.farmers »

Hi,

I am estimating a Mixed Logit model and I have questions about the right distributions of the parameters I should use/test for, especially for my interaction terms. This is in the context of material/labour sharing among farmers for adoption of a new technology.

Here are my attribute and levels (4 categorical attributes and 1 continious attribute):
-Attribute a: Technical support before the replantation 1.None 2. Personalized 3. Collective
- Attribute b: Replantation work 1.Individual approach 2.Share of material and/or labour through independent group 3.Share of material and/or labour through cooperative
- Attribute c: Investment cost: 45 000 euros/ha, 50 000euros/ha, 55 000euros/ha, 60 000 euros/ha
- Attribute d: Crop protection practices for resistant varieties 1.Individual approach 2.Share of material and/or labour through independent group
3.Share of material and/or labour through cooperative
- Attribute e: Technical support after the replantation 1.Personalized 2. Collective

In addition I have two interactions in my model:
- attribute b level 2 and 3 intereacted with an attitudinal variable formulated in this way "collaborating with other farmers can lead to economic efficiency" (transforemd scale as 1= totally agree or agree and 0= and 1= do not agree, fully disagree , do not agree/disagree)
- attribute e level 2 interacted with an attitudinal variable formulated in this way "sharing knowledge and experience between farmers is important" (transforemd scale as 1= totally agree or agree and 0= and 1= do not agree, fully disagree , do not agree/disagree)

And in addition to that, I have added a few socio-eco characteristics interacted with the status quo (SQ* intention to replant, SQ* Turnover, SQ*first factor impacting their yield, SQ* status of farmers, SQ * knowledge on a specific variety etc.).

Here is what I have assumed so far as distributions for the coefficients of my different levels and attributes:
Normal distribution for all coefficients except for the coefiicient of my cost attribute which is non-random/constant. In fact,I tested for 3 different distributions for the cost coefficient (normal, negative log normal and constant) and cost modelled as a constant turn out to be the model with the best fit. Should I test for other distributions?

Here are below my results for the model I am describing above just in case.

Thank you!

Code: Select all

Model name                                  : MMNL_uncorrelated
Model description                           : Mixed logit model WITH sobol draws log normal cost with socio-demographics
Model run at                                : 2024-10-11 12:17:15.075789
Estimation method                           : bgw
Model diagnosis                             : Relative function convergence
Optimisation diagnosis                      : Maximum found
     hessian properties                     : Negative definite
     maximum eigenvalue                     : -0.080965
     reciprocal of condition number         : 5.16905e-14
Number of individuals                       : 120
Number of rows in database                  : 720
Number of modelled outcomes                 : 720

Number of cores used                        :  11 
Number of inter-individual draws            : 1000 (sobol)

LL(start)                                   : -580.37
LL at equal shares, LL(0)                   : -791
LL at observed shares, LL(C)                : -656.96
LL(final)                                   : -503.75
Rho-squared vs equal shares                  :  0.3631 
Adj.Rho-squared vs equal shares              :  0.3189 
Rho-squared vs observed shares               :  0.2332 
Adj.Rho-squared vs observed shares           :  0.183 
AIC                                         :  1077.5 
BIC                                         :  1237.78 

Estimated parameters                        : 35
Time taken (hh:mm:ss)                       :  00:02:38.57 
     pre-estimation                         :  00:00:24.77 
     estimation                             :  00:00:43.92 
          initial estimation                :  00:00:42.56 
          estimation after rescaling        :  00:00:1.36 
     post-estimation                        :  00:01:29.88 
Iterations                                  :  62  
     initial estimation                     :  61 
     estimation after rescaling             :  1 

Unconstrained optimisation.

Estimates:
                       Estimate        s.e.   t.rat.(0)  p(1-sided)    Rob.s.e.
b_asc_alt3             -8.98671      3.4068    -2.63785    0.004172      2.9231
sd_asc_alt3             1.10843      0.6005     1.84590    0.032453      0.3785
b_asc_alt3Ren           0.92108      1.0145     0.90787    0.181973      0.8314
sd_asc_alt3Ren          0.08045      0.5225     0.15398    0.438814      0.4085
b_asc_alt3SV           -1.92175      1.4182    -1.35506    0.087699      1.1513
sd_asc_alt3SV          -2.26705      0.6328    -3.58278  1.6998e-04      0.5839
b_asc_alt3KNOW         -1.55372      1.0075    -1.54223    0.061509      0.8530
sd_asc_alt3KNOW         0.80504      0.2868     2.80653    0.002504      0.2019
b_asc_alt3MIL           0.28799      0.8564     0.33627    0.368335      0.6624
sd_asc_alt3MIL          0.55878      0.3949     1.41516    0.078511      0.2265
b_asc_alt3CA         -1.604e-06   1.915e-06    -0.83759    0.201129   1.034e-06
sd_asc_alt3CA         1.845e-07   9.955e-07     0.18529    0.426502   3.418e-07
b_asc_alt3Trad          0.42534      1.0166     0.41839    0.337830      0.8810
sd_asc_alt3Trad        -0.77116      0.8518    -0.90538    0.182632      0.4703
mean_b_suppnone         0.00000          NA          NA          NA          NA
sd_b_suppnone           0.00000          NA          NA          NA          NA
mean_b_suppindiv        0.82810      0.2804     2.95301    0.001573      0.3260
sd_b_suppindiv          0.98424      0.4288     2.29541    0.010855      0.4255
mean_b_suppcoll         1.02037      0.3107     3.28446  5.1090e-04      0.3293
sd_b_suppcoll           1.28049      0.3757     3.40862  3.2646e-04      0.3599
mean_b_tvxindiv         0.00000          NA          NA          NA          NA
sd_b_tvxindiv           0.00000          NA          NA          NA          NA
mean_b_tvxindgp         0.11497      0.5066     0.22695    0.410233      0.6437
sd_b_tvxindgp           1.20193      0.4725     2.54371    0.005484      0.5289
mean_b_tvxindgpCOL6    -0.16780      0.5632    -0.29797    0.382864      0.6974
sd_b_tvxindgpCOL6      -0.47913      0.7361    -0.65093    0.257546      0.4908
mean_b_tvxcoop          0.02373      0.4929     0.04814    0.480802      0.5764
sd_b_tvxcoop            0.60884      0.6676     0.91199    0.180886      0.5655
mean_b_tvxcoopCOL6     -0.26907      0.5352    -0.50276    0.307568      0.6380
sd_b_tvxcoopCOL6        0.82921      0.6541     1.26780    0.102435      0.5829
mean_b_pulvindiv        0.00000          NA          NA          NA          NA
sd_b_pulvindiv          0.00000          NA          NA          NA          NA
mean_b_pulvindgp       -0.95271      0.2640    -3.60825  1.5413e-04      0.3071
sd_b_pulvindgp          1.41891      0.4427     3.20526  6.7469e-04      0.5807
mean_b_pulvcoop        -1.44252      0.3357    -4.29709   8.653e-06      0.4311
sd_b_pulvcoop           1.10164      0.4289     2.56838    0.005109      0.5371
mean_b_techindiv        0.00000          NA          NA          NA          NA
sd_b_techindiv          0.00000          NA          NA          NA          NA
mean_b_techcoll        -0.49574      0.2136    -2.32115    0.010139      0.2349
sd_b_techcoll           1.21609      0.3515     3.46007  2.7002e-04      0.3812
mean_b_techcollCOL3     0.54906      0.8096     0.67814    0.248840      0.7262
sd_b_techcollCOL3      -1.77534      0.9890    -1.79500    0.036327      0.8150
c_b_cost            -1.5288e-04   2.978e-05    -5.13354   1.422e-07   3.294e-05

dpalma
Posts: 227
Joined: 24 Apr 2020, 17:54

Re: What distributions for the coefficients? Mixed Logit

Post by dpalma »

Hi,

I'm afraid there is no definitive rule on what distributions to use for coefficients. It will depend on the nature of the explanatory variables, and on how much information your data contains.

In general, the best practise is to always begin by using simple MNL models. Personally, I begin by estimating the simplest reasonable model: an MNL with only direct effects of alternative's attributes, without any interaction with sociodemographics. Then I introduce relevant interactions among parameters (if any). Then I start testing interactions with sociodemographics.

Only after finding an MNL models I feel comfortable with, I move to an MMNL (mixed) model definition. Some basic ideas on what distribution to use for a coefficient:
  • If there is no prior info on the sign of the parameter, then I use a Normal distribution.
  • If I know a priori the sign of the coefficient (e.g. cost coefficient) I use a negative or positive lognormal as appropriate.
  • If the lognormal does not converge, or has a tail that is too big, or is too concentrated around zero, I try other distributions, such as log-uniform, or triangular.
In general, systematic taste variation (i.e. deterministic heterogeneity, like the one you capture when interacting attributes and socio-demographics) tends to be much more informative than random heterogeneity. So in most cases I would worry more about capturing deterministic heterogeneity rather than finding the optimal random distribution (unless you are interesting in things such as the s.d. of the coefficient).

I hope this is useful.

Best wishes
David
Post Reply