How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

susiezhao · Post by **susiezhao** » 31 Aug 2023, 07:59

Dear professor,

I wanted to use a mixed logit model, so I added a couple of random parameters.
But the problem is that I don't know how to define the starting values for mu and sigma.
I see that the examples give starting values of -3 for mu and 0 or -0.01 for sigma.
I don't know what is the basis for such starting values.
I used the parameter values previously estimated by a simple MNL model as the starting value of mu for a mixed logit model random parameter, and then took 0.0000000001 or -0.01 for sigma.
But there is still a lot of NA and I don't know why.
I also added error components and panel effects to the mixed logit model and restricted the estimated standard deviation of the error components to be greater than zero.
I don't know if that is related to this.
I'm wondering if the starting value of mu is taken as -3, is it related to the specific study? Why is it taken this way?
I used the effec coding for all the data. So I don't know if I should take -3 or not.
Please help me this problem.

dpalma · Post by **dpalma** » 31 Aug 2023, 13:42

Hi,

Usually, I recommend first estimating a simple MNL model and use those estimated parameters as starting values for the mu (mean) of the the mixed logit, while starting their sigma (s.d.) at zero. This is assuming the random coefficients in your mixed model follow a normal distribution.

If the random coefficients in your mixed model follow log-normal distribution, then I recommend using as starting value for mu the log of the magnitude of the MNL estimate, so mu = log( abs(b) ), where b is the MNl estimate for the coefficient, and zero for the s.d. (i.e. sigma=0).

However, I always recommend introducing the random coefficients one at a time. So let's imagine you first estimated a simple MNL model with the following deterministic utility for alternative i:
V_i = asc_i + b1*x1_i + b2*x2_i + b3*x3_i
Where x1_i is the value of explanatory variable x1 for alternative i.

If the simple MNL model gives reasonable results, then, I would estimate a new mixed model where only b1 is random. If that works well, then I would estimate a new mixed model where b1 and b2 are random. And so on.

In other words, I strongly recommend adding complexity to your model in a step-by-step way. Do not go from a simple MNL model to a mixed model where all coefficients are random, because if something fails you will not know what is the source of the failure. Instead, if you add complexity one step at a time, and something fails, you will know exactly where the problem comes from.

Cheers
David

susiezhao · Post by **susiezhao** » 01 Sep 2023, 04:23

Dear David,

Thank you very much for your reply.
I still have some questions. Please help me.
I don't know which from of the distribution is the best?
For example,
randcoeff[["ran_cost_bike1"]] = mu_cost_bike1 + sigma_cost_bike1 * draws_cost_bike1
randcoeff[["ran_cost_bike1"]] = exp(mu_cost_bike1 + sigma_cost_bike1 * draws_cost_bike1)
randcoeff[["ran_cost_bike1"]] = -exp(mu_cost_bike1 + sigma_cost_bike1 * draws_cost_bike1)
I've seen people say that for the cost, it's better to use a negative log-uniform distribution.
Because people always favor cheaper costs, a negative coefficient needs to be guaranteed all the time.
But I used effect coding and the coefficients may be positive or negative at different levels.
So I'm not sure which distribution is more appropriate. Maybe I need to try all of these three.
But I don't know how to set the starting values for these three different distributions.
Now I only know how to set the starting values for normal distribution and log-normal distribution.
How about log-uniform distribution?
And is it still necessary to use ln() or exp() to calculate the mu and sigma obtained from these three distributions?
Can the LL results of the R code obtained from these different distributions be used directly in the paper?
Or do I need to recalculate them myself?
I would appreciate hearing from you soon.
Thank you very much.

Kind regards,

dpalma · Post by **dpalma** » 01 Sep 2023, 13:07

Hi,

Let's consider five types of distributions and their recommended starting values.

Code: Select all

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation
apollo_beta = c(m1 = 0, s1 = 0, # normal starting values
                m2 =-5, s2 = 0, # positive lognormal starting values
                m3 =-5, s3 = 0, # negative lognormal starting values
                a4 =-5, c4 = 0, # positive loguniform starting values
                a5 =-5, c5 = 0) # negative loguniform starting values

### Vector with names (in quotes) of parameters to be kept fixed at their 
  # starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()

# ################################################################# #
#### DEFINE RANDOM COMPONENTS                                    ####
# ################################################################# #

### Set parameters for generating draws
apollo_draws = list(
  interDrawsType = "mlhs",
  interNDraws    = 500,
  interNormDraws = c("eta1","eta2","eta3"),
  interUnifDraws = c("eta4", "eta5")
)

### Create random parameters
apollo_randCoeff = function(apollo_beta, apollo_inputs){
  randcoeff = list()

  randcoeff[["b1"]] =       m1 + s1*eta1   # normal
  randcoeff[["b2"]] =  exp( m2 + s2*eta2 ) # positive lognormal
  randcoeff[["b3"]] = -exp( m3 + s3*eta3 ) # negative lognormal
  randcoeff[["b4"]] =  exp( a4 + c4*eta4 ) # positive loguniform
  randcoeff[["b5"]] = -exp( a5 + c5*eta5 ) # negative loguniform

  return(randcoeff)
}

The idea of the starting values I use in the code above is for the starting coefficients to be close to zero. They might not work for every model, but they are the safest bet in most cases.

If the starting values in the code above do not work for you, then I would recommend the following (where the parameters ending in _mnl are the value of the equivalent estimated coefficient in a simple MNL model with no mixing):

set all "s" and "c" parameters to a starting value of 0
m1=b1_mnl
m2=log(b2_mnl)
m3=log(abs(b3_mnl))
a4=log(b4_mnl)
a5=log(abs(b5_mnl))

The decision of what distribution to use is determined by your prior assumptions about the sign of the coefficients.

If you want to allow for both positive and negative values of the coefficient, and there is no bound on the magnitude of the coefficient, then use the normal distribution. the coefficient will be able to take any value.
If you want to restrict the coefficient to only positive values, and there is no restriction on how big their magnitude can be, then use the positive lognormal distribution. The coefficient will be able to take any value bigger than zero.
If you want to restrict the coefficient to only negative values, and there is no restriction on how big their magnitude be, then use the negative lognormal distribution. The coefficient will be able to take any value smaller than zero.
If you want to restrict the coefficient to only positive values, and you don't want the magnitude of the coefficient to be too large, then use the positive loguniform distribution. Using the parametrisation in the code above, the coefficient will be able to take values between exp(a4) and exp(a4 + c4).
If you want to restrict the coefficient to only negative values, and you don't want the magnitude of the coefficient to be too large, then use the negative loguniform distribution. Using the parametrisation in the code above, the coefficient will be able to take values between -exp(a5 + c5) and -exp(a5).

If you use a positive lognormal distribution, you should not report the m2 and s2 parameters, but instead the actual mean and standard deviation of the distribution. You can calculate them after estimation using the apollo_deltaMethod function, as follows:

Code: Select all

apollo_deltaMethod(model, list(operation="lognormal", 
                               parName1="m2", 
                               parName2="s2"))

For a negative lognormal, you would do it similarly to the above code, but obviously replacing "m2" for "m3" and "s2" for "s3", and putting a negative value to the mean reported by the function.

Similarly, if you use a loguniform distribution you should not report a4, c4, a5, c5 parameters, but instead their actual lower and higher bounds, which you can calculate using the following code:

Code: Select all

apollo_deltaMethod(model, list(expression=c("exp(a4)", "exp(a4 + c4)")))
apollo_deltaMethod(model, list(expression=c("-exp(a5)", "-exp(a5 + c5)")))

Finally, about your price coefficient, I would say that in most cases the price coefficient should be negative, so a negative lognormal or a negative loguniform coefficient should be used for it. If you are finding some positive price coefficients in your model, I would recommend trying different utility specifications, such as interacting price with income, or including any additional attributes of the alternative you might be missing.

Best wishes
David

susiezhao · Post by **susiezhao** » 01 Sep 2023, 13:40

Dear David,

I really appreciate such detailed replies. It gave me a very comprehensive understanding of the various distributions and settings of the starting values. Since I used effect coding for the different levels of cost, the coefficients corresponding to the lower cost levels should be positive and the coefficients corresponding to the higher cost levels should be negative. Given that I have both positive and negative coefficients. Normal distribution is the most appropriate. I used the simplest MNL model adding socio-demographics to calculate to get some coefficients. These coefficients were applied to a mixed model with random parameters. These coefficients were set to the starting values of mu, and the starting value of sigma was set to 0. But my Rho-squared didn't change much. It is logical that by adding random coefficients, the value of Rho-squared should increase significantly. But no matter how I change the random coefficients, my Rho-squared doesn't change much. Please help me.

Kind regards,
Zhao

dpalma · Post by **dpalma** » 01 Sep 2023, 15:23

Hi Zhao,

I am not very familiar with effects coding (I myself always use dummy coding), so not sure what to recommend about it. But if you are including price as a continuous variable in your utility, then its coefficients almost certainly should be negative.

About the rho squared, I am afraid there isn't much I can say. Rho^2 is in itself not a very good indicator of fit. You can read a short discussion of it in section 3.8.1 of Trains' Discrete Choice Methods with Simulation (available here). In short, rho^2 is not like linear regression's R^2, and is not comparable across models estimated in different datasets. So I wouldn't worry much about its value.

Cheers
David

susiezhao · Post by **susiezhao** » 02 Sep 2023, 03:48

I will consider your suggestion.
Thank you very much for your reply.

georkapo · Post by **georkapo** » 27 Sep 2023, 06:13

Hello David, Hello Zhao,

I am running a mixed logit model in wtp space with negative lognormal for the price, while for the rest random coefficients, I use the normal distribution. However, the starting value for price (I use the values from the mnl model) doesn't work since the model produces NA. I followed your suggestion and used as mu_log_price=log(abs(price from the mnl)), but it still produces NA. Then I put a random starting value, e.g. -8 and it works, but the values I get are really different from the mixed logit model in preference space. Do you happen to know why this happens? any advice is more than welcome.

Thank you in advance.

Best regards,
Georgios

Post by **stephanehess** » 28 Sep 2023, 13:57

Georgios

it's difficult to reply to this without seeing your results for both models. But you shouldn't expect them to be the same as they are using different distributions

Stephane

ApolloChoiceModelling forum

How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?

Re: How to define the start value of mu and sigma when adding random parameters to the mixed logit model?