MDCEV problems
Posted: 06 May 2025, 11:23
Dear Professor,
I am running into difficulties trying to estimate an MDCEV of energy, housing, and other expenditures.
It concerns privacy-sensitive data of (a random subsample of 10k out of) 7 million households.
The focus is the amount of money spent, as virtually all households spent some money on energy and housing
We are thus interested in the gamma_parameters.
The outside "other" category is defined as the budget minus the other expenditures.
Households tend to spent roughly 0.1-50% (not real data/order of magnitude, due to strict privacy regulations) of their budget on energy versus roughly 1- 95% on housing.
Households with less budget naturally tend to spend a higher fraction on energy and housing.
All exogenous variables are sociodemographic (currently: wheter people are retired, their local address density, and their building's age).
Sigma is fixed to 1.
Alpha values are fixed to zero for now.
Apollo is 0.3.5.
However, I keep running into the following issues:
1. I cannot properly estimate the beta parameters due to the lack of corner cases. I tried creating artificial zeros by specifying a minimum consumption level, but that (naturally) results in very low likelihoods for those entries. I also tried fixing beta_energy and beta_housing to 10 or 100, but this does not seem to work properly either.
2. The gamma values estimated are extremely low: in the E-4 to E-8 range.
3. I frequently get zero likelihoods at starting values. When carefully modifying these starting values, I instead get "False Convergence"/"Unconstrained Optimization".
I tried apollo_searchStart, but this does not result in any good options as the likelihoods tend to be zero at starting values.
I also computed the probabilities using apollo_probabilities(functionality = output), but this has not yielded sufficient insight (other than showing that likelihoods for entries with imposed zeros are extremely low and that households with huge budgets cannot be modeled properly either).
Data cleaning is limited to removing households with very high/low budgets or energy/housing expenditures.
The reason is that the datasets have been prepared by third parties (Statistics Netherlands), with pseudonymized household identifiers and little background information.
Sharing or copying the R-file is impossible due to the remote environment (they want to ensure that we do not accidentally give away data on households in our code before we can download said code).
I realize that I am not exactly making this easy, but I was hoping you might have some general thoughts or recommendations?
Maybe this is not suitable for MDCEV after all?
Since there are not really any Discrete Choices of zero consumption being made?
Finally, I keep getting confused about the actual output of the model if it would be working.
I thought this would be in input units (i.e. Euros) but my gamma_parameters do not seem to depend on the scaling of these inputs (i.e. are similarly small if I am modeling euros versus thousands of euros)?
Thank you so much,
Your sincerely,
Chris ten Dam
I am running into difficulties trying to estimate an MDCEV of energy, housing, and other expenditures.
It concerns privacy-sensitive data of (a random subsample of 10k out of) 7 million households.
The focus is the amount of money spent, as virtually all households spent some money on energy and housing
We are thus interested in the gamma_parameters.
The outside "other" category is defined as the budget minus the other expenditures.
Households tend to spent roughly 0.1-50% (not real data/order of magnitude, due to strict privacy regulations) of their budget on energy versus roughly 1- 95% on housing.
Households with less budget naturally tend to spend a higher fraction on energy and housing.
All exogenous variables are sociodemographic (currently: wheter people are retired, their local address density, and their building's age).
Sigma is fixed to 1.
Alpha values are fixed to zero for now.
Apollo is 0.3.5.
However, I keep running into the following issues:
1. I cannot properly estimate the beta parameters due to the lack of corner cases. I tried creating artificial zeros by specifying a minimum consumption level, but that (naturally) results in very low likelihoods for those entries. I also tried fixing beta_energy and beta_housing to 10 or 100, but this does not seem to work properly either.
2. The gamma values estimated are extremely low: in the E-4 to E-8 range.
3. I frequently get zero likelihoods at starting values. When carefully modifying these starting values, I instead get "False Convergence"/"Unconstrained Optimization".
I tried apollo_searchStart, but this does not result in any good options as the likelihoods tend to be zero at starting values.
I also computed the probabilities using apollo_probabilities(functionality = output), but this has not yielded sufficient insight (other than showing that likelihoods for entries with imposed zeros are extremely low and that households with huge budgets cannot be modeled properly either).
Data cleaning is limited to removing households with very high/low budgets or energy/housing expenditures.
The reason is that the datasets have been prepared by third parties (Statistics Netherlands), with pseudonymized household identifiers and little background information.
Sharing or copying the R-file is impossible due to the remote environment (they want to ensure that we do not accidentally give away data on households in our code before we can download said code).
I realize that I am not exactly making this easy, but I was hoping you might have some general thoughts or recommendations?
Maybe this is not suitable for MDCEV after all?
Since there are not really any Discrete Choices of zero consumption being made?
Finally, I keep getting confused about the actual output of the model if it would be working.
I thought this would be in input units (i.e. Euros) but my gamma_parameters do not seem to depend on the scaling of these inputs (i.e. are similarly small if I am modeling euros versus thousands of euros)?
Thank you so much,
Your sincerely,
Chris ten Dam