Page 1 of 1

Multiple Imputation and ICLV Model Estimation

Posted: 16 Apr 2021, 02:06
by hossain
Dear All,

Hope this email finds you well. We are working on a dataset which some missing values. We want to perform multiple imputations to estimate the missing values and produce several datasets. Is possible to use those multiple imputed datasets to estimated an ICLV model where the model coefficients are pooled estimates (from multiple datasets) in apollo? Thank you!

Re: Multiple Imputation and ICLV Model Estimation

Posted: 21 Apr 2021, 14:47
by dpalma
Hi,

I believe there are two ways to approach this problem: sequential and simultaneous.

Sequential approach
1) Estimate an auxiliary model to impute the missing data
2) Predict the value of the missing data using the auxiliary model. Save this new dataset (without any missing data) as, for example, db1.csv.
3) Repeat step (2) either with a different auxiliary model, or the same but adding noise to it. An example of adding noise: let's imagine your auxiliary model is a linear regression z = b1*x + e, where z is the missing data, x is another explanatory variable, and e is the error term. You could generate multiple predictions by simulating the value of e by drawing from its random distribution. After this step M times you will end up with multiple datasets: db1, db2, db3, ...., dbM (M is the number of dataset you generated).
4) Stack together all your datasets (one of top of the other). You'll end up with a new big dataset with M*N rows (N is the number of observations in the original dataset), and the same number of variables (columns) as the original dataset.
5) Estimate your model on the big dataset.

Simultaneous approach
You could estimate the auxiliary and main model together using simulated full information maximum likelihood. While preferable from a statistical point of view, it might be more difficult to implement and estimate, as the particular implementation will depend on what kind of models you are using, and the estimation will be more computationally intensive and prone to fall in local optima.

Best
David

Re: Multiple Imputation and ICLV Model Estimation

Posted: 21 Apr 2021, 22:04
by hossain
Hi Dr. Palma,

Thank you for your kind and informative reply. I really appreciate it.

Regards

Hossain

Re: Multiple Imputation and ICLV Model Estimation

Posted: 22 Apr 2021, 22:50
by hossain
Dear Dr. Palma,

Thank you for your previous reply. I have a further query regarding the Sequential approach you have mentioned. For example, if I have a dataset of 500 samples with a lot of missing values. I have done multiple imputations using the MICE package and as an example, I have produced 20 datasets. If I stack the data together, I get a dataset of 10000 observations. I am concerned about running the ICLV on the inflated dataset as the statistical properties may not same compared to the original one. So, after producing the 20 datasets, I want to run the ICLV model 20 times.

Is it possible to run a loop or is there a function that can run the ICLV model 20 times and then pooled the estimates from the 20 ICLV estimates? Is it possible to pool the estimates using the apollo platform? I appreciate your time on this. Thank you!

Regards
Hossain

Re: Multiple Imputation and ICLV Model Estimation

Posted: 23 Apr 2021, 08:22
by stephanehess
Hossain

you could easily set this up as a loop. Are your 20 datasets in separate data.frames?

Stephane

Re: Multiple Imputation and ICLV Model Estimation

Posted: 23 Apr 2021, 08:52
by hossain
Dear Dr. Hess,

The MICE package gives the data in alist format, however, I can make them into separate data frames. I do not know how I can set up the code on the apollo platform so that it can give me pooled estimates from 20 datasets. For example, some portion of the code.

database = read.csv("Use.csv",header=TRUE)


### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()


### Set parameters for generating draws
apollo_draws = list(
interDrawsType="halton",
interNDraws=100,
interUnifDraws=c(),
interNormDraws=c("nu_n1","nu_n2","nu_n3"),

intraDrawsType='',
intraNDraws=0,
intraUnifDraws=c(),
intraNormDraws=c()
)

### Create random parameters
apollo_randCoeff=function(apollo_beta, apollo_inputs){
randcoeff = list()

randcoeff[["LV1"]] = nu_n1 + a_woman1 * Woman
randcoeff[["LV2"]] = nu_n2 + a_income_L2*income_low
randcoeff[["LV3"]] = nu_n3 + a_education3*education_college

return(randcoeff)
}

###-----------------
V = list()
V[['no']] = 0
V[['yes']] = asc_yes + b_student*student+ b_age* age + b_Woman* Woman+b_education*education_college+
b_income_L*income_low+ b_income_M*income_middle+b_employed*employ_fulltime+
gamma1*LV1+gamma2*LV2+gamma3*LV3

### Define settings for MNL model component
mnl_settings = list(
alternatives = c(no=0, yes=1),
avail = 1,
choiceVar = Use,
V = V
)

### Compute probabilities for MNL model component
P[["JUMP_use"]] = apollo_mnl(mnl_settings, functionality)

### Likelihood of the whole model
P = apollo_combineModels(P, apollo_inputs, functionality)

### Take product across observation for same individual
#P = apollo_panelProd(P, apollo_inputs, functionality)

### Average across inter-individual draws
P = apollo_avgInterDraws(P, apollo_inputs, functionality)

### Prepare and return outputs of function
P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}


I appreciate your time on this. Thank you!

Regards
Hossain

Re: Multiple Imputation and ICLV Model Estimation

Posted: 25 Apr 2021, 09:05
by stephanehess
Hi Hossain

let's assume you have a list called data, which contains the different versions of the dataset. Then you could do something like this:

First part is before the loop, with all the details you want in apollo_control for your model

Code: Select all

rm(list = ls())
library(apollo)
apollo_initialise()
apollo_control = list(
  ...
)
Then we load your list

Code: Select all

data = readRDS("overall_list.rds)
Then initialise a new list

Code: Select all

models=list()
Loop over your datasets

Code: Select all

for(s in 1:length(data)){

database = data[[s]]

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c()


### Set parameters for generating draws
apollo_draws = list(

...

P = apollo_prepareProb(P, apollo_inputs, functionality)
return(P)
}

models[[s]]=apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)}
You now have a new list called models, where each element contains the outputs for one of your datasets.

Stephane

Re: Multiple Imputation and ICLV Model Estimation

Posted: 25 Apr 2021, 10:08
by hossain
Hi Dr. Hess,

Thank you for your kind and detailed reply. I appreciate it.

I just need one more clarification. Given that I have m number of datasets, I will get m number of outputs (such as m number of regression estimates for a predictor).

I was curious whether it is possible to pool the estimate. So that for m number of datasets, I will get one output of estimates. One of the functions of the MICE package is after imputing multiple datasets, I can get one regression output table. It is mentioned they do it by Rubin's rule ("The pool() function combines the estimates from m repeated complete data analyses"). However, I can not run the ICLV model in MICE.

Is it possible to do this pool operation or a similar type of operation in the apollo platform for an ICLV model with multiple input datasets? I appreciate your time on this. Thank you!

Regards

Hossain

Re: Multiple Imputation and ICLV Model Estimation

Posted: 26 Apr 2021, 17:08
by dpalma
Hi Hossain,

Sadly, and as far as I know, the models estimated using Apollo cannot be used as inputs to the pool function you mentioned. Therefore, you will have to apply Rubin's rule manually. The method was proposed in:
  • Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons.
But is also described in section 2 of :
Sorry for not providing more detailed instructions, but I am not familiar with the method.

Cheers
David