Page 1 of 1

apollo_validateInputs() and apollo_insertScaling() handle symbols differently?

Posted: 21 Oct 2021, 19:45
by sensoryscientist
My specific current problem is: If I want to automate/streamline my code a little, making the names of model coefficients inside for loops or apply calls, apollo_validateInputs() will cause an error if the constants are specified as symbols (as output by as.name()), but apollo_insertScaling() will cause an error if the constants are specified as objects (as output by get()). Are there alternatives I'm missing? Has anyone else come up with a more elegant solution for this? Is this a bug, or am I missing something?

More broadly speaking, what I'm trying to do is use a combination of apply functions and for loops to create a code I can use to run a latent class model of an arbitrary number of classes on the same dataset, with the same utility functions but varying beta & delta constants, by only changing one variable (below, kgroups) at the top of the file to specify how many classes I want to fit the model to.

My initial instinct was to use as.name(paste0(...)) calls to create the symbols for the coefficients that apollo will fit. apollo_validateInputs() kept erroring out when it was trying to find the object delta_c, since apollo_attach() is only called inside of apollo_probabilities() and won't actually run before I create apollo_inputs, which is the output of apollo_validateInputs().

By switching to the syntax used in the FAQ, which uses get(paste0(...)), I'm able to get apollo_validateInputs() to run but then apollo_estimate() makes calls to apollo_insertScaling(), which gives me the error "Error in apollo_insertScaling(e[], sca) : Argument "e" must be a function, a call, a symbol, or a value". This does make sense, since get() will return an object, not a symbol, but trying to use as.name() or as.symbol() instead causes an error with apollo_validateInputs().

The data is resulting from a choice experiment: 500 panelists each saw 10 choice questions, which had 2 alternatives with synthetic descriptions (containing 5 of 10 possible flavor terms) and one of 5 possible prices (A and B). There was also a no preference option (C). So the data has Participant (numeric ID) and Choice (all values A, B, or C) columns, plus binary columns for flavor terms (banana_A, banana_B, banana_C, oak_A...) and 3 columns for price (Price_A, Price_B, Price_C).

Let me know if some synthetic data would be helpful.

Code: Select all

### Load Apollo library
library(apollo)

### Initialise code
apollo_initialise()

### Set core controls
apollo_control <- list(
  modelName  ="Apollo_LC_Whiskey",
  modelDescr ="Simple LC model on Whiskey",
  indivID    ="Participant",
  nCores     = 3,
  noDiagnostics = TRUE
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #

database <- read.csv("whiskey_choice_tidy_responses_ABC_binary.csv",header=TRUE)

database$Choice <- ifelse(database$Choice == "A", 1, ifelse(database$Choice == "B", 2, 3))

# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation

starting_coeffecients <- c(b_chocolate     = 0.7179,
                           b_caramel       = 0.8968,
                           b_tobacco       = 0.3757,
                           b_leather       = 0.2507,
                           b_sultanas      = 0.4376,
                           b_oak           = 0.6574,
                           b_banana        = 0.5521,
                           b_corn          = 0.4352,
                           b_grass         = 0.4763,
                           b_mint          = 0.3535,
                           b_price         = -0.0141)

kgroups <- 2

### Results from Dr. Neill's Matlab running
#overall intercept set = 0, and we should also fix one class to be the reference
apollo_beta <- c(intercept = 0)

for (k in 1:kgroups) {
  for (b in 1:length(starting_coeffecients)) {
    apollo_beta[paste0(names(starting_coeffecients[b]), "_c", k)] <- starting_coeffecients[b]
  }
  apollo_beta[paste0("delta_c", k)] <- 0 #class-specific intercept/offset
}

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("intercept","delta_c1")

# ################################################################# #
#### DEFINE LATENT CLASS COMPONENTS                              ####
# ################################################################# #

apollo_lcPars <- function(apollo_beta, apollo_inputs){
  lcpars <- sapply(setNames(nm = names(starting_coeffecients)),
                   function(t) {
                     lapply(1:kgroups, function(k) {
                       get(paste0(t, "_c", k))
                     })
                   })
  
  V <- lapply(setNames(1:kgroups, paste0("class_", 1:kgroups)), function(k) {
    get(paste0("delta_c", k))
  })
  
  mnl_settings <- list(
    alternatives = setNames(1:kgroups, paste0("class_", 1:kgroups)), 
    avail        = 1,
    choiceVar    = NA,
    V            = V
  )
  
  lcpars[["pi_values"]] = apollo_mnl(mnl_settings, functionality="raw")
  
  lcpars[["pi_values"]] = apollo_firstRow(lcpars[["pi_values"]], apollo_inputs)
  
  return(lcpars)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #

apollo_inputs <- apollo_validateInputs()

# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### Define settings for MNL model component that are generic across classes
  mnl_settings = list(
    alternatives = c(A=1, B=2, C=3),
    avail        = list(A=1, B=1, C=1),
    choiceVar    = Choice
  )
  
  ### Loop over classes
  for (k in 1:kgroups){
    
    ### Compute class-specific utilities
    V=list()
    V[['A']]  <- intercept +
      b_chocolate[[k]] * chocolate_A +
      b_tobacco[[k]] * tobacco_A +
      b_caramel[[k]] * caramel_A +
      b_sultanas[[k]] * sultanas_A +
      b_grass[[k]] * grass_A +
      b_mint[[k]] * mint_A +
      b_oak[[k]] * oak_A +
      b_leather[[k]] * leather_A +
      b_corn[[k]] * corn_A +
      b_banana[[k]] * banana_A +
      b_price[[k]] * Price_A
    V[['B']]  <- intercept +
      b_chocolate[[k]] * chocolate_B +
      b_tobacco[[k]] * tobacco_B +
      b_caramel[[k]] * caramel_B +
      b_sultanas[[k]] * sultanas_B +
      b_grass[[k]] * grass_B +
      b_mint[[k]] * mint_B +
      b_oak[[k]] * oak_B +
      b_leather[[k]] * leather_B +
      b_corn[[k]] * corn_B +
      b_banana[[k]] * banana_B +
      b_price[[k]] * Price_B
    V[['C']]  <- intercept
    
    mnl_settings$V <- V
    mnl_settings$componentName <- paste0("Class_", k)
    
    ### Compute within-class choice probabilities using MNL model
    P[[paste0("Class_",k)]] <- apollo_mnl(mnl_settings, functionality)
    
    ### Take product across observation for same individual
    P[[paste0("Class_",k)]] <- apollo_panelProd(P[[paste0("Class_",k)]], apollo_inputs, functionality)
    
  }
  
  ### Compute latent class model probabilities
  lc_settings   = list(inClassProb = P, classProb=pi_values)
  P[["model"]] = apollo_lc(lc_settings, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}

# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

### Optional starting values search
# apollo_beta=apollo_searchStart(apollo_beta, apollo_fixed,apollo_probabilities, apollo_inputs)

model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs)
Other than apollo_validateInputs(), the only output is the error and traceback:

Code: Select all

Error in apollo_insertScaling(e[[i]], sca) : 
  Argument "e" must be a function, a call, a symbol, or a value
> traceback()
7: stop("Argument \"e\" must be a function, a call, a symbol, or a value")
6: apollo_insertScaling(e[[i]], sca)
5: apollo_insertScaling(e[[i]], sca)
4: apollo_insertScaling(e[[3]], sca)
3: apollo_insertScaling(e[[i]], sca)
2: apollo_insertScaling(apollo_inputs$apollo_lcPars, apollo_inputs$apollo_scaling)
1: apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, 
       apollo_inputs)

Re: apollo_validateInputs() and apollo_insertScaling() handle symbols differently?

Posted: 05 Nov 2021, 15:17
by stephanehess
Hi

apologies for the slow reply. Some synthetic data would be helpful indeed to help us diagnose this

Stephane

Re: apollo_validateInputs() and apollo_insertScaling() handle symbols differently?

Posted: 17 Nov 2021, 19:15
by sensoryscientist
We've since realized that there are some issues in the data we were trying to run this with, so we're probably going to redesign and recollect the discrete choice data. But it would be nice to have this figured out for this and future experiments! So here's some data you should be able to use to reproduce the error.

Code: Select all

database <- data.frame(Participant = rep(1:100, each = 5*9),
                       Chocolate_A = rep(c(1,1,0,0,0), 100*9),
                       Chocolate_B = rep(c(1,0,0,0,1), 100*9),
                       Chocolate_C = rep(0, 500*9),
                       Caramel_A = rep(c(0,1,1,0,0), 100*9),
                       Caramel_B = rep(c(0,1,0,1,0), 100*9),
                       Caramel_C = rep(0, 500*9),
                       Sultanas_A = rep(c(0,0,1,1,0), 100*9),
                       Sultanas_B = rep(c(0,0,1,0,1), 100*9),
                       Sultanas_C = rep(0, 500*9),
                       Oak_A = rep(c(0,0,0,1,1), 100*9),
                       Oak_B = rep(c(1,0,0,1,0), 100*9),
                       Oak_C = rep(0, 500),
                       Tobacco_A = rep(c(1,0,1,0,0), 100*9),
                       Tobacco_B = rep(c(0,0,1,1,0), 100*9),
                       Tobacco_C = rep(0, 500*9),
                       Mint_A = rep(c(1,0,0,1,0), 100*9),
                       Mint_B = rep(c(1,1,0,0,0), 100*9),
                       Mint_C = rep(0, 500*9),
                       Corn_A = rep(c(1,0,0,0,1), 100*9),
                       Corn_B = rep(c(0,0,0,1,1), 100*9),
                       Corn_C = rep(0, 500),
                       Leather_A = rep(c(0,1,0,1,0), 100*9),
                       Leather_B = rep(c(0,1,0,0,1), 100*9),
                       Leather_C = rep(0, 500*9),
                       Banana_A = rep(c(0,1,0,0,1), 100*9),
                       Banana_B = rep(c(0,1,1,0,0), 100*9),
                       Banana_C = rep(0, 500*9),
                       Grass_A = rep(c(0,0,1,0,1), 100*9),
                       Grass_B = rep(c(1,0,1,0,0), 100*9),
                       Grass_C = rep(0, 500*9),
                       Price_A = rep(c(35,35,35,50,50,50,80,80,80), each = 5),
                       Price_B = rep(c(35,50,80,35,50,80,35,50,80), each = 5),
                       Price_C = rep(0, 500*9))

WTP_c1 <- list(Chocolate = 40,
           Caramel = 20,
           Leather = 16,
           Corn = 10,
           Sultanas = 0,
           Banana = -5,
           Grass = -25,
           Oak = 24,
           Mint = -3,
           Tobacco = -10)

WTP_c2 <- list(Chocolate = 0,
           Caramel = 0,
           Leather = 50,
           Corn = -40,
           Sultanas = 120,
           Banana = -10,
           Grass = -15,
           Oak = 30,
           Mint = -40,
           Tobacco = 0)

set.seed(500)
database$WTP_A <- ifelse(database$Participant > 300,
                         WTP_c1$Chocolate * database$Chocolate_A +
                           WTP_c1$Caramel * database$Caramel_A +
                           WTP_c1$Leather * database$Leather_A +
                           WTP_c1$Corn * database$Corn_A +
                           WTP_c1$Sultanas * database$Sultanas_A +
                           WTP_c1$Banana * database$Banana_A +
                           WTP_c1$Grass * database$Grass_A +
                           WTP_c1$Oak * database$Oak_A +
                           WTP_c1$Mint * database$Mint_A +
                           WTP_c1$Tobacco * database$Tobacco_A +
                           rnorm(nrow(database), 0, 15),
                         WTP_c2$Chocolate * database$Chocolate_A +
                           WTP_c2$Caramel * database$Caramel_A +
                           WTP_c2$Leather * database$Leather_A +
                           WTP_c2$Corn * database$Corn_A +
                           WTP_c2$Sultanas * database$Sultanas_A +
                           WTP_c2$Banana * database$Banana_A +
                           WTP_c2$Grass * database$Grass_A +
                           WTP_c2$Oak * database$Oak_A +
                           WTP_c2$Mint * database$Mint_A +
                           WTP_c2$Tobacco * database$Tobacco_A +
                           rnorm(nrow(database), 0, 10)
                         )
database$Util_A <- database$WTP_A - database$Price_A

database$WTP_B <- ifelse(database$Participant > 300,
                         WTP_c1$Chocolate * database$Chocolate_B +
                           WTP_c1$Caramel * database$Caramel_B +
                           WTP_c1$Leather * database$Leather_B +
                           WTP_c1$Corn * database$Corn_B +
                           WTP_c1$Sultanas * database$Sultanas_B +
                           WTP_c1$Banana * database$Banana_B +
                           WTP_c1$Grass * database$Grass_B +
                           WTP_c1$Oak * database$Oak_B +
                           WTP_c1$Mint * database$Mint_B +
                           WTP_c1$Tobacco * database$Tobacco_B +
                           rnorm(nrow(database), 0, 15),
                         WTP_c2$Chocolate * database$Chocolate_B +
                           WTP_c2$Caramel * database$Caramel_B +
                           WTP_c2$Leather * database$Leather_B +
                           WTP_c2$Corn * database$Corn_B +
                           WTP_c2$Sultanas * database$Sultanas_B +
                           WTP_c2$Banana * database$Banana_B +
                           WTP_c2$Grass * database$Grass_B +
                           WTP_c2$Oak * database$Oak_B +
                           WTP_c2$Mint * database$Mint_B +
                           WTP_c2$Tobacco * database$Tobacco_B +
                           rnorm(nrow(database), 0, 10)
                         )
database$Util_B <- database$WTP_B - database$Price_B

database$Choice <- ifelse(database$Util_A > database$Util_B,
                          ifelse(database$Util_A > 0, "A", "C"),
                          ifelse(database$Util_B > 0, "B", "C"))

Re: apollo_validateInputs() and apollo_insertScaling() handle symbols differently?

Posted: 30 Nov 2021, 18:33
by dpalma
Hi,

Your code contained some peculiarities that made it incompatible with Apollo. In particular, the way you were populating the list lcpars with the different coefficients for each class inside apollo_lcPars prevented Apollo from working as expected. However, we liked the idea of automatically populating lcPars, so we introduced some changes in a new (still in development) version of Apollo.

Additionally, your code contained some bugs, so below you will find an updated version of your code. The main changes include:
  • Your data generation process was missing randomness, so I rewrote it including randomness. I also changed it no preference space, so that you can more easily check if the estimation recovers the parameters.
  • The way you were populating lcpars using “sapply” was not working. I replaced it with two nested loops. In general, is preferable to work with loops inside apollo_lcPars, apollo_randCoeff, and apollo_probabilities, as Apollo can read and expand loops, but it tends to get confused with the “apply” function family.
  • If you are using variables that do not live inside the database, you should copy them into apollo_inputs (see what I did to “starting_coefficients” and “kgroups”), otherwise you won’t be able to access them from inside apollo_probabilities, apollo_lcPars nor apollo_randCoeff when using multiple cores during estimation.
You will only be able to run this script using the new version 0.2.7 of Apollo. This version is still under development, so it will change before being published officially in CRAN. You can download it from one of the links below, depending on your OS: See the updated code below.

Best
David

Code: Select all

### Initialise
rm(list = ls())
library(apollo)
apollo_initialise()

### Set core controls
apollo_control <- list(
  modelName  ="whiskey",
  modelDescr ="Simple LC model on Whiskey",
  indivID    ="id",
  nCores     = 1,
  noDiagnostics = TRUE, 
  debug         = TRUE, 
  analyticGrad  = TRUE
)

# ################################################################# #
#### LOAD DATA AND APPLY ANY TRANSFORMATIONS                     ####
# ################################################################# #
set.seed(1)
N <- 100 # number of indivs
M <- 45  # number of obs per indiv
K <- 10  # number of attributes, excluding price
id <- rep(1:N, each=M) # indivs id
XA <- cbind(matrix(runif(N*M*K)>0.5, nrow=N*M, ncol=K), pric=sample(c(35, 50, 80), size=N*M, replace=TRUE))
XB <- cbind(matrix(runif(N*M*K)>0.5, nrow=N*M, ncol=K), pric=sample(c(35, 50, 80), size=N*M, replace=TRUE))
colnames(XA) <- c("choc", "cara", "sult", "oak" , "toba", "mint", "corn", "leat", "bana", "gras", "pric")
colnames(XB) <- paste0(colnames(XA), "_B")
colnames(XA) <- paste0(colnames(XA), "_A")
b1 <- c(choc=1.5, cara=.9, sult=.9, oak=.6, toba=0, mint=-.6, corn=-.9, leat=.9, bana=0, gras=-.6, pric=-0.03)
b2 <- c(choc=0, cara=0, sult=1.8, oak=-1.2, toba=1.8, mint=-.3, corn=-.6, leat=1.2, bana=-1.2, gras=0, pric=-0.06)
c1 <- id <= floor(N/2)
U1 <- cbind(A = XA%*%b1 - log(-log(runif(N*M))), B = XB%*%b1 - log(-log(runif(N*M))), C = 0 - log(-log(runif(N*M))))
U2 <- cbind(A = XA%*%b2 - log(-log(runif(N*M))), B = XB%*%b2 - log(-log(runif(N*M))), C = 0 - log(-log(runif(N*M))))
cho <- ifelse(c1, apply(U1, MARGIN=1, which.max), apply(U2, MARGIN=1, which.max))
database <- data.frame(id, XA, XB, cho)
rm(N,M,K,id,XA,XB,b1,b2,c1,U1,U2,cho)


# ################################################################# #
#### DEFINE MODEL PARAMETERS                                     ####
# ################################################################# #

### Vector of parameters, including any that are kept fixed in estimation

starting_coefficients <- c(b_chocolate = 0.1,
                           b_caramel   = 0.1,
                           b_sultanas  = 0.1,
                           b_oak       = 0.1,
                           b_tobacco   = 0.1,
                           b_mint      = 0.1,
                           b_corn      = 0.1,
                           b_leather   = 0.1,
                           b_banana    = 0.1,
                           b_grass     = 0.1,
                           b_price     =-0.1)
kgroups <- 2

#overall intercept set = 0, and we should also fix one class to be the reference
apollo_beta <- c(intercept = 0)

for (k in 1:kgroups) {
  for (b in 1:length(starting_coefficients)) {
    apollo_beta[paste0(names(starting_coefficients[b]), "_c", k)] <- starting_coefficients[b]
  }
  apollo_beta[paste0("delta_c", k)] <- 0 #class-specific intercept/offset
}

### Vector with names (in quotes) of parameters to be kept fixed at their starting value in apollo_beta, use apollo_beta_fixed = c() if none
apollo_fixed = c("intercept", "delta_c1")

# ################################################################# #
#### DEFINE LATENT CLASS COMPONENTS                              ####
# ################################################################# #

apollo_lcPars <- function(apollo_beta, apollo_inputs){
  ### List of parameters for each class
  lcpars <- setNames(vector(mode="list", length=length(apollo_inputs$starting_coefficients)), 
                     names(apollo_inputs$starting_coefficients))
  for(i in names(lcpars)){
    lcpars[[i]] <- list()
    for(j in 1:apollo_inputs$kgroups) lcpars[[i]][[j]] <- get(paste0(i, "_c", j))
  }
  ### Class allocation probabilities
  classAlloc_settings = list(
    V = list(class_1 = delta_c1,
             class_2 = delta_c2),
    avail = list(class_1=1, class_2=1)
  )
  lcpars[["pi_values"]] = apollo_classAlloc(classAlloc_settings)
  
  return(lcpars)
}

# ################################################################# #
#### GROUP AND VALIDATE INPUTS                                   ####
# ################################################################# #
apollo_inputs <- list()
apollo_inputs$starting_coefficients <- starting_coefficients
apollo_inputs$kgroups               <- kgroups
apollo_inputs <- apollo_validateInputs(recycle=TRUE)


# ################################################################# #
#### DEFINE MODEL AND LIKELIHOOD FUNCTION                        ####
# ################################################################# #

apollo_probabilities=function(apollo_beta, apollo_inputs, functionality="estimate"){
  
  ### Attach inputs and detach after function exit
  apollo_attach(apollo_beta, apollo_inputs)
  on.exit(apollo_detach(apollo_beta, apollo_inputs))
  
  ### Create list of probabilities P
  P = list()
  
  ### Define settings for MNL model component that are generic across classes
  mnl_settings = list(
    alternatives = c(A=1, B=2, C=3),
    avail        = list(A=1, B=1, C=1),
    choiceVar    = cho
  )
  
  ### Loop over classes
  for (k in 1:apollo_inputs$kgroups){
    
    ### Compute class-specific utilities
    W=list()
    W[['A']]  <- intercept +
      b_chocolate[[k]] * choc_A +
      b_caramel[[k]]   * cara_A +
      b_sultanas[[k]]  * sult_A +
      b_oak[[k]]       * oak_A  +
      b_tobacco[[k]]   * toba_A +
      b_mint[[k]]      * mint_A +
      b_corn[[k]]      * corn_A +
      b_leather[[k]]   * leat_A +
      b_banana[[k]]    * bana_A +
      b_grass[[k]]     * gras_A +
      b_price[[k]]     * pric_A
    W[['B']]  <- intercept +
      b_chocolate[[k]] * choc_B +
      b_caramel[[k]]   * cara_B +
      b_sultanas[[k]]  * sult_B +
      b_oak[[k]]       * oak_B  +
      b_tobacco[[k]]   * toba_B +
      b_mint[[k]]      * mint_B +
      b_corn[[k]]      * corn_B +
      b_leather[[k]]   * leat_B +
      b_banana[[k]]    * bana_B +
      b_grass[[k]]     * gras_B +
      b_price[[k]]     * pric_B
    W[['C']]  <- 0
    
    mnl_settings$utilities <- W
    
    ### Compute within-class choice probabilities using MNL model
    P[[paste0("Class_",k)]] <- apollo_mnl(mnl_settings, functionality)
    
    ### Take product across observation for same individual
    P[[paste0("Class_",k)]] <- apollo_panelProd(P[[paste0("Class_",k)]], apollo_inputs, functionality)
  }
  
  ### Compute latent class model probabilities
  lc_settings  = list(inClassProb = P, classProb=pi_values)
  P[["model"]] = apollo_lc(lc_settings, apollo_inputs, functionality)
  
  ### Prepare and return outputs of function
  P = apollo_prepareProb(P, apollo_inputs, functionality)
  return(P)
}


# ################################################################# #
#### MODEL ESTIMATION                                            ####
# ################################################################# #

### Optional starting values search
estimate_settings = list(writeIter=FALSE)
model = apollo_estimate(apollo_beta, apollo_fixed, apollo_probabilities, apollo_inputs, estimate_settings)

apollo_modelOutput(model)