Dimension Mismatch problem when predicting from the regression with pooled variables

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Dimension Mismatch problem when predicting from the regression with pooled variables

Jessica Koh
Hi all,

I am having a problem when predicting from the regression with pooled variables. By pooled variables I mean the ones that are created from the pool() function. I pooled the variable by groups (36 total), so putting the pooled variable in the regression automatically runs regression with indicators for 36 groups (some will be dropped due to collinearity). My current code is something like below:

# Create pooled data array from group_index column 
sampledata[:group_pooled] = pool(sampledata[:group_index])

# Run regression 
IPW_treat_fml = Formula(:attr_treat, :group_pooled)
IPW_treat_reg = glm(IPW_treat_fml, sampledata, Normal(), IdentityLink())

# Predict
predict(IPW_treat_reg, sampledata)


However, the predict(IPW_treat_reg, sampledata) does not work and gives me an error saying "DimensionMismatch("second dimension of A, 36, does not match length of x, 35"). If I write predict(IPW_treat_reg), then the code works, but I need to put sampledata in the prediction function in order to see all the NA predictions as well. predict(IPW_treat_reg) drops all the NA results. 

Any help will be greatly appreciated! 





--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Dimension Mismatch problem when predicting from the regression with pooled variables

Jessica Koh
Okay so I temporarily created a solution for this.

I think predict(IPW_treat_reg, sampledata) does not work in this case, because as I said some pooled variables are dropped due to collinearity. predict(IPW_treat_reg) works, although it only shows prediction for the non-NA dependent variable values. I don't need predictions for NA dependent variable values, so I decided to do the following.  

sampledata[:predict] = 0.0   # Make sure that it is a float!

p_index = 1   # Index for the prediction values. Going to be increased in the loop.

for i in 1:length(sampledata[:attr_treat])
     if !isna(sampledata[i, :attr_treat])
        sampledata[i, :predict] = predict(IPW_treat_reg)[p_index]
        p_index = p_index + 1
      else
        sampledata[i, :predict] = NA
      end
end

Above code allows me to create a new column called "predict" in sampledata that shows NA for the NA dependent variable values and predicted values for non-NAs. 

Let me know if there is an easier way to do this!

On Tuesday, May 31, 2016 at 11:19:02 AM UTC-5, Jessica Koh wrote:
Hi all,

I am having a problem when predicting from the regression with pooled variables. By pooled variables I mean the ones that are created from the pool() function. I pooled the variable by groups (36 total), so putting the pooled variable in the regression automatically runs regression with indicators for 36 groups (some will be dropped due to collinearity). My current code is something like below:

# Create pooled data array from group_index column 
sampledata[:group_pooled] = pool(sampledata[:group_index])

# Run regression 
IPW_treat_fml = Formula(:attr_treat, :group_pooled)
IPW_treat_reg = glm(IPW_treat_fml, sampledata, Normal(), IdentityLink())

# Predict
predict(IPW_treat_reg, sampledata)


However, the predict(IPW_treat_reg, sampledata) does not work and gives me an error saying "DimensionMismatch("second dimension of A, 36, does not match length of x, 35"). If I write predict(IPW_treat_reg), then the code works, but I need to put sampledata in the prediction function in order to see all the NA predictions as well. predict(IPW_treat_reg) drops all the NA results. 

Any help will be greatly appreciated! 





--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.