**Question About Selection of Correlated Predictor Variables and Model Selection:****How much correlation among independent variables is too much in a GLMM? If I have correlation in the variables does it affect the interpretation or model selection?**

**Answer from a Statistician Friend**:**0.8 and above is high and often one variable can be replaced by the other, and**

both are not necessary in the model.

Below 0.7 typically both variables are needed for a good model fit.

I usually use stepAIC (from the MASS package in R) for model selection.

The difficulty comes in interpreting the regression coefficients: with correlation in the predictor variables, the variable that appears first

on the model statement usually gets the larger absolute value, whereas

the other variable has a smaller (in absolute value) coefficient.

Remember the interpretation of regression coefficients: the change

in the response per unit increase GIVEN ALL THE OTHER VARIABLEs IN THE

MODEL.

If you want coefficients that represent "additive" contributions to the

variation in the response (regardless of the order in which predictors

appear in the model statement), and if you have considerable multicollinearity

you might want to consider doing a principal component regression with all

or perhaps with only a subgroup of similar predictor variables.

As with most issues in statistics, there is not a clear-cut hard-fact simple

answer. Live would be simpler if there was....

**Question of GLMM Bayesian Approach:**Hey Dan - I'm using GLMM b/c I have a repeated-measures design, count data response (negative binomial distribution), etc. I'm finding admb in R is doing the job - and I read the article you mentioned a few months back, when I started considering GLMMs...

I have never worked with Bayesian stats and wouldn't even know where to begin. Do you have any recommendations for overview reading, and can I analyze a repeated-measures design (i.e., is there a way to cope with random factors)?

**My Response**:My data sounds very similar to yours. I usually use lmer in the lme4 package. Right now I am just essentially copying the code in Bolker et al 2009 from the online supplements in the TREE paper previously mentioned. I have never see the admb package and will have to check it out. I've tried glmmPQL and glmmML but there are more examples in lmer and it's Splus predecessor. I am annoyed that in Zuur et al. "Mixed Effects Models and Extensions in Ecology with R" they don't spend much time on model assumptions or model comparison. I feel like they show users how to do the analysis but not how to evaluate it. Pinheiro and Bates do a better job in "Mixed-Effects Models in S and S-Plus" but they focus on linear mixed models and non-linear mixed models and less on GLMM. Plus the code is similar to but differs enough from R that it can be challenging to use at times. The "SAS for Mixed Models" book is good but SAS isn't free and the code isn't as transparent to me. Plus it doesn't have good graphics so I prefer R.

Anyway, Bayesian stats have their own can of worms but I find it more intuitively appealing and I like the transparency in the code using WinBUGS (no Mac version) called from R. There are two very good, practical books to get started. McCarthy presents a good overview and introduction to bayesian stats in "Bayesian Methods for Ecology" but the examples don't get very advanced. Personally I recommend getting that from the library and reading the first few chapters. I would then buy Marc Kery's excellent book, "Introduction to WinBUGS for Ecologists." It is very well written and has a wider range of examples that typically relate to many animal ecology studies. Clark and Gelfand have a decent modeling book with Bayesian analysis in R examples but it's more ecosystem/environmentally oriented than animal ecology.

Bayesian analysis treats all factors sort of like random variables from population distributions. Therefore there is not need for explicit random vs. fixed delineation. You get estimates and credibility intervals for all variables. You can essentially write the same GLMM model and then analyze it in a Bayesian framework. The big difference in the philosophy behind frequentist vs Bayesian statistics. Bayesians use prior information (even noninformative priors contain information on the underlying distributions). Some scientists are opposed to this but for various reasons that I won't go into now, I like it. Some people do want a sensitivity analysis to go along with Bayesian analysis to determine the influence of the priors. I might go as far as to say that in the case of GLMM type data Bayesian statistics are more sound (robust?) than frequentist methods but they differ significantly from a philosophical standpoint.

Anyway, I hope that helps.

I'm looking at GUDs (giving up density) with habitat type, repeated measures, and overdispersion. I was handed the data and asked to have the analysis done at the end of the day and your post actually gives me some good insight. I would go Bayes for every analysis if I could articulate the code. Thanks for your post.

ReplyDeleteHi Jeff. Thanks, I'm glad the post could be helpful to some extent. Now that I've been doing GLMM and Hierarchical Modeling for a few years, I should write a more informative post. I have been using a Bayesian Framework for a lot of analyses recently, mostly coded in JAGS. Pros and cons but at least the code is intuitive.

ReplyDeleteThis is an old post and I have now moved my blog to: https://danieljhocking.wordpress.com/quantitative-ecology-blog/

Best of luck with your analyses