Fitting mixed-effects models using the MixedModels package for Julia

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fitting mixed-effects models using the MixedModels package for Julia

Douglas Bates
I sent this message to the [hidden email] mailing list today inviting follow-ups to take place on this group.

As many readers of this list are aware, most of my development effort for the last few years has been in the Julia language, in particular the MixedModels package for Julia.  There are several aspects of the Julia language that allow for writing faster code than in R, especially for iterative fitting of models to large data sets.  The downside for the user of switching to another language is, well, switching to another language.

Some users have taken the plunge and used Julia because the model fits in R using lme4 were taking a long time, as in many hours.  They have seen one to two orders of magnitude differences in speed which, when things are taking that long, is worth the pain of switching.  If the fit in R is only taking a few seconds then it is not worthwhile learning a new language just to make that faster.

I think that performing an lmer-like fit in Julia is now sufficiently straightforward that it will be worthwhile for others to try doing so.  We have developed a Julia package called RCall which allows a Julia user to run an R process from within Julia.  In particular, RCall makes it easy to create a copy of an R data.frame as a Julia DataFrame object and use that to fit a linear mixed model.

The steps are reasonably straightforward.  I will illustrate with data on ratings of  about 3700 movies by about 6000 users.  Of course, not every user rates every movie.  There are about 1,000,000 ratings in the data set.

These data are available as the MovieLens 1M Dataset at http://grouplens.org/datasets/movielens/  and as the ml1m data set in the Timings  R package available at https://github.com/Stat990-033/Timings. That package is not on CRAN because the data sets are too large.  You must install it in R with

install.packages("devtools")
devtools::Install_github("Stat990-033/Timings")

Downloads of Julia itself are at http://julialang.org/downloads/  After installing Julia start it and add the packages

Pkg.add("RCall")
Pkg.add("MixedModels")

The actual model fit is performed as

julia> using DataFrames, MixedModels, RCall

julia> ml1m = rcopy("Timings::ml1m");

julia> @time m1 = fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
 24.956579 seconds (39.95 M allocations: 1.398 GB, 1.06% gc time)
Linear mixed model fit by maximum likelihood
 logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622, BIC: 2664027.274500

Variance components:
           Variance   Std.Dev.  
 G        0.12985210 0.36034996
 H        0.36879694 0.60728654
 Residual 0.81390867 0.90216887
 Number of obs: 1000209; levels of grouping factors: 6040, 3706

  Fixed-effects parameters:
             Estimate Std.Error z value
(Intercept)   3.33902 0.0114624 291.302


The "using" directive is similar to the "library" or "require" functions in R. The named Julia packages are, in R terminology, loaded and attached.

The "Timings::ml1m" expression is an R expression.  It accesses the ml1m object in the Timings package, loading the package first, if necessary.  The call to the Julia function lmm is similar to lmer but only creates the model.  The call to fit! is what causes the model to be fit.

As you can see, this fit takes about 25 seconds.  A similar fit using lmer takes about 40 minutes on the same machine.

I would be happy to answer questions about the MixedModels package but I don't think this forum would be appropriate.  It is a forum for questions about fitting mixed-effects models with R.  For the time being I would suggest asking questions on the Google group called julia-stats
https://groups.google.com/forum/#!forum/julia-stats to which I am sending a copy of this message.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Fitting mixed-effects models using the MixedModels package for Julia

Douglas Bates
I quoted an incorrect R time.  When I reran the model using the lme4 package for R it converged in 162 seconds elapsed on this machine.  The gain in speed is a factor of 6 not "one or two orders of magnitude"

On Tuesday, February 23, 2016 at 11:45:41 AM UTC-6, Douglas Bates wrote:
I sent this message to the R-SIG-Mixed-Models@R-project.org mailing list today inviting follow-ups to take place on this group.

As many readers of this list are aware, most of my development effort for the last few years has been in the Julia language, in particular the MixedModels package for Julia.  There are several aspects of the Julia language that allow for writing faster code than in R, especially for iterative fitting of models to large data sets.  The downside for the user of switching to another language is, well, switching to another language.

Some users have taken the plunge and used Julia because the model fits in R using lme4 were taking a long time, as in many hours.  They have seen one to two orders of magnitude differences in speed which, when things are taking that long, is worth the pain of switching.  If the fit in R is only taking a few seconds then it is not worthwhile learning a new language just to make that faster.

I think that performing an lmer-like fit in Julia is now sufficiently straightforward that it will be worthwhile for others to try doing so.  We have developed a Julia package called RCall which allows a Julia user to run an R process from within Julia.  In particular, RCall makes it easy to create a copy of an R data.frame as a Julia DataFrame object and use that to fit a linear mixed model.

The steps are reasonably straightforward.  I will illustrate with data on ratings of  about 3700 movies by about 6000 users.  Of course, not every user rates every movie.  There are about 1,000,000 ratings in the data set.

These data are available as the MovieLens 1M Dataset at <a href="http://grouplens.org/datasets/movielens/" style="color:rgb(126,87,194)" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fgrouplens.org%2Fdatasets%2Fmovielens%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNGpkNfWzNin5rIgWjPZzYVM0J7WUQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fgrouplens.org%2Fdatasets%2Fmovielens%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNGpkNfWzNin5rIgWjPZzYVM0J7WUQ&#39;;return true;">http://grouplens.org/datasets/movielens/  and as the ml1m data set in the Timings  R package available at <a href="https://github.com/Stat990-033/Timings" style="color:rgb(126,87,194)" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FStat990-033%2FTimings\46sa\75D\46sntz\0751\46usg\75AFQjCNEh-5PJ-2KGCMPAa9IYW83h_SRchw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FStat990-033%2FTimings\46sa\75D\46sntz\0751\46usg\75AFQjCNEh-5PJ-2KGCMPAa9IYW83h_SRchw&#39;;return true;">https://github.com/Stat990-033/Timings. That package is not on CRAN because the data sets are too large.  You must install it in R with

install.packages("devtools")
devtools::Install_github("Stat990-033/Timings")

Downloads of Julia itself are at <a href="http://julialang.org/downloads/" style="color:rgb(126,87,194)" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fjulialang.org%2Fdownloads%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNEtjraduQnGnON9KJCGfvyrI523NQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fjulialang.org%2Fdownloads%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNEtjraduQnGnON9KJCGfvyrI523NQ&#39;;return true;">http://julialang.org/downloads/  After installing Julia start it and add the packages

Pkg.add("RCall")
Pkg.add("MixedModels")

The actual model fit is performed as

julia> using DataFrames, MixedModels, RCall

julia> ml1m = rcopy("Timings::ml1m");

julia> @time m1 = fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
 24.956579 seconds (39.95 M allocations: 1.398 GB, 1.06% gc time)
Linear mixed model fit by maximum likelihood
 logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622, BIC: 2664027.274500

Variance components:
           Variance   Std.Dev.  
 G        0.12985210 0.36034996
 H        0.36879694 0.60728654
 Residual 0.81390867 0.90216887
 Number of obs: 1000209; levels of grouping factors: 6040, 3706

  Fixed-effects parameters:
             Estimate Std.Error z value
(Intercept)   3.33902 0.0114624 291.302


The "using" directive is similar to the "library" or "require" functions in R. The named Julia packages are, in R terminology, loaded and attached.

The "Timings::ml1m" expression is an R expression.  It accesses the ml1m object in the Timings package, loading the package first, if necessary.  The call to the Julia function lmm is similar to lmer but only creates the model.  The call to fit! is what causes the model to be fit.

As you can see, this fit takes about 25 seconds.  A similar fit using lmer takes about 40 minutes on the same machine.

I would be happy to answer questions about the MixedModels package but I don't think this forum would be appropriate.  It is a forum for questions about fitting mixed-effects models with R.  For the time being I would suggest asking questions on the Google group called julia-stats
<a href="https://groups.google.com/forum/#!forum/julia-stats" style="color:rgb(126,87,194)" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/forum/#!forum/julia-stats&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/forum/#!forum/julia-stats&#39;;return true;">https://groups.google.com/forum/#!forum/julia-stats to which I am sending a copy of this message.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Fitting mixed-effects models using the MixedModels package for Julia

Michael Borregaard
This is great! Maybe worth just noting that the R command is devtools::install_github("Stat990-033/Timings") #'i' in install should not be in caps


On Tue, Feb 23, 2016 at 7:00 PM, Douglas Bates <[hidden email]> wrote:
I quoted an incorrect R time.  When I reran the model using the lme4 package for R it converged in 162 seconds elapsed on this machine.  The gain in speed is a factor of 6 not "one or two orders of magnitude"


On Tuesday, February 23, 2016 at 11:45:41 AM UTC-6, Douglas Bates wrote:
I sent this message to the [hidden email] mailing list today inviting follow-ups to take place on this group.

As many readers of this list are aware, most of my development effort for the last few years has been in the Julia language, in particular the MixedModels package for Julia.  There are several aspects of the Julia language that allow for writing faster code than in R, especially for iterative fitting of models to large data sets.  The downside for the user of switching to another language is, well, switching to another language.

Some users have taken the plunge and used Julia because the model fits in R using lme4 were taking a long time, as in many hours.  They have seen one to two orders of magnitude differences in speed which, when things are taking that long, is worth the pain of switching.  If the fit in R is only taking a few seconds then it is not worthwhile learning a new language just to make that faster.

I think that performing an lmer-like fit in Julia is now sufficiently straightforward that it will be worthwhile for others to try doing so.  We have developed a Julia package called RCall which allows a Julia user to run an R process from within Julia.  In particular, RCall makes it easy to create a copy of an R data.frame as a Julia DataFrame object and use that to fit a linear mixed model.

The steps are reasonably straightforward.  I will illustrate with data on ratings of  about 3700 movies by about 6000 users.  Of course, not every user rates every movie.  There are about 1,000,000 ratings in the data set.

These data are available as the MovieLens 1M Dataset at http://grouplens.org/datasets/movielens/  and as the ml1m data set in the Timings  R package available at https://github.com/Stat990-033/Timings. That package is not on CRAN because the data sets are too large.  You must install it in R with

install.packages("devtools")
devtools::Install_github("Stat990-033/Timings")

Downloads of Julia itself are at http://julialang.org/downloads/  After installing Julia start it and add the packages

Pkg.add("RCall")
Pkg.add("MixedModels")

The actual model fit is performed as

julia> using DataFrames, MixedModels, RCall

julia> ml1m = rcopy("Timings::ml1m");

julia> @time m1 = fit!(lmm(Y ~ 1 + (1|G) + (1|H), ml1m))
 24.956579 seconds (39.95 M allocations: 1.398 GB, 1.06% gc time)
Linear mixed model fit by maximum likelihood
 logLik: -1331986.005811, deviance: 2663972.011622, AIC: 2663980.011622, BIC: 2664027.274500

Variance components:
           Variance   Std.Dev.  
 G        0.12985210 0.36034996
 H        0.36879694 0.60728654
 Residual 0.81390867 0.90216887
 Number of obs: 1000209; levels of grouping factors: 6040, 3706

  Fixed-effects parameters:
             Estimate Std.Error z value
(Intercept)   3.33902 0.0114624 291.302


The "using" directive is similar to the "library" or "require" functions in R. The named Julia packages are, in R terminology, loaded and attached.

The "Timings::ml1m" expression is an R expression.  It accesses the ml1m object in the Timings package, loading the package first, if necessary.  The call to the Julia function lmm is similar to lmer but only creates the model.  The call to fit! is what causes the model to be fit.

As you can see, this fit takes about 25 seconds.  A similar fit using lmer takes about 40 minutes on the same machine.

I would be happy to answer questions about the MixedModels package but I don't think this forum would be appropriate.  It is a forum for questions about fitting mixed-effects models with R.  For the time being I would suggest asking questions on the Google group called julia-stats
https://groups.google.com/forum/#!forum/julia-stats to which I am sending a copy of this message.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.