Hello,
-- I'm just starting with Julia today and I've coded a simple algorithm to demean columns of a data.frame with respect to multiple high dimensional fixed effects (where groups are potentially defined by multiple columns). The algorithm simply returns a new dataframe with the partialled out columns. One may use these partialled out variables in a simple OLS. This corresponds to a very basic version of the R package felm. I've tried to minimize copies (using subdataframes) and to return residuals aligned with the original data.frame (in case of NA). Since this is my first experience with Julia, I'd welcome any kind of feedback. A couple of beginner questions: - Is `copy` the best way to keep the previous result in [this iteration loop](https://github.com/matthieugomez/FixedEffects.jl/blob/master/src/fixedeffects.jl#L60)? - What's the best way to add a subset argument to my function ? I'd like this argument to allow the user to (estimate the model and return the residuals) on a subset of the dataframe only. You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Just an FYI. When you export the demean-function, it enters the namespace when you write using. This means that you can just write demean(...); no need to write Package.Function(args...).
-- I take it you are an econometrics-student of some sort, feel free to hit me up if you have any future projects you need help with. Stuff like this is nice to have, if we want julia to enter exercise classes at universities. Good to have you on board! Patrick On Wednesday, June 10, 2015 at 12:35:53 AM UTC+2, Matthieu wrote:
You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by Matthieu
Thanks.
-- The current version of the package now estimates models with instrumental variables (2SLS), high dimensional fixed effects, and white / clustered standard errors. This allows to estimate a large part of models used in applied economics research. Moreover, this function seems faster than Stata and R corresponding functions (respectively areg / lfe), in particular for models with one high dimensional fixed effect. Two more points make this function differ from the lm function in GLM: 1. The regression result object is very light (basically the initial formula, a vector of coefficients, and a covariance matrix). In contrast, since the output of GLM contains the original dataframe, the converted matrix of regressors, the model response etc, the output from GLM can actually take much more space than the initial DataFrame. I have chosen to return a light object because it allows to estimate multiple models without requiring more RAM at every step. Methods such as predict and residual can be defined as long as the user provides a DataFrame 2. The function has an argument that allows to change the way errors are computed. In R, correct errors are generally estimated in a second step, through a different package like vcov, multiwayvcov. This strikes me as inefficient and counterintuitive. I've defined an abstract type AbstractVcov. Any user can define a new type (child of this abstract type), as long as he/she defines a method, vcov, that acts on a regressor matrix (X), a hat matrix (X'X in the simple case), and a vector of residuals. This seems enough to define a wide range of standard errors. I've only defined 3 types (simple, white, clustered). For instance, to estimate a model with white robust standard errors reg(formula, df, VceWhite()) To estimate a model with clustered standard errors reg(formula, df, VceCluster(:clustervar)) You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Le mercredi 24 juin 2015 à 09:25 -0700, Matthieu a écrit :
> Thanks. > > The current version of the package now estimates models with > instrumental variables (2SLS), high dimensional fixed effects, and > white / clustered standard errors. This allows to estimate a large > part of models used in applied economics research. Moreover, this > function seems faster than Stata and R corresponding functions > (respectively areg / lfe), in particular for models with one high > dimensional fixed effect. I'm not very familiar with these models, but that looks really nice. Have you considered using the fit() function with a model type to be more similar to GLM.jl? > Two more points make this function differ from the lm function in > GLM: > > 1. The regression result object is very light (basically the initial > formula, a vector of coefficients, and a covariance matrix). In > contrast, since the output of GLM contains the original dataframe, > the converted matrix of regressors, the model response etc, the > output from GLM can actually take much more space than the initial > DataFrame. > I have chosen to return a light object because it allows to estimate > multiple models without requiring more RAM at every step. Methods > such as predict and residual can be defined as long as the user > provides a DataFrame wouldn't make any sense to try saving all of the data with the model. We could imagine adding an argument to keep a copy of the data, if it turns out that's needed. I think the only case where having the data in the model object is when calling predict(). Maybe it would be possible to save just the name of the data frame, and use it if it's in scope? > 2. The function has an argument that allows to change the way errors > are computed. In R, correct errors are generally estimated in a > second step, through a different package like vcov, multiwayvcov. > This strikes me as inefficient and counterintuitive. > > I've defined an abstract type AbstractVcov. Any user can define a new > type (child of this abstract type), as long as he/she defines a > method, vcov, that acts on a regressor matrix (X), a hat matrix (X'X > in the simple case), and a vector of residuals. This seems enough to > define a wide range of standard errors. > > I've only defined 3 types (simple, white, clustered). > For instance, to estimate a model with white robust standard errors > reg(formula, df, VceWhite()) > > To estimate a model with clustered standard errors > reg(formula, df, VceCluster(:clustervar)) https://github.com/JuliaStats/GLM.jl/issues/42 Do you have any ideas about how to handle bootstrap in the same framework? Regards -- You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Thanks!. I'm glad you also think standard errors should be an argument in the fit option! I have considered using the fit function, but I don't really understand what the first argument is supposed to be : the syntax is very different between, say, GLM, MixedModels, and NLreg (https://github.com/JuliaStats/StatsBase.jl/issues/116). On Wed, Jun 24, 2015 at 1:19 PM, Milan Bouchet-Valat <[hidden email]> wrote: Le mercredi 24 juin 2015 à 09:25 -0700, Matthieu a écrit : You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
In reply to this post by Matthieu
I'll have a look at the updates later. Would you be against having the std. errors as a keyword instead, with some default (sandwich, or whatever)? The "stardard" way (or at least how a lot of people seem to be doing it) is to have
-- reg(formula, df; se = :sandwich) so you would run reg( y ~ x + z, df) for default, and reg(y ~ x + z; se = :my_custom_se) for some other standard error-method. You would have to do the clustering a bit different, but I think you get the idea. You can see Optim.jl or QuantileRegression.jl to see what I mean (they have "method" keywords). On Wednesday, June 24, 2015 at 6:25:42 PM UTC+2, Matthieu wrote:
You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Ah, now I see what you did (looked in the repo), the VceWhite() is a constructor, I thought it was the vcov-function for White standard errors :)
-- On Friday, June 26, 2015 at 11:55:59 AM UTC+2, Patrick Kofod Mogensen wrote:
You received this message because you are subscribed to the Google Groups "julia-stats" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. For more options, visit https://groups.google.com/d/optout. |
Free forum by Nabble | Edit this page |