Calling glm with a formula when variable names are only known at run-time

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Calling glm with a formula when variable names are only known at run-time

colintbowers
Hi all,

I'm trying to call glm to perform OLS on some data in a DataFrame. However, the variable names (i.e. column-names) in the DataFrame are only known at run-time, so I'm not sure how to construct the formula input to the glm function. For example, at run-time, my function determines that it wants to regress column 1 of a DataFrame on columns 3 and 4, but the column names of the DataFrame are only known at run-time. Obviously my function could construct an ASCIIString representation of the appropriate formula, e.g. "column1Name ~ column3Name + column4Name" from the column headers in the DataFrame, but the glm function will not accept an ASCIIString as the input type for the formula argument.

I'm sure there is a simple way around this, but I couldn't work it out from the GLM docs. I can't even seem to work out what is the type of the formula input to glm. If someone could just point me to the relevant constructor for the type of the formula input, I would imagine it shouldn't be too hard to come up with a routine to convert an ASCIIString representation of the formula to the appropriate type.

Any help would be greatly appreciated.

Cheers,

Colin

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calling glm with a formula when variable names are only known at run-time

Douglas Bates
On Sunday, May 29, 2016 at 1:07:42 AM UTC-5, [hidden email] wrote:
Hi all,

I'm trying to call glm to perform OLS on some data in a DataFrame. However, the variable names (i.e. column-names) in the DataFrame are only known at run-time, so I'm not sure how to construct the formula input to the glm function. For example, at run-time, my function determines that it wants to regress column 1 of a DataFrame on columns 3 and 4, but the column names of the DataFrame are only known at run-time. Obviously my function could construct an ASCIIString representation of the appropriate formula, e.g. "column1Name ~ column3Name + column4Name" from the column headers in the DataFrame, but the glm function will not accept an ASCIIString as the input type for the formula argument.

I'm sure there is a simple way around this, but I couldn't work it out from the GLM docs. I can't even seem to work out what is the type of the formula input to glm. If someone could just point me to the relevant constructor for the type of the formula input, I would imagine it shouldn't be too hard to come up with a routine to convert an ASCIIString representation of the formula to the appropriate type.

Any help would be greatly appreciated.

Cheers,

Colin

Constructing a formula on the fly requires you to learn a bit about the structure of the formula itself.

julia> ff = foo ~ bar + baz
Formula: foo ~ bar + baz 

julia> fieldnames(ff)
2-element Array{Symbol,1}:
 :lhs
 :rhs

julia> typeof(ff.lhs)
Symbol

julia> ff.lhs
:foo

Suppose instead that you want to have "fab" on the left hand side.  You can simply reassign the .lhs member as the symbol

julia> ff.lhs = symbol("fab")
:fab

julia> ff
Formula: fab ~ bar + baz

The right-hand side is a bit more complicated in that it is an expression.

julia> ff.rhs
:(bar + baz)

julia> typeof(ff.rhs)
Expr

julia> fieldnames(ff.rhs)
3-element Array{Symbol,1}:
 :head
 :args
 :typ

julia> ff.rhs.args
3-element Array{Any,1}:
 :+
 :bar
 :baz

Now it happens that the + function can take an arbitrary number of arguments.  I can change the formula to "fab ~ 1 + baz + boz" with

julia> ff.rhs.args = Any[:+, 1, :baz, :box]
4-element Array{Any,1}:
  :+
 1
  :baz
  :box

julia> ff
Formula: fab ~ 1 + baz + box 

Does this help?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Calling glm with a formula when variable names are only known at run-time

colintbowers
That is extremely helpful, thank you! And I can see from this that I don't need to bother messing around with an ASCIIString formula - I can just create the Formula type directly myself.

Out of curiosity, is anyone working on a general routine to convert an ASCIIString formula to a Formula type? Your example makes it clear that it is fairly trivial for a linear additive formula, but I can see how it could get complicated very quickly for more general formula types...

Cheers, and thanks again.

Colin


On Monday, 30 May 2016 01:12:24 UTC+10, Douglas Bates wrote:
On Sunday, May 29, 2016 at 1:07:42 AM UTC-5, [hidden email] wrote:
Hi all,

I'm trying to call glm to perform OLS on some data in a DataFrame. However, the variable names (i.e. column-names) in the DataFrame are only known at run-time, so I'm not sure how to construct the formula input to the glm function. For example, at run-time, my function determines that it wants to regress column 1 of a DataFrame on columns 3 and 4, but the column names of the DataFrame are only known at run-time. Obviously my function could construct an ASCIIString representation of the appropriate formula, e.g. "column1Name ~ column3Name + column4Name" from the column headers in the DataFrame, but the glm function will not accept an ASCIIString as the input type for the formula argument.

I'm sure there is a simple way around this, but I couldn't work it out from the GLM docs. I can't even seem to work out what is the type of the formula input to glm. If someone could just point me to the relevant constructor for the type of the formula input, I would imagine it shouldn't be too hard to come up with a routine to convert an ASCIIString representation of the formula to the appropriate type.

Any help would be greatly appreciated.

Cheers,

Colin

Constructing a formula on the fly requires you to learn a bit about the structure of the formula itself.

julia> ff = foo ~ bar + baz
Formula: foo ~ bar + baz 

julia> fieldnames(ff)
2-element Array{Symbol,1}:
 :lhs
 :rhs

julia> typeof(ff.lhs)
Symbol

julia> ff.lhs
:foo

Suppose instead that you want to have "fab" on the left hand side.  You can simply reassign the .lhs member as the symbol

julia> ff.lhs = symbol("fab")
:fab

julia> ff
Formula: fab ~ bar + baz

The right-hand side is a bit more complicated in that it is an expression.

julia> ff.rhs
:(bar + baz)

julia> typeof(ff.rhs)
Expr

julia> fieldnames(ff.rhs)
3-element Array{Symbol,1}:
 :head
 :args
 :typ

julia> ff.rhs.args
3-element Array{Any,1}:
 :+
 :bar
 :baz

Now it happens that the + function can take an arbitrary number of arguments.  I can change the formula to "fab ~ 1 + baz + boz" with

julia> ff.rhs.args = Any[:+, 1, :baz, :box]
4-element Array{Any,1}:
  :+
 1
  :baz
  :box

julia> ff
Formula: fab ~ 1 + baz + box 

Does this help?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.