Is there a way to use values in a DataFrame directly in computation?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Is there a way to use values in a DataFrame directly in computation?

Min-Woong Sohn
I am using DataFrames from master branch (with NullableArrays as the default) and was wondering how the following should be done:

df = DataFrame()
df[:A] = NullableArray([1,2,3])

The following are not allowed or return wrong values:

df[1,:A] == 1   # false
df[1,:A] > 1     # MethodError: no method matching isless(::Int64, ::Nullable{Int64})
df[3,:A] + 1     # MethodError: no method matching +(::Nullable{Int64}, ::Int64)

How should I get around these issues? Does anybody know if there is a plan to support these kinds of computations directly?

 



Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to use values in a DataFrame directly in computation?

Milan Bouchet-Valat
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :

> I am using DataFrames from master branch (with NullableArrays as the default) and was wondering how the following should be done:
>
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
>
> The following are not allowed or return wrong values:
>
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64, ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching +(::Nullable{Int64}, ::Int64)
>
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
https://github.com/JuliaStats/NullableArrays.jl/pull/85
https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to use values in a DataFrame directly in computation?

Alex Mellnik
In reply to this post by Min-Woong Sohn
This is why, IMHO, Nullables are a mess at the moment.  You have to either get(df[1,:A]) or otherwise extract the actual value, since very few things handle Nullables out of the box.  

On Monday, October 3, 2016 at 8:21:39 AM UTC-7, Min-Woong Sohn wrote:
I am using DataFrames from master branch (with NullableArrays as the default) and was wondering how the following should be done:

df = DataFrame()
df[:A] = NullableArray([1,2,3])

The following are not allowed or return wrong values:

df[1,:A] == 1   # false
df[1,:A] > 1     # MethodError: no method matching isless(::Int64, ::Nullable{Int64})
df[3,:A] + 1     # MethodError: no method matching +(::Nullable{Int64}, ::Int64)

How should I get around these issues? Does anybody know if there is a plan to support these kinds of computations directly?

 



Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to use values in a DataFrame directly in computation?

Milan Bouchet-Valat
In reply to this post by Min-Woong Sohn
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :

>
> I am using DataFrames from master branch (with NullableArrays as the
> default) and was wondering how the following should be done:
>
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
>
> The following are not allowed or return wrong values:
>
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64,
> ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching
> +(::Nullable{Int64}, ::Int64)
>
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
https://github.com/JuliaStats/NullableArrays.jl/pull/85
https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to use values in a DataFrame directly in computation?

Min-Woong Sohn
Thank you. I fear that Nullables will make the DataFrame very difficult to use and turn many people away from Julia. 



On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote:
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :

>
> I am using DataFrames from master branch (with NullableArrays as the
> default) and was wondering how the following should be done:
>
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
>
> The following are not allowed or return wrong values:
>
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64,
> ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching
> +(::Nullable{Int64}, ::Int64)
>
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
<a href="https://github.com/JuliaStats/NullableArrays.jl/pull/85" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaStats%2FNullableArrays.jl%2Fpull%2F85\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGrOGEWAk-6Fm3aOylzen_CGSqrjw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaStats%2FNullableArrays.jl%2Fpull%2F85\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGrOGEWAk-6Fm3aOylzen_CGSqrjw&#39;;return true;">https://github.com/JuliaStats/NullableArrays.jl/pull/85
<a href="https://github.com/JuliaLang/julia/pull/16988" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F16988\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhF6EFbtA9t7CUr3zakrTal4A65w&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F16988\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhF6EFbtA9t7CUr3zakrTal4A65w&#39;;return true;">https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to use values in a DataFrame directly in computation?

John Myles White
I think the core problem is that the current API + Nullable's is very cumbersome, but the switch to Nullable's will hopefully occur nearly simultaneously with the introduction of new API's that can make Nullable's much easier to deal with. David Gold spent the summer working on one approach that is, I think, much better than the current API; David Anthoff also has another approach that is substantially more powerful than the current API. The time between 0.5 and 0.6 may be a little chaotic in this regard, but I think the eventual results will be unequivocally worth the wait.

 -- John

On Monday, October 3, 2016 at 3:45:42 PM UTC-7, Min-Woong Sohn wrote:
Thank you. I fear that Nullables will make the DataFrame very difficult to use and turn many people away from Julia. 



On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote:
Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :

>
> I am using DataFrames from master branch (with NullableArrays as the
> default) and was wondering how the following should be done:
>
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
>
> The following are not allowed or return wrong values:
>
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64,
> ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching
> +(::Nullable{Int64}, ::Int64)
>
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
<a href="https://github.com/JuliaStats/NullableArrays.jl/pull/85" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaStats%2FNullableArrays.jl%2Fpull%2F85\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGrOGEWAk-6Fm3aOylzen_CGSqrjw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaStats%2FNullableArrays.jl%2Fpull%2F85\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGrOGEWAk-6Fm3aOylzen_CGSqrjw&#39;;return true;">https://github.com/JuliaStats/NullableArrays.jl/pull/85
<a href="https://github.com/JuliaLang/julia/pull/16988" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F16988\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhF6EFbtA9t7CUr3zakrTal4A65w&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F16988\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGhF6EFbtA9t7CUr3zakrTal4A65w&#39;;return true;">https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards
Reply | Threaded
Open this post in threaded view
|

Re: Is there a way to use values in a DataFrame directly in computation?

Michael Borregaard

This is good news, and I am holding my breath for this to be succesful! As someone from a data-rich science (Ecology), a really good way of interacting directly with data is the make-or-break for whether I will be able to persuade my colleagues to make the shift to julia.
Reply | Threaded
Open this post in threaded view
|

RE: Is there a way to use values in a DataFrame directly in computation?

David Anthoff
In reply to this post by John Myles White

Query.jl does not aim to make working with Nullables easier. The package provides querying capabilities, and is specifically designed to simply pick up whatever support for Nullables there is in julia base. Right now, as a temporary measure, Query.jl defines lots of methods for functions like arithmetic operators (``+`` etc.) for Nullables. Without those definitions the package would be close to unusable (in the same way that DataFrames right now is close to unusable). But I really hope to move these methods out of Query.jl, they don’t belong in that package, instead those methods should be in base (the approach here is what David Gold called the “method extension lifting approach”).

 

I feel strongly that “pushing” the problem of how to deal with Nullables into querying packages is not the right strategy. Instead I would much prefer to see better support for Nullables generally, and then packages like Query.jl can pick that support up. There are too many situations where using a query package is overkill, but where you will still encounter Nullables (especially now that DataFrames is based on NullableArrays). If we ask folks to use query packages in all of these cases we will have created a conceptually clean but completely impractical system, IMHO. For example, I think the examples from the original email simply need to work before the new DataFrames is tagged. Those kinds of operations are sooo common, I would find it completely impractical to ask folks to use something like Query.jl in such a situation.

 

I think the path to this is pretty simple: all we have to do is add methods for the common arithmetic operators that work on Nullable types. That is the approach that C# took, and it works really well. Maybe add some methods for Strings. I think once that is covered, most of the common use cases are dealt with and the system would work well in practice. Those new methods could be added to the julia master branch now, and then be backported to julia 0.5. Once that is done DataFrames could be merged.

 

Best,

David

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of John Myles White
Sent: Monday, October 3, 2016 5:05 PM
To: julia-users <[hidden email]>
Subject: Re: [julia-users] Is there a way to use values in a DataFrame directly in computation?

 

I think the core problem is that the current API + Nullable's is very cumbersome, but the switch to Nullable's will hopefully occur nearly simultaneously with the introduction of new API's that can make Nullable's much easier to deal with. David Gold spent the summer working on one approach that is, I think, much better than the current API; David Anthoff also has another approach that is substantially more powerful than the current API. The time between 0.5 and 0.6 may be a little chaotic in this regard, but I think the eventual results will be unequivocally worth the wait.


 -- John


On Monday, October 3, 2016 at 3:45:42 PM UTC-7, Min-Woong Sohn wrote:

Thank you. I fear that Nullables will make the DataFrame very difficult to use and turn many people away from Julia. 

 



On Monday, October 3, 2016 at 12:20:32 PM UTC-4, Milan Bouchet-Valat wrote:

Le lundi 03 octobre 2016 à 08:21 -0700, Min-Woong Sohn a écrit :


>
> I am using DataFrames from master branch (with NullableArrays as the
> default) and was wondering how the following should be done:
>
> df = DataFrame()
> df[:A] = NullableArray([1,2,3])
>
> The following are not allowed or return wrong values:
>
> df[1,:A] == 1   # false
> df[1,:A] > 1     # MethodError: no method matching isless(::Int64,
> ::Nullable{Int64})
> df[3,:A] + 1     # MethodError: no method matching
> +(::Nullable{Int64}, ::Int64)
>
> How should I get around these issues? Does anybody know if there is a
> plan to support these kinds of computations directly?
These operations currently work (after loading NullableArrays) if you
rewrite 1 as Nullable(1), eg. df[1, :A] == Nullable(1). But the two
first return a Nullable{Bool}, so you need to call get() on the result
if you want to use them e.g. with an if. As an alternative, you can use
isequal().

There are discussions as regards whether mixing Nullable and scalars
should be allowed, as well as whether these operations should be moved
into Julia Base. See in particular
https://github.com/JuliaStats/NullableArrays.jl/pull/85
https://github.com/JuliaLang/julia/pull/16988

Anyway, the best approach to work with data frames is probably to use
frameworks like AbstractQuery.jl and Query.jl, which are not yet
completely ready to handle Nullable, but should make this easier.


Regards