reshaping dataframe to make rows columns?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

reshaping dataframe to make rows columns?

Reuben
Seems like this should be easy, probably is, but I cannot figure out how to get it to work with the melt / stack functions. The reason I want to is, I'm finding that with the data I have, I search for a the row where the string in column 1 == myparameter, then I get the values for columns 2-6, clumsily extract them into an array (because a dataframe row is not a dataArray), then apply the mean function to this array. It would be a lot simpler to have column names that were based on my parameter values. Then i could say "mean(df[:parameter][2:6])" and be done.

I suspect I am missing out on how to use dataframes to make this easy; can someone point me in the right direction?

-Reuben

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: reshaping dataframe to make rows columns?

Andreas Noack
Would the by function help here? E.g. something like

julia> df = DataFrame(ct = vcat(fill("A", 3), fill("B", 5)), x = randn(8))
8×2 DataFrames.DataFrame
│ Row │ ct  │ x         │
├─────┼─────┼───────────┤
│ 1   │ "A" │ -1.17715  │
│ 2   │ "A" │ 0.781145  │
│ 3   │ "A" │ 0.74948   │
│ 4   │ "B" │ -1.88212  │
│ 5   │ "B" │ 1.30658   │
│ 6   │ "B" │ -0.578074 │
│ 7   │ "B" │ -0.710504 │
│ 8   │ "B" │ -1.98858  │

julia> by(df, :ct, d -> mean(d[:x]))
2×2 DataFrames.DataFrame
│ Row │ ct  │ x1        │
├─────┼─────┼───────────┤
│ 1   │ "A" │ 0.117825  │
│ 2   │ "B" │ -0.770539 │

On Sun, Sep 4, 2016 at 3:49 PM, Reuben <[hidden email]> wrote:
Seems like this should be easy, probably is, but I cannot figure out how to get it to work with the melt / stack functions. The reason I want to is, I'm finding that with the data I have, I search for a the row where the string in column 1 == myparameter, then I get the values for columns 2-6, clumsily extract them into an array (because a dataframe row is not a dataArray), then apply the mean function to this array. It would be a lot simpler to have column names that were based on my parameter values. Then i could say "mean(df[:parameter][2:6])" and be done.

I suspect I am missing out on how to use dataframes to make this easy; can someone point me in the right direction?

-Reuben

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: reshaping dataframe to make rows columns?

Reuben
In reply to this post by Reuben
That isn't exactly what I'm looking for...what i'm trying to do is transform something like the first one here to the second:

│ Row │ ct  │ x         │
├─────┼─────┼───────────┤
│ 1   │ "A" │ -1.17715  │
│ 2   │ "B" │ 0.781145  │
│ 3   │ "C" │ 0.74948   │
│ 4   │ "D" │ -1.88212  │
│ 5   │ "E" │ 1.30658   │

│ Row │ ct  │ "A"   │"B"             | "C"         | "D"        | "E"
├───┼─────┼───────|──-------┤------------|--------------
│ 1   │ -1.17715     │   0.781145 | 0.74948|-1.88212 | 1.30658 


On Sunday, September 4, 2016 at 2:49:35 PM UTC-5, Reuben wrote:
Seems like this should be easy, probably is, but I cannot figure out how to get it to work with the melt / stack functions. The reason I want to is, I'm finding that with the data I have, I search for a the row where the string in column 1 == myparameter, then I get the values for columns 2-6, clumsily extract them into an array (because a dataframe row is not a dataArray), then apply the mean function to this array. It would be a lot simpler to have column names that were based on my parameter values. Then i could say "mean(df[:parameter][2:6])" and be done.

I suspect I am missing out on how to use dataframes to make this easy; can someone point me in the right direction?

-Reuben

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: reshaping dataframe to make rows columns?

Michael Borregaard
I wonder if the functions are broken. This:
using DataFrames, RDatasets
iris = dataset("datasets", "iris")
iris[:id] = 1:size(iris, 1)
longdf = melt(iris, :id)
widedf = unstack(longdf, :id, :variable, :value)

from http://dataframesjl.readthedocs.io/en/latest/reshaping_and_pivoting.html fails with
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Float64
This may have arisen from a call to the constructor Float64(...),
since type constructors fall back to convert methods
.
 
in setindex!(::DataArrays.DataArray{Float64,1}, ::String, ::Int64) at /Users/michael/.julia/v0.5/DataArrays/src/indexing.jl:217
 
in unstack(::DataFrames.DataFrame, ::Int64, ::Int64, ::Int64) at /Users/michael/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:183
 
in unstack(::DataFrames.DataFrame, ::Symbol, ::Symbol, ::Symbol) at /Users/michael/.julia/v0.5/DataFrames/src/abstractdataframe/reshape.jl:188

I have opened an issue at the repo.



--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: reshaping dataframe to make rows columns?

Michael Borregaard
In reply to this post by Reuben
The functions in DataFrames appear to be working now.

To do what you want to do with unstack() you need to define an id variable, in this case (where you only want one row in the result) as a vector of identical values. Example:

using DataFrames
df
= DataFrame(ct = ["a", "b", "c", "d"], x = randn(4), id = ones(Int, 4))
unstack
(df, :id, :ct, :x)

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Loading...