joining DataFrames with identical columns

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

joining DataFrames with identical columns

Westley Hennigh
This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia
> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

John Myles White
Unfortunately, multikey joins don't really work yet. In this example, you only have join keys and don't have any non-join keys, so there's no way to imitate the result without multikey joins.

I've been thinking to clean up join for some time, but haven't had time to get to it yet.

 -- John

On Sep 3, 2014, at 2:11 PM, Westley Hennigh <[hidden email]> wrote:

This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

Westley Hennigh
Ah, no worries, we'll just join our things manually and add them to one big DataFrame. Maybe we have time we can contribute!

Thanks,
=Westley

On Wednesday, September 3, 2014 5:15:41 PM UTC-4, John Myles White wrote:
Unfortunately, multikey joins don't really work yet. In this example, you only have join keys and don't have any non-join keys, so there's no way to imitate the result without multikey joins.

I've been thinking to clean up join for some time, but haven't had time to get to it yet.

 -- John

On Sep 3, 2014, at 2:11 PM, Westley Hennigh <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="DYPPhcboplkJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">westley...@...> wrote:

This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="DYPPhcboplkJ" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">julia-stats...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" onmousedown="this.href='https://groups.google.com/d/optout';return true;" onclick="this.href='https://groups.google.com/d/optout';return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

Sean Garborg
Wes -- I bet you want `vcat` or `append!`  (`vcat` does at lot of copying at the moment, but is more forgiving with column order and type inconsistencies):
```julia

julia> vcat(df1, df2)
4x2 DataFrame
|-------|-----|------|
| Row # | IDs | Vals |
| 1     | 1   | "1"  |
| 2     | 2   | "2"  |
| 3     | 3   | "3"  |
| 4     | 4   | "4"  |

julia> append!(df1, df2)
4x2 DataFrame
|-------|-----|------|
| Row # | IDs | Vals |
| 1     | 1   | "1"  |
| 2     | 2   | "2"  |
| 3     | 3   | "3"  |
| 4     | 4   | "4"  |
```

John -- I think multicolumn joins have worked fine for me -- what's wrong with them? Joins fail if one of the DataFrames doesn't have any non-key cols (opening an issue), so using join would fail here, but that's not exclusive to multi-key joins.



On Wed, Sep 3, 2014 at 4:59 PM, Westley Hennigh <[hidden email]> wrote:
Ah, no worries, we'll just join our things manually and add them to one big DataFrame. Maybe we have time we can contribute!

Thanks,
=Westley


On Wednesday, September 3, 2014 5:15:41 PM UTC-4, John Myles White wrote:
Unfortunately, multikey joins don't really work yet. In this example, you only have join keys and don't have any non-join keys, so there's no way to imitate the result without multikey joins.

I've been thinking to clean up join for some time, but haven't had time to get to it yet.

 -- John

On Sep 3, 2014, at 2:11 PM, Westley Hennigh <[hidden email]> wrote:

This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

John Myles White
Do we actually generate joins correctly where on is an vector of symbols? I thought that was never fully implemented, but maybe didn’t notice that the code was already present.

 — John

On Sep 3, 2014, at 6:29 PM, Sean Garborg <[hidden email]> wrote:

Wes -- I bet you want `vcat` or `append!`  (`vcat` does at lot of copying at the moment, but is more forgiving with column order and type inconsistencies):
```julia

julia> vcat(df1, df2)
4x2 DataFrame
|-------|-----|------|
| Row # | IDs | Vals |
| 1     | 1   | "1"  |
| 2     | 2   | "2"  |
| 3     | 3   | "3"  |
| 4     | 4   | "4"  |

julia> append!(df1, df2)
4x2 DataFrame
|-------|-----|------|
| Row # | IDs | Vals |
| 1     | 1   | "1"  |
| 2     | 2   | "2"  |
| 3     | 3   | "3"  |
| 4     | 4   | "4"  |
```

John -- I think multicolumn joins have worked fine for me -- what's wrong with them? Joins fail if one of the DataFrames doesn't have any non-key cols (opening an issue), so using join would fail here, but that's not exclusive to multi-key joins.



On Wed, Sep 3, 2014 at 4:59 PM, Westley Hennigh <[hidden email]> wrote:
Ah, no worries, we'll just join our things manually and add them to one big DataFrame. Maybe we have time we can contribute!

Thanks,
=Westley


On Wednesday, September 3, 2014 5:15:41 PM UTC-4, John Myles White wrote:
Unfortunately, multikey joins don't really work yet. In this example, you only have join keys and don't have any non-join keys, so there's no way to imitate the result without multikey joins.

I've been thinking to clean up join for some time, but haven't had time to get to it yet.

 -- John

On Sep 3, 2014, at 2:11 PM, Westley Hennigh <[hidden email]> wrote:

This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to julia-stats...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

Sean Garborg
In reply to this post by Westley Hennigh
We should add some tests, but it seems fine:

```julia
julia> df1 = DataFrame(A=[1, 2], B=[3, 4], C=[5, 6])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | C |
| 1     | 1 | 3 | 5 |
| 2     | 2 | 4 | 6 |

julia> df2 = DataFrame(A=[2, 1], B=[4, 5], D=[7, 8])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | D |
| 1     | 2 | 4 | 7 |
| 2     | 1 | 5 | 8 |

julia> join(df1, df2, on = [:A, :B])
1x4 DataFrame
|-------|---|---|---|---|
| Row # | A | B | C | D |
| 1     | 2 | 4 | 6 | 7 |

julia> join(df1, df2, on = [:A, :B], kind = :outer)
3x4 DataFrame
|-------|---|---|----|----|
| Row # | A | B | C  | D  |
| 1     | 2 | 4 | 6  | 7  |
| 2     | 1 | 3 | 5  | NA |
| 3     | 1 | 5 | NA | 8  |
```

On Wednesday, September 3, 2014 4:11:29 PM UTC-5, Westley Hennigh wrote:
This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia
> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

Sean Garborg
OT -- Wes, how did you get code syntax highlighting in your post?

On Wednesday, September 3, 2014 8:41:23 PM UTC-5, Sean Garborg wrote:
We should add some tests, but it seems fine:

```julia
julia> df1 = DataFrame(A=[1, 2], B=[3, 4], C=[5, 6])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | C |
| 1     | 1 | 3 | 5 |
| 2     | 2 | 4 | 6 |

julia> df2 = DataFrame(A=[2, 1], B=[4, 5], D=[7, 8])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | D |
| 1     | 2 | 4 | 7 |
| 2     | 1 | 5 | 8 |

julia> join(df1, df2, on = [:A, :B])
1x4 DataFrame
|-------|---|---|---|---|
| Row # | A | B | C | D |
| 1     | 2 | 4 | 6 | 7 |

julia> join(df1, df2, on = [:A, :B], kind = :outer)
3x4 DataFrame
|-------|---|---|----|----|
| Row # | A | B | C  | D  |
| 1     | 2 | 4 | 6  | 7  |
| 2     | 1 | 3 | 5  | NA |
| 3     | 1 | 5 | NA | 8  |
```

On Wednesday, September 3, 2014 4:11:29 PM UTC-5, Westley Hennigh wrote:
This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia
> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

Westley Hennigh
Hey! Sorry for not seeing this before, thanks so much for the info!

I don't know what the special symbols for code blocks are, but on the bar in the google groups interface there's a `{}` button, and pushing that will give you highlighting. It's a little finicky and can be hard to get out of - I've found it's best to always have at least one newline past the end of where you're creating a block.

On Wednesday, September 3, 2014 9:42:07 PM UTC-4, Sean Garborg wrote:
OT -- Wes, how did you get code syntax highlighting in your post?

On Wednesday, September 3, 2014 8:41:23 PM UTC-5, Sean Garborg wrote:
We should add some tests, but it seems fine:

```julia
julia> df1 = DataFrame(A=[1, 2], B=[3, 4], C=[5, 6])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | C |
| 1     | 1 | 3 | 5 |
| 2     | 2 | 4 | 6 |

julia> df2 = DataFrame(A=[2, 1], B=[4, 5], D=[7, 8])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | D |
| 1     | 2 | 4 | 7 |
| 2     | 1 | 5 | 8 |

julia> join(df1, df2, on = [:A, :B])
1x4 DataFrame
|-------|---|---|---|---|
| Row # | A | B | C | D |
| 1     | 2 | 4 | 6 | 7 |

julia> join(df1, df2, on = [:A, :B], kind = :outer)
3x4 DataFrame
|-------|---|---|----|----|
| Row # | A | B | C  | D  |
| 1     | 2 | 4 | 6  | 7  |
| 2     | 1 | 3 | 5  | NA |
| 3     | 1 | 5 | NA | 8  |
```

On Wednesday, September 3, 2014 4:11:29 PM UTC-5, Westley Hennigh wrote:
This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia
> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: joining DataFrames with identical columns

Sean Garborg

"Thanks!"



On Saturday, September 6, 2014 8:44:49 AM UTC-6, Westley Hennigh wrote:
Hey! Sorry for not seeing this before, thanks so much for the info!

I don't know what the special symbols for code blocks are, but on the bar in the google groups interface there's a `{}` button, and pushing that will give you highlighting. It's a little finicky and can be hard to get out of - I've found it's best to always have at least one newline past the end of where you're creating a block.

On Wednesday, September 3, 2014 9:42:07 PM UTC-4, Sean Garborg wrote:
OT -- Wes, how did you get code syntax highlighting in your post?

On Wednesday, September 3, 2014 8:41:23 PM UTC-5, Sean Garborg wrote:
We should add some tests, but it seems fine:

```julia
julia> df1 = DataFrame(A=[1, 2], B=[3, 4], C=[5, 6])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | C |
| 1     | 1 | 3 | 5 |
| 2     | 2 | 4 | 6 |

julia> df2 = DataFrame(A=[2, 1], B=[4, 5], D=[7, 8])
2x3 DataFrame
|-------|---|---|---|
| Row # | A | B | D |
| 1     | 2 | 4 | 7 |
| 2     | 1 | 5 | 8 |

julia> join(df1, df2, on = [:A, :B])
1x4 DataFrame
|-------|---|---|---|---|
| Row # | A | B | C | D |
| 1     | 2 | 4 | 6 | 7 |

julia> join(df1, df2, on = [:A, :B], kind = :outer)
3x4 DataFrame
|-------|---|---|----|----|
| Row # | A | B | C  | D  |
| 1     | 2 | 4 | 6  | 7  |
| 2     | 1 | 3 | 5  | NA |
| 3     | 1 | 5 | NA | 8  |
```

On Wednesday, September 3, 2014 4:11:29 PM UTC-5, Westley Hennigh wrote:
This might be a silly question, but I've had a little trouble figuring out DataFrames `join` function.

Suppose that you've got a couple small DataFrames of the form:


julia> df1 = DataFrame(IDs = [1,2], Vals = ["1", "2"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |


julia
> df2 = DataFrame(IDs = [3,4], Vals = ["3", "4"])
2x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 3   | "3"  |
| 2   | 4   | "4"  |


and you'd like to have one DataFrame:

4x2 DataFrame
|-----|-----|------|
| Row | IDs | Vals |
| 1   | 1   | "1"  |
| 2   | 2   | "2"  |
| 3   | 3   | "3"  |
| 4   | 4   | "4"  |

Is there an easy way to do that?

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.