Question: Forcing readtable to create string type on import

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Question: Forcing readtable to create string type on import

LeAnthony Mathews
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
8018884596
8018893530
8018909633

When I do a readtable of this CSV into file1 then do a typeof(file1[:account_number]) I get:
DataArrays.DataArray(Int32,1)
 -571049996
 -571041062
 -571024959

when I do a typeof(file2[:account_number])
DataArrays.DataArray(String,1)


Question:  
My CSV files give no guidance that account_number should be Int32 or string type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
new_account_join = join(file1, file2, on =:account_number,kind = :left)

But I am getting this error:
ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{
Array{Symbol,1},1}
 in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D
ataFrames.DataFrame) at .\<missing>:0


Any help would be appreciated.  


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

Jacob Quinn
You could use CSV.jl: http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <[hidden email]> wrote:
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
<a href="tel:8018884596" value="+18018884596" target="_blank">8018884596
<a href="tel:8018893530" value="+18018893530" target="_blank">8018893530
<a href="tel:8018909633" value="+18018909633" target="_blank">8018909633

When I do a readtable of this CSV into file1 then do a typeof(file1[:account_number]) I get:
DataArrays.DataArray(Int32,1)
 -571049996
 -571041062
 -571024959

when I do a typeof(file2[:account_number])
DataArrays.DataArray(String,1)


Question:  
My CSV files give no guidance that account_number should be Int32 or string type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
new_account_join = join(file1, file2, on =:account_number,kind = :left)

But I am getting this error:
ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{
Array{Symbol,1},1}
 in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D
ataFrames.DataFrame) at .\<missing>:0


Any help would be appreciated.  



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

LeAnthony Mathews
Great, that worked for forcing the column into a string type.
Thanks

On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
You could use CSV.jl: <a href="http://juliadata.github.io/CSV.jl/stable/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;">http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="utWq4L1YAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">leant...@...> wrote:
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
When I do a readtable of this CSV into file1 then do a typeof(file1[:account_number]) I get:
DataArrays.DataArray(Int32,1)
 -571049996
 -571041062
 -571024959

when I do a typeof(file2[:account_number])
DataArrays.DataArray(String,1)


Question:  
My CSV files give no guidance that account_number should be Int32 or string type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
new_account_join = join(file1, file2, on =:account_number,kind = :left)

But I am getting this error:
ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{
Array{Symbol,1},1}
 in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D
ataFrames.DataFrame) at .\<missing>:0


Any help would be appreciated.  



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

LeAnthony Mathews
Spoke too soon.  
Again I simple want the CSV column that is read in to not be an int32, but a string.

Still having issues casting the CSV file back into a Dataframe.
Its hard to understand why the Julia system is attempting to determine the type of the columns when I use readtable and I have no control over this.

Why can I not say:
df1 = readtable(file1; types=Dict(1=>String)) # assuming your account number is column # 1

Reading the Julia spec-Advanced Options for Reading CSV Files
readtable accepts the following optional keyword arguments:

eltypes::Vector{DataType} – Specify the types of all columns. Defaults to [].


df1 = readtable(file1, Int32::Vector(String))

I get 
ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}

Is this even an option?  Or how about convert the df1_CSV to df1_dataframe?  
df1_dataframe = convert(dataframe, df1_CSV)
Since the CSV .read seems to give more granular control.


On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
Great, that worked for forcing the column into a string type.
Thanks

On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
You could use CSV.jl: <a href="http://juliadata.github.io/CSV.jl/stable/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;">http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <[hidden email]> wrote:
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
When I do a readtable of this CSV into file1 then do a typeof(file1[:account_number]) I get:
DataArrays.DataArray(Int32,1)
 -571049996
 -571041062
 -571024959

when I do a typeof(file2[:account_number])
DataArrays.DataArray(String,1)


Question:  
My CSV files give no guidance that account_number should be Int32 or string type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
new_account_join = join(file1, file2, on =:account_number,kind = :left)

But I am getting this error:
ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{
Array{Symbol,1},1}
 in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D
ataFrames.DataFrame) at .\<missing>:0


Any help would be appreciated.  



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

Michael Borregaard
The result of CSV should be a DataFrame by default.  What return type do you get?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

LeAnthony Mathews
Sure, so I need col #1 in my CSV to be a string in my data frame.   

So as a test  I tried to load the file 3 different ways:

df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing the column to stay a string
df1_readtable = readtable("$df1_path")  #Do not know how to force the column to stay a string
df1_convertDF = convert(DataFrame, df1_CSV)

Here is the output:  If they are all dataframes then showcols should work an all three df1:

julia> names(df1_CSV)
3-element Array{Symbol,1}:
 :account_number
 Symbol("Discharge Date")
 :site

julia> names(df1_readtable)
3-element Array{Symbol,1}:
 :account_number
 :Discharge_Date
 :site

julia> names(df1_convertDF)
3-element Array{Symbol,1}:
 :account_number
 Symbol("Discharge Date")
 :site


julia> eltypes(df1_CSV)
3-element Array{Type,1}:
 Nullable{String}
 Nullable{WeakRefString{UInt8}}
 Nullable{WeakRefString{UInt8}}

julia> eltypes(df1_readtable)
3-element Array{Type,1}:
 Int32   #Do not know how to force the column to stay a string
 String
 String

julia> eltypes(df1_convertDF)
3-element Array{Type,1}:
 Nullable{String}
 Nullable{WeakRefString{UInt8}}
 Nullable{WeakRefString{UInt8}}

julia> showcols(df1_convertDF)
1565x3 DataFrames.DataFrame
ERROR: MethodError: no method matching countna(::NullableArrays.NullableArray{St
ring,1})
Closest candidates are:
  countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
ils.jl:115
  countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\other\utils.jl:128
  countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at C:\Users\lmathews\.ju
lia\v0.5\DataFrames\src\other\utils.jl:143
 in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\abstractdataframe\abstractdataframe.jl:657
 in showcols(::Base.TTY, ::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.
5\DataFrames\src\abstractdataframe\show.jl:574
 in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFrames
\src\abstractdataframe\show.jl:581

julia> showcols(df1_readtable)
1565x3 DataFrames.DataFrame
│ Col # │ Name           │ Eltype │ Missing │
├───────┼────────────────┼────────┼─────────┤
│ 1     │ account_number │ Int32  │ 0       │
│ 2     │ Discharge_Date │ String │ 0       │
│ 3     │ site           │ String │ 0       │

julia> showcols(df1_CSV)
1565x3 DataFrames.DataFrame
ERROR: MethodError: no method matching countna(::NullableArrays.NullableArray{St
ring,1})
Closest candidates are:
  countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
ils.jl:115
  countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\other\utils.jl:128
  countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at C:\Users\lmathews\.ju
lia\v0.5\DataFrames\src\other\utils.jl:143
 in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\abstractdataframe\abstractdataframe.jl:657
 in showcols(::Base.TTY, ::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.
5\DataFrames\src\abstractdataframe\show.jl:574
 in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFrames
\src\abstractdataframe\show.jl:581



On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard wrote:
The result of CSV should be a DataFrame by default.  What return type do you get?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

Michael Borregaard
DataFrames is currently undergoing a very major change. Looks like CSV creates the new type of DataFrames. I hope someone can help you with using that. As a workaround, on the normal DataFrames version, I have generally just replaced with a string representation:
```
df[:account_numbers] = ["$account_number" for account_number in df[:account_numbers]]

On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews <[hidden email]> wrote:
Sure, so I need col #1 in my CSV to be a string in my data frame.   

So as a test  I tried to load the file 3 different ways:

df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing the column to stay a string
df1_readtable = readtable("$df1_path")  #Do not know how to force the column to stay a string
df1_convertDF = convert(DataFrame, df1_CSV)

Here is the output:  If they are all dataframes then showcols should work an all three df1:

julia> names(df1_CSV)
3-element Array{Symbol,1}:
 :account_number
 Symbol("Discharge Date")
 :site

julia> names(df1_readtable)
3-element Array{Symbol,1}:
 :account_number
 :Discharge_Date
 :site

julia> names(df1_convertDF)
3-element Array{Symbol,1}:
 :account_number
 Symbol("Discharge Date")
 :site


julia> eltypes(df1_CSV)
3-element Array{Type,1}:
 Nullable{String}
 Nullable{WeakRefString{UInt8}}
 Nullable{WeakRefString{UInt8}}

julia> eltypes(df1_readtable)
3-element Array{Type,1}:
 Int32   #Do not know how to force the column to stay a string
 String
 String

julia> eltypes(df1_convertDF)
3-element Array{Type,1}:
 Nullable{String}
 Nullable{WeakRefString{UInt8}}
 Nullable{WeakRefString{UInt8}}

julia> showcols(df1_convertDF)
1565x3 DataFrames.DataFrame
ERROR: MethodError: no method matching countna(::NullableArrays.NullableArray{St
ring,1})
Closest candidates are:
  countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
ils.jl:115
  countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\other\utils.jl:128
  countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at C:\Users\lmathews\.ju
lia\v0.5\DataFrames\src\other\utils.jl:143
 in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\abstractdataframe\abstractdataframe.jl:657
 in showcols(::Base.TTY, ::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.
5\DataFrames\src\abstractdataframe\show.jl:574
 in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFrames
\src\abstractdataframe\show.jl:581

julia> showcols(df1_readtable)
1565x3 DataFrames.DataFrame
│ Col # │ Name           │ Eltype │ Missing │
├───────┼────────────────┼────────┼─────────┤
│ 1     │ account_number │ Int32  │ 0       │
│ 2     │ Discharge_Date │ String │ 0       │
│ 3     │ site           │ String │ 0       │

julia> showcols(df1_CSV)
1565x3 DataFrames.DataFrame
ERROR: MethodError: no method matching countna(::NullableArrays.NullableArray{St
ring,1})
Closest candidates are:
  countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
ils.jl:115
  countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\other\utils.jl:128
  countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at C:\Users\lmathews\.ju
lia\v0.5\DataFrames\src\other\utils.jl:143
 in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\abstractdataframe\abstractdataframe.jl:657
 in showcols(::Base.TTY, ::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.
5\DataFrames\src\abstractdataframe\show.jl:574
 in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFrames
\src\abstractdataframe\show.jl:581



On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard wrote:
The result of CSV should be a DataFrame by default.  What return type do you get?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

LeAnthony Mathews
Thanks Michael,
  I been thinking about this all day.  Yes, basically I am going to have to create a macro CSVreadtable that mimics the readtable command, but in the expantion uses CSV.read.  The macro will manually constructs a similar readtable sized dataframe array, but use the column types I specify or inherit from the original readtable command.  The macro can use the current CSV.read parameters.

So this would work.
df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))  

so a:
eltypes(df1_CSVreadtable)
3-element Array{Type,1}:
 Int32   
 String
 String


  Anyway, I was looking for a quick fix, but it least I will learn some Julia.



On Thursday, November 3, 2016 at 4:05:23 PM UTC-4, Michael Borregaard wrote:
DataFrames is currently undergoing a very major change. Looks like CSV creates the new type of DataFrames. I hope someone can help you with using that. As a workaround, on the normal DataFrames version, I have generally just replaced with a string representation:
```
df[:account_numbers] = ["$account_number" for account_number in df[:account_numbers]]

On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="jfn2mZ5GAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">leant...@...> wrote:
Sure, so I need col #1 in my CSV to be a string in my data frame.   

So as a test  I tried to load the file 3 different ways:

df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing the column to stay a string
df1_readtable = readtable("$df1_path")  #Do not know how to force the column to stay a string
df1_convertDF = convert(DataFrame, df1_CSV)

Here is the output:  If they are all dataframes then showcols should work an all three df1:

julia> names(df1_CSV)
3-element Array{Symbol,1}:
 :account_number
 Symbol("Discharge Date")
 :site

julia> names(df1_readtable)
3-element Array{Symbol,1}:
 :account_number
 :Discharge_Date
 :site

julia> names(df1_convertDF)
3-element Array{Symbol,1}:
 :account_number
 Symbol("Discharge Date")
 :site


julia> eltypes(df1_CSV)
3-element Array{Type,1}:
 Nullable{String}
 Nullable{WeakRefString{UInt8}}
 Nullable{WeakRefString{UInt8}}

julia> eltypes(df1_readtable)
3-element Array{Type,1}:
 Int32   #Do not know how to force the column to stay a string
 String
 String

julia> eltypes(df1_convertDF)
3-element Array{Type,1}:
 Nullable{String}
 Nullable{WeakRefString{UInt8}}
 Nullable{WeakRefString{UInt8}}

julia> showcols(df1_convertDF)
1565x3 DataFrames.DataFrame
ERROR: MethodError: no method matching countna(::NullableArrays.NullableArray{St
ring,1})
Closest candidates are:
  countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
ils.jl:115
  countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\other\utils.jl:128
  countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at C:\Users\lmathews\.ju
lia\v0.5\DataFrames\src\other\utils.jl:143
 in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\abstractdataframe\abstractdataframe.jl:657
 in showcols(::Base.TTY, ::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.
5\DataFrames\src\abstractdataframe\show.jl:574
 in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFrames
\src\abstractdataframe\show.jl:581

julia> showcols(df1_readtable)
1565x3 DataFrames.DataFrame
│ Col # │ Name           │ Eltype │ Missing │
├───────┼────────────────┼────────┼─────────┤
│ 1     │ account_number │ Int32  │ 0       │
│ 2     │ Discharge_Date │ String │ 0       │
│ 3     │ site           │ String │ 0       │

julia> showcols(df1_CSV)
1565x3 DataFrames.DataFrame
ERROR: MethodError: no method matching countna(::NullableArrays.NullableArray{St
ring,1})
Closest candidates are:
  countna(::Array{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
ils.jl:115
  countna(::DataArrays.DataArray{T,N}) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\other\utils.jl:128
  countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at C:\Users\lmathews\.ju
lia\v0.5\DataFrames\src\other\utils.jl:143
 in colmissing(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFram
es\src\abstractdataframe\abstractdataframe.jl:657
 in showcols(::Base.TTY, ::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.
5\DataFrames\src\abstractdataframe\show.jl:574
 in showcols(::DataFrames.DataFrame) at C:\Users\lmathews\.julia\v0.5\DataFrames
\src\abstractdataframe\show.jl:581



On Thursday, November 3, 2016 at 8:54:19 AM UTC-4, Michael Borregaard wrote:
The result of CSV should be a DataFrame by default.  What return type do you get?

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

Milan Bouchet-Valat
Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :

> Thanks Michael,
>   I been thinking about this all day.  Yes, basically I am going to
> have to create a macro CSVreadtable that mimics the readtable
> command, but in the expantion uses CSV.read.  The macro will manually
> constructs a similar readtable sized dataframe array, but use the
> column types I specify or inherit from the original readtable
> command.  The macro can use the current CSV.read parameters.
>
> So this would work.
> df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))  
>
> so a:
> eltypes(df1_CSVreadtable)
> 3-element Array{Type,1}:
>  Int32   
>  String
>  String
>
>
>   Anyway, I was looking for a quick fix, but it least I will learn
> some Julia.
If you don't have missing values and just want a Vector{String}, you
can pass nullable=false to CSV.read().


Regards

>
>
> > DataFrames is currently undergoing a very major change. Looks like
> > CSV creates the new type of DataFrames. I hope someone can help you
> > with using that. As a workaround, on the normal DataFrames version,
> > I have generally just replaced with a string representation:
> > ```
> > df[:account_numbers] = ["$account_number" for account_number in
> > df[:account_numbers]]
> >
> > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews <[hidden email]
> > om> wrote:
> > > Sure, so I need col #1 in my CSV to be a string in my data frame.
> > >   
> > >
> > > So as a test  I tried to load the file 3 different ways:
> > >
> > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
> > > the column to stay a string
> > > df1_readtable = readtable("$df1_path")  #Do not know how to force
> > > the column to stay a string
> > > df1_convertDF = convert(DataFrame, df1_CSV)
> > >
> > > Here is the output:  If they are all dataframes then showcols
> > > should work an all three df1:
> > >
> > > julia> names(df1_CSV)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > >
> > > julia> names(df1_readtable)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  :Discharge_Date
> > >  :site
> > >
> > > julia> names(df1_convertDF)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > >
> > >
> > > julia> eltypes(df1_CSV)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > >
> > > julia> eltypes(df1_readtable)
> > > 3-element Array{Type,1}:
> > >  Int32   #Do not know how to force the column to stay a string
> > >  String
> > >  String
> > >
> > > julia> eltypes(df1_convertDF)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > >
> > > julia> showcols(df1_convertDF)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > >
> > > julia> showcols(df1_readtable)
> > > 1565x3 DataFrames.DataFrame
> > > │ Col # │ Name           │ Eltype │ Missing │
> > > ├───────┼────────────────┼────────┼─────────┤
> > > │ 1     │ account_number │ Int32  │ 0       │
> > > │ 2     │ Discharge_Date │ String │ 0       │
> > > │ 3     │ site           │ String │ 0       │
> > >
> > > julia> showcols(df1_CSV)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > >
> > >
> > >
> > > > The result of CSV should be a DataFrame by default.  What
> > > > return type do you get?
> > > >
> > >
> >
> >
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

Jacob Quinn
LeAnthony,

I'm wondering if you're on an old version of DataFrames? There haven't been any issues "show"-ing DataFrames with NullableArray columns for quite some time. You can check (and post back here) your current package versions by doing:

Pkg.installed()

You can also ensure you're on the latest valid release by doing:

Pkg.update()


-Jacob

On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valat <[hidden email]> wrote:
Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :
> Thanks Michael,
>   I been thinking about this all day.  Yes, basically I am going to
> have to create a macro CSVreadtable that mimics the readtable
> command, but in the expantion uses CSV.read.  The macro will manually
> constructs a similar readtable sized dataframe array, but use the
> column types I specify or inherit from the original readtable
> command.  The macro can use the current CSV.read parameters.
>
> So this would work.
> df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))  
>
> so a:
> eltypes(df1_CSVreadtable)
> 3-element Array{Type,1}:
>  Int32   
>  String
>  String
>
>
>   Anyway, I was looking for a quick fix, but it least I will learn
> some Julia.
If you don't have missing values and just want a Vector{String}, you
can pass nullable=false to CSV.read().


Regards

>
>
> > DataFrames is currently undergoing a very major change. Looks like
> > CSV creates the new type of DataFrames. I hope someone can help you
> > with using that. As a workaround, on the normal DataFrames version,
> > I have generally just replaced with a string representation:
> > ```
> > df[:account_numbers] = ["$account_number" for account_number in
> > df[:account_numbers]]
> >
> > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews <[hidden email]
> > om> wrote:
> > > Sure, so I need col #1 in my CSV to be a string in my data frame.
> > >   
> > >
> > > So as a test  I tried to load the file 3 different ways:
> > >
> > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
> > > the column to stay a string
> > > df1_readtable = readtable("$df1_path")  #Do not know how to force
> > > the column to stay a string
> > > df1_convertDF = convert(DataFrame, df1_CSV)
> > >
> > > Here is the output:  If they are all dataframes then showcols
> > > should work an all three df1:
> > >
> > > julia> names(df1_CSV)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > >
> > > julia> names(df1_readtable)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  :Discharge_Date
> > >  :site
> > >
> > > julia> names(df1_convertDF)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > >
> > >
> > > julia> eltypes(df1_CSV)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > >
> > > julia> eltypes(df1_readtable)
> > > 3-element Array{Type,1}:
> > >  Int32   #Do not know how to force the column to stay a string
> > >  String
> > >  String
> > >
> > > julia> eltypes(df1_convertDF)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > >
> > > julia> showcols(df1_convertDF)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > >
> > > julia> showcols(df1_readtable)
> > > 1565x3 DataFrames.DataFrame
> > > │ Col # │ Name           │ Eltype │ Missing │
> > > ├───────┼────────────────┼────────┼─────────┤
> > > │ 1     │ account_number │ Int32  │ 0       │
> > > │ 2     │ Discharge_Date │ String │ 0       │
> > > │ 3     │ site           │ String │ 0       │
> > >
> > > julia> showcols(df1_CSV)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > >
> > >
> > >
> > > > The result of CSV should be a DataFrame by default.  What
> > > > return type do you get?
> > > >
> > >
> >
> >

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

Ralph Smith
In reply to this post by LeAnthony Mathews
Unless I misunderstand,

df1 = readtable(file1,eltypes=[String,String,String])


seems to be what you want.

If you're new to Julia, the fact that a "vector of types" really means exactly that may be surprising. 

Let us hope that the new versions of DataFrames include a parser that doesn't treat most 10-digit numbers as Int32 on systems like yours.

On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews wrote:
Spoke too soon.  
Again I simple want the CSV column that is read in to not be an int32, but a string.

Still having issues casting the CSV file back into a Dataframe.
Its hard to understand why the Julia system is attempting to determine the type of the columns when I use readtable and I have no control over this.

Why can I not say:
df1 = readtable(file1; types=Dict(1=>String)) # assuming your account number is column # 1

Reading the Julia spec-Advanced Options for Reading CSV Files
readtable accepts the following optional keyword arguments:

eltypes::Vector{DataType} – Specify the types of all columns. Defaults to [].


df1 = readtable(file1, Int32::Vector(String))

I get 
ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}

Is this even an option?  Or how about convert the df1_CSV to df1_dataframe?  
df1_dataframe = convert(dataframe, df1_CSV)
Since the CSV .read seems to give more granular control.


On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
Great, that worked for forcing the column into a string type.
Thanks

On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
You could use CSV.jl: <a href="http://juliadata.github.io/CSV.jl/stable/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;">http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <[hidden email]> wrote:
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
When I do a readtable of this CSV into file1 then do a typeof(file1[:account_number]) I get:
DataArrays.DataArray(Int32,1)
 -571049996
 -571041062
 -571024959

when I do a typeof(file2[:account_number])
DataArrays.DataArray(String,1)


Question:  
My CSV files give no guidance that account_number should be Int32 or string type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
new_account_join = join(file1, file2, on =:account_number,kind = :left)

But I am getting this error:
ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{
Array{Symbol,1},1}
 in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D
ataFrames.DataFrame) at .\<missing>:0


Any help would be appreciated.  



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

LeAnthony Mathews
In reply to this post by Jacob Quinn
Hello Jacob, see below:

julia> Pkg.installed()
Dict{String,VersionNumber} with 25 entries:
  "DataFrames"        => v"0.8.4"
  "DataStreams"       => v"0.1.2"
  "Calculus"          => v"0.1.15"
  "Reexport"          => v"0.0.3"
  "BinDeps"           => v"0.4.5"
  "Rmath"             => v"0.1.4"
  "Dates"             => v"0.4.4"
  "NullableArrays"    => v"0.0.10"
  "URIParser"         => v"0.1.6"
  "GZip"              => v"0.2.20"
  "CSV"               => v"0.1.1"
  "RDatasets"         => v"0.2.0"
  "SortingAlgorithms" => v"0.1.0"
  "Compat"            => v"0.9.3"
  "FileIO"            => v"0.2.0"
  "Distributions"     => v"0.11.0"
  "DataArrays"        => v"0.3.9"
  "PDMats"            => v"0.5.0"
  "SHA"               => v"0.2.1"
  "StatsBase"         => v"0.11.1"
  "XGBoost"           => v"0.2.0"
  "RData"             => v"0.0.4"
  "WeakRefStrings"    => v"0.2.0"
  "StatsFuns"         => v"0.3.1"
  "CategoricalArrays" => v"0.1.0"

On Thursday, November 3, 2016 at 5:19:04 PM UTC-4, Jacob Quinn wrote:
LeAnthony,

I'm wondering if you're on an old version of DataFrames? There haven't been any issues "show"-ing DataFrames with NullableArray columns for quite some time. You can check (and post back here) your current package versions by doing:

Pkg.installed()

You can also ensure you're on the latest valid release by doing:

Pkg.update()


-Jacob

On Thu, Nov 3, 2016 at 3:15 PM, Milan Bouchet-Valat <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="PaxR7aNKAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">nali...@...> wrote:
Le jeudi 03 novembre 2016 à 13:35 -0700, LeAnthony Mathews a écrit :
> Thanks Michael,
>   I been thinking about this all day.  Yes, basically I am going to
> have to create a macro CSVreadtable that mimics the readtable
> command, but in the expantion uses CSV.read.  The macro will manually
> constructs a similar readtable sized dataframe array, but use the
> column types I specify or inherit from the original readtable
> command.  The macro can use the current CSV.read parameters.
>
> So this would work.
> df1_CSVreadtable = CSVreadtable("$df1_path"; types=Dict(1=>String))  
>
> so a:
> eltypes(df1_CSVreadtable)
> 3-element Array{Type,1}:
>  Int32   
>  String
>  String
>
>
>   Anyway, I was looking for a quick fix, but it least I will learn
> some Julia.
If you don't have missing values and just want a Vector{String}, you
can pass nullable=false to CSV.read().


Regards

>
>
> > DataFrames is currently undergoing a very major change. Looks like
> > CSV creates the new type of DataFrames. I hope someone can help you
> > with using that. As a workaround, on the normal DataFrames version,
> > I have generally just replaced with a string representation:
> > ```
> > df[:account_numbers] = ["$account_number" for account_number in
> > df[:account_numbers]]
> >
> > On Thu, Nov 3, 2016 at 3:05 PM, LeAnthony Mathews <[hidden email]
> > om> wrote:
> > > Sure, so I need col #1 in my CSV to be a string in my data frame.
> > >   
> > >
> > > So as a test  I tried to load the file 3 different ways:
> > >
> > > df1_CSV = CSV.read("$df1_path"; types=Dict(1=>String))  #forcing
> > > the column to stay a string
> > > df1_readtable = readtable("$df1_path")  #Do not know how to force
> > > the column to stay a string
> > > df1_convertDF = convert(DataFrame, df1_CSV)
> > >
> > > Here is the output:  If they are all dataframes then showcols
> > > should work an all three df1:
> > >
> > > julia> names(df1_CSV)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > >
> > > julia> names(df1_readtable)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  :Discharge_Date
> > >  :site
> > >
> > > julia> names(df1_convertDF)
> > > 3-element Array{Symbol,1}:
> > >  :account_number
> > >  Symbol("Discharge Date")
> > >  :site
> > >
> > >
> > > julia> eltypes(df1_CSV)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > >
> > > julia> eltypes(df1_readtable)
> > > 3-element Array{Type,1}:
> > >  Int32   #Do not know how to force the column to stay a string
> > >  String
> > >  String
> > >
> > > julia> eltypes(df1_convertDF)
> > > 3-element Array{Type,1}:
> > >  Nullable{String}
> > >  Nullable{WeakRefString{UInt8}}
> > >  Nullable{WeakRefString{UInt8}}
> > >
> > > julia> showcols(df1_convertDF)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > >
> > > julia> showcols(df1_readtable)
> > > 1565x3 DataFrames.DataFrame
> > > │ Col # │ Name           │ Eltype │ Missing │
> > > ├───────┼────────────────┼────────┼─────────┤
> > > │ 1     │ account_number │ Int32  │ 0       │
> > > │ 2     │ Discharge_Date │ String │ 0       │
> > > │ 3     │ site           │ String │ 0       │
> > >
> > > julia> showcols(df1_CSV)
> > > 1565x3 DataFrames.DataFrame
> > > ERROR: MethodError: no method matching
> > > countna(::NullableArrays.NullableArray{St
> > > ring,1})
> > > Closest candidates are:
> > >   countna(::Array{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames\src\other\ut
> > > ils.jl:115
> > >   countna(::DataArrays.DataArray{T,N}) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\other\utils.jl:128
> > >   countna(::DataArrays.PooledDataArray{T,R<:Integer,N}) at
> > > C:\Users\lmathews\.ju
> > > lia\v0.5\DataFrames\src\other\utils.jl:143
> > >  in colmissing(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFram
> > > es\src\abstractdataframe\abstractdataframe.jl:657
> > >  in showcols(::Base.TTY, ::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.
> > > 5\DataFrames\src\abstractdataframe\show.jl:574
> > >  in showcols(::DataFrames.DataFrame) at
> > > C:\Users\lmathews\.julia\v0.5\DataFrames
> > > \src\abstractdataframe\show.jl:581
> > >
> > >
> > >
> > > > The result of CSV should be a DataFrame by default.  What
> > > > return type do you get?
> > > >
> > >
> >
> >

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Question: Forcing readtable to create string type on import

LeAnthony Mathews
In reply to this post by Ralph Smith
Hello Ralph,  this worked.
  
I changed added the eltypes option to force the readtable command to read the first column in as a string type rather than a destructive int32. 
df1_readtable_old = readtable("$df1_path")
df1_readtable_new = readtable("$df1_path", eltypes=[String,String,String])

julia> eltypes(df1_readtable_old)
3-element Array{Type,1}:
 Int32
 String
 String

julia> eltypes(df1_readtable_new)
3-element Array{Type,1}:
 String
 String
 String

Thanks everyone for the support.

julia>

On Thursday, November 3, 2016 at 11:29:53 PM UTC-4, Ralph Smith wrote:
Unless I misunderstand,

df1 = readtable(file1,eltypes=[String,String,String])


seems to be what you want.

If you're new to Julia, the fact that a "vector of types" really means exactly that may be surprising. 

Let us hope that the new versions of DataFrames include a parser that doesn't treat most 10-digit numbers as Int32 on systems like yours.

On Wednesday, November 2, 2016 at 4:15:20 PM UTC-4, LeAnthony Mathews wrote:
Spoke too soon.  
Again I simple want the CSV column that is read in to not be an int32, but a string.

Still having issues casting the CSV file back into a Dataframe.
Its hard to understand why the Julia system is attempting to determine the type of the columns when I use readtable and I have no control over this.

Why can I not say:
df1 = readtable(file1; types=Dict(1=>String)) # assuming your account number is column # 1

Reading the Julia spec-Advanced Options for Reading CSV Files
readtable accepts the following optional keyword arguments:

eltypes::Vector{DataType} – Specify the types of all columns. Defaults to [].


df1 = readtable(file1, Int32::Vector(String))

I get 
ERROR: TypeError: typeassert: expected Array{String,1}, got Type{Int32}

Is this even an option?  Or how about convert the df1_CSV to df1_dataframe?  
df1_dataframe = convert(dataframe, df1_CSV)
Since the CSV .read seems to give more granular control.


On Tuesday, November 1, 2016 at 7:28:36 PM UTC-4, LeAnthony Mathews wrote:
Great, that worked for forcing the column into a string type.
Thanks

On Monday, October 31, 2016 at 3:26:14 PM UTC-4, Jacob Quinn wrote:
You could use CSV.jl: <a href="http://juliadata.github.io/CSV.jl/stable/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjuliadata.github.io%2FCSV.jl%2Fstable%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEip61yTIaHuIAtzzUVDXOSYG27EQ&#39;;return true;">http://juliadata.github.io/CSV.jl/stable/

In this case, you'd do:

df1 = CSV.read(file1; types=Dict(1=>String)) # assuming your account number is column # 1
df2 = CSV.read(file2; types=Dict(1=>String))

-Jacob


On Mon, Oct 31, 2016 at 12:50 PM, LeAnthony Mathews <[hidden email]> wrote:
Using v0.5.0
I have two different 10,000 line CSV files that I am reading into two different dataframe variables using the readtable function.
Each table has in common a ten digit account_number that I would like to use as an index and join into one master file.

Here is the account number example in the original CSV from file1:
When I do a readtable of this CSV into file1 then do a typeof(file1[:account_number]) I get:
DataArrays.DataArray(Int32,1)
 -571049996
 -571041062
 -571024959

when I do a typeof(file2[:account_number])
DataArrays.DataArray(String,1)


Question:  
My CSV files give no guidance that account_number should be Int32 or string type.  How do I force it to make both account_number elements type String?

I would like this join command to work:
new_account_join = join(file1, file2, on =:account_number,kind = :left)

But I am getting this error:
ERROR: TypeError: typeassert: expected Union{Array{Symbol,1},Symbol}, got Array{
Array{Symbol,1},1}
 in (::Base.#kw##join)(::Array{Any,1}, ::Base.#join, ::DataFrames.DataFrame, ::D
ataFrames.DataFrame) at .\<missing>:0


Any help would be appreciated.  



Loading...