DataFrames: the most efficient way to read strings with commas as floats

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

DataFrames: the most efficient way to read strings with commas as floats

Alexander Flyax
I have a csv that has a lot of numbers with commas in string format. E.g., "123,456". I want to read them in as floats. What is the most efficient way to do this? E.g., in Python I can do:

income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')

Is there an automatic equivalent in `DataFrames` of the `thousands` argument? If not, what's the most "julian" way of doing that? Thanks...

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: DataFrames: the most efficient way to read strings with commas as floats

Milan Bouchet-Valat
Le dimanche 22 février 2015 à 16:37 -0800, Alexander Flyax a écrit :

> I have a csv that has a lot of numbers with commas in string format.
> E.g., "123,456". I want to read them in as floats. What is the most
> efficient way to do this? E.g., in Python I can do:
>
>
> income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')
>
>
> Is there an automatic equivalent in `DataFrames` of the `thousands`
> argument? If not, what's the most "julian" way of doing that?
> Thanks...
You can simply use
readtable("income_2013_dollars.csv", separator='\t', decimal=',')

See:
http://dataframesjl.readthedocs.org/en/latest/io.html

(BTW, tab-delimited fields go against the pseudo-standard for a .csv
file... Else, readtable would have even guessed the arguments for you
based on the file extension.)

Regards

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: DataFrames: the most efficient way to read strings with commas as floats

Eureka Zhu
How about using something like sed to prepocessing the csv file? 


On Mon, Feb 23, 2015 at 21:49, [hidden email] <[hidden email]> wrote:
Le dimanche 22 février 2015 à 16:37 -0800, Alexander Flyax a écrit :

> I have a csv that has a lot of numbers with commas in string format.
> E.g., "123,456". I want to read them in as floats. What is the most
> efficient way to do this? E.g., in Python I can do:
>
>
> income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')
>
>
> Is there an automatic equivalent in `DataFrames` of the `thousands`
> argument? If not, what's the most "julian" way of doing that?
> Thanks...
You can simply use
readtable("income_2013_dollars.csv", separator='\t', decimal=',')

See:
http://dataframesjl.readthedocs.org/en/latest/io.html

(BTW, tab-delimited fields go against the pseudo-standard for a .csv
file... Else, readtable would have even guessed the arguments for you
based on the file extension.)

Regards

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: DataFrames: the most efficient way to read strings with commas as floats

Pritam Prasad
In reply to this post by Milan Bouchet-Valat
Hi,
I tried reading with decimal=',' but its unsupported. Even though its there in the readtable manual. Can you please help me here:

Code:
data = readtable("dataFile.csv",separator=';',decimal=',')


On Monday, February 23, 2015 at 7:19:18 PM UTC+5:30, Milan Bouchet-Valat wrote:
Le dimanche 22 février 2015 à 16:37 -0800, Alexander Flyax a écrit :

> I have a csv that has a lot of numbers with commas in string format.
> E.g., "123,456". I want to read them in as floats. What is the most
> efficient way to do this? E.g., in Python I can do:
>
>
> income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')
>
>
> Is there an automatic equivalent in `DataFrames` of the `thousands`
> argument? If not, what's the most "julian" way of doing that?
> Thanks...
You can simply use
readtable("income_2013_dollars.csv", separator='\t', decimal=',')

See:
<a href="http://dataframesjl.readthedocs.org/en/latest/io.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdataframesjl.readthedocs.org%2Fen%2Flatest%2Fio.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGMQR6_YpDbRZJj78oYmu7r8tBt8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdataframesjl.readthedocs.org%2Fen%2Flatest%2Fio.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGMQR6_YpDbRZJj78oYmu7r8tBt8A&#39;;return true;">http://dataframesjl.readthedocs.org/en/latest/io.html

(BTW, tab-delimited fields go against the pseudo-standard for a .csv
file... Else, readtable would have even guessed the arguments for you
based on the file extension.)

Regards

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: DataFrames: the most efficient way to read strings with commas as floats

Benjamin Deonovic
Don't use CSV this is such an insane file type. The fact that any programming language can figure out hwo to parse it is a miracle. Consider:

a,b,1,000,2,000,5,6,345

is that 9 columns with 4th and 6th being 000? Is it 7 columns with last column 345? is it 6 columns with last column as 6,345?

If for some reason you have to use CSV because you can't find any way to save your data in another format just give up on life and become a sailor or something.

On Monday, November 16, 2015 at 4:12:04 AM UTC-6, Pritam Prasad wrote:
Hi,
I tried reading with decimal=',' but its unsupported. Even though its there in the readtable manual. Can you please help me here:

Code:
data = readtable("dataFile.csv",separator=';',decimal=',')


On Monday, February 23, 2015 at 7:19:18 PM UTC+5:30, Milan Bouchet-Valat wrote:
Le dimanche 22 février 2015 à 16:37 -0800, Alexander Flyax a écrit :

> I have a csv that has a lot of numbers with commas in string format.
> E.g., "123,456". I want to read them in as floats. What is the most
> efficient way to do this? E.g., in Python I can do:
>
>
> income_df = pd.read_csv("income_2013_dollars.csv", sep='\t', thousands=',')
>
>
> Is there an automatic equivalent in `DataFrames` of the `thousands`
> argument? If not, what's the most "julian" way of doing that?
> Thanks...
You can simply use
readtable("income_2013_dollars.csv", separator='\t', decimal=',')

See:
<a href="http://dataframesjl.readthedocs.org/en/latest/io.html" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdataframesjl.readthedocs.org%2Fen%2Flatest%2Fio.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGMQR6_YpDbRZJj78oYmu7r8tBt8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fdataframesjl.readthedocs.org%2Fen%2Flatest%2Fio.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGMQR6_YpDbRZJj78oYmu7r8tBt8A&#39;;return true;">http://dataframesjl.readthedocs.org/en/latest/io.html

(BTW, tab-delimited fields go against the pseudo-standard for a .csv
file... Else, readtable would have even guessed the arguments for you
based on the file extension.)

Regards

--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.