ANN: Query.jl

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

ANN: Query.jl

David Anthoff

Hi all,


I just tagged Query.jl v0.1.0, and with that my little summer project should be ready for wider consumption.


Query.jl hopes to be the equivalent of LINQ or dplyr for julia, eventually. It provides a unified way to query many different data sources, the most prominent being DataFrames.


You can find more information and documentation at https://github.com/davidanthoff/Query.jl.


You should consider the package in beta at the moment: it is more or less feature complete and functional for a first release, but it hasn't been tested widely. Please do take it for a spin and report any bugs, usability issues etc. back.


I'm also keenly interested in collaborators. This is an ambitious project, and any help would be greatly appreciated. PRs are welcome!


Finally, the package builds on a lot of previous work. I just want to highlight some: C# LINQ (the basis for the whole package design), the NamedTuples package (couldn't have built Query without it) and the DataStreams ecosystem that enabled rapid integration of a large number of sources and sinks. And of course julia itself. It is quite amazing how simple it was to take a really complex design like LINQ and port it to julia. Not only did I never bump into any language limitation in the process, but julia actually enabled a couple of really neat features in Query.jl that would not have been possible in C#.


Here are some other highlights of the package:

- Query.jl is an almost complete implementation of the query expression section of the C# specification, with some additional julia specific features added in.
- The package supports a large number of data sources: DataFrames, TypedTables, normal arrays, any DataStream source (this includes CSV, Feather, SQLite), NDSparseData structures and any type that can be iterated.
The results of a query can be materialized into a range of different data structures: iterators, DataFrames, arrays or any DataStream sink (this includes CSV and Feather files).
One can mix and match almost all sources and sinks within one query. For example, one can easily perform a join of a DataFrame with a CSV file and write the results into a Feather file, all within one query.
The type instability problems that one can run into with DataFrames do not affect Query.jl, i.e. queries against DataFrames are completely type stable.
There are three different APIs that package authors can use to make their data sources queryable with this package. The most simple API only requires a data source to provide an iterator. Another API provides a data source with a complete graph representation of the query and the data source can then e.g. rewrite that query graph as a SQL statement to execute the query. The final API allows a data source to provide its own data structures that can represent a query graph.
The package is completely documented.

Have fun and please report back any issues you run into.

Best,
David


--
You received this message because you are subscribed to the Google Groups "julia-stats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.