Addition of readonly Dict type to Julia

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Addition of readonly Dict type to Julia

Scott Jones
Answering a question on StackOverflow (http://stackoverflow.com/questions/36385653/return-const-dictionary),
I'd brought up the idea of using a perfect hash (pointing out gperf for generating a perfect hash).
Stefan added some very useful information about a newer, non-GPL (it's LGPL) library, and also the possibility of directly using the CHD algorithm (http://cmph.sourceforge.net/papers/esa09.pdf) to write a pure Julia version under the MIT license
(which I think might make for a good Julia GSoC 2017 project, if it hasn't been done before then - what do people think?)

I already know of two places in Base  (in the REPL code, for Emoji and LaTex shortcuts) where an optimized readonly dictionary using a perfect hash could save significant space (by storing
all the keys in one long string and using short offsets to each key, as well as removing the unnecessary \ from the LaTex keys, and the initial \: and trailing : from the Emoji keys).
If the perfect hash generation were done at build time, that would also help speed up Julia startup time (something that has become a bit of an issue lately, esp. on ARM platforms).

Stefan's bam (https://github.com/StefanKarpinski/bam) is a good example of the advantages of taking such an approach.

I think this would be good to add either as a separate package, or possibly added to DataStructures.jl.




Reply | Threaded
Open this post in threaded view
|

Re: Addition of readonly Dict type to Julia

Erik Schnetter
I learned recently that Base has a type `ImmutableDict` (that is not
exported) (yet?). This is optimized for small dictionaries, as it uses
a linear search in a linked list for lookup, so it's probably not
useful for the large number of latex symbols.

-erik


On Mon, Apr 4, 2016 at 7:48 AM, Scott Jones <[hidden email]> wrote:

> Answering a question on StackOverflow
> (http://stackoverflow.com/questions/36385653/return-const-dictionary),
> I'd brought up the idea of using a perfect hash (pointing out gperf for
> generating a perfect hash).
> Stefan added some very useful information about a newer, non-GPL (it's LGPL)
> library, and also the possibility of directly using the CHD algorithm
> (http://cmph.sourceforge.net/papers/esa09.pdf) to write a pure Julia version
> under the MIT license
> (which I think might make for a good Julia GSoC 2017 project, if it hasn't
> been done before then - what do people think?)
>
> I already know of two places in Base  (in the REPL code, for Emoji and LaTex
> shortcuts) where an optimized readonly dictionary using a perfect hash could
> save significant space (by storing
> all the keys in one long string and using short offsets to each key, as well
> as removing the unnecessary \ from the LaTex keys, and the initial \: and
> trailing : from the Emoji keys).
> If the perfect hash generation were done at build time, that would also help
> speed up Julia startup time (something that has become a bit of an issue
> lately, esp. on ARM platforms).
>
> Stefan's bam (https://github.com/StefanKarpinski/bam) is a good example of
> the advantages of taking such an approach.
>
> I think this would be good to add either as a separate package, or possibly
> added to DataStructures.jl.
>
>
>
>



--
Erik Schnetter <[hidden email]>
http://www.perimeterinstitute.ca/personal/eschnetter/
Reply | Threaded
Open this post in threaded view
|

Re: Addition of readonly Dict type to Julia

Scott Jones
Yes, I mentioned that in my answer on StackOverflow, but that's not really even immutable, in the sense that you can still modify it after it is created.
Items can't be deleted, but can be hidden by adding a new element with the same key.  That's why I'd use the techniques I described above to make looking up latex symbols & emoji more efficient (both space and time).

On Tuesday, April 5, 2016 at 8:18:32 AM UTC-4, Erik Schnetter wrote:
I learned recently that Base has a type `ImmutableDict` (that is not
exported) (yet?). This is optimized for small dictionaries, as it uses
a linear search in a linked list for lookup, so it's probably not
useful for the large number of latex symbols.

-erik


On Mon, Apr 4, 2016 at 7:48 AM, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="MsSqDSjGAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">scott.pa...@...> wrote:

> Answering a question on StackOverflow
> (<a href="http://stackoverflow.com/questions/36385653/return-const-dictionary" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fstackoverflow.com%2Fquestions%2F36385653%2Freturn-const-dictionary\46sa\75D\46sntz\0751\46usg\75AFQjCNG4emplh4_vKJsnvhhC6rTVtU0agA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fstackoverflow.com%2Fquestions%2F36385653%2Freturn-const-dictionary\46sa\75D\46sntz\0751\46usg\75AFQjCNG4emplh4_vKJsnvhhC6rTVtU0agA&#39;;return true;">http://stackoverflow.com/questions/36385653/return-const-dictionary),
> I'd brought up the idea of using a perfect hash (pointing out gperf for
> generating a perfect hash).
> Stefan added some very useful information about a newer, non-GPL (it's LGPL)
> library, and also the possibility of directly using the CHD algorithm
> (<a href="http://cmph.sourceforge.net/papers/esa09.pdf" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fcmph.sourceforge.net%2Fpapers%2Fesa09.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNEfsHwlp2LRyvvAFLgecx0lr41ZQA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fcmph.sourceforge.net%2Fpapers%2Fesa09.pdf\46sa\75D\46sntz\0751\46usg\75AFQjCNEfsHwlp2LRyvvAFLgecx0lr41ZQA&#39;;return true;">http://cmph.sourceforge.net/papers/esa09.pdf) to write a pure Julia version
> under the MIT license
> (which I think might make for a good Julia GSoC 2017 project, if it hasn't
> been done before then - what do people think?)
>
> I already know of two places in Base  (in the REPL code, for Emoji and LaTex
> shortcuts) where an optimized readonly dictionary using a perfect hash could
> save significant space (by storing
> all the keys in one long string and using short offsets to each key, as well
> as removing the unnecessary \ from the LaTex keys, and the initial \: and
> trailing : from the Emoji keys).
> If the perfect hash generation were done at build time, that would also help
> speed up Julia startup time (something that has become a bit of an issue
> lately, esp. on ARM platforms).
>
> Stefan's bam (<a href="https://github.com/StefanKarpinski/bam" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FStefanKarpinski%2Fbam\46sa\75D\46sntz\0751\46usg\75AFQjCNFZE5tcEaP3kEXdyLDo53pLG-OeFw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FStefanKarpinski%2Fbam\46sa\75D\46sntz\0751\46usg\75AFQjCNFZE5tcEaP3kEXdyLDo53pLG-OeFw&#39;;return true;">https://github.com/StefanKarpinski/bam) is a good example of
> the advantages of taking such an approach.
>
> I think this would be good to add either as a separate package, or possibly
> added to DataStructures.jl.
>
>
>
>



--
Erik Schnetter <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="MsSqDSjGAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">schn...@...>
<a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fwww.perimeterinstitute.ca%2Fpersonal%2Feschnetter%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNGxlaNboZlt-tpAt8j3eV3SBzPUpg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\75http%3A%2F%2Fwww.perimeterinstitute.ca%2Fpersonal%2Feschnetter%2F\46sa\75D\46sntz\0751\46usg\75AFQjCNGxlaNboZlt-tpAt8j3eV3SBzPUpg&#39;;return true;">http://www.perimeterinstitute.ca/personal/eschnetter/
Reply | Threaded
Open this post in threaded view
|

Re: Addition of readonly Dict type to Julia

Erik Schnetter
It's immutable in the sense that, by adding an element, you create a
copy, and the original remains unchanged.

-erik

On Tue, Apr 5, 2016 at 2:22 PM, Scott Jones <[hidden email]> wrote:

> Yes, I mentioned that in my answer on StackOverflow, but that's not really
> even immutable, in the sense that you can still modify it after it is
> created.
> Items can't be deleted, but can be hidden by adding a new element with the
> same key.  That's why I'd use the techniques I described above to make
> looking up latex symbols & emoji more efficient (both space and time).
>
> On Tuesday, April 5, 2016 at 8:18:32 AM UTC-4, Erik Schnetter wrote:
>>
>> I learned recently that Base has a type `ImmutableDict` (that is not
>> exported) (yet?). This is optimized for small dictionaries, as it uses
>> a linear search in a linked list for lookup, so it's probably not
>> useful for the large number of latex symbols.
>>
>> -erik
>>
>>
>> On Mon, Apr 4, 2016 at 7:48 AM, Scott Jones <[hidden email]> wrote:
>> > Answering a question on StackOverflow
>> > (http://stackoverflow.com/questions/36385653/return-const-dictionary),
>> > I'd brought up the idea of using a perfect hash (pointing out gperf for
>> > generating a perfect hash).
>> > Stefan added some very useful information about a newer, non-GPL (it's
>> > LGPL)
>> > library, and also the possibility of directly using the CHD algorithm
>> > (http://cmph.sourceforge.net/papers/esa09.pdf) to write a pure Julia
>> > version
>> > under the MIT license
>> > (which I think might make for a good Julia GSoC 2017 project, if it
>> > hasn't
>> > been done before then - what do people think?)
>> >
>> > I already know of two places in Base  (in the REPL code, for Emoji and
>> > LaTex
>> > shortcuts) where an optimized readonly dictionary using a perfect hash
>> > could
>> > save significant space (by storing
>> > all the keys in one long string and using short offsets to each key, as
>> > well
>> > as removing the unnecessary \ from the LaTex keys, and the initial \:
>> > and
>> > trailing : from the Emoji keys).
>> > If the perfect hash generation were done at build time, that would also
>> > help
>> > speed up Julia startup time (something that has become a bit of an issue
>> > lately, esp. on ARM platforms).
>> >
>> > Stefan's bam (https://github.com/StefanKarpinski/bam) is a good example
>> > of
>> > the advantages of taking such an approach.
>> >
>> > I think this would be good to add either as a separate package, or
>> > possibly
>> > added to DataStructures.jl.
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Erik Schnetter <[hidden email]>
>> http://www.perimeterinstitute.ca/personal/eschnetter/



--
Erik Schnetter <[hidden email]>
http://www.perimeterinstitute.ca/personal/eschnetter/