Use of GNU libc extensions, such as memmem

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Use of GNU libc extensions, such as memmem

Scott Jones
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott
Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Elliot Saba
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott

Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Scott Jones
I don't know about Windows (I've been avoiding Windows for a couple years now 😃), but it's fine on Linux (after libc 5.0.9), GNU libc 2.1 and later, and OS X (FreeBSD 6.0 and later), and OpenBSD (5.4 and later).  If it isn't available though, it would be easy enough to fall back to a slower version without it for Windows (aren't the ARM platforms targeted by Julia using Linux or GNU libc though?)

For example, OS X man page:

CONFORMING TO

     memmem() is a GNU extension.


HISTORY

     The memmem() function first appeared in FreeBSD 6.0.


On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="9SwX_62xAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">scott.pa...@...> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott

Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Scott Jones
In reply to this post by Elliot Saba
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="9SwX_62xAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">scott.pa...@...> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott

Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Elliot Saba
In general, if there is a performance boost on some platforms, it doesn't majorly complicate the code and doesn't break or slow down incompatible platforms, I don't see why we wouldn't want it.  So I would say go ahead and submit a PR.  Having some numbers showing the performance boost is a good idea, and if we can get the overhead of checking whether the function exist to as low as possible (and done once) that would be ideal.
-E

On Wed, Sep 16, 2015 at 2:29 PM, Scott Jones <[hidden email]> wrote:
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott


Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Scott Jones
It wouldn't slow anything down, I'd use something like @windows_only etc. so that there's no run-time slow down.  Even without it, I think it may be order magnitude faster.

On Wednesday, September 16, 2015 at 5:38:05 PM UTC-4, Elliot Saba wrote:
In general, if there is a performance boost on some platforms, it doesn't majorly complicate the code and doesn't break or slow down incompatible platforms, I don't see why we wouldn't want it.  So I would say go ahead and submit a PR.  Having some numbers showing the performance boost is a good idea, and if we can get the overhead of checking whether the function exist to as low as possible (and done once) that would be ideal.
-E

On Wed, Sep 16, 2015 at 2:29 PM, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="YNqQo_S9AwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">scott.pa...@...> wrote:
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott


Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Isaiah Norton
If benchmarks show that there is a substantial improvement (and we can't get there yet in pure Julia for some reason), we could potentially use the musl memmem implementation (MIT licensed) on all platforms. From a brief glance it appears to be self-contained.


On Wed, Sep 16, 2015 at 5:55 PM, Scott Jones <[hidden email]> wrote:
It wouldn't slow anything down, I'd use something like @windows_only etc. so that there's no run-time slow down.  Even without it, I think it may be order magnitude faster.

On Wednesday, September 16, 2015 at 5:38:05 PM UTC-4, Elliot Saba wrote:
In general, if there is a performance boost on some platforms, it doesn't majorly complicate the code and doesn't break or slow down incompatible platforms, I don't see why we wouldn't want it.  So I would say go ahead and submit a PR.  Having some numbers showing the performance boost is a good idea, and if we can get the overhead of checking whether the function exist to as low as possible (and done once) that would be ideal.
-E

On Wed, Sep 16, 2015 at 2:29 PM, Scott Jones <[hidden email]> wrote:
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott



Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Scott Jones
Thanks!  I'll probably try rewriting that code in Julia (with all acknowledgements, of course), benchmark it against GNU libc memmem, and then decide whether to: use ccall & memmem everywhere it is available, and fallback to a Julia memmem (done at compile time!),
or use Julia version of musl memmem everywhere.

On Thursday, September 17, 2015 at 10:48:01 AM UTC-4, Isaiah wrote:
If benchmarks show that there is a substantial improvement (and we can't get there yet in pure Julia for some reason), we could potentially use the musl memmem implementation (MIT licensed) on all platforms. From a brief glance it appears to be self-contained.

<a href="https://github.com/esmil/musl/blob/master/src/string/memmem.c" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fesmil%2Fmusl%2Fblob%2Fmaster%2Fsrc%2Fstring%2Fmemmem.c\46sa\75D\46sntz\0751\46usg\75AFQjCNFJV3fWVVNy6Q-TgEt-ZEaXk80cDQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fesmil%2Fmusl%2Fblob%2Fmaster%2Fsrc%2Fstring%2Fmemmem.c\46sa\75D\46sntz\0751\46usg\75AFQjCNFJV3fWVVNy6Q-TgEt-ZEaXk80cDQ&#39;;return true;">https://github.com/esmil/musl/blob/master/src/string/memmem.c

On Wed, Sep 16, 2015 at 5:55 PM, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="egNPvij2AwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">scott.pa...@...> wrote:
It wouldn't slow anything down, I'd use something like @windows_only etc. so that there's no run-time slow down.  Even without it, I think it may be order magnitude faster.

On Wednesday, September 16, 2015 at 5:38:05 PM UTC-4, Elliot Saba wrote:
In general, if there is a performance boost on some platforms, it doesn't majorly complicate the code and doesn't break or slow down incompatible platforms, I don't see why we wouldn't want it.  So I would say go ahead and submit a PR.  Having some numbers showing the performance boost is a good idea, and if we can get the overhead of checking whether the function exist to as low as possible (and done once) that would be ideal.
-E

On Wed, Sep 16, 2015 at 2:29 PM, Scott Jones <[hidden email]> wrote:
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott



Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

elextr
Scott,

One thing to watch out for is if there are any vector instructions that you could use (via compiler intrinsics).  

One proposal that a project I am involved with use a complex search instead of str* operations was extremely embarrassed when forced to benchmark, since the system str* operations used vector instructions where they could, and blew the complex code out the water :)

Cheers
Lex

On Friday, September 18, 2015 at 1:02:46 AM UTC+10, Scott Jones wrote:
Thanks!  I'll probably try rewriting that code in Julia (with all acknowledgements, of course), benchmark it against GNU libc memmem, and then decide whether to: use ccall & memmem everywhere it is available, and fallback to a Julia memmem (done at compile time!),
or use Julia version of musl memmem everywhere.

On Thursday, September 17, 2015 at 10:48:01 AM UTC-4, Isaiah wrote:
If benchmarks show that there is a substantial improvement (and we can't get there yet in pure Julia for some reason), we could potentially use the musl memmem implementation (MIT licensed) on all platforms. From a brief glance it appears to be self-contained.

<a href="https://github.com/esmil/musl/blob/master/src/string/memmem.c" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fesmil%2Fmusl%2Fblob%2Fmaster%2Fsrc%2Fstring%2Fmemmem.c\46sa\75D\46sntz\0751\46usg\75AFQjCNFJV3fWVVNy6Q-TgEt-ZEaXk80cDQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fesmil%2Fmusl%2Fblob%2Fmaster%2Fsrc%2Fstring%2Fmemmem.c\46sa\75D\46sntz\0751\46usg\75AFQjCNFJV3fWVVNy6Q-TgEt-ZEaXk80cDQ&#39;;return true;">https://github.com/esmil/musl/blob/master/src/string/memmem.c

On Wed, Sep 16, 2015 at 5:55 PM, Scott Jones <[hidden email]> wrote:
It wouldn't slow anything down, I'd use something like @windows_only etc. so that there's no run-time slow down.  Even without it, I think it may be order magnitude faster.

On Wednesday, September 16, 2015 at 5:38:05 PM UTC-4, Elliot Saba wrote:
In general, if there is a performance boost on some platforms, it doesn't majorly complicate the code and doesn't break or slow down incompatible platforms, I don't see why we wouldn't want it.  So I would say go ahead and submit a PR.  Having some numbers showing the performance boost is a good idea, and if we can get the overhead of checking whether the function exist to as low as possible (and done once) that would be ideal.
-E

On Wed, Sep 16, 2015 at 2:29 PM, Scott Jones <[hidden email]> wrote:
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott



Reply | Threaded
Open this post in threaded view
|

Re: Use of GNU libc extensions, such as memmem

Scott Jones
Oh yes, I'm well aware of that!
Used to code those vector instructions (when available) in assembly (before any compiler instrinsics were available), for x86, POWER, Alpha, Sparc, etc.
Had the same experience, another developer wanted to change things to use KMP algorithm, but, at least for our most common inputs, my "brute force" optimized code was a lot faster.

Thanks, Scott

On Friday, September 18, 2015 at 5:12:43 AM UTC-4, [hidden email] wrote:
Scott,

One thing to watch out for is if there are any vector instructions that you could use (via compiler intrinsics).  

One proposal that a project I am involved with use a complex search instead of str* operations was extremely embarrassed when forced to benchmark, since the system str* operations used vector instructions where they could, and blew the complex code out the water :)

Cheers
Lex

On Friday, September 18, 2015 at 1:02:46 AM UTC+10, Scott Jones wrote:
Thanks!  I'll probably try rewriting that code in Julia (with all acknowledgements, of course), benchmark it against GNU libc memmem, and then decide whether to: use ccall & memmem everywhere it is available, and fallback to a Julia memmem (done at compile time!),
or use Julia version of musl memmem everywhere.

On Thursday, September 17, 2015 at 10:48:01 AM UTC-4, Isaiah wrote:
If benchmarks show that there is a substantial improvement (and we can't get there yet in pure Julia for some reason), we could potentially use the musl memmem implementation (MIT licensed) on all platforms. From a brief glance it appears to be self-contained.

<a href="https://github.com/esmil/musl/blob/master/src/string/memmem.c" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fesmil%2Fmusl%2Fblob%2Fmaster%2Fsrc%2Fstring%2Fmemmem.c\46sa\75D\46sntz\0751\46usg\75AFQjCNFJV3fWVVNy6Q-TgEt-ZEaXk80cDQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Fesmil%2Fmusl%2Fblob%2Fmaster%2Fsrc%2Fstring%2Fmemmem.c\46sa\75D\46sntz\0751\46usg\75AFQjCNFJV3fWVVNy6Q-TgEt-ZEaXk80cDQ&#39;;return true;">https://github.com/esmil/musl/blob/master/src/string/memmem.c

On Wed, Sep 16, 2015 at 5:55 PM, Scott Jones <[hidden email]> wrote:
It wouldn't slow anything down, I'd use something like @windows_only etc. so that there's no run-time slow down.  Even without it, I think it may be order magnitude faster.

On Wednesday, September 16, 2015 at 5:38:05 PM UTC-4, Elliot Saba wrote:
In general, if there is a performance boost on some platforms, it doesn't majorly complicate the code and doesn't break or slow down incompatible platforms, I don't see why we wouldn't want it.  So I would say go ahead and submit a PR.  Having some numbers showing the performance boost is a good idea, and if we can get the overhead of checking whether the function exist to as low as possible (and done once) that would be ideal.
-E

On Wed, Sep 16, 2015 at 2:29 PM, Scott Jones <[hidden email]> wrote:
Was more of a conventions question - this is available almost everywhere, and gets a good performance boost, and the code can be written to fall back to a slower version, but I didn't want to spend time writing it, only to be told, no, that just can't be used in Julia.

Thanks, Scott

On Wednesday, September 16, 2015 at 1:53:07 PM UTC-4, Elliot Saba wrote:
Well clearly whatever code you write would not be portable to other libc libraries (e.g. Windows MSVC, OSX, OpenBSD, potentially some future ARM platforms), so that would be a problem, but ccall() should make this fairly straightforward, no?  Is there any technical hurdle you're trying to overcome, or is this more of a conventions question?
-E

On Tue, Sep 15, 2015 at 3:31 PM, Scott Jones <[hidden email]> wrote:
I just wanted to know if it would be possible to use some of the GNU libc extensions in Julia code.
Currently, code in Base uses standard libc functions such as memchr and memcpy, but some of the GNU extensions
could be used to substantially speed up some of the functions such as search and replace.
(I've been writing a faster version of replace, that also eliminates the type instability (except in the case of an ASCIIString where the replacement string has non-ASCII characters)).

Thanks, Scott