A story on unchecked integer operations

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

A story on unchecked integer operations

Erik Schnetter
We have this code <https://einsteintoolkit.org> that simulates black holes and other astrophysical systems. It's written in C++ (and a few other languages). I obviously intend to rewrite it in Julia, but that's not the point here.

One of the core functions allows evaluating (interpolating) the value of a function at any point in the domain. That code was originally written in 2002, and has been used and optimized and tested extensively. So you'd think it's reasonably bug-free...

Today, a colleague ran this code on Blue Waters, using 32,000 nodes, and with some other parameters set to higher resolutions that before. Given the subject of the email, you can guess what happened.

Luckily, a debugging routine was active, and caught an inconsistency (an inconsistent domain decomposition), alerting us to the problem.

Would Julia have prevented this? I know that everybody wants speed -- and if you are using 32,000 nodes, you want a lot of speed -- but the idea of bugs that only appear when you are pushing the limits makes me uncomfortable. So, no -- Julia's unchecked integer arithmetic would not have caught this bug either.

Score: Julia vs. C++, both zero.

-erik

--
Reply | Threaded
Open this post in threaded view
|

Re: A story on unchecked integer operations

John Myles White
This seems more like a use case for static analysis that checked operations to me. The problem IIUC isn't about the usage of high-performance code that is unsafe, but rather that the system was nominally tested, but tested in an imperfect way that didn't cover the failure cases. If you were rewriting this in Rust, it's easy for me to imagine that you would use checked arithmetic at the start until 5 years have passed, then you would decide it's safe and turn off the checks -- all because you had never really tested the obscure cases that only a static analyzer is likely to catch.

 -- John

On Wednesday, July 13, 2016 at 1:07:59 PM UTC-7, Erik Schnetter wrote:
We have this code <<a href="https://einsteintoolkit.org" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Feinsteintoolkit.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFdXwiSshN8qp7cXl6PBs-AQz6bRQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Feinsteintoolkit.org\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFdXwiSshN8qp7cXl6PBs-AQz6bRQ&#39;;return true;">https://einsteintoolkit.org> that simulates black holes and other astrophysical systems. It's written in C++ (and a few other languages). I obviously intend to rewrite it in Julia, but that's not the point here.

One of the core functions allows evaluating (interpolating) the value of a function at any point in the domain. That code was originally written in 2002, and has been used and optimized and tested extensively. So you'd think it's reasonably bug-free...

Today, a colleague ran this code on Blue Waters, using 32,000 nodes, and with some other parameters set to higher resolutions that before. Given the subject of the email, you can guess what happened.

Luckily, a debugging routine was active, and caught an inconsistency (an inconsistent domain decomposition), alerting us to the problem.

Would Julia have prevented this? I know that everybody wants speed -- and if you are using 32,000 nodes, you want a lot of speed -- but the idea of bugs that only appear when you are pushing the limits makes me uncomfortable. So, no -- Julia's unchecked integer arithmetic would not have caught this bug either.

Score: Julia vs. C++, both zero.

-erik

--
Erik Schnetter <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="vOA8eXCKBAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">schn...@...> <a href="http://www.perimeterinstitute.ca/personal/eschnetter/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.perimeterinstitute.ca%2Fpersonal%2Feschnetter%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxlaNboZlt-tpAt8j3eV3SBzPUpg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.perimeterinstitute.ca%2Fpersonal%2Feschnetter%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGxlaNboZlt-tpAt8j3eV3SBzPUpg&#39;;return true;">http://www.perimeterinstitute.ca/personal/eschnetter/
Reply | Threaded
Open this post in threaded view
|

Re: A story on unchecked integer operations

Erik Schnetter
I'm hoping for a system where integer operations are checked by default. If this becomes expensive in a particular function / module, then one can there (1) perform respective checks when the function is entered, (2) disable further checks inside the function, and (3) (for bonus points) use static analysis to prove that the manual checks prevent accidental overflow.

Except for the bonus part, that's how array indices are currently checked, and that's how we handle IEEE floating-point conformance.

Anyway -- adding one item to my to-do list: Experiment with a command-line flag to Julia that switches on checked integer operations.

-erik

On Wed, Jul 13, 2016 at 4:21 PM, John Myles White <[hidden email]> wrote:
This seems more like a use case for static analysis that checked operations to me. The problem IIUC isn't about the usage of high-performance code that is unsafe, but rather that the system was nominally tested, but tested in an imperfect way that didn't cover the failure cases. If you were rewriting this in Rust, it's easy for me to imagine that you would use checked arithmetic at the start until 5 years have passed, then you would decide it's safe and turn off the checks -- all because you had never really tested the obscure cases that only a static analyzer is likely to catch.

 -- John

On Wednesday, July 13, 2016 at 1:07:59 PM UTC-7, Erik Schnetter wrote:
We have this code <https://einsteintoolkit.org> that simulates black holes and other astrophysical systems. It's written in C++ (and a few other languages). I obviously intend to rewrite it in Julia, but that's not the point here.

One of the core functions allows evaluating (interpolating) the value of a function at any point in the domain. That code was originally written in 2002, and has been used and optimized and tested extensively. So you'd think it's reasonably bug-free...

Today, a colleague ran this code on Blue Waters, using 32,000 nodes, and with some other parameters set to higher resolutions that before. Given the subject of the email, you can guess what happened.

Luckily, a debugging routine was active, and caught an inconsistency (an inconsistent domain decomposition), alerting us to the problem.

Would Julia have prevented this? I know that everybody wants speed -- and if you are using 32,000 nodes, you want a lot of speed -- but the idea of bugs that only appear when you are pushing the limits makes me uncomfortable. So, no -- Julia's unchecked integer arithmetic would not have caught this bug either.

Score: Julia vs. C++, both zero.

-erik

--



--
Reply | Threaded
Open this post in threaded view
|

Re: A story on unchecked integer operations

Stefan Karpinski
Ubiquitous integer overflow checking is likely to be much more expensive that you may realize. I wish that both hardware and LLVM had better support for doing this inexpensively, but they don't. Rust's choice is to be checked in development mode and unchecked in release mode reflects this. I'm not aware of any language that's particularly performance-oriented that would have caught this kind of error.

On Wed, Jul 13, 2016 at 4:29 PM, Erik Schnetter <[hidden email]> wrote:
I'm hoping for a system where integer operations are checked by default. If this becomes expensive in a particular function / module, then one can there (1) perform respective checks when the function is entered, (2) disable further checks inside the function, and (3) (for bonus points) use static analysis to prove that the manual checks prevent accidental overflow.

Except for the bonus part, that's how array indices are currently checked, and that's how we handle IEEE floating-point conformance.

Anyway -- adding one item to my to-do list: Experiment with a command-line flag to Julia that switches on checked integer operations.

-erik

On Wed, Jul 13, 2016 at 4:21 PM, John Myles White <[hidden email]> wrote:
This seems more like a use case for static analysis that checked operations to me. The problem IIUC isn't about the usage of high-performance code that is unsafe, but rather that the system was nominally tested, but tested in an imperfect way that didn't cover the failure cases. If you were rewriting this in Rust, it's easy for me to imagine that you would use checked arithmetic at the start until 5 years have passed, then you would decide it's safe and turn off the checks -- all because you had never really tested the obscure cases that only a static analyzer is likely to catch.

 -- John

On Wednesday, July 13, 2016 at 1:07:59 PM UTC-7, Erik Schnetter wrote:
We have this code <https://einsteintoolkit.org> that simulates black holes and other astrophysical systems. It's written in C++ (and a few other languages). I obviously intend to rewrite it in Julia, but that's not the point here.

One of the core functions allows evaluating (interpolating) the value of a function at any point in the domain. That code was originally written in 2002, and has been used and optimized and tested extensively. So you'd think it's reasonably bug-free...

Today, a colleague ran this code on Blue Waters, using 32,000 nodes, and with some other parameters set to higher resolutions that before. Given the subject of the email, you can guess what happened.

Luckily, a debugging routine was active, and caught an inconsistency (an inconsistent domain decomposition), alerting us to the problem.

Would Julia have prevented this? I know that everybody wants speed -- and if you are using 32,000 nodes, you want a lot of speed -- but the idea of bugs that only appear when you are pushing the limits makes me uncomfortable. So, no -- Julia's unchecked integer arithmetic would not have caught this bug either.

Score: Julia vs. C++, both zero.

-erik

--



--

Reply | Threaded
Open this post in threaded view
|

Re: A story on unchecked integer operations

Erik Schnetter
In reply to this post by Erik Schnetter
Someone asked for more details; I'm replying publicly.

The code decomposes the domain into many small "components" which are characterized by a tuple of 4 (small) integers. We want to use these integers as key in a C++ map (i.e. Dict). To simplify things, we calculate a single, unique integer key from these 4 integers, very much like linearizing an array index. As the maximum value of all 4 integers is known, this is straightforward. Only a few components need to be handled by a particular node, so this map is very sparsely populated and efficient.

In this case, the maximum key exceeded the capacity of a C++ int. The solution was to use a 64-bit integer instead. (Another option would be to use a C++ tuple, but tuples did not exist in C++ in 2002 when the code was originally developed.)

I realize now that Julia actually would have prevented this issue on Blue Waters, since this is a 64-bit architecture, and Julia uses 64-bit integers there by default. On the other hand, a simulation on a Blue Gene/Q would have exhibited the same issue, and this is a 32-bit architecture

-erik


On Wed, Jul 13, 2016 at 4:29 PM, Erik Schnetter <[hidden email]> wrote:
I'm hoping for a system where integer operations are checked by default. If this becomes expensive in a particular function / module, then one can there (1) perform respective checks when the function is entered, (2) disable further checks inside the function, and (3) (for bonus points) use static analysis to prove that the manual checks prevent accidental overflow.

Except for the bonus part, that's how array indices are currently checked, and that's how we handle IEEE floating-point conformance.

Anyway -- adding one item to my to-do list: Experiment with a command-line flag to Julia that switches on checked integer operations.

-erik

On Wed, Jul 13, 2016 at 4:21 PM, John Myles White <[hidden email]> wrote:
This seems more like a use case for static analysis that checked operations to me. The problem IIUC isn't about the usage of high-performance code that is unsafe, but rather that the system was nominally tested, but tested in an imperfect way that didn't cover the failure cases. If you were rewriting this in Rust, it's easy for me to imagine that you would use checked arithmetic at the start until 5 years have passed, then you would decide it's safe and turn off the checks -- all because you had never really tested the obscure cases that only a static analyzer is likely to catch.

 -- John

On Wednesday, July 13, 2016 at 1:07:59 PM UTC-7, Erik Schnetter wrote:
We have this code <https://einsteintoolkit.org> that simulates black holes and other astrophysical systems. It's written in C++ (and a few other languages). I obviously intend to rewrite it in Julia, but that's not the point here.

One of the core functions allows evaluating (interpolating) the value of a function at any point in the domain. That code was originally written in 2002, and has been used and optimized and tested extensively. So you'd think it's reasonably bug-free...

Today, a colleague ran this code on Blue Waters, using 32,000 nodes, and with some other parameters set to higher resolutions that before. Given the subject of the email, you can guess what happened.

Luckily, a debugging routine was active, and caught an inconsistency (an inconsistent domain decomposition), alerting us to the problem.

Would Julia have prevented this? I know that everybody wants speed -- and if you are using 32,000 nodes, you want a lot of speed -- but the idea of bugs that only appear when you are pushing the limits makes me uncomfortable. So, no -- Julia's unchecked integer arithmetic would not have caught this bug either.

Score: Julia vs. C++, both zero.

-erik

--



--



--