Ordering assumptions with combine function for sparse matrix (or vector) construction

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Ordering assumptions with combine function for sparse matrix (or vector) construction

Scott Jones
I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.

Is it required that the combine function work on dupllicate values in the order they are present in the V vector?

In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the results
from the current sparse routine, because floating point addition (the default combine function) is not commutative.
Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.

Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.

Thanks in advance for any help.

(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)

Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Stefan Karpinski
Floating point addition is commutative. It isn't associative.

On Sunday, October 4, 2015, Scott Jones <[hidden email]> wrote:
I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.

Is it required that the combine function work on dupllicate values in the order they are present in the V vector?

In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the results
from the current sparse routine, because floating point addition (the default combine function) is not commutative.
Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.

Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.

Thanks in advance for any help.

(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)

Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Scott Jones
Argh!  Got the terminology crossed.  I expect you understood what I really meant though.

Do you have any clarification about what I was actually asking about?
There isn't anything in the documentation that I was able to find about what order duplicate entries are combined, another thing to
add to my list of minor clarifications to be added to the sparse / sort documentation.

(It might not even be necessary - because I realized that by treating the index into I/J/V as part of the key i.e. (col, row, index),
I eliminate the possibility of duplicates, and even the quicksort will preserve the order I want.)

On Sunday, October 4, 2015 at 4:03:06 PM UTC-4, Stefan Karpinski wrote:
Floating point addition is commutative. It isn't associative.

On Sunday, October 4, 2015, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="1ZZ0c_nCDAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">scott.pa...@...> wrote:
I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.

Is it required that the combine function work on dupllicate values in the order they are present in the V vector?

In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the results
from the current sparse routine, because floating point addition (the default combine function) is not commutative.
Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.

Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.

Thanks in advance for any help.

(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)

Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Stefan Karpinski-2
Right, just wanted to nip potential misinformation in the bud – for some reason people often claim that fp ops aren't commutative. I don't know much about sparse stuff, so someone else will have to chime in on that.

On Oct 4, 2015, at 1:24 PM, Scott Jones <[hidden email]> wrote:

Argh!  Got the terminology crossed.  I expect you understood what I really meant though.

Do you have any clarification about what I was actually asking about?
There isn't anything in the documentation that I was able to find about what order duplicate entries are combined, another thing to
add to my list of minor clarifications to be added to the sparse / sort documentation.

(It might not even be necessary - because I realized that by treating the index into I/J/V as part of the key i.e. (col, row, index),
I eliminate the possibility of duplicates, and even the quicksort will preserve the order I want.)

On Sunday, October 4, 2015 at 4:03:06 PM UTC-4, Stefan Karpinski wrote:
Floating point addition is commutative. It isn't associative.

On Sunday, October 4, 2015, Scott Jones <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="1ZZ0c_nCDAAJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">scott.pa...@...> wrote:
I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.

Is it required that the combine function work on dupllicate values in the order they are present in the V vector?

In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the results
from the current sparse routine, because floating point addition (the default combine function) is not commutative.
Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.

Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.

Thanks in advance for any help.

(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)

Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Tony Kelman
Given that we support very general element types and combine functions, a stable sort would be the more predictable default choice. Otherwise sorting implementation details could leak into the output results. If the performance difference is worth the API complication, perhaps an unstable sort could be opted into behind a keyword argument.
Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Scott Jones
Yes, thanks. I'm going to have to benchmark the three approaches, mergesort or quicksort on a work vector with (col, row) as key, index into V as value, or a quicksort with (col, row, index) as the key to make it stable, to see if there is enough of a performance difference to make the option worthwhile.

What are typical sizes and sparsity of sparse arrays used currently by Julians?
Is it true that there is only support (currently) for spare matrices (i.e. 2D)
I know sparse vector support is currently being worked on, but what about more than 3 dimensions?
My past experience was with implementing multi-dimensional sparse (associative) arrays, but I haven't seen anything
quite like that in Julia yet (nor in the packages, but there are so many, I might have missed it).

Another thing that I don't understand is why Ti<:Integer is part of the parameters for sparse matrices in Julia.
If that weren't part of the parameters, the sparse matrix code could pick different sizes for the colptr and rowval vectors,
for example, Vector{Int16} or Vector{Int32}, instead of using 8 bytes always on 64-bit systems, which can hurt performance
because of using more memory and causing more cache misses.

On Sunday, October 4, 2015 at 7:34:38 PM UTC-4, Tony Kelman wrote:
Given that we support very general element types and combine functions, a stable sort would be the more predictable default choice. Otherwise sorting implementation details could leak into the output results. If the performance difference is worth the API complication, perhaps an unstable sort could be opted into behind a keyword argument.
Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Steven G. Johnson


On Sunday, October 4, 2015 at 8:25:47 PM UTC-4, Scott Jones wrote: 
I know sparse vector support is currently being worked on, but what about more than 3 dimensions?
My past experience was with implementing multi-dimensional sparse (associative) arrays, but I haven't seen anything
quite like that in Julia yet (nor in the packages, but there are so many, I might have missed it).
 
SparseMatrix in Julia is designed for sparse linear algebra.

Unless you are doing sparse multi-linear algebra, why can't you use a Dict?  Passing multiple indices to a Dict works fine.
Reply | Threaded
Open this post in threaded view
|

Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

Scott Jones
I'd like to have structures that work as well or better than they currently do for vectors and matrices, but can also handle efficiently other use cases,
such as where undefined or Null values instead of zeros are the sparse entries, and that are sorted.
Yes, sorted dicts can be used for such, but that doesn't give you an optimized structure for the case where the keys are all integers, for example.

On Sunday, October 4, 2015 at 11:06:27 PM UTC-4, Steven G. Johnson wrote:


On Sunday, October 4, 2015 at 8:25:47 PM UTC-4, Scott Jones wrote: 
I know sparse vector support is currently being worked on, but what about more than 3 dimensions?
My past experience was with implementing multi-dimensional sparse (associative) arrays, but I haven't seen anything
quite like that in Julia yet (nor in the packages, but there are so many, I might have missed it).
 
SparseMatrix in Julia is designed for sparse linear algebra.

Unless you are doing sparse multi-linear algebra, why can't you use a Dict?  Passing multiple indices to a Dict works fine.