# Ordering assumptions with combine function for sparse matrix (or vector) construction

8 messages
Open this post in threaded view
|

## Ordering assumptions with combine function for sparse matrix (or vector) construction

 I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.Is it required that the combine function work on dupllicate values in the order they are present in the V vector?In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the resultsfrom the current sparse routine, because floating point addition (the default combine function) is not commutative.Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.Thanks in advance for any help.(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)
Open this post in threaded view
|

## Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

 Floating point addition is commutative. It isn't associative.On Sunday, October 4, 2015, Scott Jones <[hidden email]> wrote:I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.Is it required that the combine function work on dupllicate values in the order they are present in the V vector?In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the resultsfrom the current sparse routine, because floating point addition (the default combine function) is not commutative.Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.Thanks in advance for any help.(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)
Open this post in threaded view
|

## Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

 Argh!  Got the terminology crossed.  I expect you understood what I really meant though.Do you have any clarification about what I was actually asking about?There isn't anything in the documentation that I was able to find about what order duplicate entries are combined, another thing toadd to my list of minor clarifications to be added to the sparse / sort documentation.(It might not even be necessary - because I realized that by treating the index into I/J/V as part of the key i.e. (col, row, index),I eliminate the possibility of duplicates, and even the quicksort will preserve the order I want.)On Sunday, October 4, 2015 at 4:03:06 PM UTC-4, Stefan Karpinski wrote:Floating point addition is commutative. It isn't associative.On Sunday, October 4, 2015, Scott Jones <scott.pa...@...> wrote:I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.Is it required that the combine function work on dupllicate values in the order they are present in the V vector?In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the resultsfrom the current sparse routine, because floating point addition (the default combine function) is not commutative.Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.Thanks in advance for any help.(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)
Open this post in threaded view
|

## Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

 Right, just wanted to nip potential misinformation in the bud – for some reason people often claim that fp ops aren't commutative. I don't know much about sparse stuff, so someone else will have to chime in on that.On Oct 4, 2015, at 1:24 PM, Scott Jones <[hidden email]> wrote:Argh!  Got the terminology crossed.  I expect you understood what I really meant though.Do you have any clarification about what I was actually asking about?There isn't anything in the documentation that I was able to find about what order duplicate entries are combined, another thing toadd to my list of minor clarifications to be added to the sparse / sort documentation.(It might not even be necessary - because I realized that by treating the index into I/J/V as part of the key i.e. (col, row, index),I eliminate the possibility of duplicates, and even the quicksort will preserve the order I want.)On Sunday, October 4, 2015 at 4:03:06 PM UTC-4, Stefan Karpinski wrote:Floating point addition is commutative. It isn't associative.On Sunday, October 4, 2015, Scott Jones <scott.pa...@...> wrote:I wanted to get some clarification on whether or not there were any ordering assumptions with the combine function in the sparse() function.Is it required that the combine function work on dupllicate values in the order they are present in the V vector?In testing a faster version of the sparse function which uses quicksort (which is not stable), I noticed that the results were not identical to the resultsfrom the current sparse routine, because floating point addition (the default combine function) is not commutative.Since frequently quicksort is faster than mergesort, I'd hoped to be able to use quicksort, but ran into this issue.Also, I realized that the combine function might simply return the 1st value (first duplicate wins) or the 2nd value (last duplicate wins), in which case the ordering would be critical.Thanks in advance for any help.(Some good news, in my tests, my sparse code is >50 faster for the case that Viral put in issue #13400)
Open this post in threaded view
|

## Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

 Given that we support very general element types and combine functions, a stable sort would be the more predictable default choice. Otherwise sorting implementation details could leak into the output results. If the performance difference is worth the API complication, perhaps an unstable sort could be opted into behind a keyword argument.
Open this post in threaded view
|

## Re: Ordering assumptions with combine function for sparse matrix (or vector) construction

 Yes, thanks. I'm going to have to benchmark the three approaches, mergesort or quicksort on a work vector with (col, row) as key, index into V as value, or a quicksort with (col, row, index) as the key to make it stable, to see if there is enough of a performance difference to make the option worthwhile.What are typical sizes and sparsity of sparse arrays used currently by Julians?Is it true that there is only support (currently) for spare matrices (i.e. 2D)I know sparse vector support is currently being worked on, but what about more than 3 dimensions?My past experience was with implementing multi-dimensional sparse (associative) arrays, but I haven't seen anythingquite like that in Julia yet (nor in the packages, but there are so many, I might have missed it).Another thing that I don't understand is why Ti<:Integer is part of the parameters for sparse matrices in Julia.If that weren't part of the parameters, the sparse matrix code could pick different sizes for the colptr and rowval vectors,for example, Vector{Int16} or Vector{Int32}, instead of using 8 bytes always on 64-bit systems, which can hurt performancebecause of using more memory and causing more cache misses.On Sunday, October 4, 2015 at 7:34:38 PM UTC-4, Tony Kelman wrote:Given that we support very general element types and combine functions, a stable sort would be the more predictable default choice. Otherwise sorting implementation details could leak into the output results. If the performance difference is worth the API complication, perhaps an unstable sort could be opted into behind a keyword argument.