# @inbounds and @simd not showing any sign of speedup

4 messages
Open this post in threaded view
|

## @inbounds and @simd not showing any sign of speedup

 Hello,I have a function which is doing basically an operation inside a loop and when adding @simd or @inbounds time doesn't improve, in any case it seems slightly worse.julia> using BenchmarkToolsjulia> A = rand(1000,1000)julia> function f!(n::Integer, DA::Number, DX::AbstractArray, incx::Integer) #Original function           i = 1           n = min(n,length(DX))           while i <= n               DX[i] *= DA               i += incx           end           DX       endf! (generic function with 1 method)julia> function f2!(n::Integer, DA::Number, DX::AbstractArray, incx::Integer) #inner cycle @inbounds and @simd           i = 1           n = min(n,length(DX))           @inbounds @simd for i in 1:incx:n               DX[i] *= DA           end           DX       endf2! (generic function with 1 method)julia> @inbounds function f3!(n::Integer, DA::Number, DX::AbstractArray, incx::Integer) #inner cycle @simd, function @inbounds           i = 1           n = min(n,length(DX))           @simd for i in 1:incx:n               DX[i] *= DA           end           DX       endjulia> minimum(@benchmark f!(length(A),1.0,A,1))BenchmarkTools.TrialEstimate:   time:             52.04 ms  gctime:           0.00 ns (0.00%)  memory:           16.00 bytes  allocs:           1  time tolerance:   5.00%  memory tolerance: 1.00%julia> minimum(@benchmark f2!(length(A),1.0,A,1))BenchmarkTools.TrialEstimate:   time:             55.80 ms  gctime:           0.00 ns (0.00%)  memory:           16.00 bytes  allocs:           1  time tolerance:   5.00%  memory tolerance: 1.00%julia> minimum(@benchmark f3!(length(A),1.0,A,1))BenchmarkTools.TrialEstimate:   time:             55.62 ms  gctime:           0.00 ns (0.00%)  memory:           16.00 bytes  allocs:           1  time tolerance:   5.00%  memory tolerance: 1.00%Is there an explanation for this? Thank you
Open this post in threaded view
|

## Re: @inbounds and @simd not showing any sign of speedup

 Hello colleague,On Friday, July 29, 2016 at 8:59:36 AM UTC+2, Juan Lopez wrote:Hello,I have a function which is doing basically an operation inside a loop and when adding @simd or @inbounds time doesn't improve, in any case it seems slightly worse. Is there an explanation for this? Thank youthere is a non-vanishing propability, that the plain loop is already compiled to the optimal code. Maybe you try to look at the lowered code.
 A great tool to figuring out what is going on in these cases is `@code_llvm`. It shows you a representation of your code that is still readable, but very close to the machine.Your simple julia code without a `@simd` is nearly optimal, but does benefits from the inclusion of `@inbounds`While_loop with `@inbounds`:  minimum time:     797.28 μsWhile loop without `@inbounds`: minimum time:    1.01 msFor loop without & with `@inbounds`: minimum time: 802/812.11 μs`function simple(A, b, stride, N)  N = min(N, length(A))  for i in 1:stride:N    @inbounds A[i] *= b  end endfunction while_based(A, b, stride, N)  i = 1   N = min(N, length(A))  while i <= N    A[i] *= b    i += stride  end end`Now to the question whether or not `@simd` is beneficial in this case. LLVM has a loop vectorizer that we run and it has a cost-benefits (and correctness) analysis when it sees a loop. The fact that in the code_llvm we don't see vectorized code means that LLVM did not deem it worth while to vectorize our code (as Kristoffer said most likely because of non unit strides). With `@simd` we (forcibly) tell LLVM to vectorize out code and to be less strict about correctness and to also not to do a cost-benefit analysis. While vectorized code has great performance benefits it also comes with costs (code size increase, overhead).I hope this tough analysis helps.On Friday, 29 July 2016 22:36:50 UTC+9, Kristoffer Carlsson wrote:It is likely because the ranges are not UnitRanges.On Friday, July 29, 2016 at 5:35:57 AM UTC-4, Andreas Lobinger wrote:Hello colleague,On Friday, July 29, 2016 at 8:59:36 AM UTC+2, Juan Lopez wrote:Hello,I have a function which is doing basically an operation inside a loop and when adding @simd or @inbounds time doesn't improve, in any case it seems slightly worse. Is there an explanation for this? Thank youthere is a non-vanishing propability, that the plain loop is already compiled to the optimal code. Maybe you try to look at the lowered code.