# A naive benchmark

15 messages
Open this post in threaded view
|

## A naive benchmark

 Hello everyone,I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower..... So I did some naive benchmark and found strange results... I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )PS: I know that it's a bit of a recurrent question.... The codes are Julia code `nbp = 2^12;m = rand(nbp,nbp);a = 0.0;Mr = zeros(nbp,nbp);tic()for k = 1:nbp    for kk = 1:nbp    Mr[k,kk] = m[k,kk]*m[k,kk];    endendtoc()`Elapsed time: 7.481011275 secondsMATLAB code`nbp = 2^12;m = rand(nbp,nbp);a = 0.0;Mr = zeros(nbp,nbp);ticfor k = 1:nbp    for kk = 1:nbp       Mr(k,kk) =m(k,kk)*m(k,kk);        endendtoc`Elapsed time is 0.618451 seconds.
Open this post in threaded view
|

## Re: A naive benchmark

 See the top of http://docs.julialang.org/en/release-0.4/manual/performance-tips/On Friday, July 1, 2016 at 7:16:10 AM UTC-7, baillot maxime wrote:Hello everyone,I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower..... So I did some naive benchmark and found strange results... I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )PS: I know that it's a bit of a recurrent question.... The codes are Julia code `nbp = 2^12;m = rand(nbp,nbp);a = 0.0;Mr = zeros(nbp,nbp);tic()for k = 1:nbp    for kk = 1:nbp    Mr[k,kk] = m[k,kk]*m[k,kk];    endendtoc()`Elapsed time: 7.481011275 secondsMATLAB code`nbp = 2^12;m = rand(nbp,nbp);a = 0.0;Mr = zeros(nbp,nbp);ticfor k = 1:nbp    for kk = 1:nbp       Mr(k,kk) =m(k,kk)*m(k,kk);        endendtoc`Elapsed time is 0.618451 seconds.
Open this post in threaded view
|

## Re: A naive benchmark

 Ok, I did that`function test(M,nbp)  Mr = zeros(nbp,nbp)  for k = 1:nbp      for kk = 1:nbp      Mr[k,kk] = M[k,kk]*M[k,kk];      end  endreturn Mrendfunction main()  nbp = 2^12;m = rand(nbp,nbp);@time test(m,nbp)endmain()`and yes it was much faster  0.627739 seconds (2 allocations: 128.000 MB, 1.34% gc time).I'm not sure if I understand but I think I do. Anyway thanks for the answer.On Friday, July 1, 2016 at 4:26:01 PM UTC+2, John Myles White wrote:See the top of http://docs.julialang.org/en/release-0.4/manual/performance-tips/On Friday, July 1, 2016 at 7:16:10 AM UTC-7, baillot maxime wrote:Hello everyone,I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower..... So I did some naive benchmark and found strange results... I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )PS: I know that it's a bit of a recurrent question.... The codes are Julia code `nbp = 2^12;m = rand(nbp,nbp);a = 0.0;Mr = zeros(nbp,nbp);tic()for k = 1:nbp    for kk = 1:nbp    Mr[k,kk] = m[k,kk]*m[k,kk];    endendtoc()`Elapsed time: 7.481011275 secondsMATLAB code`nbp = 2^12;m = rand(nbp,nbp);a = 0.0;Mr = zeros(nbp,nbp);ticfor k = 1:nbp    for kk = 1:nbp       Mr(k,kk) =m(k,kk)*m(k,kk);        endendtoc`Elapsed time is 0.618451 seconds.
Open this post in threaded view
|

## Re: A naive benchmark

 Also remember that the first time you execute a function you should not @time it becausethen it also measures the compilation time. You can test this by doing@time test(m,nbp)@time test(m,nbp)@time test(m,nbp)and probably see that the 2nd and 3rd time the timing is lowerthan for the first one.
Open this post in threaded view
|

## Re: A naive benchmark

 Thank you but don't worry in general I do it like 3 or 4 times before looking at the time :)I think the problem in my main (not the useless code I did to do the benchmark) Julia code come from the BLAS library. The library look like much slower than the library MATLAB use to do matrix element wise multiplication.On Friday, July 1, 2016 at 4:53:01 PM UTC+2, Andre Bieler wrote:Also remember that the first time you execute a function you should not @time it becausethen it also measures the compilation time. You can test this by doing@time test(m,nbp)@time test(m,nbp)@time test(m,nbp)and probably see that the 2nd and 3rd time the timing is lowerthan for the first one.
Open this post in threaded view
|

## Re: A naive benchmark

 Read about cache here: http://julialang.org/blog/2013/09/fast-numericand add @inbounds @simd in front of the inner loop. --Tim On Friday, July 1, 2016 8:10:33 AM CDT baillot maxime wrote: > Thank you but don't worry in general I do it like 3 or 4 times before > looking at the time :) > > I think the problem in my main (not the useless code I did to do the > benchmark) Julia code come from the BLAS library. The library look like > much slower than the library MATLAB use to do matrix element wise > multiplication. > > On Friday, July 1, 2016 at 4:53:01 PM UTC+2, Andre Bieler wrote: > > Also remember that the first time you execute a function you should not > > @time it because > > then it also measures the compilation time. You can test this by doing > > > > @time test(m,nbp) > > @time test(m,nbp) > > @time test(m,nbp) > > > > and probably see that the 2nd and 3rd time the timing is lower > > than for the first one.
Open this post in threaded view
|

## Re: A naive benchmark

 To be explicit, you are looping over things in the wrong order. You want t he next loop iteration to access data that is close in memory to the previous iteration. Right now you are making big jumps in memory between each iteration.
Open this post in threaded view
|

## Re: A naive benchmark

 To be even more explicit: loop over rows in the innermost loop, not cols. You should get an improvement of around x6 (at least I do).On Friday, July 1, 2016 at 7:26:20 PM UTC+2, Kristoffer Carlsson wrote:To be explicit, you are looping over things in the wrong order. You want t he next loop iteration to access data that is close in memory to the previous iteration. Right now you are making big jumps in memory between each iteration.
Open this post in threaded view
|

## Re: A naive benchmark

 @Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.Thanks to you all. Now I understand a lot of things and why it was slower than matlab.So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Open this post in threaded view
|

## Re: A naive benchmark

 In fact no I did just try it.Anyway thank to all of you :)On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.Thanks to you all. Now I understand a lot of things and why it was slower than matlab.So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Open this post in threaded view
|

## Re: A naive benchmark

 In reply to this post by baillot maxime BLAS will be faster for (non-trivial sized) matrix multiplications, but it doesn't apply to component-wise operations (.*, ./).For component-wise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).For this equation, you can devectorize it using the Devectorize.jl macro:@devec Mr = m.*mAt least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code.On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.Thanks to you all. Now I understand a lot of things and why it was slower than matlab.So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Open this post in threaded view
|

## Re: A naive benchmark

 Ok,I tried it on my main program but it's slower. Also in my main program I use vectors not 2D matrix so maybe that why it's slower.On Saturday, July 2, 2016 at 3:23:49 AM UTC+2, Chris Rackauckas wrote:BLAS will be faster for (non-trivial sized) matrix multiplications, but it doesn't apply to component-wise operations (.*, ./).For component-wise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).For this equation, you can devectorize it using the Devectorize.jl macro:@devec Mr = m.*mAt least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code.On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.Thanks to you all. Now I understand a lot of things and why it was slower than matlab.So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Open this post in threaded view
|

## Re: A naive benchmark

 But That ok :)Thank you for the idea!On Saturday, July 2, 2016 at 11:22:21 AM UTC+2, baillot maxime wrote:Ok,I tried it on my main program but it's slower. Also in my main program I use vectors not 2D matrix so maybe that why it's slower.On Saturday, July 2, 2016 at 3:23:49 AM UTC+2, Chris Rackauckas wrote:BLAS will be faster for (non-trivial sized) matrix multiplications, but it doesn't apply to component-wise operations (.*, ./).For component-wise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).For this equation, you can devectorize it using the Devectorize.jl macro:@devec Mr = m.*mAt least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code.On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.Thanks to you all. Now I understand a lot of things and why it was slower than matlab.So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?