

Hello everyone,
I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower.....
So I did some naive benchmark and found strange results...
I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.
Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )
PS: I know that it's a bit of a recurrent question....
The codes are
Julia code
nbp = 2^12;
m = rand(nbp,nbp); a = 0.0; Mr = zeros(nbp,nbp);
tic() for k = 1:nbp for kk = 1:nbp
Mr[k,kk] = m[k,kk]*m[k,kk];
end end toc()
Elapsed time: 7.481011275 seconds
MATLAB code
nbp = 2^12;
m = rand(nbp,nbp); a = 0.0; Mr = zeros(nbp,nbp);
tic for k = 1:nbp for kk = 1:nbp Mr(k,kk) =m(k,kk)*m(k,kk); end end toc
Elapsed time is 0.618451 seconds.


See the top of http://docs.julialang.org/en/release0.4/manual/performancetips/ On Friday, July 1, 2016 at 7:16:10 AM UTC7, baillot maxime wrote: Hello everyone,
I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower.....
So I did some naive benchmark and found strange results...
I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.
Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )
PS: I know that it's a bit of a recurrent question....
The codes are
Julia code
nbp = 2^12;
m = rand(nbp,nbp); a = 0.0; Mr = zeros(nbp,nbp);
tic() for k = 1:nbp for kk = 1:nbp
Mr[k,kk] = m[k,kk]*m[k,kk];
end end toc()
Elapsed time: 7.481011275 seconds
MATLAB code
nbp = 2^12;
m = rand(nbp,nbp); a = 0.0; Mr = zeros(nbp,nbp);
tic for k = 1:nbp for kk = 1:nbp Mr(k,kk) =m(k,kk)*m(k,kk); end end toc
Elapsed time is 0.618451 seconds.


Ok, I did that
function test(M,nbp) Mr = zeros(nbp,nbp)
for k = 1:nbp for kk = 1:nbp
Mr[k,kk] = M[k,kk]*M[k,kk];
end end
return Mr
end
function main() nbp = 2^12;
m = rand(nbp,nbp);
@time test(m,nbp)
end
main()
and yes it was much faster 0.627739 seconds (2 allocations: 128.000 MB, 1.34% gc time).
I'm not sure if I understand but I think I do.
Anyway thanks for the answer. On Friday, July 1, 2016 at 4:26:01 PM UTC+2, John Myles White wrote: See the top of <a href="http://docs.julialang.org/en/release0.4/manual/performancetips/" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fdocs.julialang.org%2Fen%2Frelease0.4%2Fmanual%2Fperformancetips%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEC2PAR3KTiuej_QsQZqElVlqChcg';return true;" onclick="this.href='http://www.google.com/url?q\x3dhttp%3A%2F%2Fdocs.julialang.org%2Fen%2Frelease0.4%2Fmanual%2Fperformancetips%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEC2PAR3KTiuej_QsQZqElVlqChcg';return true;">http://docs.julialang.org/en/ release0.4/manual/performancetips/On Friday, July 1, 2016 at 7:16:10 AM UTC7, baillot maxime wrote: Hello everyone,
I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower.....
So I did some naive benchmark and found strange results...
I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.
Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )
PS: I know that it's a bit of a recurrent question....
The codes are
Julia code
nbp = 2^12;
m = rand(nbp,nbp); a = 0.0; Mr = zeros(nbp,nbp);
tic() for k = 1:nbp for kk = 1:nbp
Mr[k,kk] = m[k,kk]*m[k,kk];
end end toc()
Elapsed time: 7.481011275 seconds
MATLAB code
nbp = 2^12;
m = rand(nbp,nbp); a = 0.0; Mr = zeros(nbp,nbp);
tic for k = 1:nbp for kk = 1:nbp Mr(k,kk) =m(k,kk)*m(k,kk); end end toc
Elapsed time is 0.618451 seconds.


Also remember that the first time you execute a function you should not @time it because then it also measures the compilation time. You can test this by doing
@time test(m,nbp)
@time test(m,nbp)
@time test(m,nbp)
and probably see that the 2nd and 3rd time the timing is lower than for the first one.


Thank you but don't worry in general I do it like 3 or 4 times before looking at the time :)
I think the problem in my main (not the useless code I did to do the benchmark) Julia code come from the BLAS library. The library look like much slower than the library MATLAB use to do matrix element wise multiplication. On Friday, July 1, 2016 at 4:53:01 PM UTC+2, Andre Bieler wrote: Also remember that the first time you execute a function you should not @time it because then it also measures the compilation time. You can test this by doing
@time test(m,nbp)
@time test(m,nbp)
@time test(m,nbp)
and probably see that the 2nd and 3rd time the timing is lower than for the first one.


Read about cache here:
http://julialang.org/blog/2013/09/fastnumericand add @inbounds @simd in front of the inner loop.
Tim
On Friday, July 1, 2016 8:10:33 AM CDT baillot maxime wrote:
> Thank you but don't worry in general I do it like 3 or 4 times before
> looking at the time :)
>
> I think the problem in my main (not the useless code I did to do the
> benchmark) Julia code come from the BLAS library. The library look like
> much slower than the library MATLAB use to do matrix element wise
> multiplication.
>
> On Friday, July 1, 2016 at 4:53:01 PM UTC+2, Andre Bieler wrote:
> > Also remember that the first time you execute a function you should not
> > @time it because
> > then it also measures the compilation time. You can test this by doing
> >
> > @time test(m,nbp)
> > @time test(m,nbp)
> > @time test(m,nbp)
> >
> > and probably see that the 2nd and 3rd time the timing is lower
> > than for the first one.


To be explicit, you are looping over things in the wrong order. You want t he next loop iteration to access data that is close in memory to the previous iteration. Right now you are making big jumps in memory between each iteration.


To be even more explicit: loop over rows in the innermost loop, not cols. You should get an improvement of around x6 (at least I do). On Friday, July 1, 2016 at 7:26:20 PM UTC+2, Kristoffer Carlsson wrote: To be explicit, you are looping over things in the wrong order. You want t he next loop iteration to access data that is close in memory to the previous iteration. Right now you are making big jumps in memory between each iteration.


@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)
@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to 0.037768 seconds.
Thanks to you all. Now I understand a lot of things and why it was slower than matlab.
So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ . Not a matrixial language like Julia or Matlab.
Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?


In fact no I did just try it.
Anyway thank to all of you :) On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote: @Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)
@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to 0.037768 seconds.
Thanks to you all. Now I understand a lot of things and why it was slower than matlab.
So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ . Not a matrixial language like Julia or Matlab.
Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?


BLAS will be faster for (nontrivial sized) matrix multiplications, but it doesn't apply to componentwise operations (.*, ./).
For componentwise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).
For this equation, you can devectorize it using the Devectorize.jl macro:
@devec Mr = m.*m
At least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code. On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote: @Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)
@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to 0.037768 seconds.
Thanks to you all. Now I understand a lot of things and why it was slower than matlab.
So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ . Not a matrixial language like Julia or Matlab.
Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?


Ok,
I tried it on my main program but it's slower. Also in my main program I use vectors not 2D matrix so maybe that why it's slower. On Saturday, July 2, 2016 at 3:23:49 AM UTC+2, Chris Rackauckas wrote: BLAS will be faster for (nontrivial sized) matrix multiplications, but it doesn't apply to componentwise operations (.*, ./).
For componentwise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).
For this equation, you can devectorize it using the Devectorize.jl macro:
@devec Mr = m.*m
At least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code. On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote: @Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)
@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to 0.037768 seconds.
Thanks to you all. Now I understand a lot of things and why it was slower than matlab.
So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ . Not a matrixial language like Julia or Matlab.
Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?


But That ok :) Thank you for the idea! On Saturday, July 2, 2016 at 11:22:21 AM UTC+2, baillot maxime wrote: Ok,
I tried it on my main program but it's slower. Also in my main program I use vectors not 2D matrix so maybe that why it's slower. On Saturday, July 2, 2016 at 3:23:49 AM UTC+2, Chris Rackauckas wrote: BLAS will be faster for (nontrivial sized) matrix multiplications, but it doesn't apply to componentwise operations (.*, ./).
For componentwise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).
For this equation, you can devectorize it using the Devectorize.jl macro:
@devec Mr = m.*m
At least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code. On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote: @Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)
@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to 0.037768 seconds.
Thanks to you all. Now I understand a lot of things and why it was slower than matlab.
So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ . Not a matrixial language like Julia or Matlab.
Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?


Le vendredi 01 juillet 2016 à 16:11 0700, baillot maxime a écrit :
> @Tim Holy : Thank you for the web page. I didn't know it. Now I
> understand a lot of thing :)
>
> @Kristoffer and Patrick: I just read about that in the link that Tim
> gave me. I did change the code and the time just past from 0.348052
> seconds to 0.037768 seconds.
>
> Thanks to you all. Now I understand a lot of things and why it was
> slower than matlab.
>
> So now I understand why a lot of people was speaking
> about Devectorizing matrix calculus. But I think it's sad, because if
> I want to do this I will use C or C++ . Not a matrixial language
> like Julia or Matlab.
Note that work is going on to allow vectorized syntax to be (almost) as
efficient as devectorized loops. See
https://github.com/JuliaLang/julia/issues/16285Regards
> Anyway! So if I'm not mistaking... It's better for me to create a
> "mul()" function than use the ".*" ?


Nice! Thanks for the Info On Saturday, July 2, 2016 at 11:53:25 AM UTC+2, Milan BouchetValat wrote: Le vendredi 01 juillet 2016 à 16:11 0700, baillot maxime a écrit :
> @Tim Holy : Thank you for the web page. I didn't know it. Now I
> understand a lot of thing :)
>
> @Kristoffer and Patrick: I just read about that in the link that Tim
> gave me. I did change the code and the time just past from 0.348052
> seconds to 0.037768 seconds.
>
> Thanks to you all. Now I understand a lot of things and why it was
> slower than matlab.
>
> So now I understand why a lot of people was speaking
> about Devectorizing matrix calculus. But I think it's sad, because if
> I want to do this I will use C or C++ . Not a matrixial language
> like Julia or Matlab.
Note that work is going on to allow vectorized syntax to be (almost) as
efficient as devectorized loops. See
<a href="https://github.com/JuliaLang/julia/issues/16285" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F16285\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFrerV6dBaPOHysO2KpQK__nfOH2Q';return true;" onclick="this.href='https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F16285\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFrerV6dBaPOHysO2KpQK__nfOH2Q';return true;">https://github.com/JuliaLang/julia/issues/16285
Regards
> Anyway! So if I'm not mistaking... It's better for me to create a
> "mul()" function than use the ".*" ?

