A naive benchmark

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

A naive benchmark

baillot maxime
Hello everyone,

I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower..... 

So I did some naive benchmark and found strange results... 

I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.

Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )

PS: I know that it's a bit of a recurrent question.... 

The codes are 

Julia code 

nbp = 2^12;

m = rand(nbp,nbp);
a = 0.0;
Mr = zeros(nbp,nbp);

tic()
for k = 1:nbp
    for kk = 1:nbp

    Mr[k,kk] = m[k,kk]*m[k,kk];

    end
end
toc()


Elapsed time: 7.481011275 seconds


MATLAB code

nbp = 2^12;

m = rand(nbp,nbp);
a = 0.0;
Mr = zeros(nbp,nbp);


tic
for k = 1:nbp
    for kk = 1:nbp
   
    Mr(k,kk) =m(k,kk)*m(k,kk);
    
    end
end
toc


Elapsed time is 0.618451 seconds.

Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

John Myles White
See the top of http://docs.julialang.org/en/release-0.4/manual/performance-tips/

On Friday, July 1, 2016 at 7:16:10 AM UTC-7, baillot maxime wrote:
Hello everyone,

I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower..... 

So I did some naive benchmark and found strange results... 

I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.

Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )

PS: I know that it's a bit of a recurrent question.... 

The codes are 

Julia code 

nbp = 2^12;

m = rand(nbp,nbp);
a = 0.0;
Mr = zeros(nbp,nbp);

tic()
for k = 1:nbp
    for kk = 1:nbp

    Mr[k,kk] = m[k,kk]*m[k,kk];

    end
end
toc()


Elapsed time: 7.481011275 seconds


MATLAB code

nbp = 2^12;

m = rand(nbp,nbp);
a = 0.0;
Mr = zeros(nbp,nbp);


tic
for k = 1:nbp
    for kk = 1:nbp
   
    Mr(k,kk) =m(k,kk)*m(k,kk);
    
    end
end
toc


Elapsed time is 0.618451 seconds.

Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
Ok, I did that

function test(M,nbp)
  Mr = zeros(nbp,nbp)

  for k = 1:nbp
      for kk = 1:nbp

      Mr[k,kk] = M[k,kk]*M[k,kk];

      end
  end

return Mr
end


function main()
  nbp = 2^12;

m = rand(nbp,nbp);

@time test(m,nbp)

end





main()


and yes it was much faster  0.627739 seconds (2 allocations: 128.000 MB, 1.34% gc time).

I'm not sure if I understand but I think I do. 

Anyway thanks for the answer.


On Friday, July 1, 2016 at 4:26:01 PM UTC+2, John Myles White wrote:
See the top of <a href="http://docs.julialang.org/en/release-0.4/manual/performance-tips/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fdocs.julialang.org%2Fen%2Frelease-0.4%2Fmanual%2Fperformance-tips%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEC2PAR3KTiuej_QsQZqElVlqChcg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fdocs.julialang.org%2Fen%2Frelease-0.4%2Fmanual%2Fperformance-tips%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEC2PAR3KTiuej_QsQZqElVlqChcg&#39;;return true;">http://docs.julialang.org/en/release-0.4/manual/performance-tips/

On Friday, July 1, 2016 at 7:16:10 AM UTC-7, baillot maxime wrote:
Hello everyone,

I am working on a Julia code which is a brutal copy of a Matlab code and found out that the Julia code is slower..... 

So I did some naive benchmark and found strange results... 

I did look for solution on The Internets but... I found nothing usefull that why I'm asking the question here.

Maybe someone have an idea of why the Julia code is slower than MATLAB? (because the official benchmark say that it should be quicker )

PS: I know that it's a bit of a recurrent question.... 

The codes are 

Julia code 

nbp = 2^12;

m = rand(nbp,nbp);
a = 0.0;
Mr = zeros(nbp,nbp);

tic()
for k = 1:nbp
    for kk = 1:nbp

    Mr[k,kk] = m[k,kk]*m[k,kk];

    end
end
toc()


Elapsed time: 7.481011275 seconds


MATLAB code

nbp = 2^12;

m = rand(nbp,nbp);
a = 0.0;
Mr = zeros(nbp,nbp);


tic
for k = 1:nbp
    for kk = 1:nbp
   
    Mr(k,kk) =m(k,kk)*m(k,kk);
    
    end
end
toc


Elapsed time is 0.618451 seconds.

Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

Andre Bieler
Also remember that the first time you execute a function you should not @time it because
then it also measures the compilation time. You can test this by doing

@time test(m,nbp)
@time test(m,nbp)
@time test(m,nbp)

and probably see that the 2nd and 3rd time the timing is lower
than for the first one.
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
Thank you but don't worry in general I do it like 3 or 4 times before looking at the time :)

I think the problem in my main (not the useless code I did to do the benchmark) Julia code come from the BLAS library. The library look like much slower than the library MATLAB use to do matrix element wise multiplication.

On Friday, July 1, 2016 at 4:53:01 PM UTC+2, Andre Bieler wrote:
Also remember that the first time you execute a function you should not @time it because
then it also measures the compilation time. You can test this by doing

@time test(m,nbp)
@time test(m,nbp)
@time test(m,nbp)

and probably see that the 2nd and 3rd time the timing is lower
than for the first one.
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

Tim Holy
Read about cache here:
http://julialang.org/blog/2013/09/fast-numeric
and add @inbounds @simd in front of the inner loop.

--Tim

On Friday, July 1, 2016 8:10:33 AM CDT baillot maxime wrote:

> Thank you but don't worry in general I do it like 3 or 4 times before
> looking at the time :)
>
> I think the problem in my main (not the useless code I did to do the
> benchmark) Julia code come from the BLAS library. The library look like
> much slower than the library MATLAB use to do matrix element wise
> multiplication.
>
> On Friday, July 1, 2016 at 4:53:01 PM UTC+2, Andre Bieler wrote:
> > Also remember that the first time you execute a function you should not
> > @time it because
> > then it also measures the compilation time. You can test this by doing
> >
> > @time test(m,nbp)
> > @time test(m,nbp)
> > @time test(m,nbp)
> >
> > and probably see that the 2nd and 3rd time the timing is lower
> > than for the first one.


Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

Kristoffer Carlsson
To be explicit, you are looping over things in the wrong order. You want t he next loop iteration to access data that is close in memory to the previous iteration. Right now you are making big jumps in memory between each iteration.
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

Patrick Kofod Mogensen
To be even more explicit: loop over rows in the innermost loop, not cols. You should get an improvement of around x6 (at least I do).

On Friday, July 1, 2016 at 7:26:20 PM UTC+2, Kristoffer Carlsson wrote:
To be explicit, you are looping over things in the wrong order. You want t he next loop iteration to access data that is close in memory to the previous iteration. Right now you are making big jumps in memory between each iteration.
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)

@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.

Thanks to you all. Now I understand a lot of things and why it was slower than matlab.

So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.

Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
In fact no I did just try it.

Anyway thank to all of you :)

On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:
@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)

@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.

Thanks to you all. Now I understand a lot of things and why it was slower than matlab.

So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.

Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

Chris Rackauckas
In reply to this post by baillot maxime
BLAS will be faster for (non-trivial sized) matrix multiplications, but it doesn't apply to component-wise operations (.*, ./).

For component-wise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).

For this equation, you can devectorize it using the Devectorize.jl macro:

@devec Mr = m.*m

At least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code.

On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:
@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)

@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.

Thanks to you all. Now I understand a lot of things and why it was slower than matlab.

So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.

Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
Ok,

I tried it on my main program but it's slower. Also in my main program I use vectors not 2D matrix so maybe that why it's slower.

On Saturday, July 2, 2016 at 3:23:49 AM UTC+2, Chris Rackauckas wrote:
BLAS will be faster for (non-trivial sized) matrix multiplications, but it doesn't apply to component-wise operations (.*, ./).

For component-wise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).

For this equation, you can devectorize it using the Devectorize.jl macro:

@devec Mr = m.*m

At least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code.

On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:
@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)

@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.

Thanks to you all. Now I understand a lot of things and why it was slower than matlab.

So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.

Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
But That ok :)

Thank you for the idea!

On Saturday, July 2, 2016 at 11:22:21 AM UTC+2, baillot maxime wrote:
Ok,

I tried it on my main program but it's slower. Also in my main program I use vectors not 2D matrix so maybe that why it's slower.

On Saturday, July 2, 2016 at 3:23:49 AM UTC+2, Chris Rackauckas wrote:
BLAS will be faster for (non-trivial sized) matrix multiplications, but it doesn't apply to component-wise operations (.*, ./).

For component-wise operations, devectorization here shouldn't give much of a speedup. The main speedup will actually come from things like loop fusing which gets rid of intermediates that are made when doing something like A.*B.*exp(C).

For this equation, you can devectorize it using the Devectorize.jl macro:

@devec Mr = m.*m

At least I think that should work. I should basically generate the code you wrote to get the efficiency without the ugly C/C++ like extra code.

On Saturday, July 2, 2016 at 1:11:49 AM UTC+2, baillot maxime wrote:
@Tim Holy : Thank you for the web page. I didn't know it. Now I understand a lot of thing :)

@Kristoffer and Patrick: I just read about that in the link that Tim gave me. I did change the code and the time just past from 0.348052 seconds to  0.037768 seconds.

Thanks to you all. Now I understand a lot of things and why it was slower than matlab.

So now I understand why a lot of people was speaking about Devectorizing matrix calculus. But I think it's sad, because if I want to do this I will use C or C++ .  Not a matrixial language like Julia or Matlab.

Anyway! So if I'm not mistaking... It's better for me to create a "mul()" function than use the ".*" ?
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

Milan Bouchet-Valat
In reply to this post by baillot maxime
Le vendredi 01 juillet 2016 à 16:11 -0700, baillot maxime a écrit :

> @Tim Holy : Thank you for the web page. I didn't know it. Now I
> understand a lot of thing :)
>
> @Kristoffer and Patrick: I just read about that in the link that Tim
> gave me. I did change the code and the time just past from 0.348052
> seconds to  0.037768 seconds.
>
> Thanks to you all. Now I understand a lot of things and why it was
> slower than matlab.
>
> So now I understand why a lot of people was speaking
> about Devectorizing matrix calculus. But I think it's sad, because if
> I want to do this I will use C or C++ .  Not a matrixial language
> like Julia or Matlab.
Note that work is going on to allow vectorized syntax to be (almost) as
efficient as devectorized loops. See
https://github.com/JuliaLang/julia/issues/16285


Regards

> Anyway! So if I'm not mistaking... It's better for me to create a
> "mul()" function than use the ".*" ?
Reply | Threaded
Open this post in threaded view
|

Re: A naive benchmark

baillot maxime
Nice! Thanks for the Info

On Saturday, July 2, 2016 at 11:53:25 AM UTC+2, Milan Bouchet-Valat wrote:
Le vendredi 01 juillet 2016 à 16:11 -0700, baillot maxime a écrit :

> @Tim Holy : Thank you for the web page. I didn't know it. Now I
> understand a lot of thing :)
>
> @Kristoffer and Patrick: I just read about that in the link that Tim
> gave me. I did change the code and the time just past from 0.348052
> seconds to  0.037768 seconds.
>
> Thanks to you all. Now I understand a lot of things and why it was
> slower than matlab.
>
> So now I understand why a lot of people was speaking
> about Devectorizing matrix calculus. But I think it's sad, because if
> I want to do this I will use C or C++ .  Not a matrixial language
> like Julia or Matlab.
Note that work is going on to allow vectorized syntax to be (almost) as
efficient as devectorized loops. See
<a href="https://github.com/JuliaLang/julia/issues/16285" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F16285\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFrerV6dBaPOHysO2KpQK__nfOH2Q&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F16285\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFrerV6dBaPOHysO2KpQK__nfOH2Q&#39;;return true;">https://github.com/JuliaLang/julia/issues/16285


Regards

> Anyway! So if I'm not mistaking... It's better for me to create a
> "mul()" function than use the ".*" ?