Julia 0.5.0 together with Codec.jl (Base64) slower than on 0.4.5

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Julia 0.5.0 together with Codec.jl (Base64) slower than on 0.4.5

Páll Haraldsson

I was running [not my code..]:


[and looking into why Base64-benchmark was slower than in Ruby.. and then even slower under 0.5]

and lines 12, 13 and 21 (e.g. here add 2 to what profile says) seem predictable slow.

A. Why is it slower than Ruby in the first place? Codec.jl must not be as optimized; no good reason for it; at least not Julia's fault.

B. Why is it slower under 0.5? I changed ASCIIString->String (the usual recommendation, but not here?):

I see it now..

Lines 12-13:
  str2 = ASCIIString(encode(Base64, str))
  s += length(str2)

I was then thinking, would it be unfair to other languages (e.g. C), to get the byte-length directly instead of scanning. Then I realized, that's exactly what happens in 0.4, because of ASCIIString, as it can. 0.5 no longer can (unless you use LegacyEncoding.jl), it seemed.

I see other languages do it:

https://github.com/kostya/benchmarks/blob/master/base64/test.cr [Crystal language]

str2 = Base64.strict_encode(str)
s += str2.bytesize

[not sure how this with them or should be defined, returns an ASCIIString?]

This seemed the obvious change:;
  str2 = String(encode(Base64, str))
  s += length(str2)

This solved the speed (at least B.) problem:
  str2 = encode(Base64, str)
  s += length(str2)

[This is a slight semantic difference, if you would print out str2? That never happens..]

In line with the sample code:

using Codecs

data = "Hello World!"
encoded = encode(Base64, encode(Zlib, data))

[that is however broken, gives an error]

In general data that encode gives is an UInt8 Vector, as it should be, e.g. for Zlib; is that also for sure meant for Base64? Should it then return UTF-8 strings, that happen to be ASCII strings? This may be by design. What is appropriate on decode?

Are these lines for sure correct in the code, do they work for all string types?:

function encode{T <: Codec}(codec::Type{T}, s::AbstractString)
encode(codec, convert(Vector{UInt8}, s))

function decode{T <: Codec}(codec::Type{T}, s::AbstractString)
decode(codec, convert(Vector{UInt8}, s))

julia> @profile x = @timed main(100)
encode: 1333333600, 4.1400511264801025
decode: 1000000000, 2.7664570808410645

julia> Profile.print()
6987 ./event.jl:68; (::Base.REPL.##3#4{Base.REPL.REPLBackend})()
 6987 ./REPL.jl:95; macro expansion
  6987 ./REPL.jl:64; eval_user_input(::Any, ::Base.REPL.REPLBackend)
   6987 ./boot.jl:234; eval(::Module, ::Any)
    6987 ./<missing>:?; anonymous
     6987 ./profile.jl:16; macro expansion;
      6987 ./util.jl:278; macro expansion;
       88   ./REPL[17]:3; main(::Int64)
        1  ./strings/types.jl:172; repeat(::String, ::Int64)
        84 ./strings/types.jl:173; repeat(::String, ::Int64)
         31 ./array.jl:0; copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
         3  ./array.jl:60; copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
         5  ./array.jl:62; copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
         33 ./array.jl:65; copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
          7  ./array.jl:0; unsafe_copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
          19 ./array.jl:51; unsafe_copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
           1  ./abstractarray.jl:737; pointer
           18 ./array.jl:44; unsafe_copy!
          1  ./array.jl:56; unsafe_copy!(::Array{UInt8,1}, ::Int64, ::Array{UInt8,1}, ::Int64, ::Int64)
       2206 ./REPL[17]:10; main(::Int64)
        16  /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:60; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        264 /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:63; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        416 /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:64; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        270 /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:65; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        313 /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:66; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        608 /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:67; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        319 /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:68; encode(::Type{Codecs.Base64}, ::Array{UInt8,1})
       1929 ./REPL[17]:11; main(::Int64)
        4    ./strings/string.jl:48; length(::String)
        1925 ./strings/string.jl:49; length(::String)
       2764 ./REPL[17]:19; main(::Int64)
        1    ./strings/string.jl:48; length(::String)
        1438 ./strings/string.jl:49; length(::String)
        9    /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:106; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        121  /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:109; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        121  /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:111; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        16   /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:112; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        94   /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:113; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        17   /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:114; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        223  /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:115; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        326  /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:126; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        310  /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:127; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})
        88   /home/qwerty/.julia/v0.5/Codecs/src/Codecs.jl:128; decode(::Type{Codecs.Base64}, ::Array{UInt8,1})