@profile gives Segmentation fault in 0.4.0-rc3

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

@profile gives Segmentation fault in 0.4.0-rc3

Deniz Yuret
I am testing a large package written on top of JuliaGPU, and I get frequent segfaults with the profiler.  Unfortunately there is no information or error message to tell me what is causing it.  I tried profilling each part of the offending program separately, but when I look for it the segfaults disappear.  I am stumped.  Any advice on how to debug this?

thanks,
deniz

Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Tim Holy
I haven't done this in ages myself, but when I last tried I also saw segfaults
when combining the profiler and CUDArt. My situation was complicated (yours
might be too), in that it involved multiple julia processes, timed waits, etc.
Because of that complexity, I did not make any progress in isolating the
problem.

--Tim

On Saturday, October 03, 2015 12:07:47 PM Deniz Yuret wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or
> error message to tell me what is causing it.  I tried profilling each part
> of the offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz

Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Yichao Yu
In reply to this post by Deniz Yuret
On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <[hidden email]> wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or error
> message to tell me what is causing it.  I tried profilling each part of the
> offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz
>

Any self-contained (possibly using other registered packages) that
triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Deniz Yuret
I could not generate a small example that consistently generates a segfault (yet).  While trying I noticed that the segfault always occurs during profiling but does not occur in the same place in the program.  That brings some bad interaction with gc to mind.  I don't know what else is non-deterministic in Julia.  I also ran julia-debug under gdb, but the segfaults appear all over the place: https://gist.github.com/denizyuret has the first 7 examples typing the exact same commands to a fresh julia session.

Of course these are probably not where the offending instruction is but when the OS finally notices something is off.  I remember using things like electric fence to trace these to the offending instruction long time ago.  I am not sure what the modern tools are or what works with Julia.  Let me know if I can provide anything else to help chase this bug.

thanks,
deniz




On Saturday, October 3, 2015 at 12:41:02 PM UTC-7, Yichao Yu wrote:
On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="j5WQizBzDAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">deniz...@...> wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or error
> message to tell me what is causing it.  I tried profilling each part of the
> offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz
>

Any self-contained (possibly using other registered packages) that
triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Jameson Nash
In reply to this post by Yichao Yu
some aspects of the backtrace make me think you might not be using llvm-3.3. can you add the output of versioninfo()? libunwind won't work on newer versions of llvm currently due to https://github.com/JuliaLang/julia/issues/12060, https://github.com/JuliaLang/julia/pull/12380, and https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967


On Sat, Oct 3, 2015 at 3:41 PM Yichao Yu <[hidden email]> wrote:
On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <[hidden email]> wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or error
> message to tell me what is causing it.  I tried profilling each part of the
> offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz
>

Any self-contained (possibly using other registered packages) that
triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Deniz Yuret
Here is the versioninfo:

julia> versioninfo()
Julia Version 0.4.0-rc3
Commit 483d548* (2015-09-27 20:34 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

This was the v0.4.0-rc3 image I downloaded from the download page.  It does not look like llvm is dynamically linked, is this version built into the binary?

deniz


On Saturday, October 3, 2015 at 5:26:01 PM UTC-7, Jameson wrote:
some aspects of the backtrace make me think you might not be using llvm-3.3. can you add the output of versioninfo()? libunwind won't work on newer versions of llvm currently due to <a href="https://github.com/JuliaLang/julia/issues/12060" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F12060\46sa\75D\46sntz\0751\46usg\75AFQjCNFxYKDesoWKFmfiWgqwLgxPLrkRXw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F12060\46sa\75D\46sntz\0751\46usg\75AFQjCNFxYKDesoWKFmfiWgqwLgxPLrkRXw&#39;;return true;">https://github.com/JuliaLang/julia/issues/12060, <a href="https://github.com/JuliaLang/julia/pull/12380" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F12380\46sa\75D\46sntz\0751\46usg\75AFQjCNH-r0q6Gwuv4aAni_z_ea8ByI2cfg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F12380\46sa\75D\46sntz\0751\46usg\75AFQjCNH-r0q6Gwuv4aAni_z_ea8ByI2cfg&#39;;return true;">https://github.com/JuliaLang/julia/pull/12380, and <a href="https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fblob%2Ff67f21398754724589cda779c0429ea9fda4b47d%2Fsrc%2Fcodegen.cpp%23L5967\46sa\75D\46sntz\0751\46usg\75AFQjCNGrxb_kxWbhZGonDW_s0fREK3P10A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fblob%2Ff67f21398754724589cda779c0429ea9fda4b47d%2Fsrc%2Fcodegen.cpp%23L5967\46sa\75D\46sntz\0751\46usg\75AFQjCNGrxb_kxWbhZGonDW_s0fREK3P10A&#39;;return true;">https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967


On Sat, Oct 3, 2015 at 3:41 PM Yichao Yu <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="ekr9vr2CDAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">yyc...@...> wrote:
On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="ekr9vr2CDAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">deniz...@...> wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or error
> message to tell me what is causing it.  I tried profilling each part of the
> offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz
>

Any self-contained (possibly using other registered packages) that
triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Deniz Yuret
I tried valgrind and the output is at https://gist.github.com/denizyuret/d365d1215efac5e62348

The command I used was: valgrind --leak-check=yes julia-debug.

Profiling starts at line 119.  There seems to be some trouble at line 164, 248, etc.

On Saturday, October 3, 2015 at 5:44:52 PM UTC-7, Deniz Yuret wrote:
Here is the versioninfo:

julia> versioninfo()
Julia Version 0.4.0-rc3
Commit 483d548* (2015-09-27 20:34 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

This was the v0.4.0-rc3 image I downloaded from the download page.  It does not look like llvm is dynamically linked, is this version built into the binary?

deniz


On Saturday, October 3, 2015 at 5:26:01 PM UTC-7, Jameson wrote:
some aspects of the backtrace make me think you might not be using llvm-3.3. can you add the output of versioninfo()? libunwind won't work on newer versions of llvm currently due to <a href="https://github.com/JuliaLang/julia/issues/12060" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F12060\46sa\75D\46sntz\0751\46usg\75AFQjCNFxYKDesoWKFmfiWgqwLgxPLrkRXw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fissues%2F12060\46sa\75D\46sntz\0751\46usg\75AFQjCNFxYKDesoWKFmfiWgqwLgxPLrkRXw&#39;;return true;">https://github.com/JuliaLang/julia/issues/12060, <a href="https://github.com/JuliaLang/julia/pull/12380" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F12380\46sa\75D\46sntz\0751\46usg\75AFQjCNH-r0q6Gwuv4aAni_z_ea8ByI2cfg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fpull%2F12380\46sa\75D\46sntz\0751\46usg\75AFQjCNH-r0q6Gwuv4aAni_z_ea8ByI2cfg&#39;;return true;">https://github.com/JuliaLang/julia/pull/12380, and <a href="https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fblob%2Ff67f21398754724589cda779c0429ea9fda4b47d%2Fsrc%2Fcodegen.cpp%23L5967\46sa\75D\46sntz\0751\46usg\75AFQjCNGrxb_kxWbhZGonDW_s0fREK3P10A&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2FJuliaLang%2Fjulia%2Fblob%2Ff67f21398754724589cda779c0429ea9fda4b47d%2Fsrc%2Fcodegen.cpp%23L5967\46sa\75D\46sntz\0751\46usg\75AFQjCNGrxb_kxWbhZGonDW_s0fREK3P10A&#39;;return true;">https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967


On Sat, Oct 3, 2015 at 3:41 PM Yichao Yu <[hidden email]> wrote:
On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <[hidden email]> wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or error
> message to tell me what is causing it.  I tried profilling each part of the
> offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz
>

Any self-contained (possibly using other registered packages) that
triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Jameson Nash
there's only one valgrind report that looks bad to me, the rest are essentially nuisance message from libunwind (msync_validate):

==15636== Invalid read of size 8
==15636==    at 0x58DF33D: access_mem (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/libjulia-debug.so)
==15636==    by 0x58DD3FF: is_plt_entry (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/libjulia-debug.so)
==15636==    by 0x58DD59B: _ULx86_64_step (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/libjulia-debug.so)
==15636==    by 0x4FAF6CE: rec_backtrace_ctx (task.c:661)
==15636==    by 0x4FAF5F9: rec_backtrace (task.c:645)
==15636==    by 0x4FC7030: profile_bt (signals-linux.c:17)
==15636==    by 0x63F770F: ??? (in /lib64/libpthread-2.12.so)
==15636==    by 0xD70B14F: ??? (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/sys-debug.so)
==15636==    by 0x10FEFFE2BF: ???
==15636==    by 0xD77AC1F: jlcall_finalizer_2641 (in /auto/nlg-05/dy_052/julia/v0.4.0-rc3/lib/julia/sys-debug.so)
==15636==    by 0x4F005C1: jl_apply (julia.h:1324)
==15636==    by 0x4F067EB: jl_apply_generic (gf.c:1684)
==15636==  Address 0x10feffe2c0 is not stack'd, malloc'd or (recently) free'd

it looks like this may be https://github.com/JuliaLang/julia/pull/12380 after all.



On Sat, Oct 3, 2015 at 9:05 PM Deniz Yuret <[hidden email]> wrote:
I tried valgrind and the output is at https://gist.github.com/denizyuret/d365d1215efac5e62348

The command I used was: valgrind --leak-check=yes julia-debug.

Profiling starts at line 119.  There seems to be some trouble at line 164, 248, etc.


On Saturday, October 3, 2015 at 5:44:52 PM UTC-7, Deniz Yuret wrote:
Here is the versioninfo:

julia> versioninfo()
Julia Version 0.4.0-rc3
Commit 483d548* (2015-09-27 20:34 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

This was the v0.4.0-rc3 image I downloaded from the download page.  It does not look like llvm is dynamically linked, is this version built into the binary?

deniz


On Saturday, October 3, 2015 at 5:26:01 PM UTC-7, Jameson wrote:
some aspects of the backtrace make me think you might not be using llvm-3.3. can you add the output of versioninfo()? libunwind won't work on newer versions of llvm currently due to https://github.com/JuliaLang/julia/issues/12060, https://github.com/JuliaLang/julia/pull/12380, and https://github.com/JuliaLang/julia/blob/f67f21398754724589cda779c0429ea9fda4b47d/src/codegen.cpp#L5967


On Sat, Oct 3, 2015 at 3:41 PM Yichao Yu <[hidden email]> wrote:
On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <[hidden email]> wrote:
> I am testing a large package written on top of JuliaGPU, and I get frequent
> segfaults with the profiler.  Unfortunately there is no information or error
> message to tell me what is causing it.  I tried profilling each part of the
> offending program separately, but when I look for it the segfaults
> disappear.  I am stumped.  Any advice on how to debug this?
>
> thanks,
> deniz
>

Any self-contained (possibly using other registered packages) that
triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Yichao Yu
In reply to this post by Deniz Yuret
On Sat, Oct 3, 2015 at 8:16 PM, Deniz Yuret <[hidden email]> wrote:
> I could not generate a small example that consistently generates a segfault
> (yet).  While trying I noticed that the segfault always occurs during
> profiling but does not occur in the same place in the program.  That brings
> some bad interaction with gc to mind.  I don't know what else is
> non-deterministic in Julia.  I also ran julia-debug under gdb, but the
> segfaults appear all over the place: https://gist.github.com/denizyuret has
> the first 7 examples typing the exact same commands to a fresh julia
> session.

The output you posted doesn't seem to include the actually code that is run.

>
> Of course these are probably not where the offending instruction is but when
> the OS finally notices something is off.  I remember using things like
> electric fence to trace these to the offending instruction long time ago.  I
> am not sure what the modern tools are or what works with Julia.  Let me know
> if I can provide anything else to help chase this bug.
>
> thanks,
> deniz
>
>
>
>
> On Saturday, October 3, 2015 at 12:41:02 PM UTC-7, Yichao Yu wrote:
>>
>> On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <[hidden email]> wrote:
>> > I am testing a large package written on top of JuliaGPU, and I get
>> > frequent
>> > segfaults with the profiler.  Unfortunately there is no information or
>> > error
>> > message to tell me what is causing it.  I tried profilling each part of
>> > the
>> > offending program separately, but when I look for it the segfaults
>> > disappear.  I am stumped.  Any advice on how to debug this?
>> >
>> > thanks,
>> > deniz
>> >
>>
>> Any self-contained (possibly using other registered packages) that
>> triggers the segfault?
Reply | Threaded
Open this post in threaded view
|

Re: @profile gives Segmentation fault in 0.4.0-rc3

Deniz Yuret-2
I could not find a nice small standalone example, but if you don't mind installing some stuff, here are the instructions:

Pkg.init()
Pkg.clone("git://github.com/denizyuret/Knet.jl.git")
Pkg.build("Knet")
include(Pkg.dir("Knet/examples/linreg.jl")
@time linreg()
@profile linreg()

On Sat, Oct 3, 2015 at 8:39 PM Yichao Yu <[hidden email]> wrote:
On Sat, Oct 3, 2015 at 8:16 PM, Deniz Yuret <[hidden email]> wrote:
> I could not generate a small example that consistently generates a segfault
> (yet).  While trying I noticed that the segfault always occurs during
> profiling but does not occur in the same place in the program.  That brings
> some bad interaction with gc to mind.  I don't know what else is
> non-deterministic in Julia.  I also ran julia-debug under gdb, but the
> segfaults appear all over the place: https://gist.github.com/denizyuret has
> the first 7 examples typing the exact same commands to a fresh julia
> session.

The output you posted doesn't seem to include the actually code that is run.

>
> Of course these are probably not where the offending instruction is but when
> the OS finally notices something is off.  I remember using things like
> electric fence to trace these to the offending instruction long time ago.  I
> am not sure what the modern tools are or what works with Julia.  Let me know
> if I can provide anything else to help chase this bug.
>
> thanks,
> deniz
>
>
>
>
> On Saturday, October 3, 2015 at 12:41:02 PM UTC-7, Yichao Yu wrote:
>>
>> On Sat, Oct 3, 2015 at 3:07 PM, Deniz Yuret <[hidden email]> wrote:
>> > I am testing a large package written on top of JuliaGPU, and I get
>> > frequent
>> > segfaults with the profiler.  Unfortunately there is no information or
>> > error
>> > message to tell me what is causing it.  I tried profilling each part of
>> > the
>> > offending program separately, but when I look for it the segfaults
>> > disappear.  I am stumped.  Any advice on how to debug this?
>> >
>> > thanks,
>> > deniz
>> >
>>
>> Any self-contained (possibly using other registered packages) that
>> triggers the segfault?