PkgEval auto-issues: on degradation or wait-a-day

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

PkgEval auto-issues: on degradation or wait-a-day

Iain Dunning
Hi all,

Some of the "issues" caught by PackageEvaluator tend to be temporary/spurious, e.g. a test had numerical convergence issues, or a binary failed to download. Other times its actually an issue in one key package that disables many packages, and perhaps rarest of all is massive widespread problems (e.g. the equality meaning change).

Now, the last case seems to suggest that issues ASAP would be best, but in all other cases it seems like waiting to make sure the tests fail two days in a row would actually drastically cut down on issue count.

I've only really talked to Tim Holy about this - what does everyone else think I should do: wait a day before filing, or file immediately?

(related note: I'm going to change the badge system so its more of a moving average of test statuses - this avoids the issue of packages having this ugly "test failing" thing for a full 24 hours through possibility no fault of their own and with the problem potentially already fixed. E.g. 3/5 full passes in last 5 days = green badge, 3/5 test fails = orange, 3/5 fail to load = red...)
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Elliot Saba
Perhaps you can get the best of both worlds; if a package fails, you could just queue up another run of that package immediately, (maybe at the end of all the already queued runs, to give network problems a chance to reset themselves) and if it fails a second time, you've got a good idea that something is awry.
-E
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Leah Hanson
I like Elliot's idea. You could also decrease re-runs by only giving packages a second chance if they are changing status (passing tests => failing tests, for example). Packages that were failing yesterday probably don't need a second chance today.

-- Leah


On Wed, Jul 23, 2014 at 10:20 AM, Elliot Saba <[hidden email]> wrote:
Perhaps you can get the best of both worlds; if a package fails, you could just queue up another run of that package immediately, (maybe at the end of all the already queued runs, to give network problems a chance to reset themselves) and if it fails a second time, you've got a good idea that something is awry.
-E

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Elliot Saba
I thought of that as well; but I don't know how hard it is for PackageEvaluator to get the previous state. :P
-E


On Wed, Jul 23, 2014 at 11:27 AM, Leah Hanson <[hidden email]> wrote:
I like Elliot's idea. You could also decrease re-runs by only giving packages a second chance if they are changing status (passing tests => failing tests, for example). Packages that were failing yesterday probably don't need a second chance today.

-- Leah


On Wed, Jul 23, 2014 at 10:20 AM, Elliot Saba <[hidden email]> wrote:
Perhaps you can get the best of both worlds; if a package fails, you could just queue up another run of that package immediately, (maybe at the end of all the already queued runs, to give network problems a chance to reset themselves) and if it fails a second time, you've got a good idea that something is awry.
-E


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Tim Holy
In reply to this post by Elliot Saba
I made the same suggestion. Overall I think it's the way to go. In the
meantime, I did notice one category of spurious failure that this doesn't
catch: suppose your package tests depend on some outside resource, like
downloading a dataset, and the server is offline at the time PackageEval runs.

However, that's enough of a corner case that I favor the bug-me-earlier-as-
long-as-you've double-checked approach.

--Tim

On Wednesday, July 23, 2014 11:20:13 AM Elliot Saba wrote:
> Perhaps you can get the best of both worlds; if a package fails, you could
> just queue up another run of that package immediately, (maybe at the end of
> all the already queued runs, to give network problems a chance to reset
> themselves) and if it fails a second time, you've got a good idea that
> something is awry.
> -E

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Elliot Saba
We could also conceivably use that as an impetus for mirroring whatever that particular test requires.

Caching big dependencies like that would be neat as well, but eventually we'll just end up throwing the kitchen sink in. :P
-E


On Wed, Jul 23, 2014 at 2:20 PM, Tim Holy <[hidden email]> wrote:
I made the same suggestion. Overall I think it's the way to go. In the
meantime, I did notice one category of spurious failure that this doesn't
catch: suppose your package tests depend on some outside resource, like
downloading a dataset, and the server is offline at the time PackageEval runs.

However, that's enough of a corner case that I favor the bug-me-earlier-as-
long-as-you've double-checked approach.

--Tim

On Wednesday, July 23, 2014 11:20:13 AM Elliot Saba wrote:
> Perhaps you can get the best of both worlds; if a package fails, you could
> just queue up another run of that package immediately, (maybe at the end of
> all the already queued runs, to give network problems a chance to reset
> themselves) and if it fails a second time, you've got a good idea that
> something is awry.
> -E


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Simon Kornblith
In reply to this post by Iain Dunning
On Wednesday, July 23, 2014 11:16:11 AM UTC-4, Iain Dunning wrote:
Hi all,

Some of the "issues" caught by PackageEvaluator tend to be temporary/spurious, e.g. a test had numerical convergence issues, or a binary failed to download. Other times its actually an issue in one key package that disables many packages, and perhaps rarest of all is massive widespread problems (e.g. the equality meaning change).

Now, the last case seems to suggest that issues ASAP would be best, but in all other cases it seems like waiting to make sure the tests fail two days in a row would actually drastically cut down on issue count.

I've only really talked to Tim Holy about this - what does everyone else think I should do: wait a day before filing, or file immediately?

(related note: I'm going to change the badge system so its more of a moving average of test statuses - this avoids the issue of packages having this ugly "test failing" thing for a full 24 hours through possibility no fault of their own and with the problem potentially already fixed. E.g. 3/5 full passes in last 5 days = green badge, 3/5 test fails = orange, 3/5 fail to load = red...)

Many spurious test failures seem worth filing issues for even if the package passes on a second run. PkgEvaluator once caught a bug in LARS where I was returning uninitialized memory (fortunately only affecting a value that was always 0), and I worry that giving the package a second try might make PkgEvaluator less likely to catch similar issues. If there are tests that spuriously fail due to convergence issues, it would be better to use the same input (or call srand). As Tim notes, download failures are likely to persist through a second run if it's actually a server problem, and if they get to the point where closing the issues becomes annoying the binary should probably be moved elsewhere anyhow.

If it's not too hard, it might be nice if PkgEvaluator waited a day or two to file an issue with a package if one of its dependencies ceases to load at all, since the problem is almost certainly elsewhere and it's not necessary to call attention to the package maintainer unless the dependent package remains broken.

Simon
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: PkgEval auto-issues: on degradation or wait-a-day

Iain Dunning
Thats a pretty good idea Simon - I actually recently put together code to analyze the dependency tree of a package, so I can at least avoid things like the mass mailout that happened with the deprecation removal (e.g. DataFrames fails to load -> don't file issue for any package dependent on DataFrames).

On Wednesday, July 23, 2014 7:42:40 PM UTC-4, Simon Kornblith wrote:
On Wednesday, July 23, 2014 11:16:11 AM UTC-4, Iain Dunning wrote:
Hi all,

Some of the "issues" caught by PackageEvaluator tend to be temporary/spurious, e.g. a test had numerical convergence issues, or a binary failed to download. Other times its actually an issue in one key package that disables many packages, and perhaps rarest of all is massive widespread problems (e.g. the equality meaning change).

Now, the last case seems to suggest that issues ASAP would be best, but in all other cases it seems like waiting to make sure the tests fail two days in a row would actually drastically cut down on issue count.

I've only really talked to Tim Holy about this - what does everyone else think I should do: wait a day before filing, or file immediately?

(related note: I'm going to change the badge system so its more of a moving average of test statuses - this avoids the issue of packages having this ugly "test failing" thing for a full 24 hours through possibility no fault of their own and with the problem potentially already fixed. E.g. 3/5 full passes in last 5 days = green badge, 3/5 test fails = orange, 3/5 fail to load = red...)

Many spurious test failures seem worth filing issues for even if the package passes on a second run. PkgEvaluator once caught a bug in LARS where I was returning uninitialized memory (fortunately only affecting a value that was always 0), and I worry that giving the package a second try might make PkgEvaluator less likely to catch similar issues. If there are tests that spuriously fail due to convergence issues, it would be better to use the same input (or call srand). As Tim notes, download failures are likely to persist through a second run if it's actually a server problem, and if they get to the point where closing the issues becomes annoying the binary should probably be moved elsewhere anyhow.

If it's not too hard, it might be nice if PkgEvaluator waited a day or two to file an issue with a package if one of its dependencies ceases to load at all, since the problem is almost certainly elsewhere and it's not necessary to call attention to the package maintainer unless the dependent package remains broken.

Simon
Loading...