librelist archives

« back to archive

Retrying things that have failed

Retrying things that have failed

From:
Robert Slowley
Date:
2009-11-05 @ 16:10
I'd quite like to be able to put things in a queue, then if they fail,
resubmit them - up to a certain number of times. I'm thinking of
having a producer make a number of jobs (let's say 1.5 million
[thinking of a task I have at hand]) where some of these jobs will
fail, but if rerun they might succeed. Some jobs will permanently fail
so it is best to only resubmit them a few times (say 3 or 4).

Is the best way to do this to have a times_submitted parameter in
self.perform(), have the producer query the failed list, and increment
times_submitted, and re-enque if the times_submitted is less than some
number, or is there a better way of persisting this information /
doing this?

-Rob

-- 
http://www.slowley.com/
http://robhu.livejournal.com
http://robhu_bible.livejournal.com

"u r all Ceiling Catz kittenz. Sun shines on good kittehz and bad.
Also rain :-(" -- Matthew 5:45 (LOL)



Sent from Cambridge, Eng, United Kingdom

Re: Retrying things that have failed

From:
Chris Wanstrath
Date:
2009-11-05 @ 20:13
On Thu, Nov 5, 2009 at 8:10 AM, Robert Slowley <robert@slowley.com> wrote:

> I'd quite like to be able to put things in a queue, then if they fail,
> resubmit them - up to a certain number of times. I'm thinking of
> having a producer make a number of jobs (let's say 1.5 million
> [thinking of a task I have at hand]) where some of these jobs will
> fail, but if rerun they might succeed. Some jobs will permanently fail
> so it is best to only resubmit them a few times (say 3 or 4).
>
> Is the best way to do this to have a times_submitted parameter in
> self.perform(), have the producer query the failed list, and increment
> times_submitted, and re-enque if the times_submitted is less than some
> number, or is there a better way of persisting this information /
> doing this?

bpo has a branch that adds auto_retry you might want to check out:

http://github.com/bpo/resque/tree/auto_retry

His adds a scheduler so jobs aren't run immediately. If I were writing
it I'd just make a simple Failure backend, similar to what he's done,
which keeps track of class+args combo and how many times it has seen
them. So basically what you're saying, but I'd keep the retry logic in
its own class instead of the job.

I would just be careful about putting data into Redis that might get
stuck there if the workers crash.

-- 
Chris Wanstrath
http://github.com/defunkt