librelist archives

« back to archive

restarting worker when there is long running process, can resque handle this?

restarting worker when there is long running process, can resque handle this?

From:
Reynard
Date:
2010-04-09 @ 21:34
Hi,

It looks like if I stop the worker when it's working on a (long) job, the
job gets abandoned and lost. Shouldn't it either let the child finish
working on a job or re-queue the job?

I think I'll just explain the use case: we have some long running job (that
might take more than 1 hour). when we deploy new code we need to restart the
workers (to load the new code, right?), but that might kill the running
process and leave the job in inconsistent state, and worse we might lose the
job (we cannot tell the new worker to restart the job). Can resque handle
this case more gracefully?

Sorry if this has been addressed in the list before. I couldn't find a way
to search the list and browsing takes too long with the interface on
librelist, why not just use google group btw?

thanks,
-r

Re: [resque] restarting worker when there is long running process, can resque handle this?

From:
Tony Arcieri
Date:
2010-04-09 @ 21:43
On Fri, Apr 9, 2010 at 3:34 PM, Reynard <reynard.list@gmail.com> wrote:

> Hi,
>
> It looks like if I stop the worker when it's working on a (long) job, the
> job gets abandoned and lost. Shouldn't it either let the child finish
> working on a job or re-queue the job?
>

I'm sure everyone has a different opinion on this.  I'm extremely
opinionated myself...

The workers should never manage the state of the jobs.  You should always
assume workers can evaporate into the miasma at any time for whatever
reason.  Power supplies burn out.  Hardware fails.  Whatever system you
deploy, it must deal with these cases.

The latest release of Resque tracks job state outside the workers.  That's
one option.  I prefer having command and control processes which track the
lifetime of your "workflows" (where a workflow consists of one or more jobs
that must execute in order) and retry jobs when they fail.  This requires
your jobs are idempotent.

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] restarting worker when there is long running process, can resque handle this?

From:
Reynard
Date:
2010-04-10 @ 05:31
> It looks like if I stop the worker when it's working on a (long) job, the
>> job gets abandoned and lost. Shouldn't it either let the child finish
>> working on a job or re-queue the job?
>>
>
> I'm sure everyone has a different opinion on this.  I'm extremely
> opinionated myself...
>
> The workers should never manage the state of the jobs.  You should always
> assume workers can evaporate into the miasma at any time for whatever
> reason.  Power supplies burn out.  Hardware fails.  Whatever system you
> deploy, it must deal with these cases.
>
>
Hi Tony, thanks for the reply, so if I understand what you mean I should
expect that anything can go wrong with the worker and should handle it in
the worker (log error or requeue the job?)

although in my case my main concern is just that when I restart the workers
(when it's working on some jobs) wouldn't it be nice if it restarts
gracefully (let the child process finish the job, and restart immediately).
right now I'm just testing on local so I just hit Ctrl+C to stop the worker
which seems to kill the child process. maybe I'm just missing on how to do a
proper restart?

- reynard

Re: [resque] restarting worker when there is long running process, can resque handle this?

From:
Reynard
Date:
2010-04-10 @ 06:09
ha! I overlooked the documentation, looks like I can send QUIT signal to
wait for child to finish and quit.
So I guess deploy script just need to send QUIT to the existing workers and
then start the new workers? Anyone cares to share capistrano recipe for
restarting the worker :)

- reynard


On Sat, Apr 10, 2010 at 1:31 AM, Reynard <reynard.list@gmail.com> wrote:

>
> It looks like if I stop the worker when it's working on a (long) job, the
>>> job gets abandoned and lost. Shouldn't it either let the child finish
>>> working on a job or re-queue the job?
>>>
>>
>> I'm sure everyone has a different opinion on this.  I'm extremely
>> opinionated myself...
>>
>> The workers should never manage the state of the jobs.  You should always
>> assume workers can evaporate into the miasma at any time for whatever
>> reason.  Power supplies burn out.  Hardware fails.  Whatever system you
>> deploy, it must deal with these cases.
>>
>>
> Hi Tony, thanks for the reply, so if I understand what you mean I should
> expect that anything can go wrong with the worker and should handle it in
> the worker (log error or requeue the job?)
>
> although in my case my main concern is just that when I restart the workers
> (when it's working on some jobs) wouldn't it be nice if it restarts
> gracefully (let the child process finish the job, and restart immediately).
> right now I'm just testing on local so I just hit Ctrl+C to stop the worker
> which seems to kill the child process. maybe I'm just missing on how to do a
> proper restart?
>
> - reynard
>

Re: [resque] restarting worker when there is long running process, can resque handle this?

From:
Chris Wanstrath
Date:
2010-04-10 @ 18:09
Yep, QUIT is what you want. If you're using god you can send a signal
to a group with ease:

$ god signal resque QUIT

Chris

On Friday, April 9, 2010, Reynard <reynard.list@gmail.com> wrote:
> ha! I overlooked the documentation, looks like I can send QUIT signal to
wait for child to finish and quit.
> So I guess deploy script just need to send QUIT to the existing workers 
and then start the new workers? Anyone cares to share capistrano recipe 
for restarting the worker :)
>
> - reynard
>
>
> On Sat, Apr 10, 2010 at 1:31 AM, Reynard <reynard.list@gmail.com> wrote:
>
>
>
>
>
> It looks like if I stop the worker when it's working on a (long) job, 
the job gets abandoned and lost. Shouldn't it either let the child finish 
working on a job or re-queue the job?
>
>
>
> I'm sure everyone has a different opinion on this.  I'm extremely 
opinionated myself...
>
>
> The workers should never manage the state of the jobs.  You should 
always assume workers can evaporate into the miasma at any time for 
whatever reason.  Power supplies burn out.  Hardware fails.  Whatever 
system you deploy, it must deal with these cases.
>
>
>
>
>
>
> Hi Tony, thanks for the reply, so if I understand what you mean I should
expect that anything can go wrong with the worker and should handle it in 
the worker (log error or requeue the job?)
>
> although in my case my main concern is just that when I restart the 
workers (when it's working on some jobs) wouldn't it be nice if it 
restarts gracefully (let the child process finish the job, and restart 
immediately). right now I'm just testing on local so I just hit Ctrl+C to 
stop the worker which seems to kill the child process. maybe I'm just 
missing on how to do a proper restart?
>
> - reynard
>
>
>

-- 
Chris Wanstrath
http://github.com/defunkt

Re: [resque] restarting worker when there is long running process, can resque handle this?

From:
Reynard
Date:
2010-04-12 @ 14:05
Thanks for the tips, Chris.
- reynard

On Sat, Apr 10, 2010 at 2:09 PM, Chris Wanstrath <chris@ozmm.org> wrote:

> Yep, QUIT is what you want. If you're using god you can send a signal
> to a group with ease:
>
> $ god signal resque QUIT
>
> Chris
>
> On Friday, April 9, 2010, Reynard <reynard.list@gmail.com> wrote:
> > ha! I overlooked the documentation, looks like I can send QUIT signal to
> wait for child to finish and quit.
> > So I guess deploy script just need to send QUIT to the existing workers
> and then start the new workers? Anyone cares to share capistrano recipe for
> restarting the worker :)
> >
> > - reynard
> >
> >
> > On Sat, Apr 10, 2010 at 1:31 AM, Reynard <reynard.list@gmail.com> wrote:
> >
> >
> >
> >
> >
> > It looks like if I stop the worker when it's working on a (long) job, the
> job gets abandoned and lost. Shouldn't it either let the child finish
> working on a job or re-queue the job?
> >
> >
> >
> > I'm sure everyone has a different opinion on this.  I'm extremely
> opinionated myself...
> >
> >
> > The workers should never manage the state of the jobs.  You should always
> assume workers can evaporate into the miasma at any time for whatever
> reason.  Power supplies burn out.  Hardware fails.  Whatever system you
> deploy, it must deal with these cases.
> >
> >
> >
> >
> >
> >
> > Hi Tony, thanks for the reply, so if I understand what you mean I should
> expect that anything can go wrong with the worker and should handle it in
> the worker (log error or requeue the job?)
> >
> > although in my case my main concern is just that when I restart the
> workers (when it's working on some jobs) wouldn't it be nice if it restarts
> gracefully (let the child process finish the job, and restart immediately).
> right now I'm just testing on local so I just hit Ctrl+C to stop the worker
> which seems to kill the child process. maybe I'm just missing on how to do a
> proper restart?
> >
> > - reynard
> >
> >
> >
>
> --
> Chris Wanstrath
> http://github.com/defunkt
>