librelist archives

« back to archive

Workers dying

Workers dying

From:
Mason Jones
Date:
2010-05-22 @ 01:06
Okay, finally got a little bit of info tonight, when all four of my
workers died. Interestingly, I see this at the end of their output:

> rake aborted!
> SIGHUP
>
> (See full trace by running task with --trace)

I see the same thing for all of them. So...fair enough, they got a
SIGHUP, they quit, which would be expected. So now I'm left wondering
where the hell the SIGHUP is coming from. They were running in the
background, started with:

VERBOSE=1 QUEUE=load_data RAILS_ENV=production rake environment
resque:work > log/resque1.log 2>&1 &

I'm going to go back to running them in the foreground via 'screen'
now, and I have a suspicion that I won't see this happen. I actually
started these in the background exactly in order to see if they would
die again. Running via screen is sort of okay, but is there an easy
way then to use monit/god to start new ones if one dies?

And...if anyone has great ideas of how to figure out the source of the
SIGHUP (this is on a Debian EC2 instance), I'd love to hear 'em. I'm
99% certain it's not the oom-killer. Very odd.

Thanks, all.

Re: Workers dying

From:
Mason Jones
Date:
2010-05-22 @ 01:38
I'm going to perhaps answer some of my own question here, after a
moment's thought, because I might have simply made a stupid assumption
-- do the Resque workers not run as daemons? Does the rake task then
need to be run via nohup? I thought not, but the SIGHUP seems to
indicate so. However, I'm fairly certain that I've had workers started
with a simple '&' run for quite some time after their ssh session was
closed... Curious.


On Fri, May 21, 2010 at 6:06 PM, Mason Jones <masonoise@gmail.com> wrote:
> Okay, finally got a little bit of info tonight, when all four of my
> workers died. Interestingly, I see this at the end of their output:
>
>> rake aborted!
>> SIGHUP
>>
>> (See full trace by running task with --trace)
>
> I see the same thing for all of them. So...fair enough, they got a
> SIGHUP, they quit, which would be expected. So now I'm left wondering
> where the hell the SIGHUP is coming from. They were running in the
> background, started with:
>
> VERBOSE=1 QUEUE=load_data RAILS_ENV=production rake environment
> resque:work > log/resque1.log 2>&1 &
>
> I'm going to go back to running them in the foreground via 'screen'
> now, and I have a suspicion that I won't see this happen. I actually
> started these in the background exactly in order to see if they would
> die again. Running via screen is sort of okay, but is there an easy
> way then to use monit/god to start new ones if one dies?
>
> And...if anyone has great ideas of how to figure out the source of the
> SIGHUP (this is on a Debian EC2 instance), I'd love to hear 'em. I'm
> 99% certain it's not the oom-killer. Very odd.
>
> Thanks, all.
>