librelist archives

« back to archive

processing multiple jobs per fork

processing multiple jobs per fork

From:
Mick Staugaard
Date:
2010-03-30 @ 05:43
We (at Zendesk) are processing a very high number of resque jobs, and most
of them are pretty fast.

We are now processing jobs at a rate, where forking and our after_fork 
block is becoming too time consuming, and have therefore started running 
some of our workers with forking turned of. We realize that this is not at
viable solution, as even the smallest memory leak would grow to be a 
problem really fast.

I've created an addition to resque, that allows you to specify that each 
fork should process more than one job. You could start your worker like 
this:

QUEUE=* JOBS_PER_FORK=100 rake resque:work

Which would mean that each fork will process 100 jobs before terminating it self.

I've also added a new before_child_exit hook, that allows you to execute a
block of code right before a fork terminates. We are going to use this 
hook to submit our job instrumentation data to new relic just before each 
fork dies.

http://github.com/staugaard/resque/commit/d88f02d11ec6c5b49349954d6503459a8656dc62

Any thoughts on this?

Thanks,
Mick

Re: [resque] processing multiple jobs per fork

From:
Chris Wanstrath
Date:
2010-03-30 @ 18:11
On Mon, Mar 29, 2010 at 10:43 PM, Mick Staugaard <mick@zendesk.com> wrote:

> We are now processing jobs at a rate, where forking and our after_fork 
block is becoming too time consuming, and have therefore started running 
some of our workers with forking turned of. We realize that this is not at
viable solution, as even the smallest memory leak would grow to be a 
problem really fast.
>
> I've created an addition to resque, that allows you to specify that each
fork should process more than one job. You could start your worker like 
this:
>
> QUEUE=* JOBS_PER_FORK=100 rake resque:work
>
> Which would mean that each fork will process 100 jobs before terminating
it self.
>
> I've also added a new before_child_exit hook, that allows you to execute
a block of code right before a fork terminates. We are going to use this 
hook to submit our job instrumentation data to new relic just before each 
fork dies.
>
> 
http://github.com/staugaard/resque/commit/d88f02d11ec6c5b49349954d6503459a8656dc62

Have you thought about implementing this as a plugin?

http://gist.github.com/349376

Chris

Re: [resque] processing multiple jobs per fork

From:
Mick Staugaard
Date:
2010-03-30 @ 18:33
Yes, I thought about that, but I really think that it should be a "native"
feature of Resque, as it will speed up a lot of people's job processing. 
Also the required signal handling (that you mentioned in the github 
comments) would just be too hacky in a plugin.

My guess is that many people have jobs that depend on the rails 
environment, and all these jobs require you to have a 
ActiveRecord::Base.establish_connection in your after_fork hook, which 
gets pretty expensive if you start having frequent jobs.

It is also a really clean solution for stuff like new relic 
instrumentation, as sending data to new relic after each job, is just 
going to kill the job workers and new relic.

Pretty neat idea to implement it in an after_fork hook though. It would 
however prevent people from setting their own after_fork hook, since we 
only support one of each hook.

You don't think it smells like a "native" Resque feature?

Mick

>> We are now processing jobs at a rate, where forking and our after_fork 
block is becoming too time consuming, and have therefore started running 
some of our workers with forking turned of. We realize that this is not at
viable solution, as even the smallest memory leak would grow to be a 
problem really fast.
>> 
>> I've created an addition to resque, that allows you to specify that 
each fork should process more than one job. You could start your worker 
like this:
>> 
>> QUEUE=* JOBS_PER_FORK=100 rake resque:work
>> 
>> Which would mean that each fork will process 100 jobs before 
terminating it self.
>> 
>> I've also added a new before_child_exit hook, that allows you to 
execute a block of code right before a fork terminates. We are going to 
use this hook to submit our job instrumentation data to new relic just 
before each fork dies.
>> 
>> 
http://github.com/staugaard/resque/commit/d88f02d11ec6c5b49349954d6503459a8656dc62
> 
> Have you thought about implementing this as a plugin?
> 
> http://gist.github.com/349376
> 
> Chris

Re: [resque] processing multiple jobs per fork

From:
Tony Arcieri
Date:
2010-04-01 @ 01:24
On Tue, Mar 30, 2010 at 12:33 PM, Mick Staugaard <mick@zendesk.com> wrote:

> My guess is that many people have jobs that depend on the rails
> environment, and all these jobs require you to have a
> ActiveRecord::Base.establish_connection in your after_fork hook, which gets
> pretty expensive if you start having frequent jobs.
>

People are really loading the entire Rails environment into their workers?
That seems... excessive

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] processing multiple jobs per fork

From:
Chris Wanstrath
Date:
2010-04-01 @ 06:39
On Wed, Mar 31, 2010 at 6:24 PM, Tony Arcieri <tony@medioh.com> wrote:

> People are really loading the entire Rails environment into their workers?
> That seems... excessive

Resque was designed so you could load your entire Rails environment
into a worker - that's pretty much why it forks.

-- 
Chris Wanstrath
http://github.com/defunkt

Re: [resque] processing multiple jobs per fork

From:
Jason Amster
Date:
2010-04-01 @ 14:03
I've been able to selectively load parts of the rails environment so that
the memory footprint is down by 50%.  Essentially, I created a different
environment which mimic production and selectively choose which libraries,
initializers, plugins, and even models get loaded.  It's slightly more
difficult to maintain, but gives us much more memory for workers.  Most of
my jobs don't' require many of the libraries that are needed elsewhere in my
app, but I think it's a case by case basis.

Jason


On Thu, Apr 1, 2010 at 2:39 AM, Chris Wanstrath <chris@ozmm.org> wrote:

> On Wed, Mar 31, 2010 at 6:24 PM, Tony Arcieri <tony@medioh.com> wrote:
>
> > People are really loading the entire Rails environment into their
> workers?
> > That seems... excessive
>
> Resque was designed so you could load your entire Rails environment
> into a worker - that's pretty much why it forks.
>
> --
> Chris Wanstrath
> http://github.com/defunkt
>

Re: [resque] processing multiple jobs per fork

From:
Mason Jones
Date:
2010-04-01 @ 06:29
On Wed, Mar 31, 2010 at 6:24 PM, Tony Arcieri <tony@medioh.com> wrote:
> On Tue, Mar 30, 2010 at 12:33 PM, Mick Staugaard <mick@zendesk.com> wrote:
>>
>> My guess is that many people have jobs that depend on the rails
>> environment, and all these jobs require you to have a
>> ActiveRecord::Base.establish_connection in your after_fork hook, which gets
>> pretty expensive if you start having frequent jobs.
>
> People are really loading the entire Rails environment into their workers?
> That seems... excessive

Yeah, I'm doing that right now, because one of my jobs involves
pulling data down from a web service API and using it to fill a bunch
of data using models that combine both ActiveRecord and Redis data.
Unless I want to duplicate a bunch of business logic, I need to use
the existing models, which use ActiveRecord, which...etc. Someday when
I can move to Rails3 I can use ActiveRecord without loading up all of
Rails, but until then I haven't come up with a better idea. Thankfully
I don't need tons of workers so it's not too horrible.

But I do agree with you, philosophically.

Re: [resque] processing multiple jobs per fork

From:
Tony Arcieri
Date:
2010-04-01 @ 16:45
On Thu, Apr 1, 2010 at 12:29 AM, Mason Jones <masonoise@gmail.com> wrote:

> Unless I want to duplicate a bunch of business logic, I need to use
> the existing models, which use ActiveRecord, which...etc. Someday when
> I can move to Rails3 I can use ActiveRecord without loading up all of
> Rails, but until then I haven't come up with a better idea.
>

We use AR models as well, however they're packaged as a separate gem which
we share both across multiple Rails apps and in our background jobs.  It's
been possible to use AR separate from Rails for several years... it's not a
Rails 3-specific feature.

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] processing multiple jobs per fork

From:
Mason Jones
Date:
2010-04-01 @ 16:51
On Thu, Apr 1, 2010 at 9:45 AM, Tony Arcieri <tony@medioh.com> wrote:
> On Thu, Apr 1, 2010 at 12:29 AM, Mason Jones <masonoise@gmail.com> wrote:
>>
>> Unless I want to duplicate a bunch of business logic, I need to use
>> the existing models, which use ActiveRecord, which...etc. Someday when
>> I can move to Rails3 I can use ActiveRecord without loading up all of
>> Rails, but until then I haven't come up with a better idea.
>
> We use AR models as well, however they're packaged as a separate gem which
> we share both across multiple Rails apps and in our background jobs.  It's
> been possible to use AR separate from Rails for several years... it's not a
> Rails 3-specific feature.

Yeah, that's true; and as Jason mentioned, we could work out what
pieces of the environment we need and package it all up. With Rails 3
it'll be much easier to break things apart, is what I meant. I guess
right now it's just that the added complexity of packaging and
deploying hasn't been worth it. Our app is very new and, of course,
that may change, but thus far it hasn't been worth the hassle.

Re: [resque] processing multiple jobs per fork

From:
Jason Amster
Date:
2010-04-01 @ 17:01
On Thu, Apr 1, 2010 at 12:51 PM, Mason Jones <masonoise@gmail.com> wrote:

> Yeah, that's true; and as Jason mentioned, we could work out what
> pieces of the environment we need and package it all up. With Rails 3
> it'll be much easier to break things apart, is what I meant. I guess
> right now it's just that the added complexity of packaging and
> deploying hasn't been worth it. Our app is very new and, of course,
> that may change, but thus far it hasn't been worth the hassle.
>

To lighten the footprint in my special resque/worker environment I just
remove the main Rails libraries I know I my workers don't need, and specify
my plugins as well.  Additionally I monkey-patched the Rails::Initializer
class to allow me to choose which initializers I want loaded too.

http://gist.github.com/352077

Then, when starting workers I just specify RAILS_ENV=production_workers

Re: [resque] processing multiple jobs per fork

From:
Chris Wanstrath
Date:
2010-04-01 @ 18:02
On Thu, Apr 1, 2010 at 9:45 AM, Tony Arcieri <tony@medioh.com> wrote:


> We use AR models as well, however they're packaged as a separate gem which
> we share both across multiple Rails apps and in our background jobs.  It's
> been possible to use AR separate from Rails for several years... it's not a
> Rails 3-specific feature.

You must not deploy that often ;)

-- 
Chris Wanstrath
http://github.com/defunkt

Re: [resque] processing multiple jobs per fork

From:
Tony Arcieri
Date:
2010-04-01 @ 19:42
On Thu, Apr 1, 2010 at 12:02 PM, Chris Wanstrath <chris@ozmm.org> wrote:

>  You must not deploy that often ;)
>

We deploy frequently, and bundle gems upon deployment.  That's a blessing
and a curse, I suppose, but it's the same thing we'd do if we were running
our background jobs from the Rails environment *shrug*

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] processing multiple jobs per fork

From:
Chris Wanstrath
Date:
2010-03-30 @ 19:12
On Tue, Mar 30, 2010 at 11:33 AM, Mick Staugaard <mick@zendesk.com> wrote:

> Yes, I thought about that, but I really think that it should be a 
"native" feature of Resque, as it will speed up a lot of people's job 
processing. Also the required signal handling (that you mentioned in the 
github comments) would just be too hacky in a plugin.

One of the main reasons Resque uses forking is to avoid having
children manage their own signals - children do work, parents handle
signals. Whether in a hook or worker.rb, it's something to be avoided
because it's just not reliable.

> My guess is that many people have jobs that depend on the rails 
environment, and all these jobs require you to have a 
ActiveRecord::Base.establish_connection in your after_fork hook, which 
gets pretty expensive if you start having frequent jobs.

This is certainly not required. If you're getting "Mysql::Error: MySQL
server has gone away" due to idle workers, this Gist should help:
http://gist.github.com/238999

Otherwise, because the parent is not using the MySQL connection while
the child is working there's no need to establish a new connection on
every fork. It's safe to share it.

> It is also a really clean solution for stuff like new relic 
instrumentation, as sending data to new relic after each job, is just 
going to kill the job workers and new relic.

Doesn't an after_fork plugin solve that as well?

> Pretty neat idea to implement it in an after_fork hook though. It would 
however prevent people from setting their own after_fork hook, since we 
only support one of each hook.

Easy enough, this'll be in 2.0:

http://github.com/defunkt/resque/commit/408b7e8bdf1fa9ddb39f85f36075b79b7478d5f8

> You don't think it smells like a "native" Resque feature?

Correct. This is why we have APIs for plugins and are developing jobs
hooks. There are already a number of plugins that provide "native"
functionality not included in Resque:

http://wiki.github.com/defunkt/resque/plugins

-- 
Chris Wanstrath
http://github.com/defunkt

Re: [resque] processing multiple jobs per fork

From:
Tony Arcieri
Date:
2010-03-30 @ 20:00
On Tue, Mar 30, 2010 at 1:12 PM, Chris Wanstrath <chris@ozmm.org> wrote:

>  > My guess is that many people have jobs that depend on the rails
> environment, and all these jobs require you to have a
> ActiveRecord::Base.establish_connection in your after_fork hook, which gets
> pretty expensive if you start having frequent jobs.
>
> This is certainly not required. If you're getting "Mysql::Error: MySQL
> server has gone away" due to idle workers, this Gist should help:
> http://gist.github.com/238999
>

For what it's worth, I encountered this problem originally and switched our
worker to JRuby.  The MySQL JDBC adapter does not experience this problem as
it apparently has sane reconnection support already built-in.

Our Resque workers have been running fine in JRuby although I'm guessing I'm
one of the first to even try that configuration.

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] processing multiple jobs per fork

From:
Mick Staugaard
Date:
2010-03-30 @ 22:37
Alright, I guess it goes into a plugin then.

The after_fork approach you suggested did have the problem, that it did 
not execute the jobs in the order they entered the queue, so I had to go 
with another approach.

http://github.com/staugaard/resque-multi-job-forks

Thanks for your input,
Mick

On Mar 30, 2010, at 12:12 PM, Chris Wanstrath wrote:

> On Tue, Mar 30, 2010 at 11:33 AM, Mick Staugaard <mick@zendesk.com> wrote:
> 
>> Yes, I thought about that, but I really think that it should be a 
"native" feature of Resque, as it will speed up a lot of people's job 
processing. Also the required signal handling (that you mentioned in the 
github comments) would just be too hacky in a plugin.
> 
> One of the main reasons Resque uses forking is to avoid having
> children manage their own signals - children do work, parents handle
> signals. Whether in a hook or worker.rb, it's something to be avoided
> because it's just not reliable.
> 
>> My guess is that many people have jobs that depend on the rails 
environment, and all these jobs require you to have a 
ActiveRecord::Base.establish_connection in your after_fork hook, which 
gets pretty expensive if you start having frequent jobs.
> 
> This is certainly not required. If you're getting "Mysql::Error: MySQL
> server has gone away" due to idle workers, this Gist should help:
> http://gist.github.com/238999
> 
> Otherwise, because the parent is not using the MySQL connection while
> the child is working there's no need to establish a new connection on
> every fork. It's safe to share it.
> 
>> It is also a really clean solution for stuff like new relic 
instrumentation, as sending data to new relic after each job, is just 
going to kill the job workers and new relic.
> 
> Doesn't an after_fork plugin solve that as well?
> 
>> Pretty neat idea to implement it in an after_fork hook though. It would
however prevent people from setting their own after_fork hook, since we 
only support one of each hook.
> 
> Easy enough, this'll be in 2.0:
> 
> http://github.com/defunkt/resque/commit/408b7e8bdf1fa9ddb39f85f36075b79b7478d5f8
> 
>> You don't think it smells like a "native" Resque feature?
> 
> Correct. This is why we have APIs for plugins and are developing jobs
> hooks. There are already a number of plugins that provide "native"
> functionality not included in Resque:
> 
> http://wiki.github.com/defunkt/resque/plugins
> 
> -- 
> Chris Wanstrath
> http://github.com/defunkt

Re: [resque] processing multiple jobs per fork

From:
Chris Wanstrath
Date:
2010-04-01 @ 01:08
On Tue, Mar 30, 2010 at 3:37 PM, Mick Staugaard <mick@zendesk.com> wrote:

> The after_fork approach you suggested did have the problem, that it did 
not execute the jobs in the order they entered the queue, so I had to go 
with another approach.

I assume this is because the first job acquired gets run last - I've
updated the gist to make sure it gets run first. Were the others out
of order too?

Chris

Re: [resque] processing multiple jobs per fork

From:
Mick Staugaard
Date:
2010-04-01 @ 06:59
Right, it was only the first job that was performed last. Wouldn't your 
change mean that the first job gets performed both first AND last? as in 
twice?

Mick

On Mar 31, 2010, at 6:08 PM, Chris Wanstrath wrote:

> On Tue, Mar 30, 2010 at 3:37 PM, Mick Staugaard <mick@zendesk.com> wrote:
> 
>> The after_fork approach you suggested did have the problem, that it did
not execute the jobs in the order they entered the queue, so I had to go 
with another approach.
> 
> I assume this is because the first job acquired gets run last - I've
> updated the gist to make sure it gets run first. Were the others out
> of order too?
> 
> Chris