librelist archives

« back to archive

Running multiple workers, on demand

Running multiple workers, on demand

From:
Ian Warshak
Date:
2010-03-25 @ 02:58
Hi,

I am working on a project where hundreds of thousands of jobs get queued up,
and at the end of every month a job is run which processes all the jobs in
the queue and then quits. It has to be run during a maintenance window.

This is almost a perfect fit for resque. The exception being the worker,
which continuously scans for jobs.

So I wrote my own simple worker which forks X times and each process pulls
Jobs off of the queue and processes them. I need multiple processes because
of the large amount of jobs and the small window of time that I have to run
them

I have started working on this, but am running into an issue where the
process is failing on this exception:

undefined method `all_hashes' for nil:NilClass


whiny_nil.rb:52:in `method_missing'


mysql_adapter.rb:609:in `select'


So I am sure this has something to do with forking and the mysql
connection, perhaps even because I am in development mode. But the
architecture of resque depends on fork() and it works fine if I run
the normal worker. I am stumped. Has anyone run into this, or have any
insight? I'd really appreciate it.


Here is my simple worker: https://gist.github.com/b27423759a36c3e20066


Thanks,

Ian

Re: [resque] Running multiple workers, on demand

From:
Chris Wanstrath
Date:
2010-03-25 @ 17:57
On Wed, Mar 24, 2010 at 7:58 PM, Ian Warshak <iwarshak@stripey.net> wrote:

> So I am sure this has something to do with forking and the mysql connection,
> perhaps even because I am in development mode. But the architecture of
> resque depends on fork() and it works fine if I run the normal worker. I am
> stumped. Has anyone run into this, or have any insight? I'd really
> appreciate it.

If you initialize a MySQL connection in a parent process then use it
multiple forked child processes concurrently, you can run into
trouble. That goes for most connections of this type, in my experience
(e.g. redis).

Try doing this after forking:

    ActiveRecord::Base.establish_connection

That'll open a new connection. We do this in our Unicorn `after_fork` hook.

-- 
Chris Wanstrath
http://github.com/defunkt

Re: [resque] Running multiple workers, on demand

From:
Ian Warshak
Date:
2010-03-25 @ 20:36
On Thu, Mar 25, 2010 at 12:57 PM, Chris Wanstrath <chris@ozmm.org> wrote:

> On Wed, Mar 24, 2010 at 7:58 PM, Ian Warshak <iwarshak@stripey.net> wrote:
>
> > So I am sure this has something to do with forking and the mysql
> connection,
> > perhaps even because I am in development mode. But the architecture of
> > resque depends on fork() and it works fine if I run the normal worker. I
> am
> > stumped. Has anyone run into this, or have any insight? I'd really
> > appreciate it.
>

After a night of sleep, I realized that the reason it works just fine in
Resque is because the Resque worker doesn't do any mysql calls itself, so it
probably isn't an issue if the parent/child share a mysql connection.


>
> If you initialize a MySQL connection in a parent process then use it
> multiple forked child processes concurrently, you can run into
> trouble. That goes for most connections of this type, in my experience
> (e.g. redis).
>

Yes, I found this out the hard way. Before forking, I called ActiveRecord::
Base.connection_pool.disconnect! which empties to connection_pool and it
seems like a new connection (or pool) is made as its needed in the forked
code.

I also had to reconnect to the Redis server inside my forks() I wasn't doing
that and my processes were getting responses from other process requests.
Re-establishing the Redis connection did the trick.

Here is what I ended up with. https://gist.github.com/906a32b085ecc5bf32fe

It isn't a true worker in the sense that it doesn't register itself as one,
but I am able to record failed jobs. It probably wouldn't be that much work
to whip it into a true Worker though.

Thanks for everyones insight. This solution looks like it's going to work
good.

Ian

>
> Try doing this after forking:
>
>    ActiveRecord::Base.establish_connection
>
> That'll open a new connection. We do this in our Unicorn `after_fork` hook.
>
> --
> Chris Wanstrath
> http://github.com/defunkt
>

Re: [resque] Running multiple workers, on demand

From:
Scott Tamosunas
Date:
2010-03-25 @ 16:24
Ian,

How many process are you forking? Does it happen on the first child process
or in a subsequent child?

Scott

On Wed, Mar 24, 2010 at 7:58 PM, Ian Warshak <iwarshak@stripey.net> wrote:

> Hi,
>
> I am working on a project where hundreds of thousands of jobs get queued
> up, and at the end of every month a job is run which processes all the jobs
> in the queue and then quits. It has to be run during a maintenance window.
>
> This is almost a perfect fit for resque. The exception being the worker,
> which continuously scans for jobs.
>
> So I wrote my own simple worker which forks X times and each process pulls
> Jobs off of the queue and processes them. I need multiple processes because
> of the large amount of jobs and the small window of time that I have to run
> them
>
> I have started working on this, but am running into an issue where the
> process is failing on this exception:
>
> undefined method `all_hashes' for nil:NilClass
>
> whiny_nil.rb:52:in `method_missing'
>
> mysql_adapter.rb:609:in `select'
>
>
>
> So I am sure this has something to do with forking and the mysql 
connection, perhaps even because I am in development mode. But the 
architecture of resque depends on fork() and it works fine if I run the 
normal worker. I am stumped. Has anyone run into this, or have any 
insight? I'd really appreciate it.
>
>
> Here is my simple worker: https://gist.github.com/b27423759a36c3e20066
>
>
> Thanks,
>
> Ian
>
>