librelist archives

« back to archive

costly active record initialization

costly active record initialization

From:
Karl Baum
Date:
2010-05-20 @ 21:23
I am spawning 100's of small jobs that I would like to run in parallel  
and I am noticing one downside to this approach.  It looks like some  
active record initialization code is run each time one of these small  
jobs is forked.  For example, I see:

DEBUG      SQL (6.5ms)   BEGIN
DEBUG      SQL (58.6ms)   COMMIT
DEBUG      Email::Message Columns (76.1ms)   SHOW FIELDS FROM  
`email_messages`
DEBUG      Merchant Columns (71.1ms)   SHOW FIELDS FROM `merchants`

I guess there could be even more initialization code that is running  
for each fork, but active record is the one I see logging.  Is there a  
way around this?  Is this not the best way to use resque?

thx.

-karl

Re: [resque] costly active record initialization

From:
Philippe Lafoucrière
Date:
2010-05-20 @ 21:25
On Thu, May 20, 2010 at 11:23 PM, Karl Baum <karl.baum@gmail.com> wrote:
> I guess there could be even more initialization code that is running
> for each fork, but active record is the one I see logging.  Is there a
> way around this?  Is this not the best way to use resque?

Never noticed, but good point, I'll check if we have the same problem.
In the mean time, did take a look at
http://github.com/defunkt/resque/blob/master/docs/HOOKS.md ?

Re: [resque] costly active record initialization

From:
Tony Arcieri
Date:
2010-05-20 @ 21:26
On Thu, May 20, 2010 at 3:23 PM, Karl Baum <karl.baum@gmail.com> wrote:

> I am spawning 100's of small jobs that I would like to run in parallel
> and I am noticing one downside to this approach.  It looks like some
> active record initialization code is run each time one of these small
> jobs is forked.


Have you considered making each job process a batch?  That's often very
useful when you have high setup overhead.

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] costly active record initialization

From:
Karl Baum
Date:
2010-05-20 @ 21:32
I have considered it, but I wanted to see if there was a way to avoid  
this.  Would really like to parallelize this process.

Thanks!


On May 20, 2010, at 5:26 PM, Tony Arcieri wrote:

> On Thu, May 20, 2010 at 3:23 PM, Karl Baum <karl.baum@gmail.com>  
> wrote:
> I am spawning 100's of small jobs that I would like to run in parallel
> and I am noticing one downside to this approach.  It looks like some
> active record initialization code is run each time one of these small
> jobs is forked.
>
> Have you considered making each job process a batch?  That's often  
> very useful when you have high setup overhead.
>
> -- 
> Tony Arcieri
> Medioh! A Kudelski Brand

Re: [resque] costly active record initialization

From:
Tony Arcieri
Date:
2010-05-20 @ 21:35
On Thu, May 20, 2010 at 3:32 PM, Karl Baum <karl.baum@gmail.com> wrote:

> I have considered it, but I wanted to see if there was a way to avoid this.
>

Why?  Batch processing sounds much better suited for your particular
workload.  Although you're not the first person on this list I've seen
reluctant about switching to a batch processing model.


> Would really like to parallelize this process.
>

You can parallelize the process by having multiple workers working on
multiple batches at once.

-- 
Tony Arcieri
Medioh! A Kudelski Brand

Re: [resque] costly active record initialization

From:
Karl Baum
Date:
2010-05-20 @ 21:47
You're right.  Batch processing is better in many situations.. like  
when there is a high cost of initialization of each job or when you  
would like to batch costly IO interactions with a persistent store.   
In my situation, i would like to be as multi-threaded as possible and  
i want to see how far i can push it.

Thanks!


On May 20, 2010, at 5:35 PM, Tony Arcieri wrote:

> On Thu, May 20, 2010 at 3:32 PM, Karl Baum <karl.baum@gmail.com>  
> wrote:
> I have considered it, but I wanted to see if there was a way to  
> avoid this.
>
> Why?  Batch processing sounds much better suited for your particular  
> workload.  Although you're not the first person on this list I've  
> seen reluctant about switching to a batch processing model.
>
> Would really like to parallelize this process.
>
> You can parallelize the process by having multiple workers working  
> on multiple batches at once.
>
> -- 
> Tony Arcieri
> Medioh! A Kudelski Brand

Re: [resque] costly active record initialization

From:
Scott Tamosunas
Date:
2010-05-20 @ 21:49
Hey Karl,

That's why we added the before_first_fork hook so we can pre-load all our
models as we were seeing each resque job taking a long time. Try adding
something like this to your resque initializer:

Resque.before_first_fork do
  ActiveRecord::Base.send(:subclasses).each { |klass|  klass.columns }
end

This will happen once only on resque start and should eliminate the repeated
SHOW_FIELDS

Scott


On Thu, May 20, 2010 at 2:47 PM, Karl Baum <karl.baum@gmail.com> wrote:

> You're right.  Batch processing is better in many situations.. like when
> there is a high cost of initialization of each job or when you would like to
> batch costly IO interactions with a persistent store.  In my situation, i
> would like to be as multi-threaded as possible and i want to see how far i
> can push it.
>
> Thanks!
>
>
> On May 20, 2010, at 5:35 PM, Tony Arcieri wrote:
>
> On Thu, May 20, 2010 at 3:32 PM, Karl Baum <karl.baum@gmail.com> wrote:
>
>> I have considered it, but I wanted to see if there was a way to avoid
>> this.
>>
>
> Why?  Batch processing sounds much better suited for your particular
> workload.  Although you're not the first person on this list I've seen
> reluctant about switching to a batch processing model.
>
>
>> Would really like to parallelize this process.
>>
>
> You can parallelize the process by having multiple workers working on
> multiple batches at once.
>
> --
> Tony Arcieri
> Medioh! A Kudelski Brand
>
>
>

Re: [resque] costly active record initialization

From:
Karl Baum
Date:
2010-05-20 @ 22:25
That worked perfectly.  Thanks!
On May 20, 2010, at 5:49 PM, Scott Tamosunas wrote:

> Hey Karl,
>
> That's why we added the before_first_fork hook so we can pre-load  
> all our models as we were seeing each resque job taking a long time.  
> Try adding something like this to your resque initializer:
>
> Resque.before_first_fork do
>   ActiveRecord::Base.send(:subclasses).each { |klass|  klass.columns }
> end
>
> This will happen once only on resque start and should eliminate the  
> repeated SHOW_FIELDS
>
> Scott
>
>
> On Thu, May 20, 2010 at 2:47 PM, Karl Baum <karl.baum@gmail.com>  
> wrote:
> You're right.  Batch processing is better in many situations.. like  
> when there is a high cost of initialization of each job or when you  
> would like to batch costly IO interactions with a persistent store.   
> In my situation, i would like to be as multi-threaded as possible  
> and i want to see how far i can push it.
>
> Thanks!
>
>
> On May 20, 2010, at 5:35 PM, Tony Arcieri wrote:
>
>> On Thu, May 20, 2010 at 3:32 PM, Karl Baum <karl.baum@gmail.com>  
>> wrote:
>> I have considered it, but I wanted to see if there was a way to  
>> avoid this.
>>
>> Why?  Batch processing sounds much better suited for your  
>> particular workload.  Although you're not the first person on this  
>> list I've seen reluctant about switching to a batch processing model.
>>
>> Would really like to parallelize this process.
>>
>> You can parallelize the process by having multiple workers working  
>> on multiple batches at once.
>>
>> -- 
>> Tony Arcieri
>> Medioh! A Kudelski Brand
>
>