librelist archives

« back to archive

Batches never finishing

Batches never finishing

From:
Bj Clark
Date:
2013-05-29 @ 17:52
Hello,

I have a worker that creates batch and adds about 1500 other jobs into it 
(of another type). And I register an on_success notification callback for 
the batch.

The batches never seem to finish, but I don't see error messages in the 
logs and the workers aren't busy. When the batch is created, the workers 
all seem to take jobs and chew through them very quickly, but the batch 
seems to stall out about 1/2 way through and the notification never 
finishes.

I don't see any jobs failing nor are there jobs under retries.

And, of course, this only happens in production. I can run a smaller 
version of the batch locally and in staging and it works fine.

Where should I be looking to debug this? Are jobs which are part of a 
batch and waiting to retry displayed under retries?

Thanks,

--------
BJ Clark
bjclark@goldstar.com<mailto:bjclark@goldstar.com>

Re: [sidekiq] Batches never finishing

From:
Mike Perham
Date:
2013-05-29 @ 18:03
Hey BJ,

1) Debugging this is hard, the logs are the best we have right now.
2) Yep, batch jobs are just plain old jobs.  They are retried as normal and
look as normal in the UI.

Perhaps your call is not firing because you are using the success callback,
not complete?  Success means that 100% of the jobs succeeded without error.
 Completed means that 100% of the jobs executed but any of them could have
had an error.  Some batches will never succeed, due to data or programming
bugs.




On Wed, May 29, 2013 at 10:52 AM, BJ Clark <bjclark@goldstar.com> wrote:

> Hello,
>
> I have a worker that creates batch and adds about 1500 other jobs into it
> (of another type). And I register an on_success notification callback for
> the batch.
>
> The batches never seem to finish, but I don't see error messages in the
> logs and the workers aren't busy. When the batch is created, the workers
> all seem to take jobs and chew through them very quickly, but the batch
> seems to stall out about 1/2 way through and the notification never
> finishes.
>
> I don't see any jobs failing nor are there jobs under retries.
>
> And, of course, this only happens in production. I can run a smaller
> version of the batch locally and in staging and it works fine.
>
> Where should I be looking to debug this? Are jobs which are part of a
> batch and waiting to retry displayed under ret ries?
>
> Thanks,
>
> --------
> BJ Clark
> bjclark@goldstar.com
>

Re: [sidekiq] Batches never finishing

From:
Bj Clark
Date:
2013-05-29 @ 18:06
Mike,

I thought the same about the callback, but I don't think the batch ever 
completes either.

Here's what the UI looks like:
http://monosnap.com/image/iStbHgVSBcO8zSgXm43nKMUNU.png

Nothing is processing, there aren't any jobs enqueue, or any workers busy,
but the batch shows a ton of jobs pending. That's where I'm stuck, I don't
see how to see which jobs it thinks are pending and why it's not just 
processing them. I don't think it would fire the complete callback either.


On May 29, 2013, at 11:03 AM, Mike Perham <mperham@gmail.com> wrote:

> Hey BJ,
> 
> 1) Debugging this is hard, the logs are the best we have right now.
> 2) Yep, batch jobs are just plain old jobs.  They are retried as normal 
and look as normal in the UI.
> 
> Perhaps your call is not firing because you are using the success 
callback, not complete?  Success means that 100% of the jobs succeeded 
without error.  Completed means that 100% of the jobs executed but any of 
them could have had an error.  Some batches will never succeed, due to 
data or programming bugs.
> 
> 
> 
> 
> On Wed, May 29, 2013 at 10:52 AM, BJ Clark <bjclark@goldstar.com> wrote:
> Hello,
> 
> I have a worker that creates batch and adds about 1500 other jobs into 
it (of another type). And I register an on_success notification callback 
for the batch.
> 
> The batches never seem to finish, but I don't see error messages in the 
logs and the workers aren't busy. When the batch is created, the workers 
all seem to take jobs and chew through them very quickly, but the batch 
seems to stall out about 1/2 way through and the notification never 
finishes.
> 
> I don't see any jobs failing nor are there jobs under retries.
> 
> And, of course, this only happens in production. I can run a smaller 
version of the batch locally and in staging and it works fine.
> 
> Where should I be looking to debug this? Are jobs which are part of a 
batch and waiting to retry displayed under ret ries?
> 
> Thanks,
> 
> --------
> BJ Clark
> bjclark@goldstar.com
> 

Re: [sidekiq] Batches never finishing

From:
Mike Perham
Date:
2013-05-29 @ 18:10
Yep, that means that Sidekiq thinks there's just a lot more jobs in the
batch still to be executed.

Can you open an issue with more detail about your sidekiq config, the code
creating the batch, etc?


On Wed, May 29, 2013 at 11:06 AM, BJ Clark <bjclark@me.com> wrote:

> Mike,
>
> I thought the same about the callback, but I don't think the batch ever
> completes either.
>
> Here's what the UI looks like:
> http://monosnap.com/image/iStbHgVSBcO8zSgXm43nKMUNU.png
>
> Nothing is processing, there aren't any jobs enqueue, or any workers busy,
> but the batch shows a ton of jobs pending. That's where I'm stuck, I don't
> see how to see which jobs it thinks are pending and why it's not just
> processing them. I don't think it would fire the complete callback either.
>
>
> On May 29, 2013, at 11:03 AM, Mike Perham <mperham@gmail.com> wrote:
>
> Hey BJ,
>
> 1) Debugging this is hard, the logs are the best we have right now.
> 2) Yep, batch jobs are just plain old jobs.  They are retried as normal
> and look as normal in the UI.
>
> Perhaps your call is not firing because you are using the success
> callback, not complete?  Success means that 100% of the jobs succeeded
> without error.  Completed means that 100% of the jobs executed but any of
> them could have had an error.  Some batches will never succeed, due to data
> or programming bugs.
>
>
>
>
> On Wed, May 29, 2013 at 10:52 AM, BJ Clark <bjclark@goldstar.com> wrote:
>
>> Hello,
>>
>> I have a worker that creates batch and adds about 1500 other jobs into it
>> (of another type). And I register an on_success notification callback for
>> the batch.
>>
>> The batches never seem to finish, but I don't see error messages in the
>> logs and the workers aren't busy. When the batch is created, the workers
>> all seem to take jobs and chew through them very quickly, but the batch
>> seems to stall out about 1/2 way through and the notification never
>> finishes.
>>
>> I don't see any jobs failing nor are there jobs under retries.
>>
>> And, of course, this only happens in production. I can run a smaller
>> version of the batch locally and in staging and it works fine.
>>
>> Where should I be looking to debug this? Are jobs which are part of a
>> batch and waiting to retry displayed under ret ries?
>>
>> Thanks,
>>
>> --------
>> BJ Clark
>> bjclark@goldstar.com
>>
>
>
>