librelist archives

« back to archive

resque-scheduler cannot find resque workers (some of the time)

resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-13 @ 01:09
Hi,

We run redis, resque and resque-scheduler on Heroku. Many of the scheduled
jobs do not get run. The /resque/failed web page shows the error indicating
that rails was never loaded in the worker process. Many of the errors do
not make it into the /resque/failed page listing.

I also see in /resque/workers more workers than I have running in Heroku. I
cannot explian this. I kill the workers from the the rails console, restart
the Heroku workers and it all looks good but eventually more workers appear
in the /resque/workers page.

I suspect that the mechanism that resque-scheduler uses to communicate with
waiting and ready worker processes is broken in some way.

More details:
-I noticed there were 6 workers displayed in the /resque/workers web page
and so I used the Rails console to kill all the workers. Then I restarted
the workers in Heroku. That did nothing so I restarted the resque-scheduler
process in Heroku. Still no chnage however the /resque/workers page shows
the correct number of resque workers (that number matching the number of
heroku worker processes).
-about 12 hours later I noticed there appeared more workers in the
/resque/workers page. WTF?
-some jobs do get processed, and I see these in the completed totals for
each worker (i.e., all the workers show non-zero num_processed values).
-I do see the logs showing sometimes the workers get run.

Is there an obvious issue here?
Can someone describe the keys/entries structure in Redis that are relavent
to resque-scheduler communicating with the resque workers?
I noted resque is no longer compatable with ruby 1.9.2 but I dont see that
as a probable cause since the problem is intermittent.
And I realize we use v. 1 of resque not v. 2.

Errors and Logging info:
------------------------

For every job scheduled/queued by resque-scheduler I do see a log entry
showing its doing it and doing it at the correct time, for example:

2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
22:50:00 +0000: queueing PageSync (PageSync)

Much of the time I never see the worker run. Sometimes I do (again, from
the logs):

2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG
CitrixWorker.update_webinars: starting at 2014-12-08 22:50:00 +0000

Example error in /resque/failed:
Exception
    NameError
Error
    uninitialized constant WorkerProcess
=> indicating Rails was never loaded (or is no longer loaded) in the resque
worker process.

Environment and Versions:
-------------------------
Ruby 1.9.2
Rails 3.1.11

Gems:
redis-3.0.6
resque-1.25.1
resque-scheduler-2.3.1

Procfile (used by Heroku):
web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
./config/unicorn.rb
worker: bundle exec rake jobs:work
scheduler: bundle exec rake resque:scheduler

Rake task for workers:
----------------------
resque.rake:

require 'resque/tasks'
require 'resque/server'
require 'resque_scheduler/tasks'
desc 'Called by the resque:work rake task to initialize each worker process
(I think)'
task "resque:setup" => :environment do
  ENV['QUEUE'] = '*' # watch one or all queues (priority of these queues is
in alphabetical order when * is specified)
  ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time (polling
frequency) 5 gives same erroneous results as 1 second
  puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']}
Resque queues".info
end

desc "Alias for resque:work (To run workers on Heroku using standard worker
syntax, which is 'rake jobs:work')"
task "jobs:work" => "resque:work"
==============================

--
Mike Papper

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-18 @ 13:28
Your code is somewhat mangled.  Can you re-post or post a gist?

What we do is to put environment in the Procfile like so:

worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
environment jobs:work QUEUE=whatever
scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
environment resque:scheduler

I'm also not clear from your code on how resque:setup is ever invoked as
part of jobs:work.

It might be easier to start with less customized abstractions and just get
things running by putting all the information you need in the Procfile
using the out-of-box tasks from resque.  Once that's working, then worry
about abstracting the invocations behind other rake tasks.

As to the info in resque web—it's entirely based on information in magic
redis keys and has no bearing on what is actually happening.  It assumes
that all worker processes are populating magic keys with magic information
so the web UI reflects reality. If anything goes wrong, it gets out of sync.

We monitor some of the magic keys for stale values and clear them so our
info is good.

There's no documentation about what these keys are, although the resque
code—as spaghetti as it is—is not that hard to slog through as there's not
much too it.

Dave


On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com> wrote:
>
> Hi,
>
> We run redis, resque and resque-scheduler on Heroku. Many of the scheduled
> jobs do not get run. The /resque/failed web page shows the error indicating
> that rails was never loaded in the worker process. Many of the errors do
> not make it into the /resque/failed page listing.
>
> I also see in /resque/workers more workers than I have running in Heroku.
> I cannot explian this. I kill the workers from the the rails console,
> restart the Heroku workers and it all looks good but eventually more
> workers appear in the /resque/workers page.
>
> I suspect that the mechanism that resque-scheduler uses to communicate
> with waiting and ready worker processes is broken in some way.
>
> More details:
> -I noticed there were 6 workers displayed in the /resque/workers web page
> and so I used the Rails console to kill all the workers. Then I restarted
> the workers in Heroku. That did n othing so I restarted the
> resque-scheduler process in Heroku. Still no chnage however the
> /resque/workers page shows the correct number of resque workers (that
> number matching the number of heroku worker processes).
> -about 12 hours later I noticed there appeared more workers in the
> /resque/workers page. WTF?
> -some jobs do get processed, and I see these in the completed totals for
> each worker (i.e., all the workers show non-zero num_processed values).
> -I do see the logs showing sometimes the workers get run.
>
> Is there an obvious issue here?
> Can someone describe the keys/entries structure in Redis that are relavent
> to resque-scheduler communicating with the resque workers?
> I noted resque is no longer compatable with ruby 1.9.2 but I dont see that
> as a probable cause since the problem is intermittent.
> And I realize we use v. 1 of resque not v. 2.
>
> Errors and Logging info:
> ------------------------
>
> For every job scheduled/queued by resque-scheduler I do see a log entry
> showing its doing it and doing it at the correct time, for example:
>
> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
> 22:50:00 +0000: queueing PageSync (PageSync)
>
> Much of the time I never see the worker run. Sometimes I do (again, from
> the logs):
>
> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG CitrixWorker.
> update_webinars: starting at 2014-12-08 22:50:00 +0000
>
> Example error in /resque/failed:
> Exception
>     NameError
> Error
>     uninitialized constant WorkerProcess
> => indicating Rails was never loaded (or is no longer loaded) in the
> resque worker process.
>
> Environment and Versions:
> -------------------------
> Ruby 1.9.2
> Rails 3.1.11
>
> Gems:
> redis-3.0.6
> resque-1.25.1
> resque-scheduler-2. 3.1
>
> Procfile (used by Heroku):
> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
> ./config/unicorn.rb
> worker: bundle exec rake jobs:work
> scheduler: bundle exec rake resque:scheduler
>
> Rake task for workers:
> ----------------------
> resque.rake:
>
> require 'resque/tasks'
> require 'resque/server'
> require 'resque_scheduler/tasks'
> desc 'Called by the resque:work rake task to initialize each worker
> process (I think)'
> ta sk "resque:setup" => :environment do
>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these queues
> is in alphabetical order when * is specified)
>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time (polling
> frequency) 5 gives same erroneous results as 1 second
>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']}
> Resque queues".info
> end
>
> desc "Alias for resque:work (To run workers on Heroku using standard
> worker syntax, which is 'rake jobs:work')"
> task "jobs:work" => & quot;resque:work"
> ==============================
>
> --
> Mike Papper
>


-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-19 @ 01:38
Hi, I appreciate your reply and taking some time for this issue...

Here is the unmangled code for our rake task:

require 'resque/tasks'
require 'resque/server'
require 'resque_scheduler/tasks'

task "resque:setup" => :environment do
  ENV['QUEUE'] = '*'
  ENV['INTERVAL'] = '1.0'
  puts "INFO: waiting on ALL Resque queues"
end
task "jobs:work" => "resque:work"

Invoked via Heroku Procfile line:

worker: bundle exec rake jobs:work

------------
My understanding was that some magic in resque:work calls resque:setup that
I defined. And I do see some workers.

Nonetheless, I updated my Procfile and removed my resque:setup rake task,
heres the Procfile (very close to what you suggested):

worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9 bundle exec rake environment
resque:work QUEUE=*
scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle exec rake
environment resque:scheduler

Now I see the same results, there are 4 workers in resque, most of the jobs
are NOT processed but ONE did get processed. And the scheduler is
enqueueing jobs.

Incidentally, I stopped the worker heroku process and 2 of the resque
workers went away. When the heroku workers came back up, I saw a total of 4
resque workers (2 of which were the same as those magic ones I saw before I
killed heroku and that stayed around after I killed heroku).

So, I killed the 2 resque workers (the two that shouldnt be there)
using unregister_worker. I see 2 workers in /resque (the web page) and
Resque.workers. I see jobs scheduled but nothing is getting run. And
incidentally I dont see any failed jobs in /resque either.

Im open to any suggestions.

Mike

--
Mike


On Thu, Dec 18, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Your code is somewhat mangled.  Can you re-post or post a gist?
>
> What we do is to put environment in the Procfile like so:
>
> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
> environment jobs:work QUEUE=whatever
> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
> environment resque:scheduler
>
> I'm also not clear from your code on how resque:setup is ever invoked as
> part of jobs:work.
>
> It might be easier to start with less customized abstractions and just get
> things running by putting all the information you need in the Procfile
> using the out-of-box tasks from resque.  Once that's working, then wor ry
> about abstracting the invocations behind other rake tasks.
>
> As to the info in resque web—it's entirely based on information in magic
> redis keys and has no bearing on what is actually happening.  It assumes
> that all worker processes are populating magic keys with magic information
> so the web UI reflects reality. If anything goes wrong, it gets out of sync.
>
> We monitor some of the magic keys for stale values and clear them so our
> info is good.
>
> There's no documentation about what these keys are, although the resque
> code—as spaghetti as it is—is not that hard to slog through as there's not
> much too it.
>
> Dave
>
>
> On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com>
> wrote:
>>
>> Hi,
>>
>> We run redis, resque and resque-scheduler on Heroku. Many of the
>> scheduled jobs do not get run. The /resque/failed web page shows the error
>> indicating that rails was never loaded in the worker process. Many of the
>> errors do not make it into the /resque/failed page listing.
>>
>> I also see in /resque/workers more workers than I have running in Heroku.
>> I cannot explian this. I kill the workers from the the rails console,
>> restart the Heroku workers and it all looks good but eventually more
>> workers appear in the /resque/workers page.
>>
>> I suspect that the mechanism that res que-scheduler uses to communicate
>> with waiting and ready worker processes is broken in some way.
>>
>> More details:
>> -I noticed there were 6 workers displayed in the /resque/workers web page
>> and so I used the Rails console to kill all the workers. Then I restarted
>> the workers in Heroku. That did n othing so I restarted the
>> resque-scheduler process in Heroku. Still no chnage however the
>> /resque/workers page shows the correct number of resque workers (that
>> number matching the number of heroku worker processes).
>> -about 12 hours later I noticed there appeared more workers in the
>> /resque/workers page. WTF?
>> -some jobs do get processed, and I see these in the completed totals for
>> each worker (i.e., all the workers show non-zero num_processed values).
>> -I do see the logs showing sometimes the workers get run.
>>
>> Is there an obvious issue here?
>> Can someone describe the keys/entries structure in Redis that are
>> relavent to resque-scheduler communicating with the resque workers?
>> I noted resque is no longer compatable with ruby 1.9.2 but I dont see
>> that as a probable cause since the problem is intermittent.
>> And I realize we use v. 1 of resque not v. 2.
>>
>> Errors and Logging info:
>> ------------------------
>>
>> For every job scheduled/queued by resque-scheduler I do see a log entry
>> showing its doing it and doing it at the correct time, for example:
>>
>> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
>> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
>> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
>> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
>> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
>> 22:50:00 +0000: queueing PageSync (PageSync)
>>
>> Much of the time I never see the worker run. Sometimes I do (again, from
>> the logs):
>>
>> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG CitrixWorker.
>> update_webinars: starting at 2014-12-08 22:50:00 +0000
>>
>> Example error in /resque/failed:
>> Exception
>>     NameError
>> Error
>>     uninitialized constant WorkerProcess
>> => indicating Rails was never loaded (or is no longer loaded) in the
>> resque worker process.
>>
>> Environment and Versions:
>> -------------------------
>> Ruby 1.9.2
>> Rails 3.1.11
>>
>> Gems:
>> redis-3.0.6
>> resque-1.25.1
>> resque-scheduler-2. 3.1
>>
>> Procfile (used by Heroku):
>> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
>> ./config/unicorn.rb
>> worker: bundle exec rake jobs:work
>> scheduler: bundle exec rake resque:scheduler
>>
>> Rake task for workers:
>> ----------------------
>> resque.rake:
>>
>> require 'resque/tasks'
>> require 'resque/server'
>> require 'resque_scheduler/tasks'
>> desc 'Called by the resque:work rake task to initialize each worker
>> process (I think)'
>> ta sk "resque:setup" => :environment do
>>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these queues
>> is in alphabetical order when * is specified)
>>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time
>> (polling frequency) 5 gives same erroneous results as 1 second
>>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']}
>> Resque queues".info
>> end
>>
>> desc "Alias for resque:work (To run workers on Heroku using standard
>> worker syntax, which is 'rake jobs:work')"
>> task "jobs:work" => & quot;resque:work"
>> ==============================
>>
>> --
>> Mike Papper
>>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-19 @ 13:28
Is your concern that your jobs can't start up Rails or that you are seeing
non-existent workers in resque-web?

If you aren't seeing jobs get picked up, you can set VVERBOSE=1 in your
Procfile for each worker.  It will generate a ton of output to the logs as
to what's going on.  That's usually helped me understand what's going on.

Dave


On Thu, Dec 18, 2014 at 8:38 PM, Mike Papper <mpapper@salesforce.com> wrote:
>
> Hi, I appreciate your reply and taking some time for this issue...
>
> Here is the unmangled code for our rake task:
>
> require 'resque/tasks'
> require 'resque/server'
> require 'resque_scheduler/tasks'
>
> task "resque:setup" => :environment do
>   ENV['QUEUE'] = '*'
>   ENV['INTERVAL'] = '1.0'
>   puts "INFO: waiting on ALL Resque queues"
> end
> task "jobs:work" => "resque:work"
>
> Invoked via Heroku Procfile line:
>
> worker: bundle exec rake jobs:work
>
> ------------
> My understanding was that some magic in resque:work calls resque:setup
> that I defined. And I do see some workers.
>
> Nonetheless, I updated my Procfile and removed my resque:setu p rake task,
> heres the Procfile (very close to what you suggested):
>
> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9 bundle exec rake
> environment resque:work QUEUE=*
> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle exec rake
> environment resque:scheduler
>
> Now I see the same results, there are 4 workers in resque, most of the
> jobs are NOT processed but ONE did get processed. And the scheduler is
> enqueueing jobs.
>
> Incidentally, I stopped the worker heroku process and 2 of the resque
> workers went away. When the heroku workers came back up, I saw a total of 4
> resque workers (2 of which were the same as those magic ones I saw before I
> killed heroku and that stayed around after I killed heroku).
>
> So, I killed the 2 resque workers (the two that shouldnt be there)
> using unregister_worker. I see 2 workers in /resque (the web page) and
> Resque.workers. I see jobs schedu led but nothing is getting run. And
> incidentally I dont see any failed jobs in /resque either.
>
> Im open to any suggestions.
>
> Mike
>
> --
> Mike
>
>
> On Thu, Dec 18, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com> wrote:
>>
>> Your code is somewhat mangled.  Can you re-post or post a gist?
>>
>> What we do is to put environment in the Procfile like so:
>>
>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
>> environment jobs:work QUEUE=whatever
>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
>> environment resque:scheduler
>>
>> I'm also not clear from your code on how resque:setup is ever invoked as
>> part of jobs:work.
>>
>> It might be easier to start with less customized abstractions and just
>> get things running by putting all the information you need in the
>> Procfile using the out-of-box tasks from resque.  Once that's working,
>> then wor ry about abstracting the invocations behind other rake tasks.
>>
>> As to the info in resque web—it's entirely based on information in magic
>> redis keys and has no bearing on what is actually happening.  It assumes
>> that all worker processes are populating magic keys with magic information
>> so the web UI reflects reality. If anything goes wrong, it gets out of sync.
>>
>> We monitor some of the magic keys for stale values and clear them so our
>> info is good.
>>
>> There's no documentation about what these keys are, although the resque
>> code—as spaghetti as it is—is not that hard to slog through as there's not
>> much too it.
>>
>> Dave
>>
>>
>> On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com>
>> wrote:
>>>
>>> Hi,
>>>
>>> We run redis, resque and resque-scheduler on Heroku. Many of the
>>> scheduled jobs do not get run. The /resque/failed web page shows the error
>>> indicating that rails was never loaded in the worker process. Many of the
>>> errors do not make it into the /resque/failed page listing.
>>>
>>> I also see in /resque/workers more workers than I have running in
>>> Heroku. I cannot explian this. I kill the workers from the the rails
>>> console, restart the Heroku workers and it all looks good but eventually
>>> more workers appear in the /resque/workers page.
>>>
>>> I suspect that the mechanism that res que-scheduler uses to communicate
>>> with waiting and ready worker processes is broken in some way.
>>>
>>> More details:
>>> -I noticed there were 6 workers displayed in the /resque/workers web
>>> page and so I used the Rails console to kill all the workers. Then I
>>> restarted the workers in Heroku. That did n othing so I restarted the
>>> resque-scheduler process in Heroku. Still no chnage however the
>>> /resque/workers page shows the correct number of resque workers (that
>>> number matching the number of heroku worker processes).
>>> -about 12 hours later I noticed there appeared more workers in the
>>> /resque/workers page. WTF?
>>> -some jobs do get processed, and I see these in the completed totals for
>>> each worker (i.e., all the workers show non-zero num_processed values).
>>> -I do see the logs showing sometimes the workers get run.
>>>
>>> Is there an obvious issue here?
>>> Can someone describe the keys/entries structure in Redis that are
>>> relavent to resque-scheduler communicating with the resque workers?
>>> I noted resque is no longer compatable with ruby 1.9.2 but I dont see
>>> that as a probable cause since the problem is intermittent.
>>> And I realize we use v. 1 of resque not v. 2.
>>>
>>> Errors and Logging info:
>>> ------------------------
>>>
>>> For every job scheduled/queued by resque-scheduler I do see a log entry
>>> showing its doing it and doing it at the correct time, for example:
>>>
>>> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
>>> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
>>> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>> 22:50:00 +0000: queueing PageSync (PageSync)
>>>
>>> Much of the time I never see the worker run. Sometimes I do (again, from
>>> the logs):
>>>
>>> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG CitrixWorker.
>>> update_webinars: starting at 2014-12-08 22:50:00 +0000
>>>
>>> Example error in /resque/failed:
>>> Exception
>>>     NameError
>>> Error
>>>     uninitialized constant WorkerProcess
>>> => indicating Rails was never loaded (or is no longer loaded) in the
>>> resque worker process.
>>>
>>> Environment and Versions:
>>> -------------------------
>>> Ruby 1.9.2
>>> Rails 3.1.11
>>>
>>> Gems:
>>> redis-3.0.6
>>> resque-1.25.1
>>> resque-scheduler-2. 3.1
>>>
>>> Procfile (used by Heroku):
>>> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
>>> ./config/unicorn.rb
>>> worker: bundle exec rake jobs:work
>>> scheduler: bundle exec rake resque:scheduler
>>>
>>> Rake task for workers:
>>> ----------------------
>>> resque.rake:
>>>
>>> require 'resque/tasks'
>>> require 'resque/server'
>>> require 'resque_scheduler/tasks'
>>> desc 'Called by the resque:work rake task to initialize each worker
>>> process (I think)'
>>> ta sk "resque:setup" => :environment do
>>>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these queues
>>> is in alphabetical order when * is specified)
>>>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time
>>> (polling frequency) 5 gives same erroneous results as 1 second
>>>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']}
>>> Resque queues".info
>>> end
>>>
>>> desc "Alias for resque:work (To run workers on Heroku using standard
>>> worker syntax, which is 'rake jobs:work')"
>>> task "jobs:work" => & quot;resque:work"
>>> ==============================
>>>
>>> --
>>> Mike Papper
>>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>

-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-19 @ 19:41
My concern is that the worker jobs as-scheduled by resque are not running.
The worker processes are running. But the code in the "perform" method is
not.

I suspect that since there are these extra workers the defunct ones are
getting the scheduled jobs and just dropping them.

Mike

On Fri, Dec 19, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Is your concern that your jobs can't start up Rails or that you are seeing
> non-existent workers in resque-web?
>
> If you aren't seeing jobs get picked up, you can set VVERBOSE=1 in your
> Procfile for each worker.  It will generate a ton of output to the logs as
> to what's going on.  That's usually helped me understand what's going on.
>
> Dave
>
>
> On Thu, Dec 18, 2014 at 8:38 PM, Mike Papper <mp apper@salesforce.com
> <mpapper@salesforce.com>> wrote:
>>
>> Hi, I appreciate your reply and taking some time for this issue...
>>
>> Here is the unmangled code for our rake task:
>>
>> require 'resque/tasks'
>> require 'resque/server'
>> require 'resque_scheduler/tasks'
>>
>> task "resque:setup" => :environment do
>>   ENV['QUEUE'] = '*'
>>   ENV['INTERVAL'] = '1.0'
>>   puts "INFO: waiting on ALL Resque queues"
>> end
>> task "jobs:work" => "resque:work"
>>
>> Invoked via Heroku Procfile line:
>>
>> worker: bundle exec rake jobs:work
>>
>> ------------
>> My understa nding was that some magic in resque:work calls resque:setup
>> that I defined. And I do see some workers.
>>
>> Nonetheless, I updated my Procfile and removed my resque:setu p rake
>> task, heres the Procfile (very close to what you suggested):
>>
>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9 bundle exec rake
>> environment resque:work QUEUE=*
>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle exec rake
>> environment resque:scheduler
>>
>> Now I see the same results, there are 4 workers in resque, most of the
>> jobs are NOT processed but ONE did get processed. And the scheduler is
>> enqueueing jobs.
>>
>> Incidentally, I stopped the worker heroku process and 2 of the resque
>> workers went away. When the heroku workers came back up, I saw a total of 4
>> resque workers (2 of which were the same as those magic ones I saw before I
>> killed heroku and that stayed around after I killed heroku).
>>
>> So, I killed the 2 resque workers (the two that shouldnt be there)
>> using unregister_worker. I see 2 workers in /resque (the web page) and
>> Resque.workers. I see jobs schedu led but nothing is getting run. And
>> incidentally I dont see any failed jobs in /resque either.
>>
>> Im open to any suggestions.
>>
>> Mike
>>
>> --
>> Mike
>>
>>
>> On Thu, Dec 18, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com>
>> wrote:
>>>
>>> Your code is somewhat mangled.  Can you re-post or post a gist?
>>>
>>> What we do is to put environment in the Procfile like so:
>>>
>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
>>> environment jobs:work QUEUE=whatever
>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
>>> environment resque:scheduler
>>>
>>> I'm also not clear from your code on how resque:setup is ever invoked
>>> as part of jobs:work.
>>>
>>> It might be easier to start with less customized abstractions and just
>>> get things running by putting all the information you need in the
>>> Procfile using the out-of-box tasks from resque.  Once that's working,
>>> t hen wor ry about abstracting the invocations behind other rake tasks.
>>>
>>> As to the info in resque web—it's entirely based on information in magic
>>> redis keys and has no bearing on what is actually happening.  It assumes
>>> that all worker processes are populating magic keys with magic information
>>> so the web UI reflects reality. If anything goes wrong, it gets out of sync.
>>>
>>> We monitor some of the magic keys for stale values and clear them so our
>>> info is good.
>>>
>>> There's no documentation about what these keys are, although the resque
>>> code —as spaghetti as it is—is not that hard to slog through as there's not
>>> much too it.
>>>
>>> Dave
>>>
>>>
>>> On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com>
>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We run redis, resque and resque-scheduler on Heroku. Many of the
>>>> scheduled jobs do not get run. The /resque/failed web page shows the error
>>>> indicating that rails was never loaded in the worker process. Many of the
>>>> errors do not make it into the /resque/failed page listing.
>>>>
>>>> I also see in /resque/workers more workers than I have running in
>>>> Heroku. I cannot explian this. I kill the workers from the the rails
>>>> console, restart the Heroku workers and it all looks good but eventually
>>>> more workers appear in the /resque/workers page.
>>>>
>>>> I suspect that the mechanism that res que-scheduler uses to communicate
>>>> with waiting and ready worker processes is broken in some way.
>>>>
>>>> More details:
>>>> -I noticed there were 6 workers displayed in the /resque/workers web
>>>> page and so I used the Rails console to kill all the workers. Then I
>>>> restarted the workers in Heroku. That did n othing so I restarted the
>>>> resque-scheduler process in Heroku. Still no chnage however the
>>>> /resque/workers page shows the correct number of resque workers (that
>>>> number matching the number of heroku worker processes).
>>>> -about 12 hours later I noticed there appeared more workers in the
>>>> /resque/workers page. WTF?
>>>> -some jobs do get processed, and I see these in the completed totals
>>>> for each worker (i.e., all the workers show non-zero num_processed values).
>>>> -I do see the logs showing sometimes the workers get run.
>>>>
>>>> Is there an obvious issue here?
>>>> Can someone describe the keys/entries structure in Redis that are
>>>> relavent to resque-scheduler communicating with the resque workers?
>>>> I noted resque is no longer compatable with ruby 1.9.2 but I dont see
>>>> that as a probable cause since the problem is intermittent.
>>>> And I realize we use v. 1 of resque not v. 2.
>>>>
>>>> Errors and Logging info:
>>>> ------------------------
>>>>
>>>> For every job scheduled/queued by resque-scheduler I do see a log entry
>>>> showing its doing it and doing it at the correct time, for example:
>>>>
>>>> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
>>>> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
>>>> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>> 22:50:00 +0000: queueing PageSync (PageSync)
>>>>
>>>> Much of the time I never see the worker run. Sometimes I do (again,
>>>> from the logs):
>>>>
>>>> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG CitrixWorker.
>>>> update_webinars: starting at 2014-12-08 22:50:00 +0000
>>>>
>>>> Example error in /resque/failed:
>>>> Exception
>>>>     NameError
>>>> Error
>>>>     uninitialized constant WorkerProcess
>>>> => indicating Rails was never loaded (or is no longer loaded) in the
>>>> resque worker process.
>>>>
>>>> Environment and Versions:
>>>> -------------------------
>>>> Ruby 1.9.2
>>>> Rails 3.1.11
>>>>
>>>> Gems:
>>>> redis-3.0.6
>>>> resque-1.25.1
>>>> resque-scheduler-2. 3.1
>>>>
>>>> Procfile (used by Heroku):
>>>> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
>>>> ./config/unicorn.rb
>>>> worker: bundle exec rake jobs:work
>>>> scheduler: bundle exec rake resque:scheduler
>>>>
>>>> Rake task for workers:
>>>> ----------------------
>>>> resque.rake:
>>>>
>>>> require 'resque/tasks'
>>>> require 'resque/server'
>>>> require 'resque_scheduler/tasks'
>>>> desc 'Called by the resque:work rake task to initialize each worker
>>>> process (I think)'
>>>> ta sk "resque:setup" => :environment do
>>>>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these
>>>> queues is in alphabetical order when * is specified)
>>>>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time
>>>> (polling frequency) 5 gives same erroneous results as 1 second
>>>>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']}
>>>> Resque queues".info
>>>> end
>>>>
>>>> desc "Alias for resque:work (To run workers on Heroku using standard
>>>> worker syntax, which is 'rake jobs:work')"
>>>> task "jobs:work" => & quot;resque:work"
>>>> ==============================
>>>>
>>>> --
>>>> Mike Papper
>>>>
>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-19 @ 21:35
I have never seen resque workers consume and drop payloads, nor have I seen
workers running that weren't attached to a dyno Heroku reported on, so it
would seem odd if you have two dynos running workers that some other dynos
you can't see are still running other workers.  More likely something
bombed and left bad information in Redis.  I HAVE seen resque-scheduler
randomly not queue jobs according to the schedule, so perhaps that's it.
Definitely try VVERBOSE on your workers.

Dave

On Fri, Dec 19, 2014 at 2:41 PM, Mike Papper <mpapper@salesforce.com> wrote:
>
> My concern is that the worker jobs as-scheduled by resque are not running.
> The worker processes are running. But the code in the "perform" method is
> not.
>
> I suspect that since there are these extra workers the defunct ones are
> getting the scheduled jobs and just dropping them.
>
> Mike
>
> On Fri, Dec 19, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com> wrote:
>>
>> Is your concern that your jobs can't start up Rails or that you are
>> seeing non-existent workers in resque-web?
>>
>> If you aren't seeing jobs get picked up, you can set VVERBOSE=1 in your
>> Procfile for each worker.  It will generate a ton of output to the logs as
>> to what's going on.  That's usually helped me understand what's going on.
>>
>> Dave
>>
>>
>> On Thu, Dec 18, 2014 at 8:38 PM, Mike Papper <mp apper@salesforce.com
>> <mpapper@salesforce.com>> wrote:
>>>
>>> Hi, I appreciate your reply and taking some time for this issue...
>>>
>>> Here is the unmangled code for our rake task:
>>>
>>> require 'resque/tasks'
>>> require 'resque/server'
>>> require 'resque_scheduler/tasks'
>>>
>>> task "resque:setup" => :environment do
>>>   ENV['QUEUE'] = '*'
>>>   ENV['INTERVAL'] = '1.0'
>>>   puts "INFO: waiting on ALL Resque queues"
>>> end
>>> task "jobs:work" => "resque:work"
>>>
>>> Invoked via Heroku Procfile line:
>>>
>>> worker: bundle exec rake jobs:work
>>>
>>> ------------
>>> My understa nding was that some magic in resque:work calls resque:setup
>>> that I defined. And I do see some workers.
>>>
>>> Nonetheless, I updated my Procfile and removed my resque:setu p rake
>>> task, heres the Procfile (very close to what you suggested):
>>>
>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9 bundle exec rake
>>> environment resque:work QUEUE=*
>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle exec rake
>>> environment resque:scheduler
>>>
>>> Now I see the same results, there are 4 workers in resque, most of the
>>> jobs are NOT processed but ONE did get processed. And the scheduler is
>>> enqueueing jobs.
>>>
>>> Incidentally, I stopped the worker heroku process and 2 of the resque
>>> workers went away. When the heroku workers came back up, I saw a total of 4
>>> resque workers (2 of which were the same as those magic ones I saw before I
>>> killed heroku and that stayed around after I killed heroku).
>>>
>>> So, I killed the 2 resque workers (the two that shouldnt be there)
>>> using unregister_worker. I see 2 workers in /resque (the web page) and
>>> Resque.workers. I see jobs schedu led but nothing is getting run. And
>>> incidentally I dont see any failed jobs in /resque either.
>>>
>>> Im open to any suggestions.
>>>
>>> Mike
>>>
>>> --
>>> Mike
>>>
>>>
>>> On Thu, Dec 18, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com>
>>> wrote:
>>>>
>>>> Your code is somewhat mangled.  Can you re-post or post a gist?
>>>>
>>>> What we do is to put environment in the Procfile like so:
>>>>
>>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
>>>> environment jobs:work QUEUE=whatever
>>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
>>>> environment resque:scheduler
>>>>
>>>> I'm also not clear from your code on how resque:setup is ever invoked
>>>> as part of jobs:work.
>>>>
>>>> It might be easier to start with less customized abstractions and just
>>>> get things running by putting all the information you need in the
>>>> Procfile using the out-of-box tasks from resque.  Once that's working,
>>>> t hen wor ry about abstracting the invocations behind other rake tasks.
>>>>
>>>> As to the info in resque web—it's entirely based on information in
>>>> magic redis keys and has no bearing on what is actually happening.  It
>>>> assumes that all worker processes are populating magic keys with magic
>>>> information so the web UI reflects reality. If anything goes wrong, it gets
>>>> out of sync.
>>>>
>>>> We monitor some of the magic keys for stale values and clear them so
>>>> our info is good.
>>>>
>>>> There's no documentation about what these keys are, although the resque
>>>> code —as spaghetti as it is—is not that hard to slog through as there's not
>>>> much too it.
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> We run redis, resque and resque-scheduler on Heroku. Many of the
>>>>> scheduled jobs do not get run. The /resque/failed web page shows the error
>>>>> indicating that rails was never loaded in the worker process. Many of the
>>>>> errors do not make it into the /resque/failed page listing.
>>>>>
>>>>> I also see in /resque/workers more workers than I have running in
>>>>> Heroku. I cannot explian this. I kill the workers from the the rails
>>>>> console, restart the Heroku workers and it all looks good but eventually
>>>>> more workers appear in the /resque/workers page.
>>>>>
>>>>> I suspect that the mechanism that res que-scheduler uses to
>>>>> communicate with waiting and ready worker processes is broken in some way.
>>>>>
>>>>> More details:
>>>>> -I noticed there were 6 workers displayed in the /resque/workers web
>>>>> page and so I used the Rails console to kill all the workers. Then I
>>>>> restarted the workers in Heroku. That did n othing so I restarted the
>>>>> resque-scheduler process in Heroku. Still no chnage however the
>>>>> /resque/workers page shows the correct number of resque workers (that
>>>>> number matching the number of heroku worker processes).
>>>>> -about 12 hours later I noticed there appeared more workers in the
>>>>> /resque/workers page. WTF?
>>>>> -some jobs do get processed, and I see these in the completed totals
>>>>> for each worker (i.e., all the workers show non-zero num_processed values).
>>>>> -I do see the logs showing sometimes the workers get run.
>>>>>
>>>>> Is there an obvious issue here?
>>>>> Can someone describe the keys/entries structure in Redis that are
>>>>> relavent to resque-scheduler communicating with the resque workers?
>>>>> I noted resque is no longer compatable with ruby 1.9.2 but I dont see
>>>>> that as a probable cause since the problem is intermittent.
>>>>> And I realize we use v. 1 of resque not v. 2.
>>>>>
>>>>> Errors and Logging info:
>>>>> ------------------------
>>>>>
>>>>> For every job scheduled/queued by resque-scheduler I do see a log
>>>>> entry showing its doing it and doing it at the correct time, for example:
>>>>>
>>>>> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
>>>>> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
>>>>> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>> 22:50:00 +0000: queueing PageSync (PageSync)
>>>>>
>>>>> Much of the time I never see the worker run. Sometimes I do (again,
>>>>> from the logs):
>>>>>
>>>>> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG
>>>>> CitrixWorker. update_webinars: starting at 2014-12-08 22:50:00 +0000
>>>>>
>>>>> Example error in /resque/failed:
>>>>> Exception
>>>>>     NameError
>>>>> Error
>>>>>     uninitialized constant WorkerProcess
>>>>> => indicating Rails was never loaded (or is no longer loaded) in the
>>>>> resque worker process.
>>>>>
>>>>> Environment and Versions:
>>>>> -------------------------
>>>>> Ruby 1.9.2
>>>>> Rails 3.1.11
>>>>>
>>>>> Gems:
>>>>> redis-3.0.6
>>>>> resque-1.25.1
>>>>> resque-scheduler-2. 3.1
>>>>>
>>>>> Procfile (used by Heroku):
>>>>> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
>>>>> ./config/unicorn.rb
>>>>> worker: bundle exec rake jobs:work
>>>>> scheduler: bundle exec rake resque:scheduler
>>>>>
>>>>> Rake task for workers:
>>>>> ----------------------
>>>>> resque.rake:
>>>>>
>>>>> require 'resque/tasks'
>>>>> require 'resque/server'
>>>>> require 'resque_scheduler/tasks'
>>>>> desc 'Called by the resque:work rake task to initialize each worker
>>>>> process (I think)'
>>>>> ta sk "resque:setup" => :environment do
>>>>>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these
>>>>> queues is in alphabetical order when * is specified)
>>>>>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time
>>>>> (polling frequency) 5 gives same erroneous results as 1 second
>>>>>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' : ENV['QUEUE']}
>>>>> Resque queues".info
>>>>> end
>>>>>
>>>>> desc "Alias for resque:work (To run workers on Heroku using standard
>>>>> worker syntax, which is 'rake jobs:work')"
>>>>> task "jobs:work" => & quot;resque:work"
>>>>> ==============================
>>>>>
>>>>> --
>>>>> Mike Papper
>>>>>
>>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>


-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-19 @ 23:59
I added the flags, heres some log file output:

2014-12-19T22:55:57.600456+00:00 app[worker.2]: ** [22:55:57 2014-12-19] 2:
Checking worker_queue
2014-12-19T22:55:57.604607+00:00 app[worker.2]: ** [22:55:57 2014-12-19] 2:
resque-1.25.1: Waiting for *
2014-12-19T22:55:58.661385+00:00 app[worker.1]: ** [22:55:58 2014-12-19] 2:
Sleeping for 5.0 seconds
2014-12-19T22:55:58.661466+00:00 app[worker.1]: ** [22:55:58 2014-12-19] 2:
resque-1.25.1: Waiting for *
2014-12-19T22:55:58.658081+00:00 app[worker.1]: ** [22:55:58 2014-12-19] 2:
Checking worker_queue

2014-12-19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
22:56:00 +0000: queueing WorkerProcess (WorkerProcess)

2014-12-19T22:56:02.643542+00:00 app[worker.2]: ** [22:56:02 2014-12-19] 2:
Sleeping for 5.0 seconds
2014-12-19T22:56:02.643644+00:00 app[worker.2]: ** [22:56:02 2014-12-19] 2:
resque-1.25.1: Waiting for *
2014-12-19T22:56:02.639433+00:00 app[worker.2]: ** [22:56:02 2014-12-19] 2:
Checking worker_queue
2014-12-19T22:56:03.731002+00:00 app[worker.1]: ** [22:56:03 2014-12-19] 2:
Sleeping for 5.0 seconds
2014-12-19T22:56:03.728197+00:00 app[worker.1]: ** [22:56:03 2014-12-19] 2:
Checking worker_queue
2014-12-19T22:56:03.731210+00:00 app[worker.1]: ** [22:56:03 2014-12-19] 2:
resque-1.25.1: Waiting for *

And I see many of these errors in the /resque/failed web page:
-------error-----------
01bb9954-48e8-4a73-a280-da2b924019e1:2 on worker_queue at 14 minutes ago
Retry or Remove
Class
WorkerProcess
Arguments
Exception
NameError
Error
uninitialized constant WorkerProcess
--------end error-------------

Code for the WorkerProcess is this:

module WorkerProcess
  @queue = :worker_queue

  def self.perform()
    begin
      $log.info("WorkerProcess.perform: invoked at #{Time.now}")
    end
end

And config/resuqe_scheduler.yml is this:

WorkerProcess: # Test workers
  cron: "*/1 * * * *"
  args:
  description: "Test workers"


And Resque.workers.length = 5

===========================================
I ran the following 16 times in the rails console:

Resque.enqueue(WorkerProcess)

From the logs I see this sequence just 2 times:

2014-12-19T23:46:38.632590+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 2:
Found job on worker_queue
2014-12-19T23:46:38.632651+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 2:
got: (Job{worker_queue} | WorkerProcess | [])
2014-12-19T23:46:38.634576+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 2:
resque-1.25.1: Processing worker_queue since 1419032798 [WorkerProcess]
2014-12-19T23:46:38.634671+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 2:
Running before_fork hooks with [(Job{worker_queue} | WorkerProcess | [])]
2014-12-19T23:46:38.679192+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 2:
resque-1.25.1: Forked 74 at 1419032798
2014-12-19T23:46:38.699383+00:00 app[worker.2]: DFC-INFO
 WorkerProcess.perform: invoked at 2014-12-19 23:46:38 +0000
2014-12-19T23:46:38.706738+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
74: done: (Job{worker_queue} | WorkerProcess | [])
2014-12-19T23:46:38.630697+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 2:
Checking worker_queue
2014-12-19T23:46:38.689395+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
74: Running after_fork hooks with [(Job{worker_queue} | WorkerProcess | [])]

Seems to me that the other 14 failed without any logging etc.

--
Mike

On Fri, Dec 19, 2014 at 1:35 PM, Dave Copeland <dave@stitchfix.com> wrote:
>
> I have never seen resque workers consume and drop payloads, nor have I
> seen workers running that weren't attached to a dyno Heroku reported on, so
> it would seem odd if you have two dynos running workers that some other
> dynos you can't see are still running other workers.  More likely something
> bombed and left bad information in Redis.  I HAVE seen resque-scheduler
> randomly not queue jobs according to the schedule, so perhaps that's it.
> Definitely try VVERBOSE on your workers.
>
> Dave
>
> On Fri, Dec 19, 2014 at 2:41 PM, Mike Papper <mpapper@salesforce.com>
> wrote:
>>
>> My concern is that the worker jobs as-scheduled by resque are not
>> running. The worker processes are running. But the code in the "perform"
>> method is not.
>>
>> I suspect that since there are these extra workers the defunct ones are
>> getting the scheduled jobs and just dropping them.
>>
>> Mike
>>
>> On Fri, Dec 19, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com>
>> wrote:
>>>
>>> Is your concern that your jobs can't start up Rails or that you are
>>> seeing non-existent workers in resque-web?
>>>
>>> If you aren't seeing jobs get picked up, you can set VVERBOSE=1 in your
>>> Procfile for each worker.  It will generate a ton of output to the logs as
>>> to what's going on.  That's usually helped me understand what's going on.
>>>
>>> Dave
>>>
>>>
>>> On Thu, Dec 18, 2014 at 8:38 PM, Mike Papper <mp apper@salesforce.com
>>> <mpapper@salesforce.com>> wrote:
>>>>
>>>> Hi, I appreciate your reply and taking some time for this issue...
>>>>
>>>> Here is the unmangled code for our rake task:
>>>>
>>>> require 'resque/tasks'
>>>> require 'resque/server'
>>>> require 'resque_scheduler/tasks'
>>>>
>>>> task "resque:setup" => :environment do
>>>>   ENV['QUEUE'] = '*'
>>>>   ENV['INTERVAL'] = '1.0'
>>>>   puts "INFO: waiting on ALL Resque queues"
>>>> end
>>>> task "jobs:work" => "resque:work"
>>>>
>>>> Invoked via Heroku Procfile line:
>>>>
>>>> worker: bundle exec rake jobs:work
>>>>
>>>> ------------
>>>> My understa nding was that some magic in resque:work calls resque:setup
>>>> that I defined. And I do see some workers.
>>>>
>>>> Nonetheless, I updated my Procfile and removed my resque:setu p rake
>>>> task, heres the Procfile (very close to what you suggested):
>>>>
>>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9 bundle exec rake
>>>> environment resque:work QUEUE=*
>>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle exec rake
>>>> environment resque:scheduler
>>>>
>>>> Now I see the same results, there are 4 workers in resque, most of the
>>>> jobs are NOT processed but ONE did get processed. And the scheduler is
>>>> enqueueing jobs.
>>>>
>>>> Incidentally, I stopped the worker heroku process and 2 of the resque
>>>> workers went away. When the heroku workers came back up, I saw a total of 4
>>>> resque workers (2 of which were the same as those magic ones I saw before I
>>>> killed heroku and that stayed around after I killed heroku).
>>>>
>>>> So, I killed the 2 resque workers (the two that shouldnt be there)
>>>> using unregister_worker. I see 2 workers in /resque (the web page) and
>>>> Resque.workers. I see jobs schedu led but nothing is getting run. And
>>>> incidentally I dont see any failed jobs in /resque either.
>>>>
>>>> Im open to any suggestions.
>>>>
>>>> Mike
>>>>
>>>> --
>>>> Mike
>>>>
>>>>
>>>> On Thu, Dec 18, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com>
>>>> wrote:
>>>>>
>>>>> Your code is somewhat mangled.  Can you re-post or post a gist?
>>>>>
>>>>> What we do is to put environment in the Procfile like so:
>>>>>
>>>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
>>>>> environment jobs:work QUEUE=whatever
>>>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
>>>>> environment resque:scheduler
>>>>>
>>>>> I'm also not clear from your code on how resque:setup is ever invoked
>>>>> as part of jobs:work.
>>>>>
>>>>> It might be easier to start with less customized abstractions and just
>>>>> get things running by putting all the information you need in the
>>>>> Procfile using the out-of-box tasks from resque.  Once that's wor
>>>>> king, t hen wor ry about abstracting the invocations behind other rake
>>>>> tasks.
>>>>>
>>>>> As to the info in resque web—it's entirely based on information in
>>>>> magic redis keys and has no bearing on what is actually happening.  It
>>>>> assumes that all worker processes are populating magic keys with magic
>>>>> information so the web UI reflects reality. If anything goes wrong, it gets
>>>>> out of sync.
>>>>>
>>>>> We monitor some of the magic keys for stale values and clear them so
>>>>> our info is good.
>>>>>
>>>>> There's no documentation about what these keys are, although the
>>>>> resque code —as spaghetti as it is—is not that hard to slog through as
>>>>> there's not much too it.
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>> On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We run redis, resque and resque-scheduler on Heroku. Many of the
>>>>>> scheduled jobs do not get run. The /resque/failed web page shows the error
>>>>>> indicating that rails was never loaded in the worker process. Many of the
>>>>>> errors do not make it into the /resque/failed page listing.
>>>>>>
>>>>>> I also see in /resque/workers more workers than I have running in
>>>>>> Heroku. I cannot explian this. I kill the workers from the the rails
>>>>>> console, restart the Heroku workers and it all looks good but eventually
>>>>>> more workers appear in the /resque/workers page.
>>>>>>
>>>>>> I suspect that the mechanism that res que-scheduler uses to
>>>>>> communicate with waiting and ready worker processes is broken in some way.
>>>>>>
>>>>>> More details:
>>>>>> -I noticed there were 6 workers displayed in the /resque/workers web
>>>>>> page and so I used the Rails console to kill all the workers. Then I
>>>>>> restarted the workers in Heroku. That did n othing so I restarted the
>>>>>> resque-scheduler process in Heroku. Still no chnage however the
>>>>>> /resque/workers page shows the correct number of resque workers (that
>>>>>> number matching the number of heroku worker processes).
>>>>>> -about 12 hours later I noticed there appeared more workers in the
>>>>>> /resque/workers page. WTF?
>>>>>> -some jobs do get processed, and I see these in the completed totals
>>>>>> for each worker (i.e., all the workers show non-zero num_processed values).
>>>>>> -I do see the logs showing sometimes the workers get run.
>>>>>>
>>>>>> Is there an obvious issue here?
>>>>>> Can someone describe the keys/entries structure in Redis that are
>>>>>> relavent to resque-scheduler communicating with the resque workers?
>>>>>> I noted resque is no longer compatable with ruby 1.9.2 but I dont see
>>>>>> that as a probable cause since the problem is intermittent.
>>>>>> And I realize we use v. 1 of resque not v. 2.
>>>>>>
>>>>>> Errors and Logging info:
>>>>>> ------------------------
>>>>>>
>>>>>> For every job scheduled/queued by resque-scheduler I do see a log
>>>>>> entry showing its doing it and doing it at the correct time, for example:
>>>>>>
>>>>>> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>>> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
>>>>>> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>>> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
>>>>>> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>>> 22:50:00 +0000: queueing PageSync (PageSync)
>>>>>>
>>>>>> Much of the time I never see the worker run. Sometimes I do (again,
>>>>>> from the logs):
>>>>>>
>>>>>> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG
>>>>>> CitrixWorker. update_webinars: starting at 2014-12-08 22:50:00 +0000
>>>>>>
>>>>>> Example error in /resque/failed:
>>>>>> Exception
>>>>>>     NameError
>>>>>> Error
>>>>>>     uninitialized constant WorkerProcess
>>>>>> => indicating Rails was never loaded (or is no longer loaded) in the
>>>>>> resque worker process.
>>>>>>
>>>>>> Environment and Versions:
>>>>>> -------------------------
>>>>>> Ruby 1.9.2
>>>>>> Rails 3.1.11
>>>>>>
>>>>>> Gems:
>>>>>> redis-3.0.6
>>>>>> resque-1.25.1
>>>>>> resque-scheduler-2. 3.1
>>>>>>
>>>>>> Procfile (used by Heroku):
>>>>>> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
>>>>>> ./config/unicorn.rb
>>>>>> worker: bundle exec rake jobs:work
>>>>>> scheduler: bundle exec rake resque:scheduler
>>>>>>
>>>>>> Rake task for workers:
>>>>>> ----------------------
>>>>>> resque.rake:
>>>>>>
>>>>>> require 'resque/tasks'
>>>>>> require 'resque/server'
>>>>>> require 'resque_scheduler/tasks'
>>>>>> desc 'Called by the resque:work rake task to initialize each worker
>>>>>> process (I think)'
>>>>>> ta sk "resque:setup" => :environment do
>>>>>>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these
>>>>>> queues is in alphabetical order when * is specified)
>>>>>>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time
>>>>>> (polling frequency) 5 gives same erroneous results as 1 second
>>>>>>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' :
>>>>>> ENV['QUEUE']} Resque queues".info
>>>>>> end
>>>>>>
>>>>>> desc "Alias for resque:work (To run workers on Heroku using standard
>>>>>> worker syntax, which is 'rake jobs:work')"
>>>>>> task "jobs:work" => & quot;resque:work"
>>>>>> ==============================
>>>>>>
>>>>>> --
>>>>>> Mike Papper
>>>>>>
>>>>>
>>>>
>>>> --
>>>> *Dave Copeland, Director of Engineering*
>>>> Washington, DC (Eastern Time Zone)
>>>> dave@stitchfix.com
>>>>
>>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-20 @ 15:23
Is that code snippet correct?  It doesn't look it would work, as it's
missing an end.

Your two log snippets seem inconsistent to me.  IN the first case, no jobs
are processed, but somehow a bunch of jobs end up in the failed queue
because the resque worker cannot find your WorkerProcess class.  That would
be because Rails isn't loaded or your class isn't in a place where Rails
will autoload.

You've also used a module instead of a class.  Not sure if that matters,
but I've always used a class.

Finally, I'm not sure if you are expecting the jobs to be queued or the
scheduler to queue them.  Is the scheduler worker running?  What's the
procfile like?

Sorry I can't be more helpful, it's just hard to get a picture of what's
going on with small snippets of your code and configuration.

Since you are testing a lot of moving parts at once, it might be simplest
to take things slowly, step by step, starting with just manually queueing
jobs and seeing them in resque web, then have a worker process them, then
add in the scheduler, etc.

Dave

On Fri, Dec 19, 2014 at 6:59 PM, Mike Papper <mpapper@salesforce.com> wrote:

> I added the flags, heres some log file output:
>
> 2014-12-19T22:55:57.600456+00:00 app[worker.2]: ** [22:55:57 2014-12-19]
> 2: Checking worker_queue
> 2014-12-19T22:55:57.604607+00:00 app[worker.2]: ** [22:55:57 2014-12-19]
> 2: resque-1.25.1: Waiting for *
> 2014-12-19T22:55:58.661385+00:00 app[worker.1]: ** [22:55:58 2014-12-19]
> 2: Sleeping for 5.0 seconds
> 2014-12-19T22:55:58.661466+00:00 app[worker.1]: ** [22:55:58 2014-12-19]
> 2: resque-1.25.1: Waiting for *
> 2014-12-19T22:55:58.658081+00:00 app[worker.1]: ** [22:55:58 2014-12-19]
> 2: Checking worker_queue
>
> 2014-12-19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
> 22:56:00 +0000: queueing WorkerProcess (WorkerProcess)
>
> 2014-12-19T22:56:02.643542+00:00 app[worker.2]: ** [22:56:02 2014-12-19]
> 2: Sleeping for 5.0 seconds
> 2014-12-19T22:56:02.643644+00:00 app[worker.2]: ** [22:56:02 2014-12-19] 2
> : resque-1.25.1: Waiting for *
> 2014-12-19T22:56:02.639433+00:00 app[worker.2]: ** [22:56:02 2014-12-19]
> 2: Checking worker_queue
> 2014-12-19T22:56:03.731002+00:00 app[worker.1]: ** [22:56:03 2014-12-19]
> 2: Sleeping for 5.0 seconds
> 2014-12-19T22:56:03.728197+00:00 app[worker.1]: ** [22:56:03 2014-12-19]
> 2: Checking worker_queue
> 2014-12-19T22:56:03.731210+00:00 app[worker.1]: ** [22:56:03 2014-12-19]
> 2: resque-1.25.1: Waiting for *
>
> And I see many of these errors in the /resque/failed web page:
> -------error-----------
> 01bb9954-48e8-4a73-a280-da2b924019e1:2 on worker_queue at 14 minutes ago
> Retry or Remove
> Class
> WorkerProcess
> Arguments
> Exception
> NameError
> Error
> uninitialized constant WorkerProcess
> --------end error-------------
>
> Code for the WorkerProcess is this:
>
> mod ule WorkerProcess
>   @queue = :worker_queue
>
>   def self.perform()
>     begin
>       $log.info("WorkerProcess.perform: invoked at #{Time.now}")
>     end
> end
>
> And config/resuqe_scheduler.yml is this:
>
> WorkerProcess: # Test workers
>   cron: "*/1 * * * *"
>   args:
>   description: "Test workers"
>
>
> And Resque.workers.length = 5
>
> ===========================================
> I ran the following 16 times in the rails console:
>
> Resque.enqueue(WorkerProcess)
>
> From the logs I see this sequence just 2 times:
>
> 2014-12-19T23:46:38.632590+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 2: Found job on worker_queue
> 2014-12-19T23:46:38.632 651+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 2: got: (Job{worker_queue} | WorkerProcess | [])
> 2014-12-19T23:46:38.634576+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 2: resque-1.25.1: Processing worker_queue since 1419032798 [WorkerProcess]
> 2014-12-19T23:46:38.634671+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 2: Running before_fork hooks with [(Job{worker_queue} | WorkerProcess | [])]
> 2014-12-19T23:46:38.679192+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 2: resque-1.25.1: Forked 74 at 1419032798
> 2014-12-19T23:46:38.699383+00:00 app[worker.2]: DFC-INFO
>  WorkerProcess.perform: invoked at 2014-12-19 23:46:38 +0000
> 2014-12-19T23:46:38.706738+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 74: done: (Job{worker_queue} | WorkerProcess | [])
> 2014-12-19T23:46:38.630697+00:00 app[worker.2]: ** [23:46:38 2014-12-19]
> 2: Checking worker_queue
> 2014-12-19T23:46:38.689395+00:00 app[worker.2]: ** [23:46:38 2014-12-19] 7
> 4: Running after_fork hooks with [(Job{worker_queue} | WorkerProcess | [])]
>
> Seems to me that the other 14 failed without any logging etc.
>
> --
> Mike
>
> On Fri, Dec 19, 2014 at 1:35 PM, Dave Copeland <dave@stitchfix.com> wrote:
>>
>> I have never seen resque workers consume and drop payloads, nor have I
>> seen workers running that weren't attached to a dyno Heroku reported on, so
>> it would seem odd if you have two dynos running workers that some other
>> dynos you can't see are still running other workers.  More likely something
>> bombed and left bad information in Redis.  I HAVE seen resque-scheduler
>> randoml y not queue jobs according to the schedule, so perhaps that's it.
>> Definitely try VVERBOSE on your workers.
>>
>> Dave
>>
>> On Fri, Dec 19, 2014 at 2:41 PM, Mike Papper <mpapper@salesforce.com>
>> wrote:
>>
>>> My concern is that the worker jobs as-scheduled by resque are not
>>> running. The worker processes are running. But the code in the "perform"
>>> method is not.
>>>
>>> I suspect that since there are these extra workers the defunct ones are
>>> getting the scheduled jobs and just dropping them.
>>>
>>> Mike
>>>
>>> On Fri, Dec 19, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com>
>>> wrote:
>>>
>>>> Is your concern that your jobs can't start up Rails or that you are
>>>> seeing non-existent workers in resque-web?
>>>>
>>>> If you aren't seeing jobs get picked up, you can set VVERBOSE=1 in your
>>>> Procfile for each worker.  It will generate a ton of output to the logs as
>>>> to what's going on.  That's usually helped me understand what's going on.
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Thu, Dec 18, 2014 at 8:38 PM, Mike Papper <mp apper@salesforce.com
>>>> <mpapper@salesforce.com>> wrote:
>>>>
>>>>> Hi, I appreciate your reply and taking some time for this issue...
>>>>>
>>>>> Here is the unmangled code for our rake task:
>>>>>
>>>>> require 'resque/tasks'
>>>>> require 'resque/server'
>>>>> require 'resque_scheduler/tasks'
>>>>>
>>>>> task "resque:setup" => :environment do
>>>>>   ENV['QUEUE'] = '*'
>>>>>   ENV['INTERVAL'] = '1.0'
>>>>>   puts "INFO: waiting on ALL Resque queues"
>>>>> end
>>>>> task "jobs:work" => "resque:work"
>>>>>
>>>>> Invoked via Heroku Procfile line:
>>>>>
>>>>> worker: bundle exec rake jobs:work
>>>>>
>>>>> ------------
>>>>> My understa nding was that some magic in resque:work calls
>>>>> resque:setup that I defined. And I do see some workers.
>>>>>
>>>>> Nonetheless, I updated my Procfile and removed my resque:setu p rake
>>>>> task, heres the Procfile (very close to what you suggested):
>>>>>
>>>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9 bundle exec rake
>>>>> environment resque:work QUEUE=*
>>>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle exec rake
>>>>> environment resque:scheduler
>>>>>
>>>>> Now I see the same results, there are 4 workers in resque, most of the
>>>>> jobs are NOT processed but ONE did get processed. And the scheduler is
>>>>> enqueueing jobs.
>>>>>
>>>>> Incidentally, I stopped the worker heroku process and 2 of the resque
>>>>> workers went away. When the heroku workers came back up, I saw a total of 4
>>>>> resque workers (2 of which were the same as those magic ones I saw before I
>>>>> killed heroku and that stayed around after I killed heroku).
>>>>>
>>>>> So, I killed the 2 resque workers (the two that shouldnt be there)
>>>>> using unregister_worker. I see 2 workers in /resque (the web page) and
>>>>> Resque.workers. I see jobs schedu led but nothing is getting run. And
>>>>> incidentally I dont see any failed jobs in /resque either.
>>>>>
>>>>> Im open to any suggestions.
>>>>>
>>>>> Mike
>>>>>
>>>>> --
>>>>> Mike
>>>>>
>>>>>
>>>>> On Thu, Dec 18, 2014 at 5:28 AM, Dave Copeland <dave@stitchfix.com>
>>>>> wrote:
>>>>>
>>>>>> Your code is somewhat mangled.  Can you re-post or post a gist?
>>>>>>
>>>>>> What we do is to put environment in the Procfile like so:
>>>>>>
>>>>>> worker: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10 bundle exec rake
>>>>>> environment jobs:work QUEUE=whatever
>>>>>> scheduler: env TERM_CHILD=1 RESQUE_TERM_TIMEOUT=10  bundle exec rake
>>>>>> environment resque:scheduler
>>>>>>
>>>>>> I'm also not clear from your code on how resque:setup is ever
>>>>>> invoked as part of jobs:work.
>>>>>>
>>>>>> It might be easier to start with less customized abstractions and
>>>>>> just get things running by putting all the information you need in the
>>>>>> Procfile using the out-of-box tasks from resque.  Once that's wor
>>>>>> king, t hen wor ry about abstracting the invocations behind other rake
>>>>>> tasks.
>>>>>>
>>>>>> As to the info in resque web—it's entirely based on information in
>>>>>> magic redis keys and has no bearing on what is actually happening.  It
>>>>>> assumes that all worker processes are populating magic keys with magic
>>>>>> information so the web UI reflects reality. If anything goes wrong, it gets
>>>>>> out of sync.
>>>>>>
>>>>>> We monitor some of the magic keys for stale values and clear them so
>>>>>> our info is good.
>>>>>>
>>>>>> There's no documentation about what these keys are, although the
>>>>>> resque code —as spaghetti as it is—is not that hard to slog through as
>>>>>> there's not much too it.
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 12, 2014 at 8:09 PM, Mike Papper <mpapper@salesforce.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We run redis, resque and resque-scheduler on Heroku. Many of the
>>>>>>> scheduled jobs do not get run. The /resque/failed web page shows the error
>>>>>>> indicating that rails was never loaded in the worker process. Many of the
>>>>>>> errors do not make it into the /resque/failed page listing.
>>>>>>>
>>>>>>> I also see in /resque/workers more workers than I have running in
>>>>>>> Heroku. I cannot explian this. I kill the workers from the the rails
>>>>>>> console, restart the Heroku workers and it all looks good but eventually
>>>>>>> more workers appear in the /resque/workers page.
>>>>>>>
>>>>>>> I suspect that the mechanism that res que-scheduler uses to
>>>>>>> communicate with waiting and ready worker processes is broken in some way.
>>>>>>>
>>>>>>> More details:
>>>>>>> -I noticed there were 6 workers displayed in the /resque/workers web
>>>>>>> page and so I used the Rails console to kill all the workers. Then I
>>>>>>> restarted the workers in Heroku. That did n othing so I restarted the
>>>>>>> resque-scheduler process in Heroku. Still no chnage however the
>>>>>>> /resque/workers page shows the correct number of resque workers (that
>>>>>>> number matching the number of heroku worker processes).
>>>>>>> -about 12 hours later I noticed there appeared more workers in the
>>>>>>> /resque/workers page. WTF?
>>>>>>> -some jobs do get processed, and I see these in the completed totals
>>>>>>> for each worker (i.e., all the workers show non-zero num_processed
values).
>>>>>>> -I do see the logs showing sometimes the workers get run.
>>>>>>>
>>>>>>> Is there an obvious issue here?
>>>>>>> Can someone describe the keys/entries structure in Redis that are
>>>>>>> relavent to resque-scheduler communicating with the resque workers?
>>>>>>> I noted resque is no longer compatable with ruby 1.9.2 but I dont
>>>>>>> see that as a probable cause since the problem is intermittent.
>>>>>>> And I realize we use v. 1 of resque not v. 2.
>>>>>>>
>>>>>>> Errors and Logging info:
>>>>>>> ------------------------
>>>>>>>
>>>>>>> For every job scheduled/queued by resque-scheduler I do see a log
>>>>>>> entry showing its doing it and doing it at the correct time, for example:
>>>>>>>
>>>>>>> 2014-12-08T22:50:00.120565+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>>>> 22:50:00 +0000: queueing CitrixWorker (CitrixWorker)
>>>>>>> 2014-12-08T22:50:00.127388+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>>>> 22:50:00 +0000: queueing InstagramFeed (InstagramFeed)
>>>>>>> 2014-12-08T22:50:00.124540+00:00 app[scheduler.1]: [INFO] 2014-12-08
>>>>>>> 22:50:00 +0000: queueing PageSync (PageSync)
>>>>>>>
>>>>>>> Much of the time I never see the worker run. Sometimes I do (again,
>>>>>>> from the logs):
>>>>>>>
>>>>>>> 2014-12-08T22:50:00.367083+00:00 app[worker.3]: DFC-DEBUG
>>>>>>> CitrixWorker. update_webinars: starting at 2014-12-08 22:50:00 +0000
>>>>>>>
>>>>>>> Example error in /resque/failed:
>>>>>>> Exception
>>>>>>>     NameError
>>>>>>> Error
>>>>>>>     uninitialized constant WorkerProcess
>>>>>>> => indicating Rails was never loaded (or is no longer loaded) in the
>>>>>>> resque worker process.
>>>>>>>
>>>>>>> Environment and Versions:
>>>>>>> -------------------------
>>>>>>> Ruby 1.9.2
>>>>>>> Rails 3.1.11
>>>>>>>
>>>>>>> Gems:
>>>>>>> redis-3.0.6
>>>>>>> resque-1.25.1
>>>>>>> resque-scheduler-2. 3.1
>>>>>>>
>>>>>>> Procfile (used by Heroku):
>>>>>>> web: bin/start-nginx bundle exec unicorn --env $RACK_ENV -c
>>>>>>> ./config/unicorn.rb
>>>>>>> worker: bundle exec rake jobs:work
>>>>>>> scheduler: bundle exec rake resque:scheduler
>>>>>>>
>>>>>>> Rake task for workers:
>>>>>>> ----------------------
>>>>>>> resque.rake:
>>>>>>>
>>>>>>> require 'resque/tasks'
>>>>>>> require 'resque/server'
>>>>>>> require 'resque_scheduler/tasks'
>>>>>>> desc 'Called by the resque:work rake task to initialize each worker
>>>>>>> process (I think)'
>>>>>>> ta sk "resque:setup" => :environment do
>>>>>>>   ENV['QUEUE'] = '*' # watch one or all queues (priority of these
>>>>>>> queues is in alphabetical order when * is specified)
>>>>>>>   ENV['INTERVAL'] = '1.0' # lower from default 5 second wait time
>>>>>>> (polling frequency) 5 gives same erroneous results as 1 second
>>>>>>>   puts "INFO: waiting on #{ENV['QUEUE'] == '*' ? 'ALL' :
>>>>>>> ENV['QUEUE']} Resque queues".info
>>>>>>> end
>>>>>>>
>>>>>>> desc "Alias for resque:work (To run workers on Heroku using standard
>>>>>>> worker syntax, which is 'rake jobs:work')"
>>>>>>> task "jobs:work" => & quot;resque:work"
>>>>>>> ==============================
>>>>>>>
>>>>>>> --
>>>>>>> Mike Papper
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> *Dave Copeland, Director of Engineering*
>>>>> Washington, DC (Eastern Time Zone)
>>>>> dave@stitchfix.com
>>>>>
>>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>


-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-22 @ 08:03
Hi, please see inline...

But please note: this code does work, and has been working in production
but only very few jobs actually print any log output. What I see is this:

-if I go into the console and run the job manually (without using the
scheduler) by calling resque.enqueue... and do this 20 times in a row, then
Ill see the job run once or twice. This leads me to believe that the job is
mostly queued on one of the extra workers (those that should not be there)
and the queuing algorithm will choose one of the good workers when the
other workers are "in use" and thats why I have to enqueue 20 times to see
any output at all. Im going to test this idea by bringing it down to 1
worker.


On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Is that code snippet correct?  It doesn't look it would work, as it's
> missing an end.
>
> The error is my cut-paste. My actual code is correct.


> Your two log snippets seem inconsistent to me.  IN the first case, no jobs
> are processed, but somehow a bunch of jobs end up in the failed queue
> because the resque worker cannot find your WorkerProcess class.  That would
> be because Rails isn't loaded or your class isn't in a place where Rails
> will autoload.
>
> My theory is that the defunct workers (the ones that shouldnt be there)
are getting these jobs assigned to them and they dont actually exist (as
actual rails processes) and so we get the (perhaps misleading) error.


> You've also used a module instead of a class.  Not sure if that matters,
> but I've always used a class.
>
> I have this type of code running 10s of workers in production and many
(but not all) of these do get run correctly. They are all modelled off the
same code which uses a module. FYI.


> Finally, I'm not sure if you are expecting the jobs to be queued or the
> scheduler to queue them.  Is the scheduler worker running?  What's the
> procfile like?
>
> The scheduler is running. I ultimately want the scheduler to enqueue the
jobs. You can see the scheduler's log output saying that it is enqueueing =>

2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
22:56:00 +0000: queueing WorkerProcess (WorkerProcess)



> Sorry I can't be more helpful, it's just hard to get a picture of what's
> going on with small snippets of your code and configuration.
>

Im glad for any help you can give.


>
> Since you are testing a lot of moving parts at once, it might be simplest
> to take things slowly, step by step, starting with just manually queueing
> jobs and seeing them in resque web, then have a worker process them, then
> add in the scheduler, etc.
>
> Im going to set it to 1 worker and manually enqueue.

Thanks, Mike

PS I removed the old content as the mailer daemon was complaining about the
message being too long.

> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-22 @ 13:54
FWIW, jobs don't get assigned to workers, per se, rather the workers get
forked when there are jobs available.  Resque's designed to prevent a
forked worker for eating up jobs like this because each forked worker will
exit when the job is done.

Resque is pretty aggressive about putting jobs in the failed queue if
things go wrong, catch Exception to be sure.

Of course, the forked workers can go stale, but the master/parent process—I
believe—will not fork a new worker in this case.

As a last-ditch effort—and I realize this really sucks—you can fork resque
and add even more logging yourself and deploy with that.  I've had to do
that before to diagnose issues.  Wasn't fun, but it worked.


See:

   -
   https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
   -
   https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
   -
   https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
   -
   https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190

Dave


On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com> wrote:

> Hi, please see inline...
>
> But please note: this code does work, and has been working in production
> but only very few jobs actually print any log output. What I see is this:
>
> -if I go into the console and run the job manually (without using the
> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
> Ill see the job run once or twice. This leads me to believe that the job is
> mostly queued on one of the extra workers (those that should not be there)
> and the queuing algorithm will choose one of the good workers when the
> other workers are "in use" and thats why I have to enqueue 20 times to see
> any output at all. Im going to test this idea by bringing it down to 1
> worker.
>
>
> On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland <dave@stitchfix.com> wrote:
>>
>> Is that code snippet correct?  It doesn't look it would work, as it's
>> missing an end.
>>
>> The error is my cut-paste. My actual code is correct.
>
>
>> Your two log snippets seem inconsistent to me.  IN the first case, no
>> jobs are processed, but somehow a bunch of jobs end up in the failed queue
>> because the resque worker cannot find your WorkerProcess class.  That would
>> be because Rails isn't loaded or your class isn't in a place where Rails
>> will autoload.
>>
>> My theory is that the defunct workers (the ones that shouldnt be there)
> are getting these jobs assigned to them and they dont actually exist (as
> actual rails processes) and so we get the (perhaps misleading) error.
>
>
>> You've also used a module instead of a class.  Not sure if that matters,
>> but I've always used a class.
>>
>> I have this type of code running 10s of workers in production and many
> (but not all) of these do get run correctly. They are all modelled off the
> same code which uses a module. FYI.
>
>
>> Finally, I'm not sure if you are expecting the jobs to be queued or the
>> scheduler to queue them.  Is the scheduler worker running?  What's the
>> procfile like?
>>
>> The scheduler is running. I ultimately want the scheduler to enqueue the
> jobs. You can see the scheduler's log output saying that it is enqueueing =>
>
> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>
>
>
>> Sorry I can't be more helpful, it's just hard to get a picture of what's
>> going on with small snippets of your code and configuration.
>>
>
> Im glad for any help you can give.
>
>
>>
>> Since you are testing a lot of moving parts at once, it might be simplest
>> to take things slowly, step by step, starting with just manually queueing
>> jobs and seeing them in resque web, then have a worker process them, then
>> add in the scheduler, etc.
>>
>> Im going to set it to 1 worker and manually enqueue.
>
> Thanks, Mike
>
> PS I removed the old content as the mailer daemon was complaining about
> the message being too long.
>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com <dav+e@stitchfix.com>
>>
>


-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-29 @ 19:57
Hi, Ive made my environment more lean, I see this error each time I try to
run a job in the rails console:

Arguments

Exception NameError Error uninitialized constant WorkerEnqueued
<https://development.developer.salesforce.com/resque/failed#>
So, Ive reported this before, but I also see many jobs complete.
What should I look for in my rails worker code that would prevent the rails
environment from loading up correctly in the worker? I know I can "see"
that class in my rails console but evidently the worker cannot. Any ideas?
Mike

On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com> wrote:
>
> FWIW, jobs don't get assigned to workers, per se, rather the workers get
> forked when there are jobs available.  Resque's designed to prevent a
> forked worker for eating up jobs like this because each forked worker will
> exit when the job is done.
>
> Resque is pretty aggressive about putting jobs in the failed queue if
> things go wrong, catch Exception to be sure.
>
> Of course, the forked workers can go stale, but the master/parent
> process—I believe—will not fork a new worker in this case.
>
> As a last-ditch effort—and I realize this really sucks—you can fork resque
> and add even more logging yourself and deploy with that.  I've had to do
> that before to diagnose issues.  Wasn't fun, but it worked.
>
>
> See:
>
>    -
>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>    -
>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>    -
>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>    -
>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>
> Dave
>
>
> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
> wrote:
>
>> Hi, please see inline...
>>
>> But please note: this code does work, and has been working in production
>> but only very few jobs actually print any log output. What I see is this:
>>
>> -if I go into the console and run the job manually (without using the
>> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
>> Ill see the job run once or twice. This leads me to believe that the job is
>> mostly queued on one of the extra workers (those that should not be there)
>> and the queuing algorithm will choose one of the good workers when the
>> other workers are "in use" and thats why I have to enqueue 20 times to see
>> any output at all. Im going to test this idea by bringing it down to 1
>> worker.
>>
>>
>> On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland <dave@stitchfix.com>
>>  wrote:
>>>
>>> Is that code snippet correct?  It doesn't look it would work, as it's
>>> missing an end.
>>>
>>> The error is my cut-paste. My actual code is correct.
>>
>>
>>> Your two log snippets seem inconsistent to me.  IN the first case, no
>>> jobs are processed, but somehow a bunch of jobs end up in the failed queue
>>> because the resque worker cannot find your WorkerProcess class.  That would
>>> be because Rails isn't loaded or your class isn't in a place where Rails
>>> will autoload.
>>>
>>> My theory is that the defunct workers (the ones that shouldnt be there)
>> are getting these jobs assigned to them and they dont actually exist (as
>> actual rails processes) and so we get the (perhaps misleading) error.
>>
>>
>>> You've also used a module instead of a class.  Not sure if that matters,
>>> but I've always used a class.
>>>
>>> I have this type of code running 10s of workers in production and many
>> (but not all) of these do get run correctly. They are all modelled off the
>> same code which uses a module. FYI.
>>
>>
>>> Finally, I'm not sure if you are expecting the jobs to be queued or the
>>> scheduler to queue them.  Is the scheduler worker running?  What's the
>>> procfile like?
>>>
>>> The scheduler is running. I ultimately want the scheduler to enqueue the
>> jobs. You can see the scheduler's log output saying that it is enqueueing =>
>>
>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
>> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>
>>
>>
>>> Sorry I can't be more helpful, it's just hard to get a picture of what's
>>> going on with small snippets of your code and configuration.
>>>
>>
>> Im glad for any help you can give.
>>
>>
>>>
>>> Since you are testing a lot of moving parts at once, it might be
>>> simplest to take things slowly, step by step, starting with ju st manually
>>> queueing jobs and seeing them in resque web, then have a worker process
>>> them, then add in the scheduler, etc.
>>>
>>> Im going to set it to 1 worker and manually enqueue.
>>
>> Thanks, Mike
>>
>> PS I removed the old content as the mailer daemon was complaining about
>> the message being too long.
>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-29 @ 20:06
Is there any way you can post your entire setup somewhere?  It's really
hard to tell with just snippets of errors an dcode.

Dave

On Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com> wrote:

> Hi, Ive made my environment more lean, I see this error each time I try to
> run a job in the rails console:
>
> Arguments
>
> Exception NameError Error uninitialized constant WorkerEnqueued
> <https://development.developer.salesforce.com/resque/failed#>
> So, Ive reported this before, but I also see many jobs complete.
> What should I look for in my rails worker code that would prevent the
> rails environment from loading up correctly in the worker? I know I can
> "see" that class in my rails console but evidently the worker cannot. Any
> ideas?
> Mike
>
> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com> wrote:
>>
>> FWIW, jobs do n't get assigned to workers, per se, rather the workers get
>> forked when there are jobs available.  Resque's designed to prevent a
>> forked worker for eating up jobs like this because each forked worker will
>> exit when the job is done.
>>
>> Resque is pretty aggressive about putting jobs in the failed queue if
>> things go wrong, catch Exception to be sure.
>>
>> Of course, the forked workers can go stale, but the master/parent
>> process—I believe—will not fork a new worker in this case.
>>
>> As a last-ditch effort—and I realize this really sucks—you can fork
>> resque and add even more logging yours elf and deploy with that.  I've had
>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>
>>
>> See:
>>
>>    -
>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>    -
>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>    -
>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>    -
>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>
>> Dave
>>
>>
>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
>> wrote:
>>
>>> Hi, please see inline. ..
>>>
>>> But please note: this code does work, and has been working in production
>>> but only very few jobs actually print any log output. What I see is this:
>>>
>>> -if I go into the console and run the job manually (without using the
>>> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
>>> Ill see the job run once or twice. This leads me to believe that the job is
>>> mostly queued on one of the extra workers (those that should not be there)
>>> and the queuing algorithm will choose one of the good workers when the
>>> other workers are "in use" and thats why I have to enqueue 20 times to see
>>> any output at all. Im going to test this idea by bringing it down to 1
>>> worker.
>>>
>>>
>>> On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland <dave @stitchfix.com>
>>>  wrote:
>>>>
>>>> Is that code snippet correct?  It doesn't look it would work, as it's
>>>> missing an end.
>>>>
>>>> The error is my cut-paste. My actual code is correct.
>>>
>>>
>>>> Your two log snippets seem inconsistent to me.  IN the first case, no
>>>> jobs are processed, but somehow a bunch of jobs end up in the failed queue
>>>> because the resque worker cannot find your WorkerProcess class.  That would
>>>> be because Rails isn't loaded or your class isn't in a place where Rails
>>>> will autoload.
>>>>
>>>> My theory is that the defunct workers (the ones that shouldnt be there)
>>> are getting these jobs assigned to them and they dont actually exist (as
>>> actual rails processes) and so we get the (perhaps misleading) error.
>>>
>>>
>>>> You've also used a module instead of a class.  Not sure if that
>>>> matters, but I've always used a class.
>>>>
>>>> I have this type of code running 10s of workers in production and many
>>> (but not all) of these do get run correctly. They are all modelled off the
>>> same code which uses a module. FYI.
>>>
>>>
>>>> Finally, I'm not sure if you are expecting the jobs to be queued or the
>>>> scheduler to queue them.  Is the scheduler worker running?  What's the
>>>> procfile like?
>>>>
>>>> The scheduler is running. I ultimately want the scheduler to enqueue
>>> the jobs. You can see the scheduler's log output saying that it is
>>> enqueueing =>
>>>
>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
>>> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>
>>>
>>>
>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>> what's going on with small snippets of your code and configuration.
>>>>
>>>
>>> Im glad for any help you can give.
>>>
>>>
>>>>
>>>> Since you are testing a lot of moving parts at once, it might be
>>>> simplest to take things slowly, step by step, starting with ju st manually
>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>> them, then add in the scheduler, etc.
>>>>
>>>> Im going to set it to 1 worker and manually enqueue.
>>>
>>> Thanks, Mike
>>>
>>> PS I removed the old content as the mailer daemon was complaining about
>>> the message being too long.
>>>
>>>> --
>>>> *Dave Copeland, Director of Engineering*
>>>> Washington, DC (Eastern Time Zone)
>>>> dave@stitchfix.com
>>>>
>>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>


-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-30 @ 02:20
I updated the gist and added one line to the "Procfile" gist item:

scheduler: env VVERBOSE=1 TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle
exec rake environment resque:scheduler


I have since commented out that one line from my Procfile and it fires
off the worker processes more often. What I do is to clear the workers
using "Resque.workers.each {|w| w.unregister_worker}" and then restart
the heroku worker process and wait for Resque.workers to show exactly
one item. Then enqueueing owrkers using "Resque.enqueue(Worker)" works
some of the time. In these cases I see this from the logs:


2014-12-30T02:19:12.800016+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 2: Checking worker
2014-12-30T02:19:12.805709+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 2: Running before_fork hooks with [(Job{worker} | Worker |
[])]
2014-12-30T02:19:12.967550+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 46: Running after_fork hooks with [(Job{worker} | Worker |
[])]
2014-12-30T02:19:12.802076+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 2: Found job on worker
2014-12-30T02:19:12.802144+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 2: got: (Job{worker} | Worker | [])
2014-12-30T02:19:12.805619+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 2: resque-1.25.2: Processing worker since 1419905952
[Worker]
2014-12-30T02:19:12.963728+00:00 app[worker.1]: ** [02:19:12
2014-12-30] 2: resque-1.25.2: Forked 46 at 1419905952
2014-12-30T02:19:13.045593+00:00 app[worker.1]: DFC-INFO
Worker.perform: invoked at 2014-12-30 02:19:13 +0000
2014-12-30T02:19:13.071967+00:00 app[worker.1]: ** [02:19:13
2014-12-30] 46: done: (Job{worker} | Worker | [])
2014-12-30T02:19:13.155321+00:00 app[worker.1]: ** [02:19:13
2014-12-30] 2: Sleeping for 5.0 seconds


--

Mike




On Mon, Dec 29, 2014 at 12:06 PM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Is there any way you can post your entire setup somewhere?  It's really
> hard to tell with just snippets of errors an dcode.
>
> Dave
>
> On Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com>
> wrote:
>
>> Hi, Ive made my environment more lean, I see this error each time I try
>> to run a job in the rails console:
>>
>> Arguments
>>
>> Exception NameError Error uninitialized constant WorkerEnqueued
>> <https://development.developer.salesforce.com/resque/failed#>
>> So, Ive reported this before, but I also see many jobs complete.
>> What should I look for in my rails worker code that would prevent the
>> rails environment from loading up correctly in the worker? I know I can
>> "see" that class in my rails console but evidently the worker cannot. Any
>> ideas?
>> Mike
>>
>> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com>
>> wrote:
>>>
>>> FWIW, jobs do n't get assigned to workers, per se, rather the workers
>>> get forked when there are jobs available.  Resque's designed to prevent a
>>> forked worker for eating up jobs like this because each forked worker will
>>> exit when the job is done.
>>>
>>> Resque is pretty aggressive about putting jobs in the failed queue if
>>> things go wrong, catch Exception to be sure.
>>>
>>> Of course, the forked workers can go stale, but the master/parent
>>> process—I believe—will not fork a new worker in this case.
>>>
>>> As a last-ditch effort—and I realize this really sucks—you can fork
>>> resque and add e ven more logging yours elf and deploy with that.  I've had
>>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>>
>>>
>>> See:
>>>
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>>
>>> Dave
>>>
>>>
>>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
>>> wrote:
>>>
>>>> Hi, please see inline. ..
>>>>
>>>> But please note: this code does work, and has been working in
>>>> production but only very few jobs actually print any log output. What I see
>>>> is this:
>>>>
>>>> -if I go into the console and run the job manually (without using the
>>>> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
>>>> Ill see the job run once or twice. This leads me to believe that the job is
>>>> mostly queued on one of the extra workers (those that should not be there)
>>>> and the queuing algorithm will choose one of the good workers when the
>>>> other workers are "in use" and thats why I have to enqueue 20 times to see
>>>> any output at all. Im going to test this idea by bringing it down to 1
>>>> worker.
>>>>
>>>>
>>>> On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland < dave @stitchfix.com>
>>>>  wrote:
>>>>>
>>>>> Is that code snippet correct?  It doesn't look it would work, as it's
>>>>> missing an end.
>>>>>
>>>>> The error is my cut-paste. My actual code is correct.
>>>>
>>>>
>>>>> Your two log snippets seem inconsistent to me.  IN the first case, no
>>>>> jobs are processed, but somehow a bunch of jobs end up in the failed queue
>>>>> because the resque worker cannot find your WorkerProcess class.  That would
>>>>> be because Rails isn't loaded or your class isn't in a place where Rails
>>>>> will autoload.
>>>>>
>>>>> My theory is that the defunct workers (the ones that shouldnt be
>>>> there) are getting these jobs assigned to them and they dont actually exist
>>>> (as actual rails processes) and so we get the (perhaps misleading) error.
>>>>
>>>>
>>>>> You've also used a module instead of a class.  Not sure if that
>>>>> matters, but I've always used a class.
>>>>>
>>>>> I have this type of code running 10s of workers in production and many
>>>> (but not all) of these do get run correctly. They are all modelled off the
>>>> same code which uses a module. FYI.
>>>>
>>>>
>>>>> Finally, I'm not sure if you are expecting the jobs to be queued or
>>>>> the scheduler to queue them.  Is the scheduler worker running?  What's the
>>>>> procfile like?
>>>>>
>>>>> The scheduler is running. I ultimately want the scheduler to enqueue
>>>> the jobs. You can see the scheduler's log output saying that it is
>>>> enqueueing =>
>>>>
>>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
>>>> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>>
>>>>
>>>>
>>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>>> what's going on with small snippets of your code and configuration.
>>>>>
>>>>
>>>> Im glad for any help you can give.
>>>>
>>>>
>>>>>
>>>>> Since you are testing a lot of moving parts at once, it might be
>>>>> simplest to take things slowly, step by step, starting with ju st manually
>>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>>> them, then add in the scheduler, etc.
>>>>>
>>>>> Im going to set it to 1 worker and manually enqueue.
>>>>
>>>> Thanks, Mike
>>>>
>>>> PS I removed the old content as the mailer daemon was complaining about
>>>> the message being too long.
>>>>
>>>>> --
>>>>> *Dave Copeland, Director of Engineering*
>>>>> Washington, DC (Eastern Time Zone)
>>>>> dave@stitchfix.com
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-30 @ 15:13
Three thing seems strange:

   - the check for the development environment
   - requiring the worker in the initializer
   - Using a custom namespace

You should need to do any of those things.

But if you are seeing things work some of the time, is there any chance
some other processes are connected to your Redis instance?

Dave

On Mon, Dec 29, 2014 at 9:20 PM, Mike Papper <mpapper@salesforce.com> wrote:

> I updated the gist and added one line to the "Procfile" gist item:
>
> scheduler: env VVERBOSE=1 TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle 
exec rake environment resque:scheduler
>
>
> I have since commented out that one line from my Procfile and it fires 
off the worker processes more often. What I do is to clear the workers 
using "Resque.workers.each {|w| w.unregister_worker}" and then restart the
heroku worker process and wait for Resque.workers to show exactly one 
item. Then enqueueing owrkers using "Resque.enqueue(Worker)" works some of
the time. In these cases I see this from the logs:
>
>
> 2014-12-30T02:19:12.800016+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
2: Checking worker
> 2014-12-30T02:19:12.805709+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
2: Running before_fork hooks with [(Job{worker} | Worker | [])]
> 2014-12-30T02:19:12.967550+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
46: Running after_fork hooks with [(Job{worker} | Worker | [])]
> 2014-12-30T02:19:12.802076+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
2: Found job on worker
> 2014-12-30T02:19:12.802144+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
2: got: (Job{worker} | Worker | [])
> 2014-12-30T02:19:12.805619+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
2: resque-1.25.2: Processing worker since 1419905952 [Worker]
> 2014-12-30T02:19:12.963728+00:00 app[worker.1]: ** [02:19:12 2014-12-30]
2: resque-1.25.2: Forked 46 at 1419905952
> 2014-12-30T02:19:13.045593+00:00 app[worker.1]: DFC-INFO  
Worker.perform: invoked at 2014-12-30 02:19:13 +0000
> 2014-12-30T02:19:13.071967+00:00 app[worker.1]: ** [02:19:13 2014-12-30]
46: done: (Job{worker} | Worker | [])
> 2014-12-30T02:19:13.155321+00:00 app[worker.1]: ** [02:19:13 2014-12-30]
2: Sleeping for 5.0 seconds
>
>
> --
>
> Mike
>
>
>
>
> On Mon, Dec 29, 2014 at 12:06 PM, Dave Copeland <dave@stitchfix.com>
> wrote:
>>
>> Is there any way you can post your entire setup somewhere?  It's really
>> hard to tell with just snippets of errors an dcode.
>>
>> Dave
>>
>> On Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com>
>> wrote:
>>
>>> Hi, Ive made my environment more lean, I see this error each time I try
>>> to run a job in the rails console:
>>>
>>> Arguments
>>>
>>> Exception NameError Error uninitialized constant WorkerEnqueued
>>> <https://development.developer.salesforce.com/resque/failed#>
>>> So, Ive reported this before, but I also see many jobs complete.
>>> What should I look for in my rails worker code that would prevent the
>>> rails environment from loading up correctly in the worker? I know I can
>>> "see" that class in my rails console but evidently the worker cannot. Any
>>> ideas?
>>> Mike
>>>
>>> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com>
>>> wrote:
>>>>
>>>> FWIW, jobs do n't get assigned to workers, per se, rather the workers
>>>> get forked when there are jobs available.  Resque's designed to prevent a
>>>> forked worker for eating up jobs like this because each forked worker will
>>>> exit when the job is done.
>>>>
>>>> Resque is pretty aggressive about putting jobs in the failed queue if
>>>> things go wrong, catch Exception to be sure.
>>>>
>>>> Of course, the forked workers can go stale, but the master/parent
>>>> process—I believe—will not fork a new worker in this case.
>>>>
>>>> As a last-ditch effort—and I realize this really sucks—you can fork
>>>> resque and add e ven more logging yours elf and deploy with that.  I've had
>>>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>>>
>>>>
>>>> See:
>>>>
>>>>    -
>>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>>>    -
>>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>>>    -
>>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>>>    -
>>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
>>>> wrote:
>>>>
>>>>> Hi, please see inli ne. ..
>>>>>
>>>>> But please note: this code does work, and has been working in
>>>>> production but only very few jobs actually print any log output. What I see
>>>>> is this:
>>>>>
>>>>> -if I go into the console and run the job manually (without using the
>>>>> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
>>>>> Ill see the job run once or twice. This leads me to believe that the job is
>>>>> mostly queued on one of the extra workers (those that should not be there)
>>>>> and the queuing algorithm will choose one of the good workers when the
>>>>> other workers are "in use" and thats why I have to enqueue 20 times to see
>>>>> any output at all. Im going to test this idea by bringing it down to 1
>>>>> worker.
>>>>>
>>>>>
>>>>> On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland < dave @stitchfix.com>
>>>>>  wrote:
>>>>>>
>>>>>> Is that code snippet correct?  It doesn't look it would work, as it's
>>>>>> missing an end.
>>>>>>
>>>>>> The error is my cut-paste. My actual code is correct.
>>>>>
>>>>>
>>>>>> Your two log snippets seem inconsistent to me.  IN the first case, no
>>>>>> jobs are processed, but somehow a bunch of jobs end up in the failed queue
>>>>>> because the resque worker cannot find your WorkerProcess class.  That would
>>>>>> be because Rails isn't loaded or your class isn't in a place where Rails
>>>>>> will autoload.
>>>>>>
>>>>>> My theory is that the defunct workers (the ones that shouldnt be
>>>>> there) are getting these jobs assigned to them and they dont actually exist
>>>>> (as actual rails processes) and so we get the (perhaps misleading) error.
>>>>>
>>>>>
>>>>>> You've also used a module instead of a class.  Not sure if that
>>>>>> matters, but I've always used a class.
>>>>>>
>>>>>> I have this type of code running 10s of workers in production and
>>>>> many (but not all) of these do get run correctly. They are all modelled off
>>>>> the same code which uses a module. FYI.
>>>>>
>>>>>
>>>>>> Finally, I'm not sure if you are expecting the jobs to be queued or
>>>>>> the scheduler to queue them.  Is the scheduler worker running?  What's the
>>>>>> procfile like?
>>>>>>
>>>>>> The scheduler is running. I ultimately want the scheduler to enqueue
>>>>> the jobs. You can see the scheduler's log output saying that it is
>>>>> enqueueing =>
>>>>>
>>>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
>>>>> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>>>
>>>>>
>>>>>
>>>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>>>> what's going on with small snippets of your code and configuration.
>>>>>>
>>>>>
>>>>> Im glad for any help you can give.
>>>>>
>>>>>
>>>>>>
>>>>>> Since you are testing a lot of moving parts at once, it might be
>>>>>> simplest to take things slowly, step by step, starting with ju st manually
>>>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>>>> them, then add in the scheduler, etc.
>>>>>>
>>>>>> Im going to set it to 1 worker and manually enqueue.
>>>>>
>>>>> Thanks, Mike
>>>>>
>>>>> PS I removed the old content as the mailer daemon was complaining
>>>>> about the message being too long.
>>>>>
>>>>>> --
>>>>>> *Dave Copeland, Director of Engineering*
>>>>>> Washington, DC (Eastern Time Zone)
>>>>>> dav e@stitchfix.com
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Dave Copeland, Director of Engineering*
>>>> Washington, DC (Eastern Time Zone)
>>>> dave@stitchfix.com
>>>>
>>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>


-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-30 @ 21:22
Thanks for that input, with that I was able to track down the issue to the
use of the redis-namespace gem -> this is the call for this code:

REDIS = Redis::Namespace.new(namespace, :redis => r)
Resque.redis = REDIS


It looks like Resque doesnt work when using this namespace gem (as the gem
is supposed to pass-through get/set redis calls but it must not be working
as intended with Resque). I ended up giving Resque.redis a handle to the
un-namespaced REDIS object but still using the namespaced version in my
application's code.

I set the namespace in redis to keep our server data separate from other
data on the redis server.

I really appreciate your help - its especially useful just knowing someone
who has more knowledge tan me is there to help - which motivates me to find
the issue with my application.

--
Mike

On Tue, Dec 30, 2014 at 7:13 AM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Three thing seems strange:
>
>    - the check for the development environment
>    - requiring the worker in the initializer
>    - Using a custom namespace
>
> You should need to do any of those things.
>
> But if you are seeing things work some of the time, is there any chance
> some other processes are connected to your Redis instance?
>
> Dave
>
> On Mon, Dec 29, 2014 at 9:20 PM, Mike Papper &l t;mpapper@salesforce.com>
> wrote:
>
>> I updated the gist and added one line to the "Procfile" gist item:
>>
>> scheduler: env VVERBOSE=1 TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle 
exec rake environment resque:scheduler
>>
>>
>> I have since commented out that one line from my Procfile and it fires 
off the worker processes more often. What I do is to clear the workers 
using "Resque.workers.each {|w| w.unregister_worker}" and then restart the
heroku worker process and wait for Resque.workers to show exactly one 
item. Then enqueueing owrkers using "Resque.enqueue(Worker)" works some of
the time. In these cases I see this from the logs:
>>
>>
>> 2014-12-30T02:19:12.800016+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Checki
>>  ng worker
>> 2014-12-30T02:19:12.805709+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Running before_fork hooks with [(Job{worker} | Worker | 
[])]
>> 2014-12-30T02:19:12.967550+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 46: Running after_fork hooks with [(Job{worker} | Worker | 
[])]
>> 2014-12-30T02:19:12.802076+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Found job on worker
>> 2014-12-30T02:19:12.802144+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: got: (Job{worker} | Worker | [])
>> 2014-12-30T02:19:12.805619+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: resque-1.25.2: Processing worker since 1419905952 [Worker]
>> 2014-12-30T02:19:12.963728+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: resque-1.25.2: Forked 46 at 1419905952
>> 2014-12-30T02:19:13.045593+00:00 app[worker.1]: DFC-INFO  
Worker.perform: invoked at 2014-12-30 02:19:13 +0000
>> 2014-12-30T02:19:13.071967+00:00 app[worker.1]: ** [02:19:13 
2014-12-30] 46: done: (Job{worker} | Worker | [])
>> 2014-12-30T02:19:13.155321+00:00 app[worker.1]: ** [02:19:13 
2014-12-30] 2: Sleeping for 5.0 seconds
>>
>>
>> --
>>
>> Mike
>>
>>
>>
>>
>> On Mon, Dec 29, 2014 at 12:06 PM, Dave Copeland <dave@stitchfix.com>
>> wrote:
>>>
>>> Is there any way you can post your entire setup somewhere?  It's really
>>> hard to tell with just snippets of errors an dcode.
>>>
>>> Dave
>>>
>>> O n Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com>
>>> wrote:
>>>
>>>> Hi, Ive made my environment more lean, I see this error each time I try
>>>> to run a job in the rails console:
>>>>
>>>> Arguments
>>>>
>>>> Exception NameError Error uninitialized constant WorkerEnqueued
>>>> <https://development.developer.salesforce.com/resque/failed#>
>>>> So, Ive reported this before, but I also see many jobs complete.
>>>> What should I look for in my rails worker code that would prevent the
>>>> rails environment from loading up correctly in the worker? I know I can
>>>> "see" that class in my rails console but evidently the worker cannot. Any
>>>> ideas?
>>>> Mike
>>>>
>>>> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com>
>>>> wrote:
>>>>>
>>>>> FWIW, jobs do n't get assigned to workers, per se, rather the workers
>>>>> get forked when there are jobs available.  Resque's designed to prevent a
>>>>> forked worker for eating up jobs like this because each forked worker will
>>>>> exit when the job is done.
>>>>>
>>>>> Resque is pretty aggressive about putting jobs in the failed queue if
>>>>> things go wrong, catch Exception to be sure.
>>>>>
>>>>> Of course, the forked workers can go stale, but the master/parent
>>>>> process—I believe—will not fork a new worker in this case.
>>>>>
>>>>> As a last-ditch effort—and I realize this really sucks—you can fork
>>>>> resque and add e ven more logging yours elf and deploy with that.  I've had
>>>>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>>>>
>>>>>
>>>>> See:
>>>>>
>>>>>    -
>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>>>>    -
>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>>>>    -
>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>>>>    -
>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>>>>
>>>>> Dave
>>>>>
>>>>>
>>>>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, please see inli ne. ..
>>>>>>
>>>>>> But please note: this code does work, and has been working in
>>>>>> production but only very few jobs actually print any log output. What I see
>>>>>> is this:
>>>>>>
>>>>>> -if I go into the console and run the job manually (without using the
>>>>>> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
>>>>>> Ill see the job run once or twice. This leads me to believe that the job is
>>>>>> mostly queued on one of the extra workers (those that should not be there)
>>>>>> and the queuing algorithm will choose one of the good workers when the
>>>>>> other workers are "in use" and thats why I have to enqueue 20 times to see
>>>>>> any output at all. Im going to test this idea by bringing it down to 1
>>>>>> worker.
>>>>>>
>>>>>>
>>>>>> On Sat, Dec 20 , 2014 at 7:23 AM, Dave Copeland < dave @stitchfix.com
>>>>>> > wrote:
>>>>>>>
>>>>>>> Is that code snippet correct?  It doesn't look it would work, as
>>>>>>> it's missing an end.
>>>>>>>
>>>>>>> The error is my cut-paste. My actual code is correct.
>>>>>>
>>>>>>
>>>>>>> Your two log snippets seem inconsistent to me.  IN the first case,
>>>>>>> no jobs are processed, but somehow a bunch of jobs end up in the failed
>>>>>>> queue because the resque worker cannot find your WorkerProcess 
class.  That
>>>>>>> would be because Rails isn't loaded or your class isn't in a place where
>>>>>>> Rails will autoload.
>>>>>>>
>>>>>>> My theory is that the defunct workers (the ones that shouldnt be
>>>>>> there) are getting these jobs assigned to them and they dont actually exist
>>>>>> (as actual rails processes) and so we get the (perhaps misleading) error.
>>>>>>
>>>>>>
>>>>>>> You've also used a module instead of a class.  Not sure if that
>>>>>>> matters, but I've always used a class.
>>>>>>>
>>>>>>> I have this type of code running 10s of workers in production and
>>>>>> many (but not all) of these do get run correctly. They are all modelled off
>>>>>> the same code which uses a module. FYI.
>>>>>>
>>>>>>
>>>>>>> Finally, I'm not sure if you are expecting the jobs to be queued or
>>>>>>> the scheduler to queue them.  Is the scheduler worker running?  What's the
>>>>>>> procfile like?
>>>>>>>
>>>>>>> The scheduler is running. I ultimately want the scheduler to enqueue
>>>>>> the jobs. You can see the scheduler's log output saying that it is
>>>>>> enqueueing =>
>>>>>>
>>>>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
>>>>>> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>>>>> what's going on with small snippets of your code and configuration.
>>>>>>>
>>>>>>
>>>>>> Im glad for any help you can give.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Since you are testing a lot of moving parts at once, it might be
>>>>>>> simplest to take things slowly, step by step, starting with ju st manually
>>>>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>>>>> them, then add in the scheduler, etc.
>>>>>>>
>>>>>>> Im going to set it to 1 worker and manually enqueue.
>>>>>>
>>>>>> Thanks, Mike
>>>>>>
>>>>>> PS I removed the old content as the mailer daemon was complaining
>>>>>> about the message being too long.
>>>>>>
>>>>>>> --
>>>>>>> *Dave Copeland, Director of Engineering*
>>>>>>> Washington, DC (Eastern Time Zone)
>>>>>>> dav e@stitchfix.com
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Dave Copeland, Director of Engineering*
>>>>> Washington, DC (Eastern Time Zone)
>>>>> dave@stitchfix.com
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Dave Copeland
Date:
2014-12-30 @ 21:41
Yeah, Resque uses redis-namespace internally to put all of its data inside
the "resque" namespace.  I"m guessing if you put a namespaced redis inside
a redis-namespace, it doesn't work right?  Would be nice if it didn't
silently fail though :(

I will also add that it's probably not a good idea to use the same Redis
for resque and as a data store for other things.  You run the risk of
taking your application down if you fill up Redis.  For example, if you get
a sudden influx of jobs, you will no longer be able to store data. Or, the
converse, your data suddenly spikes in size and you can no longer queue
jobs or put them in the failed queue.  If using separate Redises, the
damage done is reduced in scope.

I know cost is always a concern, but it might be simpler if you have a
redis dedicated for Resque.  That's what we do, even so far as a dedicated
redis per resque per app.  Early on, we had a situation where a spike in
email jobs in our public website took down our warehouse management
application because Redis filled up.  Not good :)

Dave


On Tue, Dec 30, 2014 at 4:22 PM, Mike Papper <mpapper@salesforce.com> wrote:

> Thanks for that input, with that I was able to track down the issue to the
> use of the redis-namespace gem -> this is the call for this code:
>
> REDIS = Redis::Namespace.new(namespace, :redis => r)
> Resque.redis = REDIS
>
>
> It looks like Resque doesnt work when using this namespace gem (as the gem
> is supposed to pass-through get/set redis calls but it must not be working
> as intended with Resque). I ended up giving Resque.redis a handle to the
> un-namespaced REDIS object but still using the namespaced version in my
> application's code.
>
> I set the namespace in redis to keep our server data separate from other
> data on the redis server.
>
> I really appreciate your help - its especially useful just knowing someone
> who has more knowledge tan me is there to help - which motivates me to find
> the issue with my application.
>
> --
> Mike
>
> On Tue, Dec 30, 2014 at 7:13 AM, Dave Copeland <dave@stitchfix.com> wrote:
>>
>> Three thing seems strange:
>>
>>    - the check for the development environment
>>    - requiring the worker in the initializer
>>    - Using a custom namespace
>>
>> You should need to do any of those things.
>>
>> < /div>
>> But if you are seeing things work some of the time, is there any chance
>> some other processes are connected to your Redis instance?
>>
>> Dave
>>
>> On Mon, Dec 29, 2014 at 9:20 PM, Mike Papper &l t;mpapper@salesforce.com>
>> wrote:
>>
>>> I updated the gist and added one line to the "Procfile" gist item:
>>>
>>> scheduler: env VVERBOSE=1 TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle 
exec rake environment resque:scheduler
>>>
>>>
>>> I have since commented out that one line from my Procfile and it fires
off the worker processes more often. What I do is to clear the workers 
using "Resque.workers.each {|w| w.unregister_worker}" and then restart the
heroku worker process and wait for Resque.workers to show exactly one 
item. Then enqueueing owrkers using "Resque.enqueue(Worker)" works some of
the time. In these cases I see this from the logs:
>>>
>>>
>>> 2014-12-30T02:19:12.8
>>>  00016+00:00 app[worker.1]: ** [02:19:12 2014-12-30] 2: Checki
>>>  ng worker
>>> 2014-12-30T02:19:12.805709+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Running before_fork hooks with [(Job{worker} | Worker | 
[])]
>>> 2014-12-30T02:19:12.967550+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 46: Running after_fork hooks with [(Job{worker} | Worker | 
[])]
>>> 2014-12-30T02:19:12.802076+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Found job on worker
>>> 2014-12-30T02:19:12.802144+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: got: (Job{worker} | Worker | [])
>>> 2014-12-30T02:19:12.805619+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: resque-1.25.2: Processing worker since 1419905952 [Worker]
>>> 2014-12-30T02:19:12.963728+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: resque-1.25.2: Forked 46 at 1419905952
>>> 2014-12-30T02:19:13.045593+00:00 app[worker.1]: DFC-INFO  
Worker.perform: invoked at 2014-12-30 02:19:13 +0000
>>> 2014-12-30T02:19:13.071967+00:00 app[worker.1]: ** [02:19:13 
2014-12-30] 46: done: (Job{worker} | Worker | [])
>>> 2014-12-30T02:19:13.155321+00:00 app[worker.1]: ** [02:19:13 
2014-12-30] 2: Sleeping for 5.0 seconds
>>>
>>>
>>> --
>>>
>>> Mike
>>>
>>>
>>>
>>>
>>> On Mon, Dec 29, 2014 at 12:06 PM, Dave Copeland <dave@stitchfix.com>
>>> wrote:
>>>>
>>>> Is there any way you can post your entire setup somewhere?  It's really
>>>> hard to tell with just snippets of errors an dcode.
>>>>
>>>> Dave
>>>>
>>>> O n Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com>
>>>> wrote:
>>>>
>>>>> Hi, Ive made my environment more lean, I see this error each time I
>>>>> try to run a job in the rails console:
>>>>>
>>>>> Arguments
>>>>>
>>>>> Exception NameError Error uninitialized constant WorkerEnqueued
>>>>> <https://development.developer.salesforce.com/resque/failed#>
>>>>> So, Ive reported this before, but I also see many jobs complete.
>>>>> What should I look for in my rails worker code that would prevent the
>>>>> rails environment from loading up correctly in the worker? I know I can
>>>>> "see" that class in my rails console but evidently the worker cannot. Any
>>>>> ideas?
>>>>> Mike
>>>>>
>>>>> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com>
>>>>> wrote:
>>>>>>
>>>>>> FWIW , jobs do n't get assigned to workers, per se, rather the
>>>>>> workers get forked when there are jobs available.  Resque's designed to
>>>>>> prevent a forked worker for eating up jobs like this because each forked
>>>>>> worker will exit when the job is done.
>>>>>>
>>>>>> Resque is pretty aggressive about putting jobs in the failed queue if
>>>>>> things go wrong, catch Exception to be sure.
>>>>>>
>>>>>> Of course, the forked workers can go stale, but the master/parent
>>>>>> process—I believe—will not fork a new worker in this case.
>>>>>>
>>>>>> As a last-ditch effort—and I realize this really sucks—you can fork
>>>>>> resque and add e ven more logging yours elf and deploy with that.  I've had
>>>>>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>>>>>
>>>>>>
>>>>>> See:
>>>>>>
>>>>>>    -
>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>>>>>    -
>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>>>>>    -
>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>>>>>    -
>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>>>>>
>>>>>> Dave
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, please see inli ne. ..
>>>>>>>
>>>>>>> But please note: this code does work, and has been working in
>>>>>>> production but only very few jobs actually print any log output. 
What I see
>>>>>>> is this:
>>>>>>>
>>>>>>> -if I go into the console and run the job manually (without using
>>>>>>> the scheduler) by calling resque.enqueue... and do this 20 times in a row,
>>>>>>> then Ill see the job run once or twice. This leads me to believe that the
>>>>>>> job is mostly queued on one of the extra workers (those that should not be
>>>>>>> there) and the queuing algorithm will choose one of the good workers when
>>>>>>> the other workers are "in use" and thats why I have to enqueue 20 times to
>>>>>>> see any output at all. Im going to test this idea by bringing it down to 1
>>>>>>> worker.
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Dec 20 , 2014 at 7:23 AM, Dave Copeland < dave
>>>>>>> @stitchfix.com> wrote:
>>>>>>>>
>>>>>>>> Is that code snippet correct?  It doesn't look it would work, as
>>>>>>>> it's missing an end.
>>>>>>>>
>>>>>>>> The error is my cut-paste. My actual code is correct.
>>>>>>>
>>>>>>>
>>>>>>>> Your two log snippets seem inconsistent to me.  IN the first case,
>>>>>>>> no jobs are processed, but somehow a bunch of jobs end up in the failed
>>>>>>>> queue because the resque worker cannot find your WorkerProcess 
class.  That
>>>>>>>> would be because Rails isn't loaded or your class isn't in a place where
>>>>>>>> Rails will autoload.
>>>>>>>>
>>>>>>>> My theory is that the defunct workers (the ones that shouldnt be
>>>>>>> there) are getting these jobs assigned to them and they dont 
actually exist
>>>>>>> (as actual rails processes) and so we get the (perhaps misleading) error.
>>>>>>>
>>>>>>>
>>>>>>>> You've also used a module instead of a class.  Not sure if that
>>>>>>>> matters, but I've always used a class.
>>>>>>>>
>>>>>>>> I have this type of code running 10s of workers in production and
>>>>>>> many (but not all) of these do get run correctly. They are all 
modelled off
>>>>>>> the same code which uses a module. FYI.
>>>>>>>
>>>>>>>
>>>>>>>> Finally, I'm not sure if you are expecting the jobs to be queued or
>>>>>>>> the scheduler to queue them.  Is the scheduler worker running?  
What's the
>>>>>>>> procfile like?
>>>>>>>>
>>>>>>>> The scheduler is running. I ultimately want the scheduler to
>>>>>>> enqueue the jobs. You can see the scheduler's log output saying that it is
>>>>>>> enqueueing =>
>>>>>>>
>>>>>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [I NFO]
>>>>>>> 2014-12-19 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>>>>>> what's going on with small snippets of your code and configuration.
>>>>>>>>
>>>>>>>
>>>>>>> Im glad for any help you can give.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Since you are testing a lot of moving parts at once, it might be
>>>>>>>> simplest to take things slowly, step by step, starting with ju st
manually
>>>>>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>>>>>> them, then add in the scheduler, etc.
>>>>>>>>
>>>>>>>> Im going to set it to 1 worker and manually enqueue.
>>>>>>>
>>>>>>> Thanks, Mike
>>>>>>>
>>>>>>> PS I removed the old content as the mailer daemon was complaining
>>>>>>> about the message being too long.
>>>>>>>
>>>>>>>> --
>>>>>>>> *Dave Copeland, Director of Engineering*
>>>>>>>> Washington, DC (Eastern Time Zone)
>>>>>>>>
>>>>>>> dav e@stitchfix.com
>>>>>>>
>>>>>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>



-- 
*Dave Copeland, Director of Engineering*
Washington, DC (Eastern Time Zone)
dave@stitchfix.com

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-30 @ 21:50
Great idea to split the resque and other redis data into different servers
(Im doing something similar with memcache for different layers of caching -
session versus rails action cache versus rack page caching).

Thanks again,

Mike

On Tue, Dec 30, 2014 at 1:41 PM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Yeah, Resque uses redis-namespace internally to put all of its data inside
> the "resque" namespace.  I"m guessing if you put a namespaced redis inside
> a redis-namespace, it doesn't work right?  Would be nice if it didn't
> silently fail though :(
>
> I will also add that it's probably not a good idea to use the same Redis
> for resque and as a data store for other things.  You run the risk of
> taking your application down if you fill up Redis.  For example, if you get
> a sudden influx of jobs, you will no longer be able to store data. Or, the
> converse, your data suddenly spikes in size and you can no longer queue
> jobs or put them in the failed queue.  If using separate Redises, the
> damage done is reduced in scope.
>
> I know cost is always a concern, but it might be simpler if you have a
> redis dedicated for Resque.  That's what we do, even so far as a dedicated
> redis per resque per app.  Early on, we had a situation where a spike in
> email jobs in our public website took down our warehouse management
> application because Redis filled up.  Not good :)
>
> Dave
>
>
> On Tue, Dec 30, 2014 at 4:22 PM, Mike Papper <mpapper@salesforce.com>
> wrote:
>
>> Thanks for that input, with that I was able to track down the issue to
>> the use of the redis-namespace gem -> this is the call for this code:
>>
>> REDIS = Redis::Namespace.new(namespace, :redis => r)
>> Resque.redis = REDIS
>>
>>
>> It looks like Resque doesnt work when using this namespace gem (as the
>> gem is supposed to pass-through get/set redis calls but it must not be
>> working as intended with Resque). I ended up giving Resque.redis a handle
>> to the un-namespaced REDIS object but still using the namespaced version in
>> my application's code.
>>
>> I set the namespace in redis to keep our server data separate from other
>> data on the redis server.
>>
>> I really appreciate your help - its especially useful just knowing
>> someone who has more knowledge tan me is there to help - which motivates me
>> to find the issue with my application.
>>
>> --
>> Mike
>>
>> On Tue, Dec 30, 2014 at 7:13 AM, Dave Copeland <dave@stitchfix.com>
>> wrote:
>>>
>>> Three thing seems strange:
>>>
>>>    - the check for the development environment
>>>    - requiring the worker in the initializer
>>>    - Using a custom namespace
>>>
>>> You should need to do any of those things.
>>> < font face="verdana, sans-serif">
>>> < /div>
>>> But if you are seeing things work some of the time, is there any chance
>>> some other processes are connected to your Redis instance?
>>>
>>> Dave
>>>
>>> On Mon, Dec 29, 2014 at 9:20 PM, Mike Papper &l t;mpapper@salesforce.com
>>> > wrote:
>>>
>>>> I updated the gist and added one line to the "Procfile" gist item:
>>>>
>>>> scheduler: env VVERBOSE=1 TERM_CHILD=1 RESQUE_TERM_TIMEOUT=9  bundle 
exec rake environment resque:scheduler
>>>>
>>>>
>>>> I have since commented out that one line from my Procfile and it 
fires off the worker processes more often. What I do is to clear the 
workers using "Resque.workers.each {|w| w.unregister_worker}" and then 
restart the heroku worker process and wait for Resque.workers to show 
exactly one item. Then enqueueing owrkers using "Resque.enqueue(Worker)" 
works some of the time. In these cases I see this from the logs:
>>>>
>>>>
>>>> 2014-12-30T02:19:12.8
>>>>  00016+00:00 app[worker.1]: ** [02:19:12 2014-12-30] 2: Checki
>>>>  ng worker
>>>> 2014-12-30T02:19:12.805709+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Running before_fork hooks with [(Job{worker} | Worker | 
[])]
>>>> 2014-12-30T02:19:12.967550+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 46: Running after_fork hooks with [(Job{worker} | Worker | 
[])]
>>>> 2014-12-30T02:19:12.802076+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: Found job on worker
>>>> 2014-12-30T02:19:12.802144+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: got: (Job{worker} | Worker | [])
>>>> 2014-12-30T02:19:12.805619+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: resque-1.25.2: Processing worker since 1419905952 [Worker]
>>>> 2014-12-30T02:19:12.963728+00:00 app[worker.1]: ** [02:19:12 
2014-12-30] 2: resque-1.25.2: Forked 46 at 1419905952
>>>> 2014-12-30T02:19:13.045593+00:00 app[worker.1]: DFC-INFO  
Worker.perform: invoked at 2014-12-30 02:19:13 +0000
>>>> 2014-12-30T02:19:13.071967+00:00 app[worker.1]: ** [02:19:13 
2014-12-30] 46: done: (Job{worker} | Worker | [])
>>>> 2014-12-30T02:19:13.155321+00:00 app[worker.1]: ** [02:19:13 
2014-12-30] 2: Sleeping for 5.0 seconds
>>>>
>>>>
>>>> --
>>>>
>>>> Mike
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Dec 29, 2014 at 12:06 PM, Dave Copeland <dave@stitchfix.com>
>>>> wrote:
>>>>>
>>>>> Is there any way you can post your entire setup somewhere?  It's
>>>>> really hard to tell with just snippets of errors an dcode.
>>>>>
>>>>> Dave
>>>>>
>>>>> O n Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, Ive made my environment more lean, I see this error each time I
>>>>>> try to run a job in the rails console:
>>>>>>
>>>>>> Arguments
>>>>>>
>>>>>> Exception NameError Error uninitialized constant WorkerEnqueued
>>>>>> <https://development.developer.salesforce.com/resque/failed#>
>>>>>> So, Ive reported this before, but I also see many jobs complete.
>>>>>> What should I look for in my rails worker code that would prevent the
>>>>>> rails environment from loading up correctly in the worker? I know I can
>>>>>> "see" that class in my rails console but evidently the worker cannot. Any
>>>>>> ideas?
>>>>>> Mike
>>>>>>
>>>>>> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> FWIW , jobs do n't get assigned to workers, per se, rather the
>>>>>>> workers get forked when there are jobs available.  Resque's designed to
>>>>>>> prevent a forked worker for eating up jobs like this because each forked
>>>>>>> worker will exit when the job is done.
>>>>>>>
>>>>>>> Resque is pretty aggressive about putting jobs in the failed queue
>>>>>>> if things go wrong, catch Exception to be sure.
>>>>>>>
>>>>>>> Of course, the forked workers can go stale, but the master/parent
>>>>>>> process—I believe—will not fork a new worker in this case.
>>>>>>>
>>>>>>> As a last-ditch effort—and I realize this really sucks—you can fork
>>>>>>> resque and add e ven more logging yours elf and deploy with that.
I've had
>>>>>>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>>>>>>
>>>>>>>
>>>>>>> See:
>>>>>>>
>>>>>>>    -
>>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>>>>>>    -
>>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>>>>>>    -
>>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>>>>>>    -
>>>>>>>    
https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi, please see inli ne. ..
>>>>>>>>
>>>>>>>> But please note: this code does work, and has been working in
>>>>>>>> production but only very few jobs actually print any log output. 
What I see
>>>>>>>> is this:
>>>>>>>>
>>>>>>>> -if I go into the console and run the job manually (without using
>>>>>>>> the scheduler) by calling resque.enqueue... and do this 20 times 
in a row,
>>>>>>>> then Ill see the job run once or twice. This leads me to believe that the
>>>>>>>> job is mostly queued on one of the extra workers (those that 
should not be
>>>>>>>> there) and the queuing algorithm will choose one of the good workers when
>>>>>>>> the other workers are "in use" and thats why I have to enqueue 20
times to
>>>>>>>> see any output at all. Im going to test this idea by bringing it 
down to 1
>>>>>>>> worker.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Dec 20 , 2014 at 7:23 AM, Dave Copeland < dave
>>>>>>>> @stitchfix.com> wrote:
>>>>>>>>>
>>>>>>>>> Is that code snippet correct?  It doesn't look it would work, as
>>>>>>>>> it's missing an end.
>>>>>>>>>
>>>>>>>>> The error is my cut-paste. My actual code is correct.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Your two log snippets seem inconsistent to me.  IN the first case,
>>>>>>>>> no jobs are processed, but somehow a bunch of jobs end up in the failed
>>>>>>>>> queue because the resque worker cannot find your WorkerProcess 
class.  That
>>>>>>>>> would be because Rails isn't loaded or your class isn't in a place where
>>>>>>>>> Rails will autoload.
>>>>>>>>>
>>>>>>>>> My theory is that the defunct workers (the ones that shouldnt be
>>>>>>>> there) are getting these jobs assigned to them and they dont 
actually exist
>>>>>>>> (as actual rails processes) and so we get the (perhaps misleading) error.
>>>>>>>>
>>>>>>>>
>>>>>>>>> You've also used a module instead of a class.  Not sure if that
>>>>>>>>> matters, but I've always used a class.
>>>>>>>>>
>>>>>>>>> I have this type of code running 10s of workers in production and
>>>>>>>> many (but not all) of these do get run correctly. They are all 
modelled off
>>>>>>>> the same code which uses a module. FYI.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Finally, I'm not sure if you are expecting the jobs to be queued
>>>>>>>>> or the scheduler to queue them.  Is the scheduler worker 
running?  What's
>>>>>>>>> the procfile like?
>>>>>>>>>
>>>>>>>>> The scheduler is running. I ultimately want the scheduler to
>>>>>>>> enqueue the jobs. You can see the scheduler's log output saying 
that it is
>>>>>>>> enqueueing =>
>>>>>>>>
>>>>>>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [I NFO]
>>>>>>>> 2014-12-19 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>>>>>>> what's going on with small snippets of your code and configuration.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Im glad for any help you can give.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Since you are testing a lot of moving parts at once, it might be
>>>>>>>>> simplest to take things slowly, step by step, starting with ju 
st manually
>>>>>>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>>>>>>> them, then add in the scheduler, etc.
>>>>>>>>>
>>>>>>>>> Im going to set it to 1 worker and manually enqueue.
>>>>>>>>
>>>>>>>> Thanks, Mike
>>>>>>>>
>>>>>>>> PS I removed the old content as the mailer daemon was complaining
>>>>>>>> about the message being too long.
>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Dave Copeland, Director of Engineering*
>>>>>>>>> Washington, DC (Eastern Ti me Zone)
>>>>>>>>>
>>>>>>>> dav e@stitchfix.com
>>>>>>>>
>>>>>>>
>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>>
>>
>> --
>> *Dave Copeland, Director of Engineering*
>> Washington, DC (Eastern Time Zone)
>> dave@stitchfix.com
>>
>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>

Re: [resque] resque-scheduler cannot find resque workers (some of the time)

From:
Mike Papper
Date:
2014-12-30 @ 01:00
Hi, I pared down my code and put all the worker-related moving parts into a
gist here:

https://gist.github.com/mpapper/6f9c37cb10a8dc30d35f

Notes: I am testing by running a rails console that is hooked into the same
redis server. I make this call from the console:

Resque.enqueue(Worker)

If I call it once I see a corresponding error in the resque web page
(/resque/failed).

If I make that call in my console 10 times in a row very quickly I WILL see
a log message go by indicating the worker ran and I will also see some
failures in /resque/failed. All failures are like this:

Worker d0dee93e-f4d4-41da-83e1-28cd59fd7a3c:2

<https://development.developer.salesforce.com/resque/workers/d0dee93e-f4d4-41da-83e1-28cd59fd7a3c:2:*>
on *worker* at *just now
<https://development.developer.salesforce.com/resque/failed#>*
Retry <https://development.developer.salesforce.com/resque/failed/requeue/6>
or Remove
<https://development.developer.salesforce.com/resque/failed/remove/6>
 Class  Worker
<https://development.developer.salesforce.com/resque/failed/?class=Worker>
Arguments

Exception NameError Error uninitialized constant Worker
<https://development.developer.salesforce.com/resque/failed#>

On Mon, Dec 29, 2014 at 12:06 PM, Dave Copeland <dave@stitchfix.com> wrote:
>
> Is there any way you can post your entire setup somewhere?  It's really
> hard to tell with just snippets of errors an dcode.
>
> Dave
>
> On Mon, Dec 29, 2014 at 2:57 PM, Mike Papper <mpapper@salesforce.com>
> wrote:
>
>> Hi, Ive made my environment more lean, I see this error each time I try
>> to run a job in the rails console:
>>
>> Arguments
>>
>> Exception NameError Error uninitialized constant WorkerEnqueued
>> <https://development.developer.salesforce.com/resque/failed#>
>> So, Ive reported this before, but I also see many jobs complete.
>> What should I look for in my rails worker code that would prevent the
>> rails environment from loading up correctly in the worker? I know I can
>> "see" that class in my rails console but evidently the worker cannot. Any
>> ideas?
>> Mike
>>
>> On Mon, Dec 22, 2014 at 5:54 AM, Dave Copeland <dave@stitchfix.com>
>> wrote:
>>>
>>> FWIW, jobs do n't get assigned to workers, per se, rather the workers
>>> get forked when there are jobs available.  Resque's designed to prevent a
>>> forked worker for eating up jobs like this because each forked worker will
>>> exit when the job is done.
>>>
>>> Resque is pretty aggressive about putting jobs in the failed queue if
>>> things go wrong, catch Exception to be sure.
>>>
>>> Of course, the forked workers can go stale, but the master/parent
>>> process—I believe—will not fork a new worker in this case.
>>>
>>> As a last-ditch effort—and I realize this really sucks—you can fork
>>> resque and add e ven more logging yours elf and deploy with that.  I've had
>>> to do that before to diagnose issues.  Wasn't fun, but it worked.
>>>
>>>
>>> See:
>>>
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L261
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L174
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L178
>>>    -
>>>    https://github.com/resque/resque/blob/1-x-stable/lib/resque/worker.rb#L190
>>>
>>> Dave
>>>
>>>
>>> On Mon, Dec 22, 2014 at 3:03 AM, Mike Papper <mpapper@salesforce.com>
>>> wrote:
>>>
>>>> Hi, please see inline. ..
>>>>
>>>> But please note: this code does work, and has been working in
>>>> production but only very few jobs actually print any log output. What I see
>>>> is this:
>>>>
>>>> -if I go into the console and run the job manually (without using the
>>>> scheduler) by calling resque.enqueue... and do this 20 times in a row, then
>>>> Ill see the job run once or twice. This leads me to believe that the job is
>>>> mostly queued on one of the extra workers (those that should not be there)
>>>> and the queuing algorithm will choose one of the good workers when the
>>>> other workers are "in use" and thats why I have to enqueue 20 times to see
>>>> any output at all. Im going to test this idea by bringing it down to 1
>>>> worker.
>>>>
>>>>
>>>> On Sat, Dec 20, 2014 at 7:23 AM, Dave Copeland < dave @stitchfix.com>
>>>>  wrote:
>>>>>
>>>>> Is that code snippet correct?  It doesn't look it would work, as it's
>>>>> missing an end.
>>>>>
>>>>> The error is my cut-paste. My actual code is correct.
>>>>
>>>>
>>>>> Your two log snippets seem inconsistent to me.  IN the first case, no
>>>>> jobs are processed, but somehow a bunch of jobs end up in the failed queue
>>>>> because the resque worker cannot find your WorkerProcess class.  That would
>>>>> be because Rails isn't loaded or your class isn't in a place where Rails
>>>>> will autoload.
>>>>>
>>>>> My theory is that the defunct workers (the ones that shouldnt be
>>>> there) are getting these jobs assigned to them and they dont actually exist
>>>> (as actual rails processes) and so we get the (perhaps misleading) error.
>>>>
>>>>
>>>>> You've also used a module instead of a class.  Not sure if that
>>>>> matters, but I've always used a class.
>>>>>
>>>>> I have this type of code running 10s of workers in production and many
>>>> (but not all) of these do get run correctly. They are all modelled off the
>>>> same code which uses a module. FYI.
>>>>
>>>>
>>>>> Finally, I'm not sure if you are expecting the jobs to be queued or
>>>>> the scheduler to queue them.  Is the scheduler worker running?  What's the
>>>>> procfile like?
>>>>>
>>>>> The scheduler is running. I ultimately want the scheduler to enqueue
>>>> the jobs. You can see the scheduler's log output saying that it is
>>>> enqueueing =>
>>>>
>>>> 2014-12 -19T22:56:00.048070+00:00 app[scheduler.1]: [INFO] 2014-12-19
>>>> 22:56:00 +0000: queueing WorkerProcess (Worker Process)
>>>>
>>>>
>>>>
>>>>> Sorry I can't be more helpful, it's just hard to get a picture of
>>>>> what's going on with small snippets of your code and configuration.
>>>>>
>>>>
>>>> Im glad for any help you can give.
>>>>
>>>>
>>>>>
>>>>> Since you are testing a lot of moving parts at once, it might be
>>>>> simplest to take things slowly, step by step, starting with ju st manually
>>>>> queueing jobs and seeing them in resque web, then have a worker process
>>>>> them, then add in the scheduler, etc.
>>>>>
>>>>> Im going to set it to 1 worker and manually enqueue.
>>>>
>>>> Thanks, Mike
>>>>
>>>> PS I removed the old content as the mailer daemon was complaining about
>>>> the message being too long.
>>>>
>>>>> --
>>>>> *Dave Copeland, Director of Engineering*
>>>>> Washington, DC (Eastern Time Zone)
>>>>> dave@stitchfix.com
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Dave Copeland, Director of Engineering*
>>> Washington, DC (Eastern Time Zone)
>>> dave@stitchfix.com
>>>
>>
>
>
> --
> *Dave Copeland, Director of Engineering*
> Washington, DC (Eastern Time Zone)
> dave@stitchfix.com
>