librelist archives

« back to archive

Failures 2.0

Failures 2.0

From:
Tony Arcieri
Date:
2013-04-04 @ 00:24
A lot of my work on Resque has been on the failure backend, specifically
because at LivingSocial we deal with an awful lot (sometimes millions) of
failures. We have an in-house version of resque/resque-web which supports a
number of features like drilling down on failures by the class of the input
job and retrying them or deleting them by class as well. Unfortunately the
original implementation wasn't great and performed rather poorly, which is
why I wrote the Resque::Failure::RedisMultiQueue backend in the first place.

In doing so I added more features to an API that really needed some
refactoring/rewriting, which is never good. That said I've had some time to
reflect on what the problems with the existing API are, and wrote up all of
my thoughts here:

https://github.com/resque/resque/wiki/Failures-2.0

Let me know what you think. Also please CC me (bascule@gmail.com) in your
replies as for whatever reason I've been having trouble getting replies via
the mailing list itself o_O

-- 
Tony Arcieri

Re: [resque] Failures 2.0

From:
Jonathan Hyman
Date:
2013-04-04 @ 20:09
I think a lot of that sounds great. I'm a huge proponent of using objects
over hashes when crossing a class boundary (just to give a stronger
interface guarantee so you don't have to inspect the magic hash that
might/might not be the hash you're looking for).

Though, why do you need ActiveRecord as a failure backend?

> If one person views the failure page and deletes a failure, and someone
else tries to delete a different failure, the second person will end up
deleting the wrong failure, because its position in the failure queue has
changed. THIS IS BAD!

Isn't the cause here that a DELETE hits /failed/remove/x where x is the
position in the list? If you gave every failure a unique ID, then you have
your primary key and it should solve that problem. I'd be hesitant to
introduce ActiveRecord because it introduces more dependencies and other
complexities (Resque already requires one one database, now you need two?
Is the data access/usage pattern so different with failures that you need
SQL?) I can see how SQL could be more attractive because you could do
SELECT * FROM failures (WHERE queue=$1) LIMIT 10 OFFSET 10 instead of
handling multiple keys and/or lua scripting, but I think trading off that
expressiveness for lower setup and production complexity is worth it. Not
everyone uses SQL nowadays.


On Wed, Apr 3, 2013 at 8:24 PM, Tony Arcieri <tony.arcieri@gmail.com> wrote:

> A lot of my work on Resque has been on the failure backend, specifically
> because at LivingSocial we deal with an awful lot (sometimes millions) of
> failures. We have an in-house version of resque/resque-web which supports a
> number of features like drilling down on failures by the class of the input
> job and retrying them or deleting them by class as well. Unfortunately the
> original implementation wasn't great and performed rather poorly, which is
> why I wrote the Resque::Failure::RedisMultiQueue backend in the first place.
>
> In doing so I added more features to an API that really needed some
> refactoring/rewriting, which is never good. That said I've had some time to
> reflect on what the problems with the existing API are, and wrote up all of
> my thoughts here:
>
> https://github.com/resque/resque/wiki/Failures-2.0
>
> Let me know what you think. Also please CC me (bascule@gmail.com) in your
> replies as for whatever reason I've been having trouble getting replies via
> the mailing list itself o_O
>
> --
> Tony Arcieri
>