Recently I contributed a retry functionality to the cache-s3 Haskell project. After I had shared a draft PR with Alexey Kuleshevich, the project's maintainer, he suggested some improvements. In this post shall describe a simplified version of the resulting upstream pull-request and explain how it works.
Retrying in case of failure
Retrying is a simple error recovery algorithm. Communicating software components regularly experience network failures. If the network outage is intermittent, retransmitting the data shortly after a failing transfer may succeed. Thus, reducing the apparent error rate of the subsystem.
cache-s3 is used by continuous integration (CI) systems for storing a compilation cache database in an S3 bucket. Typically, a successful build is followed by the upload of the cache database, so that it can be reused by the next build. If the database upload fails the CI system fails the whole build task. This is annoying when you waited a long time for your build, but now you have no build and nor a saved cache database. You have to start everything again...
Instead, if the upload to the bucket is retried a couple of times the intermittent network problem is completely hidden from the user.
Give me an integer
In cache-s3 a complex HTTP request is retried, in this post I'm using a simpler example. The action we will retry is a question: we ask the user for an integer:
question :: IO (Maybe Int) question = do putStr "Give me an integer: " readMaybe <$> getLine
This function returns
Nothing if the user enters anything but an integer.
GHCI> question Give me an integer: 5 Just 5 GHCI> question Give me an integer: FOO Nothing
If this question is a part of a longer quiz we let the user guess again before
we fail the whole program. We would like to write a function
retry with the
following type signature:
retry :: Int -> IO (Maybe a) -> IO (Maybe a)
This function takes two arguments: the number of times to retry and the action
to run. It returns a more perseverant action with the retry logic "baked-in".
In other words the
retry alters the action's runtime behavior, but it doesn't
change its type.
retry will operate like this:
GHCI> retry 3 question Give me an integer: text Retrying 1/3 Give me an integer: 5.5 Retrying 2/3 Give me an integer: 5 Just 5
In this example, after the second additional attempt the integer's was finally received.
Here's an implementation of
retry :: Int -- ^ number of times to retry -> IO (Maybe a) -- ^ action to retry -> IO (Maybe a) retry n action = action >>= go 1 -- ① where go i (Just res) = return (Just res) -- ② go i Nothing -- ③ if i > n then return Nothing -- ④ else do putStrLn $ "Retrying " <> show i <> "/" <> show n -- ⑤ res' <- action -- ⑥ go (i + 1) res' -- ⑦
This function uses a recursive inner function, called
go by convention.
- The provided action is executed and its result is passed to
- The action succeeded, just return its result
- The action failed, try again
- If limit is reached we return
- Otherwise inform the user about the retry in progress
- Re-execute the action
- Pass the result to the next iteration of
In real code we'd wait a bit before step ⑥ , but now I'm skipping this. You can load this function into an interactive session and try how it works. You should see an output similar to that in the previous section.
retry is polymorphic in the action's return type
a. In the
function's body we don't manipulate this value at all, therefore
a can stand
for any type.
Let's develop an automatic test for this function and convince ourselves and our colleagues that this function really does what it's supposed to do. The bad news is that in its current shape this function is hard to test because it's doing too much IO operations.
Next, we are going to make
retry more polymorphic and more testable.
Make it more polymorphic
The problem with the first implementation is the appearance of
IO. We need
to get rid of that. Notice that in the function's body we don't do abitrary
IO operation but really just calling
putStrLn as a logging function. The
appearance of the bind (
>>=) operator tells us that we exploit that IO is a
Inspired by these two observations we replace
IO with a generic
constructor with two constraints:
retry :: (Monad m, HasLogFunc m) => Int -> m (Maybe a) -> m (Maybe a)
Monad typeclass is part of the Prelude ("built-in"). We define the
HasLogFunc class ourselves:
class HasLogFunc m where logInfo :: String -> m () instance HasLogFunc IO where logInfo = putStrLn
The first two lines defines the
HasLogFunc as an interface where
must be implemented. The last two lines provide
IO with an instance of this
interface: the implementation of
logInfo is just
With the following modifications we obtain to a more generic form of
retry :: (Monad m, HasLogFunc m) -- ① => Int -- ^ number of times to retry -> m (Maybe a) -- ^ action to retry -- ② -> m (Maybe a) -- same code as before logInfo $ "Retrying " <> show i <> "/" <> show n -- ③ -- same code as before
- Two constraints restrict what
mcan be: it must be a monad and must have a log function
- Instead of
IO-only, the function operates on generic actions of
- We replace
logInfowhich is available in
This version works exactly the same as the first attempt because
IO is a
monad and we took care of its
More polymorphic, more testable
This new version of
retry is more testable because in our test code we are
free to use any
m (given it satisfies the constraints we imposed) to express
our assertions. To demonstrate this let's test if the right sequence of error
messages are displayed to the user. For example, if we had an action which
alwaysFails :: FakeAction (Maybe String) alwaysFails = return Nothing
We'd expect retries until the specified number of attempts is exhausted.
We want our tests to be pure therefore, instead of using
IO, we implement a
FakeAction type using the Writer monad:
type FakeAction = Writer [String]
This is a monad which aggregates the log messages of type
logInfo means in this context:
instance HasLogFunc FakeAction where logInfo msg = tell [msg]
Now we can express the always failing scenario:
describe "retry" $ do it "gives up after the specified time" $ runWriter (retry 3 alwaysFails) `shouldBe` ( Nothing , [ "Retrying 1/3" , "Retrying 2/3" , "Retrying 3/3" ])
runWriter returns the result of the provided action, in this case
paired up with the log messages which were produced during the evaluation.
FakeAction is pure, it doesn't do any IO, but it helps us to test
This testing strategy can be further extended to cover other types of effects.
In the complete code samples on GitHub you'll see how I implemented
exponential back-off: in production
retry waits some seconds before it
re-executes the action, however the test code is pure, running fast without any
I showed that we can make a function more testable by making it a more generic. Polymorphism allows us to inject test points into our function. This is often achieved by mocks and fakes in traditional programming languages. In Haskell exploiting polymorphism is particularly elegant because we can code against powerful, abstract interfaces such as the monad.
The code is available on GitHub.