HMock: First Rate Mocks in Haskell

Published in

ITNEXT

10 min readJun 20, 2021

At the end of Zurihac this year, I released a preview version of HMock, a new library for testing with mocks in Haskell. Let’s talk about what this is, why I wrote it, and how you can use it.

A Toy Chatbot

Let’s suppose I want to write a chatbot in Haskell. I might start with a few types, like so…

newtype User = User String deriving (Eq, Show)
data PermLevel = Guest | NormalUser | Admin deriving (Eq, Show)
newtype Room = Room String deriving (Eq, Show)
data BannedException = BannedException deriving (Show)instance Exception BannedException

Now I need to log in to the chat server, connect to the right room, read and send messages… what a mess! I could dig in and just start pounding out implementations in IO with sockets and TCP, but it’s nice to build an abstraction layer. I‘ll do that with some MTL-style type classes. Here are a few things my bot needs to do.

class Monad m => MonadAuth m where
  login :: String -> String -> m ()
  logout :: m ()
  hasPermission :: PermLevel -> m Boolclass MonadAuth m => MonadChat m where
  joinRoom :: String -> m Room
  leaveRoom :: Room -> m ()
  sendChat :: Room -> String -> m ()
  pollChat :: Room -> m (User, String)
  ban :: Room -> User -> m ()class Monad m => MonadBugReport m where
  reportBug :: String -> m ()

That’s much nicer! With the right primitives in mind, I’ll write the bot. I won’t explain the code, but here it is. Our bot listens to chat, bans any naughty users, and accepts and passes along bug reports.

type MonadChatBot m = (MonadChat m, MonadBugReport m)chatbot :: (MonadMask m, MonadChatBot m) => String -> m ()
chatbot roomName = do
  login "HMockBot" "secretish"
  handleRoom roomName `finally` logouthandleRoom :: (MonadMask m, MonadChatBot m) => String -> m ()
handleRoom roomName = do
  room <- joinRoom roomName
  listenAndReply room `finally` leaveRoom roomlistenAndReply :: MonadChatBot m => Room -> m ()
listenAndReply room = do
  (user, msg) <- pollChat room
  finished <- case words msg of
    ["!hello"] -> sendChat room "Nice to meet you."
    ["!leave"] -> return True
    ("!bug" : ws) -> reportBug (unwords ws) >> return False
    ws | any isFourLetterWord ws -> do
      banIfAdmin room user
      return False
    _ -> return False
  unless finished (listenAndReply room)isFourLetterWord :: [Char] -> Bool
isFourLetterWord = (== 4) . length . filter isLettersendBugReport :: MonadChatBot m => Room -> String -> m ()
sendBugReport room bug = do
  reportBug bug
  sendChat room "Thanks for the bug report!"banIfAdmin :: MonadChat m => Room -> User -> m ()
banIfAdmin room user = do
  isAdmin <- hasPermission Admin
  when isAdmin $ do
    ban room user
    sendChat room "Sorry for the disturbance!"

Great, all done! But wait… how do I know this wall of code actually works correctly? I can’t even try it out without an implementation for the three type classes.

Testing Effectful Code

Haskell leads the programming industry in some techniques for building highly reliable software. Haskellers routinely lean on the type system to verify properties of code, maintain stateless code and immutable data to eliminate many opportunities for bugs, exploit purely functional code to test quickly and easily, rely on QuickCheck for randomized property testing, and even use inspection testing to check properties of the compiler. Haskell programmers also move as much logic as possible out of effectful code and into purely functional code. (I didn’t really do that very well above…)

But effectful code is ultimately necessary, and that’s a side of the testing landscape where Haskell is a bit harder to use well. The chatbot above talks to lots of external systems: an authentication server, a chat server, a bug reporting system, and even other human beings! It’s hard to get these pieces (especially the humans) in place for automated testing, so we must find a way to test without them.

There are basically two options here:

Fakes are fake implementations of the API that try to act like a real system so you can test with them.
Mocks are dumb implementations that know nothing about the intended behavior of the system, but can be told how to behave by the test itself.

If you have access to a high quality fake implementation, then you should definitely test with it! Creating a quality fake, though, is a lot of work. Fakes are better suited to very stable interfaces. It can be a lot of work to keep a fake up-to-date with a rapidly changing API. These are brand new interfaces which we expect to tweak heavily over time. You may struggle to write tests verifying your behavior when things aren’t simple. What if the network goes down? What if the transaction fails to commit? It’s rare to cover these cases with fakes.

So the other answer on our plate is mocks. That’s where we’ll look next.

All About Mocks

A mock is an object that just does what you tell it to do. Rather than having any internal logic at all, it simply follows instructions, matching the calls from the code you’re testing to lines in a script, and responding as it’s told to respond.

Here’s how you can write a simple test with HMock.

makeMockable ''MonadAuth
makeMockable ''MonadChat
makeMockable ''MonadBugReportrunMockT $ do
  expect $ Login "HMock" "secretish"
  expect $ JoinRoom "#haskell" |-> Room "#haskell"
  expect $ PollChat (Room "#haskell")
    |-> (User "Alice", "!hello")
    |-> (User "Bob", "!leave")
  expect $ SendChat (Room "#haskell") "Nice to meet you."
  expect $ LeaveRoom (Room "#haskell")
  expect $ Logout  chatbot "#haskell"

We can dissect the parts.

The first three lines are Template Haskell, generating a bit of boilerplate that makes these interfaces mockable.
runMockT runs the MockT monad transformer, which manages all the expected behaviors in the test.
The next block sets up expectations. Here we expect that several things will happen: the bot will log in, join a room, poll for chat messages twice, send a message, leave the room, and log out. When a method has an interesting return value, we use |-> to say what it should be.
Finally, we call the block of real code that we’re testing. Instead of an actual implementation, the implementation will use the mocks for the interfaces it uses. The right instances were generated by makeMockable.
As the code under test performs actions, MockT will look them up in the set expectations, and respond as asked. When runMockT finishes, it will check that all of the expectations have been met.

What can go wrong with mocks

Haskell has a few answers to the question of testing with mocks. Alexis King write a library called monad-mock, which I started by studying when I began this work. I also learned this weekend that Akshay Mankar wrote polysemy-mocks to work with that effect system. But I built something different.

The reason is that I think a weak mocking framework is a dangerous tool. Testing with mocks can go wrong, in several ways.

It’s easy to assert too much. Suppose your code fetches two files over a network. Do you care which order they were fetched? In fact, do you even care how many times they were fetched? Maybe so, but do you really want every single test in your test suite to fail just because you fetched the same file twice? Using mocks well requires a language that lets you say enough about the files (like: “if this file is fetched, this is what the response will be”), but not too much leading to brittle tests that break when innocent changes are made to the code.

It’s easy to mock too much. You need to mock the things you can’t have available to the test: external systems, users, and so on. It’s pretty common to set up a fancy test with mocks, only to realize you’ve told the mock to do precisely what you’re supposed to be testing for. Using mocks well sometimes requires precise control over what does get stubbed out, and what doesn’t.

It’s easy to break composability. Commonly used mock systems in mainstream languages have a reputation for being error-prone. You might add an expectation hoping for it to trigger at one time, only to have it trigger unexpectedly at the wrong time, injecting the wrong behavior into your test and havoc ensues! To test well, you want a nicely composable language for describing behavior. Mock frameworks often rely on global state, overall counts, etc. that leak details from one part of your test to another.

HMock tries to fix many of these problems.

Like best of class frameworks for mainstream programming languages, HMock gives you a powerful and flexible language for setting up expectations. You’re in charge of: whether actions need to be performed in order or not, whether there are restrictions on how many times they happen, whether the parameters are checked exactly, ignored entirely, or checked against predicates.

HMock also gives you control over how much of the API is mocked, and how smart those mocks are. It often pays to use HMock even if it’s just to set up a fake, both because delegating methods with HMock doesn’t require defining new monad types or transformers the way a new MonadChat implementation would, and because it’s easy to inject failures and test other hard cases not handled by the fakes.

Finally, and quite unlike most mocks in mainstream languages, HMock is based on a fundamentally composable language of primitives, as described in the paper An Expressive Semantics of Mocking by Svenningsson, et al. In fact, it implements an even more expressive set of primitives, by adding interleaved repetition to the set from the paper. Operations like choice and repeated sequences let you say more precisely when each operation should happen, so that expectations aren’t left around that might match accidentally at the wrong times. You can easily say things like: “it doesn’t matter which order you check your mirrors and fasten your seatbelt, but don’t start the car until you’ve done both,” or “either a 1040 form or a 1040EZ form must be filed, but not both”.

If you want to see more testing for the chatbot I introduced above, I got the code from a demo in the HMock test suite which has more tests and techniques!

A Case Study

My first use case for HMock was… the test suite for HMock itself! I really wqasn’t expecting this, since I expect to need mocks when testing effectful systems that interact with users or complex external services. But HMock does interact with one complex external service: GHC! It does this using the Template Haskell API.

During the testing of HMock, I realized I wanted to test some of the Template Haskell code. But Template Haskell runs at build time, which has some disadvantages:

You can only test the successful cases. A test failing to build isn’t an acceptable state to be in, which means it’s not possible to test failing usage at build time.
I completely love hpc, GHC’s system for reporting on test coverage. However, hpc deosn’t see code that’s been run at build time, so the resulting coverage reports saw the TH code as completely untested.
It was sometimes useful to test internal methods rather than just the top-level declarations. These can’t be tested easily at build time.

So I wanted a way to run Template Haskell in IO. Well, supposedly there is such a way: IO is an instance of the Quasi type class used by Template Haskell. But it’s a rather poor instance, throwing errors the moment you try to do anything interesting, like look up a type or instance. I needed more. It turns out I was able to use HMock to build a mock for the Quasi type class, and use it to test HMock’s own Template Haskell code. Instead of hand-writing the responses to each TH query, I was able to run the queries at build time, then lift their responses to runtime with TH’s Lift class, and use HMock to stub out the Quasi monad to return those saved responses.

Here’s the result:

I found three bugs outright by just running simple tests I’d already started to write. These were mostly bugs in error cases.
I was able to look at the hpc output, which highlighted parts of the code that were not being tested. By considering when that code was necessary, I generated more test cases, leading to finding four more bugs the same day.
I was able to more quickly develop Template Haskell code later, because I could write test cases to validate how TH behaved, run them, and assert or inspect the results in a way that’s normally difficult to do with compile-time code.

Overall, this is an overwhelming success case for HMock.

Status

HMock is now released on Hackage as a pre-release. The main reason I’m calling it a pre-release is that it’s currently only implemented for MTL-style type classes. I’d like to expand that to most common effect systems, as well as servant, haxl, and more. As part of that process, I anticipate a number of major breaking changes. However, as long as you upper version bounds appropriately, there’s no reason you can’t use HMock today.

In addition to the things mentioned above, HMock implements the following:

An extensive language of predicates that match parameters to methods, letting you easily isolate interesting calls to which you can attach responses. These predicates work with a huge variety of types, and include powerful techniques like regular expressions and order-insensitive container matching.
Configurable defaults. By default a method without an explicit response will return its default value according to the Default class. But you can set up a different default, for some or all invocations.
Per-class setup. You can configure default behaviors and bundle them with the types by writing an instance of a convenient type class.
Broad support for most kinds of classes (including multi-parameter classes with functional dependencies), and most methods (including polymorphic methods, unless the return value is a type variable bound by the method).