Transforming Data with a Functional Approach


Almost without fail, every time folks talk about the benefits of functional programming (FP) someone will mention it is “easier to reason about.” Speakers and bloggers make it sound obvious that immutable values, pure functions, and laziness lead to code that is easier to understand, maintain and test. But what does this actually mean when you are in front of a screen, ready to bash something out? What are some actual examples of “easier to reason about?” I recently worked on a project where I found a functional approach (using Clojure) really did bring about some of these benefits. Hopefully this post will provide some real-world examples how FP has some amazing advantages.

Speakers and bloggers make it sound obvious that immutable values, pure functions, and laziness lead to code that is easier to understand, maintain and test.

The project involved loading and processing about a hundred thousand jobs so they can be indexed by ElasticSearch. The first step is easy enough: just load the data from the filesystem. We’ll use Extensible Data Notation (EDN), a data format that is really easy to use from within Clojure.

(def jobs (edn/read-string (slurp “jobs.edn”)))

The `slurp` function takes a filename and returns all its contents as a string. `edn/read-string` parses the EDN in that string and returns a Clojure data structure. In this case, a list where each job is represented as a Clojure map that looks something like this

{ :id 123

  :location “New York, NY”

  :full_description “This job will involve …”

  :industry “Lumber”

  … }

Pretty straightforward and nothing really functional programingy just yet. However, trying to index these immediately throws a bunch of errors. ElasticSearch complains that some of these jobs have nulls for their locations. So it seems we have some bad data, begging the question, how many of our jobs are affected? We could manually loop through all those jobs and keep a count of the ones with nulls as locations. Yuck! Instead, let’s try an FP approach. Given a single job, we want to check if it has a `nil` for a location. Then, we filter all the jobs we have to just those `nil` ones and see how many we get.

(count (filter #(nil? (:location %)) jobs))

Note: The semicolon character simply starts a comment and is used here to display the result.

First, the call to `filter` takes two arguments, a function and a collection. The function should take elements of the collection, in this case individual jobs, and return whether or not the given job’s location (extracted using `get`) is equal to `nil` (equality checking is done with `=`).

Okay, so about a hundred bad jobs out of thousands. That’s not a whole lot, so we should be okay ignoring those. To do that, we want the opposite of the above and filter out jobs that *don’t* have nulls for locations.

(def no-nulls (filter #(not (nil? (:location %))) jobs))

Instead of `=` we use `not=` to check the opposite. ElasticSearch can now happily take our jobs and index them just fine.

We start doing some test searches against our ElasticSearch instance and quickly find out the results contain far more data than we need. Each job comes with a full job description that often contains a huge amount of text. But we don’t need all of that information. We want to create a new field for our jobs that contains a truncated version of this description. That way, the search results can be much smaller, reducing network traffic and time ElasticSearch spends serializing and marshaling all that data.

Let’s again try the FP approach. When we used `filter` earlier we were thinking of what had to be done with each individual job (check if a location is `nil`) and then we let `filter` and `count` do the grunt work. Here, what we need to do with each individual job is to add a new field. Since jobs are just hashmaps, we can simply add a new key-value pair. As for the truncated description, we can just take the first hundred characters of the full description. This is a bit more involved than the `filter` example, so let’s start with writing a function that handles an individual job.

(defn assoc-truncated-desc



      [full-desc (:full_description job)]

    (assoc job :truncated_desc (subs full-desc 0 100))))

This function takes a single `job`, gets the full description, and assigns it to `full-desc`, and then calls `assoc` to create the new key-value mapping. The key is the name of our new field, `:truncated_desc`, and the value is just a substring of the first 100 characters.

A subtle thing to note is the original `job` does not change. `assoc` returns a new hashmap with the new key value pairing. This means `assoc-truncated-desc` is a pure function that takes an existing job and returns a new one, never mutating any state. Let’s say we call it on one of our jobs and find out that 100 characters is not enough. The original job remains untouched, so we can modify the code (say, change 100 to 150) and call it again until it’s just right.

So we’ve written a function that can process a single job. Hooray! How do we do this with 100,000 jobs? Instead of filtering out some number of jobs as we did with the null locations, we want to apply this transformation to every single job. This is exactly what the `map` function is for. Let’s call it on `no-nulls` from before.

(map assoc-truncated-desc no-nulls)

And that’s it! We return a new list where every job has the new truncated description field, ready to be indexed and searched. The final code looks like this:

(defn prepare-jobs


  (map assoc-truncated-desc

       (filter #(not (nil? (:location %))) jobs)))

We started with a bunch of raw jobs from the filesystem and turned them into something that fits our needs. An imperative approach might have you start with writing a for loop and filter out how you want to modify the whole list. Functional programming takes a different approach by separating out the actual modification (remove a null value, adding a new field) from the way you wanted to *apply* a transformation (either by filtering or by mapping). We start with thinking at the level of an individual element and what we want to do to that element., allowing us to figure out whether we need to `filter`, `map`, `partition`, etc. This approach lets us focus on the “meat” of the problem.

Functional programming takes a different approach by separating out the actual modification from the way you wanted to *apply* a transformation

Immutability and purity are simply what makes this approach optimal. A job is never changed so we can keep mucking about with it until we have what we want. A transformation does not rely on any state so we can make sure it works in isolation before even thinking about collections and loops. Functional programming helps us focus on solving the problem at hand by isolating unrelated concerns so we can, so to speak, “reason” about it more easily.

We're Hiring Engineers at Ladders. Come join our team!

Stateful Components in Clojure: Part 2


In our previous installment, Dave was faced with the problem of stateful components in his Clojure webapp. He tried several approaches, but each of them seemed to have problems that got worse as the number of stateful things in the system grew and they started to depend on one another.

Dave’s Revelation

“I’m having a nightmare,” Dave says to no one in particular. The kittens somberly marching around him in concentric circles don’t seem to notice. The yarn from their knitted socks wraps him up like a maypole, and they chant SQL queries in transactions that never commit or roll back. A sudden flash makes him squint his eyes. He looks up and there’s metal sign swinging back and forth over his head, catching the light, and he sees the word “complected” etched onto it in what he’s fairly certain is Comic Sans.

A gust of wind whispers to him in a theatrical voice: “reloaded.”

He notices that the fingers on his right hand are grasping something that he instictively knows is a one-handed keyboard, and despite having never used one before he desperately tries to type (with-scissors [s (scissors)] (cut-yarn s)) but he can’t figure out how to type square braces.

The wind picks up and whispers to him more insistently: “reloaded.”

He panics because can’t remember what arguments the run-away function takes, and suddenly he realizes that his REPL might not even be working because kittens come with an outdated version of Leiningen!

The wind is so fierce and loud now that he can barely hear himself think over the sound of it whistling “reloooooooaded” all around him.

Kittens come with an outdated version of Leiningen!

The kittens are an infinite lazy sequence and more and more are filling up his vision and he realizes that he must be holding on to the head but he can’t reach his head because his arms are recursive with no base case, time and space, and the kittens are getting higher and higher until they’re almost to the top of his head and his stack overflows.

Dave jolts awake. Looking around, he can still hear “reloaded” echoing in the back of his mind. In the eerie light of four in the morning, it’s still tugging at him as he opens up his laptop and types into Google: “Clojure reloaded.” To his surprise, the first result is this very interesting blog post by Stuart Sierra.

He reads it feverishly once, and then slowly the second time. It seems to be talking about exactly the problem that he’s been facing, and more than it has a solution.

The answer is so simple! Don’t use any globals, period! Build your entire application as a value. Keep state locally. Manage the lifecycle for the whole thing with a single function call that returns the started or stopped system.

He keeps going. He discovers that Stuart Sierra also wrote a library called Component to build these systems, handle interdependencies, and orchestrate their lifecycle. Ideas are spinning in his head about how to fix his app – he barely pauses to remember the kittens dream and shudder – as he reads through the documentation.

Finally, he watches Stuart’s Clojure West talk. What finally pulls it all together for him is the code snippet for building web handlers, about 24 minutes in. It’s a moment where his mind is stretching out until it can pop into place to accommodate a big new idea, and he rewinds, re-watches, and pauses it over and over again.

Later that morning, he gets to work on restructuring his app….

How Components Work

The Component library has several important pieces, but the fundamental building block is the basic lifecycle protocol. It defines two methods:

(defprotocol Lifecycle
  (start [component])
  (stop [component]))

Implementations of the start method take an un-started component, do whatever side effects or state building or anything else required to start it, and then return a started component. The stop method is similar, except implementations accept a started component, do whatever is required to tear it down, and return a stopped component. There’s a focus on immutability here, too – the component itself is not modified in place, but rather transformed much like a Clojure map and a “new” version is returned.

So what is a component? The glib answer of “something that implements the Lifecycle protocol lol” is true on its face, as the Component library is a flexible tool and doesn’t force you to use it in any particular way. However, this is a case where an opinionated answer is more helpful.

Warning to the reader: from this point forward, opinion flows freely

Your components should include three things:

  1. A Clojure Record that implements the Lifecycle protocol.
  2. A constructor function that returns a configured, non-started component.
  3. Public API functions that take the component as the first argument.

Let’s first look at an example Record for Dave’s database connection instance.

(defrecord Database
    [config conn]

  (start [this]
    (if conn ;; already started
      (let [conn (make-pool config)]
        (assoc this :conn conn))))
  (stop [this]
    (if conn
          (.close conn)
          (catch Exception e
            (log/warn e "Error while stopping Database")))
        (assoc this :conn nil))

By comparison, here is what Dave’s previous implementation looked like:

(def ^:dynamic *db-config* "postgresql://localhost/devdb?user=dev&password=dev")

(defonce db-conn (delay (make-pool *db-config*)))

(defn db []

The main bit of logic – creating a connection pool with (make-pool config) – is basically unchanged from what Dave had before. What has changed is that we now have much more control over how this state is managed.

In the previous approach, Dave’s database connection was tied closely to both the global environment and some hard-coded logic about when and how it was created. In the componentized approach we can control all of the following at runtime and without any code change:

  • When the database handle is connected or disconnected
  • What the database configuration is
  • How many different database connections we want

Having programmatic control over these three factors, rather than hard-coding any or all of these issues in the implementation, is one of the key advantages of this pattern.

A few best practices here:

  • The Lifecycle method implementations should always return a component.
  • Idempotency. The start method should return the component
    unmodified if it’s already started, and stop should do the same
    thing if it’s not started.
  • If something in a component’s teardown can throw an exception, wrap
    it in a try ... catch. This will help later when combining
    components into systems.
  • Component Records should have one or more fields where their config
    is stored, and components with runtime state should keep that state
    in a field that’s nil until the component is started.

Now let’s look at the constructor function.

(defn database
  "Returns Database component from config, which is either a
   connection string or JDBC connection map."
  (assert (or (string? config) (map? config)))
  (->Database config nil))

Constructors should not do any state creation. Their job is to do whatever validation of the construction parameters necessary, and then simply create the record.

Here we do run into one drawback in Clojure. Records don’t include any support for docstrings, so the constructor is generally the best place to centralize all of the documentation about what is required to initialize the component.

Now let’s look at the public API.

(defn select
  [database query]
  (jdbc/query (:conn database) query))

(defn connection
  (:conn database))

Compare to the previous API the key difference is that these functions now get passed a database component and know how to use it directly. While these are not pure functions since they are doing side effects, they do stick to the “spirit” of functional programming more closely since they are functions of their arguments alone.

At this point you may be thinking, “API functions where the first argument determines a lot of the behavior… that sounds a lot like a protocol.” That’s an excellent point. For many common components, particularly ones that can be swapped out with different implementations like database connections, it can be valuable to define their API as a protocol. This can also enhance testability, since it gives us a flexible way to inject different mock components in our tests.

Here is the database component API written as a protocol.

(defprotocol DBQuery
  (select [db query])
  (connection [db]))

(defrecord Database
    [config conn]

  (start [this] ...)
  (stop [this] ...)

  (select [_ query]
    (jdbc/query conn query))
  (connection [_]

Stylistically, if the implementations of select and connection where much longer than the one-liners here it would probably be better to move much of that code out into helper functions that get called by the protocol methods. Since these are short (by nature of their being somewhat contrived toy examples) we can keep the inline.

I generally prefer using the protocol approach for most common types of components. It makes the contract for the component clearer. The tradeoff is generally that the API is more rigid – fixed methods and arities – but implementation, usage and testing becomes more flexible.

Next, let’s look at how to handle dependencies in components.

Remember Dave’s cat image processing workers? Here is the original code:

(def ^:dynamic default-message "Posted to")
(def ^:dynamic number-workers 3)

(defonce stop? (atom false))

(defn start-workers []
  (reset! stop? false)
  (dotimes [worker-n number-workers]
      (while (not @stop?)
        (if-let [task (select-next-task (db))]
            (add-watermark (:image task) (or (:message task) default-message))
            (complete-task (db) task))
          (Thread/sleep 500))))))

(defn stop-workers []
  (reset! stop? true))

Let’s first refactor this as a component.

(defn select-next-task
  (database/select db ...))

(defn complete-task
  [db task]
  (let [conn (database/connection db)]
    (jdbc/with-transaction ...)))

(defrecord ImageWorker
    [config database stop-latch]

  (start [this]
    (if (some? @stop-latch)
        (reset! stop-latch false)
        (dotimes [n (:workers config)]
            (while (not @stop-latch)
              (if-let [task (select-next-task database)]
                  (add-watermark (:image task)
                                 (or (:message task)
                                     (:default-message config)))
                  (complete-task db task))
                (Thread/sleep 500))))))))
  (stop [this]
    (if (some? @stop-latch)
        (reset! stop-latch true)
        (assoc this :stop-latch (atom nil)))

(defn image-worker
  "Returns an ImageWorker initialized with config, a map of:

   * :workers          Number of workers [2]
   * :default-message  Default message to add to images
                       [\"Posted to\"]"
   {:config (merge {:workers 2
                    :default-message "Posted to"}
    :stop-latch (atom nil)}))

The business part of this code hasn’t changed much, but we do have two key differences.

  • Config about number of workers and the default watermark message is
    now part of the component.
  • The stop method refers to local rather than global state.

What’s missing is the database – how does the ImageWorker component get that? That leads us to the next core concept in the Component library: systems.

If the simple definition of a component is something that knows how to start and stop itself, the simple definition of a system is something that knows how to stop and start other things. Systems deal with the relationships between components and orchestrated startup and shutdown.

Let’s define a system with our database and image workers.

(defn system
   :db (database (:db config))
   :image-worker (component/using (image-worker (:image-worker config))
                                  {:database :db})))

Our system function returns a Component system from a config. The component/system-map function is the main way to create systems. It’s called with keyvals, where the keys are the system’s internal names for components, and the values are the initialized components.

The database component is straightforward, but our image worker has something interesting – this component/using function. This function is one of the ways to define the dependencies of components on other components. It accepts the component itself, and then a collection describing the dependencies.

Systems deal with the relationships between components and orchestrated startup and shutdown.

If you pass a map, as we have here, the key is the name of the dependency in the component, and value is the internal name of the dependency in the system. So in this example we say that we want the :database field in our image worker to be passed in the component called :db in the system.

The other option is to pass a vector of dependencies, which works when the internal names of components in the system and the names of dependencies inside the component are the same.

The component/system-map function returns a system record which implements the Lifecycle protocol. In other words it returns something that you can call component/start on and it will start up all the components in the system.

Let’s talk about what happens during system startup. Using the relationships defined by the dependencies, Component builds a dependency graph and starts up each component in order. In other words, the dependencies for a component will always be started before it’s started. Then the dependencies are passed in to the components that depend on them, and they are started in turn.

For example, here’s the startup sequence of the system we just created.

  1. Start the Database component, assoc the started component into the
    system map.
  2. Assoc the started Database component into the image worker
  3. Start the image worker component and assoc it into the system map.
  4. Return the resulting system map.

If we called component/stop on the resulting, started, system map the same thing would happen in reverse order.

With that in mind, let’s take a look at how we could build the web handler part of Dave’s system to use the database component.

The web handler is a stateless component, but the Component pattern is actually quite good for managing those too and integrating them into the system.

We know that Dave’s web handler needs the database connection. Here’s what it might look like, built as a component.

(defrecord WebApp

(defn web-app
  (->WebApp nil))

(defn web-handler
  (GET "/" req (homepage-handler web-app req)))

The web-handler function returns a web handler which has closed over the web-app – a WebApp record – and injects it into the handler functions, so they have access to the database.

You may notice that the WebApp record does not have an implementation of the Lifecycle protocol. That’s because in this case start and stop would just return the record unchanged, and Component already contains a default implentation for Lifecycle that does exactly that.

Then we can define a web server component that ties it all together.

(defrecord ImmutantServer
    [config web-app server-handle]

  (start [this]
    (if server-handle
      (let [handler (web-handler web-app)
            server-handle (web/run handler (select-keys config [:path :port]))]
        (assoc this :server-handle server-handle))))
  (stop [this]
    (if server-handle
        (web/stop server-handle)
        (assoc this :server-handle nil))

(defn immutant-server
  "Config is map with keys:

   * :port  Server port [8080]
   * :path  Root URI path [\"/\"]"
  (->ImmutantServer config nil))

The immutant-server constructor initializes a component that will create an Immutant web server, building a handler using the component passed in as :web-app.

Let’s see how the whole system ties together, with an updated system constructor.

(defn system
   :db (database (:db config))
   :image-worker (component/using (image-worker (:image-worker config))
                                  {:database :db})
   :web-app (component/using (web-app)
                             {:database :db})
   :web-server (component/using (immutant-server (:web-server config))
Acknowledgements and Further Reading

There are basically zero new ideas in this article. Pretty much all of the credit goes to Stuart Sierra, whose code, writing, and presentations informed everything here. Thanks, Stuart!

My basic motive for writing this was to provide a more narrative, progressive guide for understanding the concepts behind Stuart’s Component pattern, and hang it on a specific example, in the hopes of providing an easier on-ramp for learning how to build applications that leverage it.

Further reading, including all the links referenced in this post, can be found here by Stuart Sierra.

We're Hiring Engineers at Ladders. Come join our team!

What is a Good Program?


How do you know if the software you are building is “good” software?  How do you know if the programmers on your team are “good” programmers?   As a programmer, how do we systematically improve ourselves?

The goal of most programmers should be to improve their craft of building programs.  A good programmer builds good programs.  A great programmer builds great programs.  As programmers, we need a way to judge the quality of the programs we build if we have any hope of becoming better programmers.

What is the problem you are trying to solve?

A traveler approaches a heavily wooded area he must pass in order to get to his destination. There are several other travelers already here. They are scrambling to cut a path through the woods so they too can get to the other side. The traveler pulls out a machete and starts chopping away at the brush and trees. After several hours of hard physical labor in the hot sun, chopping away with his machete, the traveler steps back to see his progress. He barely made a dent on the heavily wooded area, his hands are bruised and worn, he has used up most of his water, and he is exhausted. A second traveler walks up to the first traveler and asks him what he is doing. The first traveler responds, “I am trying to cut a path through these woods.” The second traveler responds, “Why?” The first traveler snaps back in frustration, “Obviously, I need to get to the other side of these woods!” The second traveler responds, “That wasn’t obvious at all! You see, I just came down that hill over there and from there you can clearly see that the wooded area is deep, but narrow. You will die before you cut your way through the woods with that machete. It would be much easier to just go around. As a matter of fact, if you look to your right you can see a taxi stand in the distance. He can get you to the other side quickly. “

As programmers, what is the problem we are trying to solve?

The first traveler lost sight of his goal. Once he encountered the wooded area with all of the other travelers already cutting their way through the woods, the problem went from getting to the other side to chopping down trees. Instead of stepping back to evaluate the possibilities and try to find the most efficient way to the other side, he joined the crowd that was already chopping away at the woods.

Programs are tools

Programs are solutions to problems. They are tools to help people accomplish their goals. Just like the first traveler in our story, programmers often lose sight of the problem they are trying to solve, wasting most of their time solving the wrong problems. Understanding the problem you are trying to solve is the key to writing good software.

Programs are tools designed to solve a problem for an intended user.

Good tools are effective. They solve the problem they were built to solve. A good hammer can hammer in nails. A good screwdriver can screw and unscrew screws. A good web browser can browse the web. A good music player can play music. A good program solves the problem it is supposed to solve. A good program is effective.

Good tools are robust. A hammer that falls apart after just one use is not a very good hammer. Similarly a program that crashes with bad inputs is not a very good program. A good program is robust.

Good tools are efficient. An electric screwdriver that takes a long time to screw in a screw is not as good as an electric screwdriver that can screw in screws quickly. Similarly, a web browser that takes a long time to render a web page is not as good as one that does so quickly. Good programs are efficient.

Like any other good tool, good programs are effective, robust, and efficient. Most tools are built to solve a well defined problem that is not expected to change. A nail will always behave as a nail does, thus a hammer’s job will never need to change. Programs, on the other hand are typically built to solve a problem that is not well defined. Requirements change all the time. A good program has to be flexible so it can be modified easily when requirements do change.

Good programs are flexible.

Creating flexible programs that can easily be adapted to meet changing requirements is one of the biggest challenges in software development. I stated earlier the key to building a good program is understanding the problem you are trying to solve. Programming is an exercise in requirements refinement. We start with an understanding of the fundamental problem we are trying to solve by using a plain language. In order to create a solution for the problem we start defining requirements. Some requirements are based on fact and some are based on assumptions. Throughout the software development process we refine those requirements, adding more detail at every step. Fully specified, detailed requirements are called code. The code we write is nothing more then a very detailed requirements document a compiler can turn into an executable program.

The challenge comes from changing requirements over time. Our understanding of a problem may change. The landscape in which we are operating may change. Technology may change. The scope of the problem may change. We have to be ready for it all.

When requirements change we have three choices: do nothing, build a new program, or modify the original program. Building a new program is a perfectly acceptable solution, and may be the right answer in some cases. Most of the time due to time and budget constraints, the best answer is to modify the original program. A good program will spend most of its life in production. During that time the requirements of the users and landscape is likely to change. When that happens your program no longer meets our first requirement for good programs: A good program is effective. The requirements of the problem no longer match the requirements specified in your code. Your program no longer solves the problem it was intended it solve. It is broken. You have to fix it as quickly as possible. If the program is flexible enough you can modify it quickly and cheaply to meet the new requirements. Most programs are not built this way and end up failing as a consequence.

To build a flexible program, a programmer should ask the question: “What requirements are least likely to change, and what requirements are most likely change?” The answer to this question will inform every aspect of your program architecture and design. It is impossible to build a good program without the answer to this question.

Good code?

Programs are made of code. Good programs are made of good code.
In this post I have specified the top-level requirements for what a good program is. In the next post I will start to refine these requirements further to answer the question, “What is good code?”

We're Hiring Engineers at Ladders. Come join our team!

Stateful Components in Clojure: Part 1


You’re building a web app with Clojure, and things are going great. Your codebase is full of these fantastic pure and composable functions, you’re enjoying all the great ways in the Clojure ecosystem to manipulate HTML and CSS, generate data for your web APIs, and maybe you’ve even written a Clojurescript front-end. It’s all smooth sailing until you realize that it’s time to connect to a database…

If you’re like many Clojure developers, you’ve lost some time learning about various bad ways to solve the problem of managing stateful things like database connections, caches, queues, authentication backends, and so on in Clojure apps.

Let’s look at the hypothetical journey of one developer – we’ll call him Dave, since everyone knows a Dave – and his attempts to solve this problem.

Dave’s Journey

Dave is building a web app. Probably one that’s a lot like yours. And he too is just now running into the problem with stateful services, as he needs to add a database to his app.

Dave is first tempted to avoid the state problem entirely – a noble impulse for any engineer – by stashing the database config in a dynamic var, reasoning with some justification that it’s just data so that’s not so bad. He ends up with the following:

(def ^:dynamic *db-config* "postgresql://localhost/devdb?user=dev&password=dev")

(defn select
  (jdbc/query *db-config* query))

This approach certainly has its benefits. None of Dave’s handlers have to know or pass around anything about the database; they simply call the select function. Dave is initially satisfied with this approach. It looks pretty easy, so he writes some tests.

(deftest homepage-handler-test
  (with-bindings {app.db/*db-config* test-config}
    (is (re-find #"Hits: [0-9]." (homepage-handler {})))))

It’s reasonably testable code, so long as he always remembers to correctly bind the *db-config* var. He keeps working on his code until he notices that some of his pages seem to be loading slowly, especially the ones that do a lot of database queries. Pretty soon the reason becomes clear – Dave’s app has to open a new connection to the database for every query.

The obvious solution is to use a persistent database connection. The question is where to put it? Using a dynamic var to hold the config is one thing – that’s just data – but a database connection is a stateful object. Like so many have done before him, Dave accepts a compromise on his functional programming principles and decides to stash the database connection in a delay. Now his code looks like this:

(def ^:dynamic *db-config* "postgresql://localhost/devdb?user=dev&password=dev")

(defonce db-conn (delay (make-pool *db-config*)))

(defn db []

(defn select
  (jdbc/query (db) query))

Testing is now a bit tricker. He has to redef the db function and handle setup and teardown.

(def test-db-conn (atom nil))

(defn with-test-db-conn
  (reset! test-db-conn (make-pool test-config))
  (shutdown @test-db-conn)
  (reset! test-db-conn nil))

(use-fixtures :once with-test-db-conn)

(defn test-db []

(deftest homepage-handler-test
  (with-redefs [db test-db]
    (is (re-find #"Hits: [0-9]." (homepage-handler {})))))

Dave is feeling nervous. It’s a nagging, itching sort of feeling in the back of his mind, like the first stages of athlete’s foot. This code is starting to bother him. Just this little bit of global state is already requiring a lot of machinery to work with. His worries haven’t erupted into a full scale breakout yet, however, and he’s got more story points to complete.

His next task is to add a new feature that’s so important that his boss’s boss’s boss is regularly asking for status updates from his boss’s boss, which get translated into urgent requests to his boss, which finally become exhortations so anxiety-ridden that they are nearly incoherent to him. Their web app is the premiere service for posting pictures of cats wearing socks, and Dave needs to add a feature that either watermarks the pictures of these unfortunate animals with the site’s URL for non-paying users, or with a user-selected message for paying users.

To do this he needs to build a system that reads from a job queue in the app’s database and processes each new image to add the appropriate message. He’ll need to have several workers doing this for performance reasons, since few things have less patience than a cat forced to wear socks. This is a tricky problem for his approach to state management because he needs to be able to initialize and shut down these workers. Not only that, but they depend on the database connection.

Here’s what he comes up with.

(def ^:dynamic default-message "Posted to")
(def ^:dynamic number-workers 3)

(defonce stop? (atom false))

(defn start-workers []
  (reset! stop? false)
  (dotimes [worker-n number-workers]
      (while (not @stop?)
        (if-let [task (select-next-task (db))]
            (add-watermark (:image task) (or (:message task) default-message))
            (complete-task (db) task))
          (Thread/sleep 500))))))

(defn stop-workers []
  (reset! stop? true))

He adds a call to start-workers to his main function, and it seems to work well enough. He wasn’t able to do much iteration at the REPL while developing because he kept needing to manually restart things, and it was tedious to keep things pointed at the right database. His test code needs to redef even more things now. There’s still the database redefs from before along with the custom setup and teardown testing code, and now he also has to redef the stop? atom so that he can test this worker code that uses a completely different start and stop mechanism.

Dave’s morale is down, too. Nothing about this code seems functional anymore – he may as well have done it in Java. It just feels wrong to him, but he doesn’t see a better way out. He’s reached the low point in his journey. What Dave needs is a hero, and one arrives in the unlikely form of Stuart Sierra.

Continued in Part 2 and Part 3

We're Hiring Engineers at Ladders. Come join our team!