The Man Who Knew Infinity: Coding Ramanujan’s Taxi

Ramanujan is on the left

Have you see the movie (or read the book) The Man Who Knew Infinity?

This new movie — which stars Dev Patel and Jeremy Irons — explores Indian mathematician Srinivasa Ramanujan and his profound understanding, ingenuity, and love of math.  The film inspired me on both an intellectual and emotional level. But what really drew my attention was a particular five second scene.

The scene takes place in 1918. Ramanujan‘s mentor and friend G.H. Hardy quips that he had just taken taxi number 1729 and finds the number “a rather dull one.”  Ramanujan passionately replies, “No, Hardy, it’s a very interesting number! It’s the smallest number expressible as the sum of two cubes in two different ways.”

Ramanujan was able to see beyond the simple taxi cab number and into the depths of the expression behind it: a³ + b³ = c³ + d³…better known as Ramanujan’s Taxi. I thought this problem was fascinating and wondered how the code implementation would look. Little did I realize there were many optimization layers to this algorithm onion.

The taxi Ramanujan took — at least in the movie

First Crack at Implementing Ramanujan’s Taxi

I started with a straight forward implementation written in Scala. The code, with performance timings, can be found on GitHub:


We begin with a brute-force implementation by looping though all combinations to find where a³ + b³ = c³ + d³. We achieve O(n⁴) performance because of the four loops used to calculate all values of a³, b³, c³, and d³ equal or less than parameter n, which bounds our search field.

This brute-force implementation, with O(n⁴) performance, kinda sucks. So, how can we do better?

We Can Do Better

First question to ask is: do we always need to calculate all the values of a³, b³, c³, and d³? Remember, the equation we are using is a³ + b³ = c³ + d³. If we solve for d³, we get d³ = a³ + b³ – c³. Thus, once we know a³, b³, and c³, we can calculate the value of d³ directly instead looping through all values of d³.

My next implementation, again in Scala, replaces the fourth loop with the calculation d³ = a³ + b³ — c³:

The 2nd version has O(n³) performance since we get to skip that final loop. Neat!

Third Time’s A Charm

We’re not done yet. There is a third, and the best yet, enhancement to consider. What if we don’t need to solve for all values of not only d³, but c³ too? A few things to understand:

  • If we calculate all values of a³ and b³ equal to or less than n, we essentially have all possible values of not only a³ and b³, but also c³ and d³.
  • The sum of a³ + b³ is equal to the sum of c³ + d³
  • If the sum of #2 above for a given pair of integers (a³, b³) matches the sum of another pair of integers (a³, b³), we have in essence found the c³ and d³ pair.

If we store every combination of the sum of a³ + b³ and the corresponding pair (a³, b³), any sum that has two pairs means we have found a³ + b³ = c³ + d³ where the first pair in the list can be considered (a³, b³) and the next (c³, d³).

For example, if we iterate through the combinations of a³ + b³, we will store the sum 1729 with the pair (1³, 12³). Continuing to iterate, we will see another sum of 1729 arise, but this time with the pair (9³, 10³). Because we have two different pairs both summing to 1729, we have found a Ramanujan Taxi that solves for a³ + b³ = c³ + d³.

In the third version, we use a Hashmap to store the sum (key) and the corresponding list of pairs as a Sorted Set (value). If the list contains more than one pair, we’ve got a winner!

This implementation has O(n²) performance since we only need two loops to calculate the combinations for a³ and b³. Very neat!

I suspect there is a forth optimization where we only need to calculate values of a³ and derive b³ from a³ (the ‘b’ loop is just an offset of the ‘a’ loop) with O(n) performance.

Also, another challenge is to re-write the implementations as a functional programming pattern. I’ll leave that for you to explore.

An Amazing Movie, an Amazing Man

After watching The Man Who Knew Infinity, I was in awe of Ramanujan’s genius. By implementing his taxi algorithm — with its several performance optimizations — I got a glimpse of the beauty he saw in “No, Hardy, it’s a very interesting number!”

Ramanujan’s Taxi, at almost a century old, is still making new discoveries. Mathematicians at Emory University have found the number 1729 relates to elliptic curves and K3 surfaces — objects important today in string theory and quantum physics.

I expect we have only scratched the surface of Ramanujan’s taxi cab number and the man’s amazing genius.

Article originally appeared at Free Code Camp

We're Hiring Engineers at Ladders. Come join our team!

Fighting the Dreaded Drive-By

drive

Here Comes the Drive-By

A drive-by is an immediate demand of your attention. In today’s modern office a drive-by can take many forms. A co-worker walks up to your desk to ask you a question, or sends you multiple yo’s in Slack, or spams your handle in a public channel, or waits to catch your eye and then waves for you to come look at something on their screen.

Drive-by’s are usually innocent and followed by phrases like “I’m sorry, are you busy? I can come back later.” The person breaking your focus isn’t trying to be malicious. They need your help. Don’t you want to be a helpful coworker, be a team player?

What’s So Bad About a Few Drive-By’s?

A lot! If you are a software engineer, or really anyone that needs deep focus and a large memory cache while working, you know why drive-by’s are bad. It takes a long time to load information back into your brain — most studies find it to be over fifteen minutes. Have you ever had a day where you felt like you got nothing done? Of course! I’d wager, the number of unscheduled interruptions on those days is pretty high.

Don’t Be the Drive-By

You can start fixing the problem by fixing your own behavior. Here are a few rules I follow to make sure I commit as few drive-by’s as possible.

  • Always message people before stopping by their desk. Try to keep it as a single message, 4 slack notifications for this request can be just as bad.
    Yo,
    I have a question
    It’s about that module you just added
    Mind if I stop by your desk?
  • If people are offline, assume they don’t want to be bothered. If you are at your desk and can handle messages, be online. If you are never online, it makes people less likely to follow this rule.
  • Only use @here or @handle in Slack (if you’re not using Slack, make step 1 use Slack) when something is urgent.
  • Use the daily stand up to ask questions. Stand up is your chance to share everything that someone might ask you that day. Be proactive. Take note of the types of questions you get asked throughout the day. Could you have addressed any of those questions in standup?

Becoming Drive-By Proof

Drive-by’s will happen and are impossible to eradicate. These are some of the ways I protect myself against them.

  • TDD (Test Driven Development) and Stubbing: Use a test as an anchor to help you find your place again and reload your mental stack. Did you just make a test go green before the disturbance? What did you stub? Stubs are a great way to keep track of what still needs to be done. Have all your stubs been implemented? You are ready to move on to your next requirement.
  • Requirements: Hopefully you have a clear set of requirements. My favorite stories have long lists of requirements. I use (/) in Jira to check off each one as it’s completed. If your stories don’t have a requirements list, take the time to convert the requirements to an actionable list. It will almost always pay off.
  • Mental Notes: If someone interrupts you, take a minute to make a mental note of what you were doing and what you need to do once you resume. Even better, write it down. This has two benefits: First, it lets you store your mental cache, which hopefully helps getting back in the flow easier once you resume. Second, if you make this your routine, coworkers will understand they won’t get an immediate response from you and will have to wait before asking you a question. This might make them more likely to try other sources before interrupting you.
  • Central Repository of Information: If people know you are only going to tell them to look it up in Confluence, they will stop asking. If you are asked a question, take the time to put your answer into the repo. This can also be a good exercise for the question asker, but make sure what they create is clear and complete.
  • Headphones are Key: Luckily I can code well while listening to music. This isn’t true for all engineers. I’ve known engineers that wear headphones with no music playing. This is an unfortunate last resort and usually comes across pretty passive aggressive.
  • Progress Reports: Hopefully you don’t need to check in with the team more often than stand up. For some highly coordinated tasks, you need to work closely with other team members. Make sure to announce your progress as you hit milestones and more importantly are in a good mental state to pause. Taking this initiative makes you less likely to be interrupted by questions like “Can I start working on X, did you finish Y?”

Epilogue

Clearly most of these suggestions are overhead and less necessary if you aren’t interrupted very often. There is no better solution than no interruptions!

TLDR: Only interrupt people when truly necessary. Break tasks into small manageable chunks and keep a list of completed and to-dos. Before you are fully interrupted, unload your mental cache into a short note.

Want to dig into this topic more? http://blog.ninlabs.com/2013/01/programmer-interrupted/ is a great article that really digs into some of the topics covered here and provides links to the full studies.

Have any other tips for preventing drive-by’s or resuming work after an interruption? Let us know in the comments.

We're Hiring Engineers at Ladders. Come join our team!

Stateful Components in Clojure: Part 3 on Testing

5cbbc6fc230200b38db266f28f810f58

Previously in Part 1 and Part 2, we looked at some problems with using globals for stateful components in Clojure, and an approach to solving this problem using local, scoped state managed with Stuart Sierra’s Component library.

A few people pointed out that I failed to deliver on the promise of showing how this approach helps with testing. From the outside it may look as though I simply forgot to include this part, and in a startling display of poor priorities I spent my time writing a deranged dream sequence instead. It might look that way! But in the best tradition of a certain prominent politician, I boldly claim that my so-called “mistake” was completely deliberate – merely a part of my grand plan – and it is you who are mistaken.

So let’s take a look at testing these things.

Simple Web Handler Test

Remember in Part 1, there was this simple test of a web handler that uses a database from the global state:

(deftest homepage-handler-test
  (with-bindings {app.db/*db-config* test-config}
    (is (re-find #"Hits: [0-9]." (homepage-handler {})))))

We re-wrote our database connection as a record in the Component style, implementing a DBQuery protocol, like this:

(defprotocol DBQuery
  (select [db query])
  (connection [db]))

(defrecord Database
    [config conn]

  component/Lifecycle
  (start [this] ...)
  (stop [this] ...)

  DBQuery
  (select [_ query]
    (jdbc/query conn query))
  (connection [_]
    conn))

(defn database
  "Returns Database component from config, which is either a
   connection string or JDBC connection map."
  [config]
  (assert (or (string? config) (map? config)))
  (->Database config nil))

We also re-wrote our web app as a component that gets the database record injected in, like this:

(defrecord WebApp
    [database])

(defn web-app
  []
  (->WebApp nil))

(defn web-handler
  [web-app]
  (GET "/" req (homepage-handler web-app req)))

This gives us several ways to test our web app and handler functionality. First, we can duplicate the original test logic that uses a testing database.

(def test-db (atom nil))

(defn with-test-db
  [f]
  (reset! test-db (component/start (database test-config)))
  (f)
  (component/stop @test-db))

(use-fixtures :once with-test-db)

(deftest homepage-handler-test
  (let [test-web-app (assoc (web-app) :database @test-db)]
    (is (re-find #"Hits: [0-9].") (homepage-handler test-web-app {}))))

If we don’t want to use an actual database to test against, we can also mock the functionality we need in a mock database component.

(defrecord MockDatabase
    [select-fn]

  DBQuery
  (select [_ query]
    (select-fn query))
  (connection [_]
    nil))

(defn mock-database
  "Returns a mock database component that calls select-fn with the
   passed in query when DBQuery/select is called."
  [select-fn]
  (->MockDatabase select-fn))

(deftest homepage-handler-test
  (let [mock-db (mock-database (constantly {:hits 10}))
        test-web-app (assoc (web-app) :database mock-db)]
    (is (re-find #"Hits: 10" (homepage-handler test-web-app {})))))

Testing the Image Workers

A lot of this code can be re-usable in testing the image worker component, which also uses a database connection. The code for this component is a bit longer, so refer to Part 2 if you need a refresher. It’s very straightforward to write a test for it that uses the test database from our previous example.

(deftest image-worker-test
  (let [... insert some tasks into the test db ...
        test-image-worker (component/start
                           (assoc (image-worker {:workers 2})
                                  :database @test-db))]
    (Thread/sleep 2000)
    (is (... check if the test-db's tasks were completed ... ))
    (component/stop test-image-worker)))

Also observe that we’ve hard-coded the idea that the task queue is in the database into our image worker component. This too could be improved by turning it into its own component with its own protocol API.

(defprotocol TaskQueue
  (next-task [this]
    "Returns the next task in the queue, or nil if no task is in the queue.")
  (complete-task [this task]
    "Sets task as complete."))

(defrecord DBTaskQueue
    [config db]

  TaskQueue
  (next-task [this]
    (database/select db ...))
  (complete-task [this task]
    (let [conn (database/connection db)]
      (jdbc/with-transaction ...))))

This lets us flexibly mock out the task queue for testing components that use it.

(defrecord MockTaskQueue
    [tasks]

  TaskQueue
  (next-task [this]
    (let [[nxt & rst :as ts] @tasks]
      (if (compare-and-set! tasks ts rst)
        nxt
        (recur))))
  (complete-task [this task]
    true))

(defn mock-task-queue
  "Takes collection of tasks, returns mock task queue."
  [tasks]
  (->MockTaskQueue (atom tasks)))

(deftest image-worker-test
  (let [tq (mock-task-queue [test-task1 test-task2])
        test-image-worker (component/start (assoc (image-worker {:workers 2})
                                                  :database tq))]
    (Thread/sleep 2000)
    (is (... check if the test task images were processed ...))
    (component/stop test-image-worker)))

One thing that was skipped over in the discussion in the previous articles was the fact that the image workers are doing side effects other than touching a database. They make a call to an `add-watermark` function, which presumably reads the image, does processing of some kind, and re-saves it somewhere. The testability of this, too, could be improved by converting it into a component which could be mocked out or redefined for tests of the image worker.

Test Systems

It’s possible to take this test style even further, and build complete Component system maps for our testing. To refresh our memory, a component system map is a structure that defines all the components in our system, describes their dependencies, and can start them in order and pass in dependencies to return a running system.

In Part 2, we built a system that includes a web server, web handlers, and database. Let’s look at defining a system-wide test.

(def test-system
  (component/system-map
   :db (database test-db-config)
   :web-app (component/using (web-app) {:database :db})
   :web-server (component/using (immutant-server {:port 8990})
                                [:web-app])))

(deftest system-test
  (let [started-system (component/start test-system)
        test-response (http/get "http://localhost:8990/")]
    (is (re-find #"Hits: [0-9].") (:body test-response))
    (component/stop started-system))

Advantages

The most natural way to work in Clojure – as a functional language with a fantastic REPL-driven workflow – is to call functions and control their behavior by passing them arguments. This enhances our workflow in development, and also gives us flexible and straightforward ways to test our code.

A component-driven code style provides the same kind of leverage when dealing with areas of our systems that deal in side effects, state, and external systems. Access to controlling the behavior of these components is through passing them different arguments on construction, and passing in different dependencies. This delivers the same kinds of benefits as pure functions, while isolating side effects and state.

The Recap

I wrote this series because when I started in Clojure the process of building clean systems was painful.  There wasn’t a community consensus on how to manage stateful resources and there certainly wasn’t a framework I could leverage.  My goal was to give the guide I wish I had; concepts could be tied together and beginners could experience the joy of Clojure even earlier than I did.

In this series, we looked at a few strategies for managing stateful resources in the global environment and some problems these approaches cause for interactive development and testing. Then we had a nightmare about being complected by kittens. Then we looked at a design pattern using Stuart Sierra’s component library for managing stateful resources in a local scope with dependency injection. Lastly, we looked at some strategies for testing components built in this style.

After reading this series, I hope you see the beauty I see in building clean systems in Clojure using components.  Go forth and build, and may the kittens always be in your favor.

Previous post in Part 1 and Part 2

Previously in Part 1 and Part 2.

We're Hiring Engineers at Ladders. Come join our team!

Transforming Data with a Functional Approach

integrated-1168815

Almost without fail, every time folks talk about the benefits of functional programming (FP) someone will mention it is “easier to reason about.” Speakers and bloggers make it sound obvious that immutable values, pure functions, and laziness lead to code that is easier to understand, maintain and test. But what does this actually mean when you are in front of a screen, ready to bash something out? What are some actual examples of “easier to reason about?” I recently worked on a project where I found a functional approach (using Clojure) really did bring about some of these benefits. Hopefully this post will provide some real-world examples how FP has some amazing advantages.

Speakers and bloggers make it sound obvious that immutable values, pure functions, and laziness lead to code that is easier to understand, maintain and test.

The project involved loading and processing about a hundred thousand jobs so they can be indexed by ElasticSearch. The first step is easy enough: just load the data from the filesystem. We’ll use Extensible Data Notation (EDN), a data format that is really easy to use from within Clojure.

(def jobs (edn/read-string (slurp “jobs.edn”)))

The `slurp` function takes a filename and returns all its contents as a string. `edn/read-string` parses the EDN in that string and returns a Clojure data structure. In this case, a list where each job is represented as a Clojure map that looks something like this

{ :id 123

  :location “New York, NY”

  :full_description “This job will involve …”

  :industry “Lumber”

  … }

Pretty straightforward and nothing really functional programingy just yet. However, trying to index these immediately throws a bunch of errors. ElasticSearch complains that some of these jobs have nulls for their locations. So it seems we have some bad data, begging the question, how many of our jobs are affected? We could manually loop through all those jobs and keep a count of the ones with nulls as locations. Yuck! Instead, let’s try an FP approach. Given a single job, we want to check if it has a `nil` for a location. Then, we filter all the jobs we have to just those `nil` ones and see how many we get.

(count (filter #(nil? (:location %)) jobs))

Note: The semicolon character simply starts a comment and is used here to display the result.

First, the call to `filter` takes two arguments, a function and a collection. The function should take elements of the collection, in this case individual jobs, and return whether or not the given job’s location (extracted using `get`) is equal to `nil` (equality checking is done with `=`).

Okay, so about a hundred bad jobs out of thousands. That’s not a whole lot, so we should be okay ignoring those. To do that, we want the opposite of the above and filter out jobs that *don’t* have nulls for locations.

(def no-nulls (filter #(not (nil? (:location %))) jobs))

Instead of `=` we use `not=` to check the opposite. ElasticSearch can now happily take our jobs and index them just fine.

We start doing some test searches against our ElasticSearch instance and quickly find out the results contain far more data than we need. Each job comes with a full job description that often contains a huge amount of text. But we don’t need all of that information. We want to create a new field for our jobs that contains a truncated version of this description. That way, the search results can be much smaller, reducing network traffic and time ElasticSearch spends serializing and marshaling all that data.

Let’s again try the FP approach. When we used `filter` earlier we were thinking of what had to be done with each individual job (check if a location is `nil`) and then we let `filter` and `count` do the grunt work. Here, what we need to do with each individual job is to add a new field. Since jobs are just hashmaps, we can simply add a new key-value pair. As for the truncated description, we can just take the first hundred characters of the full description. This is a bit more involved than the `filter` example, so let’s start with writing a function that handles an individual job.

(defn assoc-truncated-desc

  [job]

  (let

      [full-desc (:full_description job)]

    (assoc job :truncated_desc (subs full-desc 0 100))))

This function takes a single `job`, gets the full description, and assigns it to `full-desc`, and then calls `assoc` to create the new key-value mapping. The key is the name of our new field, `:truncated_desc`, and the value is just a substring of the first 100 characters.

A subtle thing to note is the original `job` does not change. `assoc` returns a new hashmap with the new key value pairing. This means `assoc-truncated-desc` is a pure function that takes an existing job and returns a new one, never mutating any state. Let’s say we call it on one of our jobs and find out that 100 characters is not enough. The original job remains untouched, so we can modify the code (say, change 100 to 150) and call it again until it’s just right.

So we’ve written a function that can process a single job. Hooray! How do we do this with 100,000 jobs? Instead of filtering out some number of jobs as we did with the null locations, we want to apply this transformation to every single job. This is exactly what the `map` function is for. Let’s call it on `no-nulls` from before.

(map assoc-truncated-desc no-nulls)

And that’s it! We return a new list where every job has the new truncated description field, ready to be indexed and searched. The final code looks like this:

(defn prepare-jobs

  [jobs]

  (map assoc-truncated-desc

       (filter #(not (nil? (:location %))) jobs)))

We started with a bunch of raw jobs from the filesystem and turned them into something that fits our needs. An imperative approach might have you start with writing a for loop and filter out how you want to modify the whole list. Functional programming takes a different approach by separating out the actual modification (remove a null value, adding a new field) from the way you wanted to *apply* a transformation (either by filtering or by mapping). We start with thinking at the level of an individual element and what we want to do to that element., allowing us to figure out whether we need to `filter`, `map`, `partition`, etc. This approach lets us focus on the “meat” of the problem.

Functional programming takes a different approach by separating out the actual modification from the way you wanted to *apply* a transformation

Immutability and purity are simply what makes this approach optimal. A job is never changed so we can keep mucking about with it until we have what we want. A transformation does not rely on any state so we can make sure it works in isolation before even thinking about collections and loops. Functional programming helps us focus on solving the problem at hand by isolating unrelated concerns so we can, so to speak, “reason” about it more easily.

We're Hiring Engineers at Ladders. Come join our team!