Ladders Engineering Cultural Values

The Ladders team is very excited to share with the world our Engineering Cultural Values. We hold these truths to be self-evid…just kidding. Creating a set of inspiring, long lasting, and fundamental values isn’t easy. It is a reflective process with many different, but often surprisingly similar opinions.

We asked ourselves:

  • What does it mean to be an engineer at Ladders?
  • What do we expect from ourselves, our colleagues, and our company?
  • What do we take pride in and where do we need to do better?
  • What are our core values that will form the bedrock now and in the future?

Wow! That is a lot of stuff to consider. After many brain-storming sessions, inspirations from other companies (thank you Zillow and Amazon), and tossed/consolidated ideas, we came up with 7 core cultural values.

Without much further adieu, we present Ladders Engineering Cultural Values.

Ladders Engineering Cultural Values

AWS IO Performance: What’s Bottlenecking Me Now?


When moving to AWS, it can be difficult to pinpoint where your previously high-functioning system is underperforming. AWS has its own internal rhyme and reason, and anyone moving to AWS must be familiar with how AWS operates. One tool that can help you on this front is AWS CloudWatch, which provides metrics and monitoring on core system performance. The focus of this article is how to configure your IO subsystem in AWS. This requires using Cloudwatch to understand your subsystem’s performance. Are you hitting throughput limits? Are you utilizing all your allocated IOPS? What’s the average queue length?

To define all these terms from scratch is out of the scope of this article – we assume you have some basic familiarity with AWS I/O terminology or at minimum have some links to get you started. Our goal here is instead to provide some high-level guidelines on how to begin thinking through your I/O performance in AWS.

It’s Not Just the Volume

Let’s take a step back for a second to let the implications of using EBS with our instances sink in. EBS (Elastic Block Store) is a type of durable, persistent, network-attached storage available for AWS EC2 instances. Emphasis here goes on “network-attached”: whenever you read/write to an EBS volume, there is an unavoidable network component involved that sits directly between the EC2 instance and the EBS network-attached storage. In many cases, this network component will be your bottleneck unless you are running on a particularly capable instance network setup (2), since it isn’t very difficult to surpass the aforementioned instance limits.

The first step in assessing your IO woes is validating this network connection from your instance to your EBS volume. To further complicate things, your performance is not only limited by the volume you provision, but the instance you attach said volume to! Amazon lists the network limits per instance type, ranging from 4k IOPS and 500MiB/s limits to 65K IOPS and 10GiB/s limits, so check your documentation! It is incredibly important that you take into consideration your expected disk throughput and IOPS when selecting an instance. A perfectly performant IO subsystem can’t outshine a low-throughput network connection to an instance, so be sure to test in every application.

When asking “what’s bottlenecking me now?”, the EBS volume type and the instance type have to be considered when setting up your system. You could provision 20000 IOPS to an io1 disk, but because of instance bottlenecks, max out at around 8000 IOPS for a throughput of 260MiB/s. That’s 12000 IOPS you’re paying for wasted, all because you forgot to consider your instance type as a bottleneck.

When consulting AWS documentation about your instance, one should be cautious about taking the listed IO limits at face value. In a fair bit of documentation, AWS achieves the listed instance benchmarks only by  assuming a read-only workload in “perfect weather.” The reality is that these numbers are rounded approximations of best-case scenario performance, using block sizes that may not be relevant to your application. Read your documentation carefully and be prepared to test your assumptions. Of course, this is just the tip of the iceberg – you’ll need to weigh in other factors as well, such as the file system being used, the read/write workload, and whether options like encryption are enabled on the EBS volumes connected to the instance.

What About the Disk?

Go ahead and start monitoring your instance to quickly determine where you are particularly struggling in the IO department (if at all!). AWS CloudWatch metrics make it very easy to validate if your high throughput I/O application is having disk-related performance bottlenecks. Combining this with standard disk benchmarking tools at the OS level will allow clear picture of your IO subsystem to emerge. We’ve found it is fairly simple to get an IO subsystem to outperform the EBS-to-Instance network connection (RAID setups come to mind), so after validating the network pipe limitations, you’ll be much better equipped to provision your EBS volume appropriately for desired IOPS and/or throughput.

Appropriate drive selection depends heavily on your primary performance attribute (2), IOPS or bandwidth. For higher IOPS applications, general SSD’s are available in general purpose burst type (gp2) and provisioned (io1) formats. SSD’s are more expensive than their HDD counterparts, and are optimized around low latency smaller block size operations. Most general purpose and mission critical applications are just a matter of appropriately sizing one of these two SSD types. The other kinds of EBS storage options offered by AWS are throughput optimized (st1) and cold (sc1) HDD volumes. What is unique about these storage types is their maximum bandwidth per drive, 500MiB/s and 250MiB/s respectively. These HDD volumes require a larger block size and a preferably sequential workload to labor effectively due to their painfully low IOPS limits (500/250 IOPS respectively).


At the end of the day, most AWS applications outside of big data/log processing will boil down to a choice between either io1 or gp2 volumes. HDD volumes have their place, but they must be carefully considered due to their extreme IOPS limitations even though their admirably high bandwidth limits might be appealing.

EBS snapshots: A Lesson in Prudence


Let’s turn now to some of the functionality provided in tandem with EBS volumes. AWS offers powerful snapshot functionality in conjunction with EBS volumes, allowing the user to make point-in-time backups of a given volume. Better yet, these snapshots are incremental – your first snapshot of a volume will take time, but any subsequent snapshot of the same volume will only snapshot blocks that have been modified since the last snapshot. AWS accomplishes this through a bit of smoke and mirrors that ultimately boils down to two things: pointers and S3. In short, all snapshotted blocks of a given volume are stored in S3. When a new EBS volume is spun up from a snapshot, AWS will initialize the volume using the newest blocks from all snapshots associated with the original drive. This means that an end user can set up a rolling backup schedule for a given EBS volume, and be assured of both data integrity to a point in time, as well as speedy incremental snapshots.

Hearing all this, it is easy to get carried away. Incremental snapshots + speedy initialization of a drive from a snapshot + programmatic control of instances and volumes via AWS API calls = all sorts of possibilities, right? One might think of backup systems, easy setup / maintenance of slave nodes or QA environments, or perhaps a data refresh scheme using snapshots? Sounds good, but stop for a second and remember: THERE IS NO FREE LUNCH. Not here, not anywhere. We’re dealing with computers, and everything has a trade-off cost. So where are we paying here?

Remember when we mentioned how AWS stores snapshot blocks in S3? So how do we get those blocks back from S3, an entirely separate system, into a drive that we initialize from a snapshot? Indeed, AWS does you no favors (or perhaps one too many favors!) on this – when you navigate a volume spun up from a snapshot, all your files look like they’re there!

Or are they? Perhaps you might notice slight performance degradation when you access a file from your snapshot-initialized volume. The first time you open a file, your IO performance is poor; the second time, much better. This pattern repeats for every file on your volume.

There it is – no free lunch. When a drive is initialized from a snapshot (whose blocks are stored on S3) and then attached to an instance, AWS fetches blocks from S3 lazily. Until you access a given block on your disk, it is not physically on your volume. So the first time you access a block, you’re paying two costs – network latency to fetch from S3, and then IO latency for reading the block from disk. And to make things worse – recall the rest of the article on how an EBS volume is essentially storage accessed through a network, which means you have more layers of latency and bottlenecks to work through for an IO read!

Okay, so what can we do? Well, if your volume is small, and / or you have time on your side, there are a few options. The main option suggested by AWS is outlined here: in essence, use the dd tool(3)  to pull all blocks from your newly-initialized volume, and redirect the output to /dev/null. In other words, read everything such that all blocks are pulled from S3 (the drive is “pre-warmed”), and then discard those reads. Now you’ve eliminated the S3 latency penalty.

Of course, depending on the size of your drive, this process can be prohibitively slow. It also requires extra manual work or automation from your engineers, which can backfire depending on the complexity of your implementation.

None of this is meant to deter you from making use of snapshots – they’re a wonderful, well-implemented tool. But as with any tool, they come with drawbacks and gotchas, and this is a biggie. So just remember: EBS snapshots are back-ended by S3, and blocks for snapshot-initialized volumes are fetched lazily.

AWS is Not Magic

Your best bet is to test the network connection directly via an I/O load test utility (such as SQLIO.exe or diskspd.exe in the Windows world). Generating an insanely high IO/bandwidth workload is very easy when you’re using the right utility, and it allows you to validate the EBS-to-Instance pipe and disk limitations at not only the AWS layer (CloudWatch), but at the OS layer as well. You can easily find the network pipe limit in terms of bandwidth or I/O using an overly performant drive setup and a correctly tuned load tester utility, such as RAID 0 volumes and diskspd.exe (RAID0 being what it is, of course, exercise caution!).

AWS offers flexibility and power that far exceeds that of your average data center, but (at risk of a groan) with great power does indeed come with great responsibility. You MUST be prepared to play by AWS’s rules and not go by conventional datacenter wisdom. This means a few things:

  1. Live and breathe AWS documentation.
  2. At the same time, believe nothing and test everything in documentation.
  3. Perform your proof-of-concepts at scale and as robustly as possible.
  4. Remember that AWS services are nothing more than complex abstractions – you will face bottlenecks from things AWS lists, and bottlenecks from things they never mention once. We refer you back to points one through three.

If anything else, the intent of this article is to convey the complexity of moving to AWS through the lens of configuring I/O, and to offer some suggestions and principles on how to navigate the world of AWS. The payoff of moving to AWS can be wonderful, but AWS is not magic. Learn to navigate by its rules, and you’ll be rewarded – otherwise, you’ll wonder what all the hype was about!

Reference Links

Additional Reading


Fighting the Dreaded Drive-By


Here Comes the Drive-By

A drive-by is an immediate demand of your attention. In today’s modern office a drive-by can take many forms. A co-worker walks up to your desk to ask you a question, or sends you multiple yo’s in Slack, or spams your handle in a public channel, or waits to catch your eye and then waves for you to come look at something on their screen.

Drive-by’s are usually innocent and followed by phrases like “I’m sorry, are you busy? I can come back later.” The person breaking your focus isn’t trying to be malicious. They need your help. Don’t you want to be a helpful coworker, be a team player?

What’s So Bad About a Few Drive-By’s?

A lot! If you are a software engineer, or really anyone that needs deep focus and a large memory cache while working, you know why drive-by’s are bad. It takes a long time to load information back into your brain — most studies find it to be over fifteen minutes. Have you ever had a day where you felt like you got nothing done? Of course! I’d wager, the number of unscheduled interruptions on those days is pretty high.

Don’t Be the Drive-By

You can start fixing the problem by fixing your own behavior. Here are a few rules I follow to make sure I commit as few drive-by’s as possible.

  • Always message people before stopping by their desk. Try to keep it as a single message, 4 slack notifications for this request can be just as bad.
    I have a question
    It’s about that module you just added
    Mind if I stop by your desk?
  • If people are offline, assume they don’t want to be bothered. If you are at your desk and can handle messages, be online. If you are never online, it makes people less likely to follow this rule.
  • Only use @here or @handle in Slack (if you’re not using Slack, make step 1 use Slack) when something is urgent.
  • Use the daily stand up to ask questions. Stand up is your chance to share everything that someone might ask you that day. Be proactive. Take note of the types of questions you get asked throughout the day. Could you have addressed any of those questions in standup?

Becoming Drive-By Proof

Drive-by’s will happen and are impossible to eradicate. These are some of the ways I protect myself against them.

  • TDD (Test Driven Development) and Stubbing: Use a test as an anchor to help you find your place again and reload your mental stack. Did you just make a test go green before the disturbance? What did you stub? Stubs are a great way to keep track of what still needs to be done. Have all your stubs been implemented? You are ready to move on to your next requirement.
  • Requirements: Hopefully you have a clear set of requirements. My favorite stories have long lists of requirements. I use (/) in Jira to check off each one as it’s completed. If your stories don’t have a requirements list, take the time to convert the requirements to an actionable list. It will almost always pay off.
  • Mental Notes: If someone interrupts you, take a minute to make a mental note of what you were doing and what you need to do once you resume. Even better, write it down. This has two benefits: First, it lets you store your mental cache, which hopefully helps getting back in the flow easier once you resume. Second, if you make this your routine, coworkers will understand they won’t get an immediate response from you and will have to wait before asking you a question. This might make them more likely to try other sources before interrupting you.
  • Central Repository of Information: If people know you are only going to tell them to look it up in Confluence, they will stop asking. If you are asked a question, take the time to put your answer into the repo. This can also be a good exercise for the question asker, but make sure what they create is clear and complete.
  • Headphones are Key: Luckily I can code well while listening to music. This isn’t true for all engineers. I’ve known engineers that wear headphones with no music playing. This is an unfortunate last resort and usually comes across pretty passive aggressive.
  • Progress Reports: Hopefully you don’t need to check in with the team more often than stand up. For some highly coordinated tasks, you need to work closely with other team members. Make sure to announce your progress as you hit milestones and more importantly are in a good mental state to pause. Taking this initiative makes you less likely to be interrupted by questions like “Can I start working on X, did you finish Y?”


Clearly most of these suggestions are overhead and less necessary if you aren’t interrupted very often. There is no better solution than no interruptions!

TLDR: Only interrupt people when truly necessary. Break tasks into small manageable chunks and keep a list of completed and to-dos. Before you are fully interrupted, unload your mental cache into a short note.

Want to dig into this topic more? is a great article that really digs into some of the topics covered here and provides links to the full studies.

Have any other tips for preventing drive-by’s or resuming work after an interruption? Let us know in the comments.

Why Do I Have to Test Your Shit? (The 5-Minute Sniff Test)


If you can’t spend 5 minutes testing your work, then don’t make me.

I’ve spent 18 years in the software engineering industry, and have been called a programmer, then a developer, and now a software engineer — next is “delivery demon,” mark my words. In that time, I’ve tested my fair share of code. I’ve worked as a part-time tester, tested code I’ve written, ran QA groups, and poked my nose into systems I shouldn’t have.

There are many titles for those who professionally test software. You might be called a Tester, a Quality Assurance Analyst, or an Engineer in Test. You might be creating incredibly advanced automation scripts that produce beautiful reports via SonarQube, or you might be doing fully manual testing. Regardless of your title or approach, you are trying to discover issues before they make it to production, and users discover them.

A Tester is an Engineer’s partner in crime. Just like the editor of a book, you take what others (i.e. Engineers) have written and make it greater. As Steinbeck said of his editor Pat Covici, “He was much more than my friend. He was my editor. Only a writer can understand how a great editor is a father, mother, teacher, personal devil, and personal god. For 30 years Pat was my collaborator and my conscience. He demanded more than I had, and thereby caused me to be more than I should have been without him.”

Testers help Software Engineers go from Good to Great.

Over the years I’ve come to believe two things about testing:

First, what a tester does is hard. Their task requires the application of precision to a project’s specific requirements, unconventional thinking (box, what box?), channelling the end user, and a holistic/system-level view. Testers do an amazing job and have been critical to my success in releasing awesome software.

Second, a Tester’s job is made even harder by Software Engineers, because Engineers hand over shit code that doesn’t pass what I call the “5-Minute Sniff Test.”

The 5-Minute Sniff Test

As a Software Engineer, you chose a career building great software. You love solving fun and interesting problems using the latest frameworks, the coolest algorithms, and rainbows & unicorns. On a daily basis, “You Crush It!” However, all that screeches to a halt if you have to stumble through the muck of poorly defined (sometimes incomprehensible) requirements just to get to the heart of the problem.

Your progress is delayed, your time is wasted, and you’re pissed off! All of this could have been avoided if the Product Manager had just put a little more care into what they handed you. Doesn’t the PM take pride in their work? Do they even know what they are doing? Garbage in is garbage out.

When it is the Software Engineer’s turn to hand over their work, the code quality determines the efficiency, success, and morale of the person testing. The Tester expects the same level of quality as the Software Engineer expected from the PM. This doesn’t mean your code is bug free — we don’t want to be putting anyone out of a job — but it does mean your code meets a certain level of quality and professionalism.

How do we quantify the quality and professionalism expected? Enter the 5-Minute Sniff Test. Basically, if a someone (Tester, QA, Spouse, or Priest) can find major and obvious bugs in the first five minutes of testing your code, then you didn’t spend enough time testing. You’ve wasted your time, disrespected the Tester’s time, and started to build a reputation for delivering sloppy work (question: do you even know what you’re doing?). You laid a rotten egg that smells bad, everyone knows it is you, and you should be embarrassed. Garbage in is garbage out.

A Lesson from the Front

That “I want to crawl under a rock” feeling

My team and I once built an awesome new feature in our iPad app that was technically challenging and had the coolest design, and I was very proud of the work we did. It rocked and we had crushed it!

With a cheshire cat grin, I presented the app to our CTO of Asia (senior guy with about 10,000 people reporting to him). He started clicking everywhere except where I wanted him to click.

No. Don’t! STOP!

But he just kept clicking. The app crashed, data wouldn’t load, and mayhem ensued. He eventually handed the iPad back to me and grumbled, “This doesn’t look ready.”

He never actually saw the new features we had built. I was devastated, embarrassed, and wanted to crawl under a rock, but I had also learned an important lesson.

Take Care of Your Stuff

Don’t make someone else pick up after you.

Typically I’ll hear software engineers argue that it’s too hard to change their mindset from coding to testing.

That is why we have a testing team, right?


I’m not asking you to walk, rub your tummy, and chew gum at the same time (it is harder than you think). I’m asking you to literally spend five minutes testing what you built and what you should be proudly standing behind.

Just because the city has street cleaners, doesn’t mean you should toss your candy wrapper onto the ground.

Don’t make someone else pick up after you.

We need to do better.

And we can.

How To Perform Your Own 5-Minute Sniff Test

The 5-Minute Sniff Test is the first line of defense against the “I will just hand over anything” mentality.

It means before you hand over your feature, stop, take a deep breath, clear your mind, and for the next 300 seconds, test what you just built.

  • Outside the Box — 90 Seconds: Did you step outside of your engineering shoes and test from a user’s perspective? Does the feature look nice and feel right, aside from the requirements? I’ve painted myself into corners arguing a feature is valid because it is “what the requirements asked for.”
  • Check Your Unit Tests — 30 Seconds: Do all the unit tests pass, and is there full code coverage? You can’t test everything by hand, so automated unit tests are vital for code quality. The 30 seconds allotted isn’t to create the unit test, but just to verify they pass. You should also consider failing the build if the unit tests don’t pass, which would allow more time for the obvious.
  • The Obvious — 190 Seconds: Did you test all the obvious cases? For example, if the pressed button should cause unicorns to dance across the screen, be sure to test if pressing the button actually makes unicorns dance across the screen. Do I get any errors in the logs, do rainbows show even though that isn’t part of the requirements (although it should be, because that would be awesome), and what happens if I double click the button? Review the requirements and make sure everything adds up. For example, did you catch that the total number of seconds listed here equals 310 instead of 300 seconds? Let’s correct “The Obvious” to 180 seconds.

Five minutes is all it takes to make sure you’re not handing over shit. If all else fails, grab a colleague and ask them to sniff around.

Looking Ahead

In his book “Blink,” Malcolm Gladwell relates the story of interviewers making snap judgements about the quality of a candidate within the first five seconds of meeting. The interviewer continues to subconsciously reinforce their initial impression by subjectively weighing the candidate’s answers towards their bias.

In other words, your first impression is your last impression. I don’t believe I ever fully recovered my reputation with the CTO of Asia after that bad first impression.

Performing the 5-Minute Sniff Test to check your shit takes very little time, but it can be of tremendous benefit to you, your colleagues, and your company:

  • It helps you build a Reputation for excellence and quality,
  • It Respects your partner in crime, the Tester, and
  • It is more Efficient with everyone’s time.

When everything looks and feels right after the 5-Minute Sniff Test, go ahead and hand off the feature you are proud of and can stand behind. You’ll have a thankful Tester, and have built yourself a reputation as an awesome engineer.

Originally published at Simple Programmer