Hack+Startup: Mike Nolet, Founder and CTO at AppNexus
At Hack+Startup Brooklyn Beta Edition, Mike Nolet, Co-Founder and CTO at AppNexus talks DevOps and how to overcome difficult processes at scale like deployment and infrastructure.
AppNexus is an advertising technology company that works with some of the biggest companies in the world — Microsoft, eBay, and Orange to name just a few. At its core, AppNexus helps these businesses shift away from a traditional hand-sold advertising market where publishers have to deal with advertisers to one geared for the 21st century. In the process, AppNexus has raised over $65 million in venture capital from firms like Venrock, Khosla and First Round Capital (us!). Today, it has over 400 employees in New York City.
What Is DevOps Anyways?
Wikipedia describes DevOps as software engineering mixed with quality assurance and some technical operations.
As a CTO, you need to realize that DevOps is not just a new name for your sysadmin. If you’re just trying to sex up sysadmin, then call them site reliability engineers.
DevOps, on the other hand, requires dedicated focus and attention. As you scale, you will want internal tools that help you manage your production and you need to staff accordingly.
When AppNexus first began, they had their sysadmin team build code to automate jobs. But, they found most great sysadmins were not a good fit for this tool building focus.
Here’s why: Most sysadmins function in an interruption-driven world — most of their days are spent constantly reacting to the changing state of production environments and the needs of engineering. This makes it very hard for them to find the large chunks of time required to build solid tools and software applications.
Given the unique role, finding dedicated people to do the DevOps job is hard. At AppNexus, they found that rotational programs were most efficient. They’d have engineers do Ops work and vice versa. Once each person had familiarized themselves with their counterpart’s role, the DevOps team was able to write good repeatable code (that had unit tests in SVN, even).
But getting the right people isn’t enough. There are two other keys to success in DevOps.
DevOps Guys on Pager
Don’t do it! Your DevOps people can’t be in the line of fire and take normal Ops tickets. If you’re asking them to build the best internal tools possible, they won’t be able to deliver if they’re simultaneously trying to knock-out a never ending queue of bugs. You need to split off a couple people and have them be dedicated to this type of fire fighting.
Promote the Tools
At AppNexus, they have a tech team of about 150. Even today it’s hard to get people to collaborate. They assumed that if they built this fantastic API driven framework with scripting languages that everyone would just use it... but no one did. Instead, each team just wrote their own.
If you have your DevOps Team promote internal advocacy and have them do individual ride-ons with teams it helps the rest of the company understand how exactly the tools can benefit them. Ultimately, make sure they’re aligned properly and that they’re communicating and advocating for precisely what they’re doing.
So now you have the right people in the room. They’re focused on the right problem. And they’re talking to the right people. Now what?
Treat Them Like an Engineering Team
While AppNexus’s DevOps Team is technically under the company’s IT branch, they’re treated like engineers in the sense that they’re held to the same level of quality and bug control. If not, they couldn't be held accountable for a system that works across 3,000 servers. It's simple math, if you get your accuracy from 90% to 99.9%, that means you’re going to have something go wrong for every deploy across a thousand servers. Do your best to create testable code that actually works 100% of the time.
Using Open Source Effectively
There’s simply too much to build everything from scratch: Monitoring, metrics, production management, etc. So, it might seem obvious, but use open source tools effectively.
The decisions AppNexus made in some cases are driven by the tools that were available five years ago when they started. For example, Maestro was built from scratch before Chef even existed. For each tool you require — production management, continuous deployment, monitoring, metrics — make sure to evaluate open source options before rolling your own. Some of the tools AppNexus uses are Nagios, Ganglia, Graphite, and Puppet (for config management and system level stuff).
A caveat: If you do use these open source tools, treat them like production applications. Don’t just “yum install” and think you’re done. Automate spool-up, make sure you can roll releases and test changes in staging and test environments just like you would with production code.
One More Thing: Metrics
Too many companies consistently under-invest in metrics. AppNexus uses Graphite, which has worked really well: Each team has dashboards that show real status for everything the company has in production. The company is religious about utilizing these from the CEO down to the last engineer.