Hack+Startup: Mike Nolet, Founder and CTO at AppNexus
At Hack+Startup Brooklyn Beta Edition, Mike Nolet, Co-Founder and CTO at AppNexus talks DevOps and how to overcome difficult processes at scale like deployment and infrastructure.
AppNexus is an advertising technology company that works with some of the biggest companies in the world — Microsoft, eBay, and Orange to name just a few. At its core, AppNexus helps these businesses shift away from a traditional hand-sold advertising market where publishers have to deal with advertisers to one geared for the 21st century. In the process, AppNexus has raised over $65 million in venture capital from firms like Venrock, Khosla and First Round Capital (us!). Today, it has over 400 employees in New York City.
What Is DevOps Anyways?
Wikipedia describes DevOps as software engineering mixed with quality assurance and some technical operations.
As a CTO, you need to realize that DevOps is not just a new name for your sysadmin. If you’re just trying to sex up sysadmin, then call them site reliability engineers.
DevOps, on the other hand, requires dedicated focus and attention. As you scale, you will want internal tools that help you manage your production and you need to staff accordingly.
When AppNexus first began, they had their sysadmin team build code to automate jobs. But, they found most great sysadmins were not a good fit for this tool building focus.
Here’s why: Most sysadmins function in an interruption-driven world — most of their days are spent constantly reacting to the changing state of production environments and the needs of engineering. This makes it very hard for them to find the large chunks of time required to build solid tools and software applications.
Given the unique role, finding dedicated people to do the DevOps job is hard. At AppNexus, they found that rotational programs were most efficient. They’d have engineers do Ops work and vice versa. Once each person had familiarized themselves with their counterpart’s role, the DevOps team was able to write good repeatable code (that had unit tests in SVN, even).
But getting the right people isn’t enough. There are two other keys to success in DevOps.
DevOps Guys on Pager
Don’t do it! Your DevOps people can’t be in the line of fire and take normal Ops tickets. If you’re asking them to build the best internal tools possible, they won’t be able to deliver if they’re simultaneously trying to knock-out a never ending queue of bugs. You need to split off a couple people and have them be dedicated to this type of fire fighting.
Promote the Tools
At AppNexus, they have a tech team of about 150. Even today it’s hard to get people to collaborate. They assumed that if they built this fantastic API driven framework with scripting languages that everyone would just use it... but no one did. Instead, each team just wrote their own.
If you have your DevOps Team promote internal advocacy and have them do individual ride-ons with teams it helps the rest of the company understand how exactly the tools can benefit them. Ultimately, make sure they’re aligned properly and that they’re communicating and advocating for precisely what they’re doing.
So now you have the right people in the room. They’re focused on the right problem. And they’re talking to the right people. Now what?
Treat Them Like an Engineering Team
While AppNexus’s DevOps Team is technically under the company’s IT branch, they’re treated like engineers in the sense that they’re held to the same level of quality and bug control. If not, they couldn't be held accountable for a system that works across 3,000 servers. It's simple math, if you get your accuracy from 90% to 99.9%, that means you’re going to have something go wrong for every deploy across a thousand servers. Do your best to create testable code that actually works 100% of the time.
Using Open Source Effectively
There’s simply too much to build everything from scratch: Monitoring, metrics, production management, etc. So, it might seem obvious, but use open source tools effectively.
The decisions AppNexus made in some cases are driven by the tools that were available five years ago when they started. For example, Maestro was built from scratch before Chef even existed. For each tool you require — production management, continuous deployment, monitoring, metrics — make sure to evaluate open source options before rolling your own. Some of the tools AppNexus uses are Nagios, Ganglia, Graphite, and Puppet (for config management and system level stuff).
A caveat: If you do use these open source tools, treat them like production applications. Don’t just “yum install” and think you’re done. Automate spool-up, make sure you can roll releases and test changes in staging and test environments just like you would with production code.
One More Thing: Metrics
Too many companies consistently under-invest in metrics. AppNexus uses Graphite, which has worked really well: Each team has dashboards that show real status for everything the company has in production. The company is religious about utilizing these from the CEO down to the last engineer.
Read These Next
Hyper-Growth Done Right - Lessons From the Man Who Scaled Engineering at Dropbox and Facebook
Radical expansion must be in Aditya Agarwal’s genes. When he started as an engineer at Facebook, the company had fewer than 15 people. Within 6 years, he had risen to Director of Product Engineering, leading 2,000 employees to reach 700 million users. In 2012, when Dropbox acquired his startup Cove, the cloud storage incumbent staffed 30 engineers building for 50 million users. Now Agarwal directs an engineering arsenal of 200+ to protect the data of over 200 million people — and he just worked on yesterday's big launch of Carousel. “Most of what I’ve learned in my career has been during a period of hyper growth and change,” Agarwal says. To grow this fast, leaders need to plan day-by-day for the business they want to be in six months — not what they are right now, he says. How do you build an engineering team to constantly rotate and expand? How do you adjust a product strategy when your company transforms weekly, monthly, quarterly? At First Round’s last CTO Summit, Agarwal shared the secrets he’s tapped to keep his team aligned and productive during the fastest of sprints.
Facebook VP of Engineering on Solving Hard Things Early
January 2002. At Linden Lab, we were still referring to Second Life as Linden World, our furnace-less office was near freezing because the space heaters kept popping breakers, the Dot-Bomb crash was in full swing, DEMO 2002 was 4 weeks away, and 10 programmers were trying to duct tape everything together. Little did we know it was never going to get easier to fix the truly hard problems companies face. I talked about this, among other engineering challenges, at First Round Capital's last CTO Summit.
How Modern Marketplaces Like Uber and Airbnb Build Trust to Achieve Liquidity
In 2009, Airbnb was close to going bust with revenue flatlining at $200 a week. Since then, over 9 million people have used it to find temporary housing. Etsy was founded almost a decade ago, but doubled its valuation with its last two rounds of funding. The gradual but ultimately huge success of these entrants to the marketplace space has paved the way for Uber and Lyft’s breakout growth, and the explosion in startups with marketplace models: Postmates, Getaround, Taskrabbit, and more — quickly eclipsing the old guard represented by Craigslist. Marketplace startups are unique because they aren’t just serving one base of customers. They connect buyers and sellers, service providers and consumers. They have to make sure users are having a good experience with each other as well as their company. As head of product for fashion marketplace Threadflip, it's remarkable to me how much of this is based on our ability to inspire and maintain trust. And while "trust" sounds like a subjective term, building it is highly tactical.