a person wearing glasses

Tell us a little about yourself.

I’m Joy, an engineering director here at Optimizely. Our team is responsible for managing the infrastructure and tooling that Optimizely engineers use to build products.

Can you explain what infrastructure is?

At its most basic, when we release software we need access to a network to allow us to talk to it, and a server for it to live on. Infrastructure provides this connection and acts as the base layer on top of which we can build all of our applications. It can take different forms such as infrastructure as a service (IaaS) like Amazon Web Services, Google Cloud, or Microsoft Azure. It could be a more managed platform as a service (PaaS), like Heroku or Google App Engine. And it could run in your own data center, on your own servers.

How is it possible to experiment on infrastructure?

For a long time it was challenging to experiment on infrastructure since it was all physical hardware that you would have to modify in person. However, with companies making the switch from owning their own servers to migrating to cloud-based, it is now possible to think of infrastructure as code. We no longer need to physically plug in, rack, and stack servers every time we need a host to run our software on. We can launch virtualized servers (also known as instances) and containers via an API. We can also write code that does this for us, and check it into our software versioning systems such as git. This gives us auditability, reproducibility, automation, and the best of all the ability to run experiments as we can easily add conditional logic to test out different configurations.

What type of experiments can you run on your infrastructure and how is it different than typical A/B testing?

Usually when people think about experimentation, they think about product and marketing: A/B testing, audience recommendations, etc. Those are impactful, but it’s important to think about experimentation all the way down to the base of your stack. In the DevOps world, we can run experiments to answer questions like: how does this new instance type work for our application profile? What about if we change the I/O layer from ephemeral disk to a durable EBS volume? We can measure our application performance baseline and then use experimentation to iterate from there, with low risk. These changes can have major impacts to quality and cost.

How does experimentation fit into the build and deploy process?

Building involves taking code and turning it into a runnable software. Deploying means taking that runnable software artifact and installing it (on servers, instances, containers, or even as a function) so your customers can use it.

There are two experimentation enablers in common use today around the build and deploy process: Continuous Integration and Continuous Deployment (also known as CI/CD). The idea is that we make small changes all the time, and then as they are added we rebuild our software (continuous integration) and can deploy that software (continuous deployment). This allows for a rapid iteration of ideas. When we make small changes we can easily validate them: in staging, via automated testing, and even in production. And we make it easier to roll back when there is an issue, because it’s much easier to reason about the impact of rolling back one or a few smaller changes than a massive collection of interlocking changes.

How do teams limit risk when rolling out new features?

There are two types of metrics that can be affected when you roll out a new feature: your system performance metrics and your business metrics and if you hurt the former one, you often impact the latter. Canary deployments are common method of reducing system performance risk. The name comes from the idea of the poor “canary in a coal mine”. You roll a change out to some small percentage of your servers and see how the change affects things like CPU usage, latency, error rate, and more. This is similar to a feature rollout, which our own Full Stack product provides and is used to roll out features to a small audience to measure the impact on business metrics.

For infrastructure changes we can segment our rollouts server by server or container by container. We can monitor system performance using a tool like New Relic and see how we perform versus our baseline. Improved system performance means better user experience in your application, as well as reduced infrastructure spend.

What about feature flags? How do those fit into DevOps experimentation?

Feature flags are basically toggles that allow you to easily turn off or on some code without having to do a deployment or rollback. This means we don’t have to wait for a deployment process to complete to roll back a feature, and we can release code in a hidden fashion, decoupling a release cadence from your feature launches.

Feature flags are generally short lived and retired after a feature is fully deployed, but they can be used as long term ops toggles too! For instance: If you hit unusually high load, a feature flag can toggle off a non-vital and computationally expensive feature, like a Recommendations panel. Think of it as a manual circuit breaker.

What are blue/green deployments and how are those different than A/B testing?

Blue/green deployments allow you experiment down to the server layer. This is when you deploy a service by standing up a new release next to the old one (blue -> green) and then switching over. This also works well with canary deployments (a gradual shift of traffic from one to the other). The difference between blue/green deployments and A/B testing is that you are splitting traffic between servers instead of users.

With blue/green deployments, It’s easy to experiment by changing your server type (or instance type, or container) and see how your software behaves. You can upgrade your operating system, or core libraries. You can upgrade your language version! All of this and you can easily switch back to the next working release if something does go wrong.

Lastly, what are the biggest advantages of experimenting in your server layer?

Faster, safer improvements! You can observe issues before they hit your customers, roll back quickly when problems do arise, and experiment at every level: application, operating system, and hardware.

This turns every failure into a chance to learn, and reduces our stress across engineering. We create happier engineers, a happier product team, and hopefully a happier bottom line — which is what we all want.