May 26, 2020

The 6 Types of Feature Flags You’ll Meet at Optimizely: Feature Rollouts

Tl;dr 🚩 How do we use Feature Rollouts at Optimizely?

Tl;dr

🔒Access Level? Any engineer at Optimizely.
😬Risk Level? Depends on what is being rolled out.
👩‍💻Tests? Unit, Integration, End to End, and Manual (All the tests!)
⏰Lifetime? Short. Remove these as soon as features are adopted.

Feature Rollouts are a method of trunk-based development our developers use to practice Continuous Delivery. Teams can check in code into a shared integration branch (master) while allowing this branch to be deployed to production at any time. This gives engineering teams the flexibility to control when they want to release a feature rather than waiting for a code deployment. As feature flags allow for deployment to be an internal, workflow decision and releasing the code to production to be a business decision.

While this flexibility is a nice feature, the real power comes in the ability for engineers to control how much of their feature is accessible to their customer base with a flick of a switch. Development teams can select a specific group of customers to release their feature to (Beta customers), or a specific amount of audience (20% of production). If the feature is working, they can roll the feature to everyone, or if something goes wrong, completely roll back the change, minimizing risk when the feature is released. Since these flags are used to deploy a permanent feature, as soon as the feature has been adopted the feature flag needs to be removed.

How do we use Feature Rollouts at Optimizely?

Rocket Launch

Photo by SpaceX

The Optimizely engineering team ships almost all our features behind feature rollouts. We do have a few exceptions when this is not the case (beautifying code, external documentation, internal documentation, small bug fixes), but the majority of our features are deployed behind a feature rollout.

The general lifecycle of a typical feature rollout is the following:

During the Technical Design Documentation phase, the Developer and Product Team discuss:
1. Design of new feature
2. Risk to roll back (see below section on Risk)
3. Deployment strategy (staged rollouts)
4. Removal strategy & exit criteria (when and who will remove this flag)
The developer builds the feature:
1. Finish coding and building feature
2. Adds the Optimizely feature flag to gate the feature
3. Adds unit testing of new component that is behind the feature flag
4. Adds any E2E testing (see below section on testing)
The developer creates a pull request and links a feature flag Jira ticket to the PR for tracking and governance. The Jira ticket contains the following information:
1. Feature flag owner
2. Risk to roll back
3. Details regarding the feature flag on vs off and it’s repository location
4. What state the flag is currently in (In Dev, Testing, Rolling Out, Deployed, Ready to be Removed, Removed)
Product then uses the feature flag to deploy out the new feature. Deployments usually go in the following steps:
1. A targeted audience such as QA, or a beta customer group – This is our canary testing stage and allows us to test in production.
2. Increase Traffic – We start rolling out the feature to a wider amount of our customer base. Depending on the risk we think will be associated with our change and the size/impact of our change, we may choose to do a smaller percentage (15-45%) vs a greater one (60-80%). Sending traffic to our new feature allows us to check logging and ensure the feature is working correctly at scale. This rollout process reduces the blast radius if the feature has any bugs.
3. Product is Fully Rolled Out – Our feature is now fully deployed.
4. Feature Flag is Removed – Once our exit criteria (ie: flag is fully rolled out and in use for 30 days without incident) is met to remove our feature flag, teams will remove this feature. Some teams make it part of their requirements for “feature complete” to remove the flag. Others rely on our Feature Flag Removal day to ping them to remove their flags.

Sample pull request

Who is allowed to make changes to Feature Rollouts?

At Optimizely, multiple teams collaborate closely with each other, so all engineers have access to all our feature rollouts flags (multiple teams might be utilizing a flag, or audience condition for that flag). We do however audit Change History to have oversight over who made changes to a flag and ask that if you do make a change, to update the linked Jira ticket with your reasoning. These two methods combined gives engineering clarity over what is happening with the rollout.

What is the risk level for using a Feature Rollout?

We categorize risk for our rollouts on a High, Medium and None Scale. These are the definitions we assign to each level:

High – DO NOT ROLL BACK. Bad things will happen. These are rollouts that cannot be rolled back without some catastrophic impact. For example, we may have a feature that modifies an entity and the backend datastore that modifies it. Turning off the feature may strand customers whose entities are already changed, but no longer have access to the backend datastore (elements are now out of sync). Rolling back usually requires some patching for customers that had the feature rolled out to them.
Medium – These can be rolled back, but is not recommended until discussing it with the engineering owner. Depending on the reason, they may be swayed into letting the flag be rolled back, but there will be some repercussions. For example, a beta customer may be using these features and removing them will cause customer friction.
None – This signifies that the flag can be turned off at any time. However, if someone turns off another engineer’s flag, they must message them(at Optimizely, it’s a nicely worded Slack message, but this could be in Jira) afterwards to give them a heads up.

Funny turn off feature flag

These risk levels are used mostly during any engineering incident when an offending flag may have caused some service level disruption. The on-call engineer will usually check the risk level (High/Medium/None), and if it is None, they can disable the flag immediately. This helps keep our Mean Time to Recovery low as we can resolve our incidents without shipping code.

How does Optimizely test its deployment of Feature Rollouts?

Testing Pyramid

Testing Pyramid for Feature Rollouts (From the book: Ship Confidently with Progressive Delivery and Experimentation)

Optimizely uses a combination of testing strategies. Implemented together, these gives us high confidence of feature coverage and quality release:

Unit Test – Every feature rollout is required to have a unit test to validate each component. Our unit tests are small and are not aware if they are affected or not by a feature flag. Each code path created has its own set of independent unit tests.
Integration Test – Since integration tests are testing combinations of integrations interacting with each other, we use mocking and stubs to control feature states. For example if we wanted to test a certain API response to an external system, we can stub the response from the Optimizely SDK to force our test into a specific state mirroring the feature rollout’s path. This makes our test able to run in a deterministic fashion.
End to End Test – These are the most expensive tests to run. Therefore we focus only on the most critical variations and we use automated audience targeting or whitelisting in our test runners to create a deterministic path to assert on.
Manual Verification – We use heuristic testing to determine what are the most high risk variations or paths we need to check. Manual verification is very similar to end to end testing, in which we force bucket our testers into one variation or another. We also expose to them info from the Optimizely logger and Notification Listener so they have better visibility in the behavior of the flag.

When does Optimizely remove its Feature Rollouts?

Feature rollouts are meant to deploy code that is permanent once we have high confidence that what we deployed works correctly. For every feature, we have the product team determine what the features exit criteria is. Here are some examples:

X number of customers have begun using the feature without any major incidents
Y amount of traffic has flowed through the feature without any service degradation
Z number of days the feature has been exposed to the public without any incidents

Once the exit criteria has been met, individual teams implement their feature removal strategy. Overall, as an engineering organization, we track how many flags have been moved into the “ready to be removed” state, and ensure that our feature rollout flags are constantly being removed.

This is the part of a series that will dive more into the different feature flag types we have here at Optimizely, as well as the engineering teams that implement them. See you next time as we dive in deeper on Bug Fix feature flag type.

If you’re looking to get started with or scale feature flags, check out our free solution: Optimizely Rollouts!

Are you using feature rollouts at your company? I’d love to hear about your experience! Twitter, LinkedIn