Operations

The Friday deploy that broke Monday

Shipped at 4:47pm. Worked in staging. Discovered the edge case at 6am Monday, when ten thousand customers found it first.

The Friday deploy that broke Monday
Illustration · Deimar Gutiérrez

A senior engineer merged a feature at 4:47pm on a Friday. It passed CI, it passed staging, it passed every manual check on the list. He closed his laptop and went into a weekend with the satisfaction of having shipped. The Slack channel he had paged the team in showed three thumbs-up reactions and one celebratory GIF.

At 6:03am Monday, the on-call engineer paged him. The feature had silently failed for every customer in the EU region — the test environment did not include the timezone that exposed the bug. Ten thousand customers had hit the error path over the weekend. The customer support queue had filled with tickets the on-call had been responding to since 4am. The senior engineer spent his Monday morning writing an incident report and his Monday afternoon writing the fix that he had told himself, on Friday, he would have plenty of time to write if anything went wrong.

The Friday push has a predictable cost profile. The deploy works often enough — eight times out of ten — that the engineer who shipped it ends Friday feeling like a professional. The two times out of ten it fails, the failure happens at the worst possible moment, with the worst possible audience, and the cost is paid by the on-call engineer rather than the deployer. The deployer learns no lesson because the lesson is being absorbed by someone else's weekend. The on-call engineer absorbs the lesson and rarely has the standing to translate it into a structural change to the deploy practice.

The popular fix is a no-Friday-deploy rule. The fix works for one quarter and erodes thereafter. Engineers route around it. It's only a tiny change. The feature flag is off, so it's not really a deploy. The customer needs it before Tuesday. The cherry-pick is technically a hotfix, not a deploy. The exceptions multiply. Within two quarters the rule is decorative. The engineering manager who tries to enforce it ends up policing edge cases that the team has invented specifically to route around the policy.

The version that holds is operator-relative rather than calendar-relative. Deploys happen in windows where there is a named, paid, on-call human who agreed to be paged for the consequences. Friday afternoon often is not that window — not because the day is cursed, but because the operator pool shrinks for the next sixty hours. The conversation moves from can I ship this to who has agreed to handle this if it breaks. Most Friday deploys do not survive that question, because the engineer doing the deploy is not the named on-call, and the named on-call has not been asked whether they consent to absorbing the deploy's risk over their weekend.

The operator-relative framing produces different behavior than the calendar rule. The engineer wanting to ship on Friday now has to identify who is on call, ask them explicitly, and explain the deploy's risk profile. Most engineers will not run this conversation for a non-urgent change. The conversation itself is the friction that produces the right behavior. The team that internalizes the conversation stops needing the calendar rule, because the conversation has become the deploy's natural precondition.

The underlying culture point is harder. Shipping is the end of the engineer's work and the beginning of the operator's. Companies that treat the two as a single job — where the engineer who deploys is the same person who is paged when the deploy fails — end up with healthier deploy practices, because the engineer doing the deploy is the same person who will be paged. The Friday push has a different cost when the deployer is the on-call. Most deployers, asked to also carry the on-call, push less on Friday afternoon.

Companies that have separated the two roles have to bridge the incentive gap with policy, and policy decays. The on-call engineer absorbs the cost without the authority to prevent it. The deploying engineer produces the risk without absorbing the consequence. The incentive misalignment is the cause; the calendar rule is the symptom. Fixing the calendar rule without fixing the incentive does not produce durable behavior change.

The structural fix, where the company can absorb it, is to rotate on-call across the deploying engineers. The week each engineer is on-call is the week they think most carefully about what they deploy. The deploys on the on-call's own deploy windows tend to be more careful, more reversible, more thoroughly tested. The deploys outside the on-call's own windows tend to be the deploys that introduce the weekend incidents. Once each engineer has been the on-call during another engineer's bad Friday deploy, the team's collective behavior around Friday deploys shifts. The lesson absorbs through the team rather than through policy.

The deeper habit is to recognize that Friday's deploy decision is a decision about other people's weekends. Most engineers, presented with the framing directly, defer the deploy without being asked. The framing requires being said explicitly. The change you are about to deploy will be carried by the on-call through the weekend. The on-call did not agree to that. The sentence shifts the deploy from a personal completion to a team consequence. Most deployments survive this framing. Some don't, and the ones that don't are the ones that should have been deferred.

The Friday push feels like closure. It is not closure. Closure is the deploy that worked through the weekend and was already proven on Monday morning. The version that shipped at 4:47pm Friday is, until further notice, a hypothesis. Treat it that way.

Before your next Friday deploy decision, ask:

  • Is the named on-call aware of this deploy, and have they explicitly consented to carry its risk this weekend?
  • If the deploy waited until Tuesday morning, what specifically would the company lose?
  • What is the reversal plan if the deploy fails between 8pm Friday and 7am Monday?
  • If I were the on-call this weekend, would I want this deploy to ship on Friday at 4:47pm?

The fourth question is the honest one. Most Friday deploys do not survive it. The deploys that survive it are the deploys that have already been coordinated with the on-call and have a clean reversal path. Those are the deploys worth shipping on Friday. The rest are the deploys that should ship on Tuesday, against an on-call who has been told and has consented.