Skip to content
Foetron Foetron Microsoft cloud operations

Backups that have been restored at least once.

Microsoft 365, Azure, and on-prem workloads tiered by recovery objective. Backups configured, restores rehearsed on a calendar, decision rights agreed before the incident. Recovery is the thing that matters, so we test it.

Recovery · tiered SLA
Tier 0
RPO < 15m · RTO < 1h
Tier 1
RPO < 1h · RTO < 4h
Tier 2
RPO < 4h · RTO < 24h
Tier 3
RPO < 24h · RTO < 72h
Veeam Azure Storage
4 tiers · rehearsed quarterly

recovery.tiers.ts

Tiered recovery posture

1 export const recoveryTiers = {
2 tier0: { rpo: '15m', rto: '1h', rehearsal: 'monthly' },
3 tier1: { rpo: '1h', rto: '4h', rehearsal: 'quarterly' },
4 tier2: { rpo: '4h', rto: '24h', rehearsal: 'quarterly' },
5 tier3: { rpo: '24h', rto: '72h', rehearsal: 'annual' },
6 }

What we keep seeing

The backup is fine. The restore is the problem.

Four patterns we see in tenants that have backups but have never actually exercised recovery.

  • 01

    Backup jobs green, restore unverified

    Every workload shows a healthy backup, but no one in the org has ever timed a full restore. The first restore is the incident.

  • 02

    M365 'Microsoft has backup' assumption

    Native retention is not backup. Deleted mailboxes age out, SharePoint version history is finite, Teams chat retention defaults are short. Surprises come at audit time.

  • 03

    RPO / RTO never agreed with the business

    IT picked a backup product. The business never said what an hour of downtime costs for which workload. Targets are unmeasured because they're unspoken.

  • 04

    Decision owner unknown at 2am

    When something goes wrong, the org spends the first hour deciding who decides — fail over, restore in place, declare incident, or wait. The clock runs the whole time.

Tiered recovery

Four tiers, agreed once, exercised on a calendar.

Every workload lands on one of four tiers. The tier defines the RPO, the RTO, the mechanism, and how often we prove it works. This table is the working document — not a marketing artefact.

Tier 0

Workloads

Identity (Entra ID), financial systems of record, customer-facing transactional apps.

RPO

≤ 15 min

RTO

≤ 1 h

Mechanism

Azure Site Recovery + geo-redundant backup + Entra ID hardening

Rehearsal cadence

Monthly full restore + quarterly region failover

Tier 1

Workloads

Email, collaboration (Teams, SharePoint), ERP, line-of-business apps with daily transactions.

RPO

≤ 1 h

RTO

≤ 4 h

Mechanism

Veeam for M365 + Azure Backup + immutable retention

Rehearsal cadence

Monthly workload restore + quarterly failover drill

Tier 2

Workloads

Internal tooling, reporting databases, secondary file shares, knowledge bases.

RPO

≤ 4 h

RTO

≤ 24 h

Mechanism

Azure Backup with weekly synthetic full + offsite copy

Rehearsal cadence

Quarterly file-level restore

Tier 3

Workloads

Dev/test environments, archival, low-change reference data.

RPO

≤ 24 h

RTO

≤ 72 h

Mechanism

Azure Backup standard tier + long-term retention

Rehearsal cadence

Annual restore verification

Tier assignment is a business conversation, not an IT one. We facilitate it; the customer signs off.

Rehearsal cadence

What gets exercised, when, and what it produces.

A backup that has never been restored is a hypothesis. We turn it into a fact on a published schedule, and we keep the artefacts so the next auditor doesn't have to take our word for it.

Monthly

Muscle memory

  • File-level restore from Microsoft 365 backup (random sample)

    Lead

    Foetron Ops

    Artefact

    Restore log + checksum diff vs source

  • Tier 0 workload point-in-time restore to isolated subscription

    Lead

    Foetron Ops + Customer IT lead

    Artefact

    Timing report + restored-app smoke test result

  • Backup job health audit — coverage, success rate, retention drift

    Lead

    Foetron Ops

    Artefact

    Monthly coverage report shared with customer

Quarterly

Failover muscle

  • Tier 0/1 region failover drill into secondary Azure region

    Lead

    Foetron + Customer exec sponsor

    Artefact

    Failover runbook with measured RTO + lessons captured

  • Identity recovery rehearsal — Entra ID + Conditional Access restore

    Lead

    Foetron IR + Customer IT lead

    Artefact

    Tested runbook for break-glass identity recovery

  • Cross-team tabletop on a chosen failure scenario

    Lead

    Foetron facilitator

    Artefact

    Tabletop notes + decision-gate updates

Annual

End-to-end proof

  • Full DR scenario — declared incident, decision gates, failover, cutback

    Lead

    Foetron + Customer exec sponsor

    Artefact

    End-to-end DR report; tier RTO/RPO targets re-validated

  • Tier 3 archival restore verification (samples)

    Lead

    Foetron Ops

    Artefact

    Restore log; retention policy reaffirmed

  • Tier assignment review with business owners

    Lead

    Customer exec sponsor; Foetron facilitates

    Artefact

    Updated tier register signed off by business

Cadence is published; rehearsals are calendared 12 months out. Skipped rehearsals are reported, not hidden.

What we don't do

We don't promise nines we can't prove. We tier workloads, rehearse the restore, and tell you when the calendar slipped.

Recovery posture is operational, not aspirational. The proof is in last month's restore log.

Restore path

What actually happens between incident and cutback.

Five steps. Each one has an owner, a decision, an expected duration, and an artefact it produces. The path is the same whether it's a deleted folder or a region failure — only the scale changes.

  1. 01

    Incident

    Duration

    0–15 min

    Owner

    Whoever notices · Foetron NOC paged

    Decision / artefact

    Confirm scope; classify by tier

  2. 02

    Decision

    Duration

    10–30 min

    Owner

    Customer exec sponsor

    Decision / artefact

    Failover vs in-place restore vs wait-and-monitor

  3. 03

    Failover / Restore

    Duration

    30 min – 4 h

    Owner

    Foetron Ops + Customer IT lead

    Decision / artefact

    Execute runbook; pause at validation gate

  4. 04

    Validate

    Duration

    30–90 min

    Owner

    Customer business owner

    Decision / artefact

    Confirm workload usable; sign off before announcing recovery

  5. 05

    Cutback

    Duration

    Scheduled

    Owner

    Foetron + Customer exec sponsor

    Decision / artefact

    Return to primary on a planned window; postmortem within 7 days

The decision step is the one most orgs skip. We rehearse it explicitly so the exec sponsor isn't the bottleneck on the night.

Recent recovery work

Verified restores, on the record.

One representative engagement. Customer name held back; outcomes signed off by their CIO.

Mid-market financial services · India

Ransomware tabletop turned into a real restore — and the runbook held.

Customer ran nightly backups but had never timed a restore. We tiered their workloads, set up Veeam for M365 + Azure Backup with immutability, rehearsed Tier 0 monthly. Six months in, a contained ransomware event hit a file server. The Tier 1 restore ran inside its RTO; the exec sponsor made the failover call inside 20 min because the decision rights were already mapped.

  • Tier 0 RTO achieved in production: 47 min (target ≤ 1 h)
  • M365 file restores verified monthly for 8 consecutive months
  • Decision gate from incident → failover call: 18 min on the live event
  • Postmortem produced 3 runbook updates, all merged within a week
  • Zero data loss on the affected file server (last backup 22 min before)

Mechanisms we operate

Tooling that's been picked because it has restored, not because it's been demoed.

Microsoft-native first; Veeam where M365 backup is the right answer; immutability everywhere it's available.

Primary accreditation

Next step

Request a recovery review.

We'll spend a session mapping your current workloads to tiers, identifying the gap between current and target RPO/RTO, and proposing a 90-day rehearsal calendar. No deck, no fear-selling.