Repository / Resiliency and DR /Resilient Architecture Pattern: Tier 4

Resilient Architecture Pattern: Tier 4

📊 Tier 4: Primary + Recovery Site (Backup/Restore Only)
💡 This diagram is optimized for readability. Scroll horizontally on mobile devices to view the full architecture.

Resilient Architecture Pattern: Tier 4

Tier 4 represents the simplest and lowest-cost resiliency model. It relies entirely on backups, image-based archives, or infrastructure-as-code rehydration to restore services in an alternate location following a primary site failure. There is no continuous replication, no warm standby, and no pre-staged compute ready for failover.

This model is intended for workloads where extended downtime is acceptable and where cost, simplicity, or workload characteristics make higher tiers unnecessary.

The interactive architecture diagram above shows the Tier 4 configuration with scheduled backup transfers to cold storage and infrastructure-as-code templates for manual recovery processes.


Purpose and Positioning

Tier 4 focuses on recoverability rather than continuity. It provides a mechanism to rebuild or restore workloads after a site outage using validated backups or redeployment processes.

Compared to Tier 3:

  • There is no ongoing replication
  • RTO can range from hours to days depending on complexity
  • RPO corresponds to the most recent successful backup
  • Recovery operations require significant manual steps or orchestration tooling

Tier 4 is suitable for systems that are important enough to require off-site backups, but not critical enough to justify active replication or warm standby infrastructure.


Architecture Summary

Primary Site

  • Hosts all production workloads
  • Generates scheduled backups, snapshots, or images
  • Stores backup metadata required for recovery
  • Acts as the authoritative source of data and configuration until a disaster occurs

Recovery Site

  • Contains storage for off-site backups or object storage archives
  • May have defined but unallocated compute capacity
  • No running workloads under normal conditions
  • Used only during DR events or periodic restoration tests
  • Often cloud-based due to cost flexibility

Traffic Flow and DNS Behavior

Normal Operations

  • DNS points exclusively to the primary site
  • No load balancing or traffic distribution
  • Recovery site endpoints are not provisioned or routable

Failover Operations

  • Workloads are restored or redeployed at the recovery site
  • DNS records are manually updated after platform restoration
  • Recovery order is dictated by the runbooks and restore dependencies
  • DNS propagation time contributes directly to the RTO
  • This tier assumes no automated detection or promotion

Backup and Rehydration Model

Backup Types

Tier 4 recovery is based on one or more of the following:

  • VM-level backups or image archives
  • Database dumps (full + incremental if needed)
  • Application configuration exports
  • Filesystem-level snapshots
  • Object-storage–based backup repositories
  • Infrastructure-as-code definition sets for redeployment

RPO Characteristics

  • RPO equals the time since the most recent backup completed
  • Nightly backups result in up to ~24 hours of potential data loss
  • More frequent backups reduce RPO at additional cost

Rehydration Requirements

To restore service at the recovery site:

  • Deploy base infrastructure
  • Provision compute resources
  • Restore application data and configurations
  • Reconnect identity and security controls
  • Validate application behavior before DNS changes

Time required varies widely depending on system complexity and automation maturity.


Failure Scenarios and Outcomes

Primary Site Loss

  • Administrators initiate DR runbook
  • Rebuild or restore infrastructure in the recovery location
  • Restore applications, databases, and dependencies
  • Validate health before DNS cutover
  • RTO can range from several hours (well-automated) to multiple days

Partial Failure

  • Individual application stacks can be restored independently
  • Recovery complexity depends on backup granularity and dependency mapping

Data Corruption Events

  • Restore from last known viable backup
  • RPO depends on how frequently restore points were taken
  • May require selective restoration rather than full environment rebuild

Operational Considerations

Backup Integrity and Testing

Regular restore testing is mandatory

Test coverage should include:

  • Full platform restore
  • Single application restore
  • Backup validation checks

Automation Opportunities

  • Automated infrastructure provisioning (Terraform, Ansible)
  • Automated database restore pipelines
  • Automated dependency validation
  • Predefined recovery landing zones (cloud or on-premises)

Documentation and Runbooks

Tier 4 success depends heavily on:

  • Accurate system inventory
  • Documented backup locations
  • Dependency maps
  • Step-by-step recovery procedures
  • Clear RTO expectations agreed with stakeholders

Appropriate Workloads

Tier 4 is suitable for:

  • Non-critical internal tools
  • Batch workloads with restart capability
  • Development and test environments
  • Reporting and analytics systems where data can be regenerated
  • Archived or legacy workloads with minimal change frequency
  • Systems with low usage tolerance for downtime but essential data preservation

Unsuitable Workloads

Avoid Tier 4 for:

  • Customer-facing services with SLA-based uptime
  • Any workload requiring low RTO or RPO objectives
  • Real-time or transactional systems
  • Identity, authentication, or shared services
  • Platforms that are difficult to redeploy due to heavy statefulness or complex dependencies

Risks and Tradeoffs

  • Long RTO due to rebuild process
  • RPO dependent entirely on backup frequency
  • Recovery site may not be capacity-ready
  • Restore failures may occur if backups are incomplete or misaligned
  • Human error risk is higher due to manual procedures
  • Dependency mismatches may surface only during DR restoration

Summary

Tier 4 is the simplest and most cost-efficient resiliency pattern, relying on backups and rehydration rather than replication. It provides a structured recovery option for workloads that do not justify multi-site infrastructure. When combined with disciplined backup validation, clear runbooks, and modern automation, Tier 4 can be reliable for systems with flexible recovery requirements.