Hybrid Cloud Architecture: When (and Why) to Mix AWS, Azure, and Bare Metal

The “all-in on cloud” era is maturing. After a decade of lift-and-shift migrations and cloud-first mandates, many organizations are discovering that the optimal infrastructure strategy is not a single platform — it is a deliberate mix of public cloud, private cloud, and bare metal, chosen workload by workload.

This is not a retreat from the cloud. It is an evolution toward cost-aware, performance-aware architecture. In this post, we lay out a practical framework for hybrid cloud decisions that we use with our consulting clients.

The Workload Placement Framework

Not every workload belongs in the same place. We evaluate placement across five dimensions:

Burst elasticity — Does this workload need to scale from 2 to 200 instances in minutes?
Data gravity — Where does the data live, and how much of it moves?
Compliance and sovereignty — Are there regulatory constraints on data location?
Steady-state cost — What does this workload cost at predictable, constant load?
Operational complexity — Does your team have the skills to run this themselves?

A workload that scores high on burst elasticity and low on steady-state volume (think: seasonal e-commerce, batch ML training) is a textbook public cloud candidate. A workload that runs 24/7 at predictable load with large data volumes (think: video transcoding, database servers, CI runners) is where bare metal or private cloud wins on cost — often dramatically.

The Real Cost of Cloud Compute

Let us compare a concrete example. A compute-heavy workload requiring 32 vCPUs, 128 GB RAM, running 24/7:

Provider	Instance	Monthly Cost (approx.)
AWS	m6i.8xlarge (on-demand)	~$1,115
AWS	m6i.8xlarge (1yr reserved)	~$700
Azure	Standard_D32s_v5 (on-demand)	~$1,100
Hetzner	AX162 dedicated	~$200
OVH	Advance-4 dedicated	~$250

That is a 4-5x cost difference for equivalent compute. Over three years and ten such servers, you are looking at hundreds of thousands in savings. The question is: what do you lose, and what do you need to build yourself?

The answer is that cloud providers bundle managed services — load balancers, managed databases, IAM, monitoring — into the platform. On bare metal, you need to provide these yourself. This is where your team’s operational maturity becomes the deciding factor.

Data Gravity: The Force That Shapes Architecture

Data gravity is the single most underestimated factor in cloud architecture. Data attracts compute, not the other way around. If you have 50 TB of data in AWS S3, moving it out costs real money (egress fees) and real time. More importantly, every service that processes that data wants to be close to it.

This means your cloud strategy often starts with a question: where does your data need to live?

For organizations with large on-premises datasets — manufacturing telemetry, media assets, scientific data — it can be cheaper to bring compute to the data (bare metal or private cloud) than to move the data to the compute (public cloud). We covered infrastructure cost optimization in our IT infrastructure audit guide, and data gravity analysis is always one of the first things we examine.

graph TD
    subgraph AWS / Azure
        EKS[Managed K8s]
        RDS[(Managed DB)]
        S3[Object Storage]
    end
    subgraph Hetzner / Colo
        CI[CI Runners]
        DEV[Dev / Staging]
        DATA[Data Processing]
    end
    subgraph Bare Metal
        PROX[Proxmox Cluster]
        CEPH[(Ceph Storage)]
    end
    EKS <-->|Site-to-Site VPN| CI
    CI <-->|WireGuard| PROX

A Practical Hybrid Architecture

Here is a pattern we deploy frequently for mid-size companies:

┌─────────────────────────────────────────────────┐
│                  Public Cloud (AWS/Azure)         │
│  ┌───────────┐  ┌───────────┐  ┌──────────────┐ │
│  │ Web Apps   │  │ Serverless│  │ Managed DBs  │ │
│  │ (EKS/AKS) │  │ Functions │  │ (RDS/CosmosDB)│ │
│  └───────────┘  └───────────┘  └──────────────┘ │
└──────────────────────┬──────────────────────────┘
                       │ Site-to-Site VPN / Direct Connect
┌──────────────────────┴──────────────────────────┐
│              Private Cloud (Proxmox/Hetzner)      │
│  ┌───────────┐  ┌───────────┐  ┌──────────────┐ │
│  │ CI/CD      │  │ Dev/Stage │  │ Data Pipeline│ │
│  │ Runners    │  │ Envs      │  │ Processing   │ │
│  └───────────┘  └───────────┘  └──────────────┘ │
└─────────────────────────────────────────────────┘

The key elements:

Customer-facing workloads in public cloud for elasticity and global edge presence
CI/CD infrastructure on bare metal or private cloud — build runners are compute-hungry, predictable, and cost-sensitive (we discussed this in our Kubernetes security hardening post in the context of securing build pipelines)
Development and staging environments on private cloud — they run 24/7 but do not need cloud-grade SLAs
Data-heavy processing colocated with data storage

The Networking Layer

Hybrid means networking complexity. You need reliable, secure connectivity between environments. The options, in increasing order of performance and cost:

WireGuard or IPsec VPN tunnels — Simple, works everywhere, adds latency
Cloud provider VPN gateways — AWS Site-to-Site VPN, Azure VPN Gateway
Direct Connect / ExpressRoute — Dedicated circuits, lowest latency, highest cost

For most mid-size deployments, a combination of WireGuard tunnels for management traffic and a cloud VPN gateway for production traffic works well. We manage the entire network fabric as code — a topic we explore in depth in our network engineering for DevOps post.

A Terraform snippet for an AWS VPN gateway:

resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.main.id

  tags = {
    Name = "hybrid-vpn-gw"
  }
}

resource "aws_customer_gateway" "on_prem" {
  bgp_asn    = 65000
  ip_address = var.on_prem_public_ip
  type       = "ipsec.1"

  tags = {
    Name = "on-prem-gateway"
  }
}

resource "aws_vpn_connection" "main" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.on_prem.id
  type                = "ipsec.1"
  static_routes_only  = true
}

When to Repatriate from Cloud

Cloud repatriation is not about rejecting cloud. It is about recognizing that some workloads were placed in the cloud by default rather than by design. Signals that a workload is a repatriation candidate:

Steady, predictable load with no meaningful burst requirements
High egress costs dominating the bill
Performance-sensitive workloads where cloud multi-tenancy adds unwanted variance
Large data volumes that make cloud storage expensive at rest and in transit
Compliance requirements that are simpler to satisfy on infrastructure you fully control

The repatriation process itself should be incremental. Move one workload at a time. Run it in parallel. Validate performance and reliability before cutting over. Use the same infrastructure-as-code tooling — Terraform, Ansible, Packer — across both environments so that the operational model stays consistent. If you are considering Proxmox as your private cloud platform, the Terraform provider makes this workflow straightforward.

Making the Decision: A CTO Checklist

Before committing to a hybrid approach, ask:

Do we have (or can we hire) engineers who can operate bare metal or private cloud infrastructure?
Is our infrastructure-as-code maturity high enough to manage multiple platforms consistently?
Have we modeled the 3-year TCO including personnel costs, not just compute costs?
Do we have clear workload classification criteria, or will every team argue for their preferred platform?
Is our monitoring and observability stack platform-agnostic?

If you answer “no” to most of these, staying in a single public cloud with cost optimization (reserved instances, spot, right-sizing) may be the better path for now. Hybrid infrastructure done poorly costs more than single-cloud done well.

Conclusion

The hybrid cloud conversation has matured past ideology. It is no longer “cloud vs. on-prem” — it is “which workload, on which platform, and why.” The organizations getting this right are the ones treating infrastructure placement as a continuous, data-driven decision rather than a one-time migration project.

At robto, we help companies map their workloads to the right platforms, build the networking and automation layers that make hybrid work, and continuously optimize placement as costs and requirements evolve. If your cloud bill is growing faster than your business, it is worth having the conversation.