Azure Capacity: Tricks and Tips You Should Know

Azure has one of the largest global footprints of any cloud provider. Easy to assume capacity is basically unlimited. It isn’t, and you’ll usually find that out at the worst possible moment, mid-deployment, post-patching windows, or mid-incident.

Seven years working on Azure for a European ISV leaves you with a few hard-won lessons. Here are the ones worth writing down.

Capacity Reservations vs. Reserved Instances

Most people know Reserved Instances: commit to 1 or 3 years, get a discount, up to 72% vs PAYG on some SKUs. What gets missed is that an RI is purely a billing construct. It doesn’t hold a physical slot anywhere. If a region is under pressure when you need to spin something up, your RI won’t help you.

On-Demand Capacity Reservations are different. They guarantee a specific SKU exists and is available in a specific zone at any time, backed by Azure VM SLAs. You pay for that slot whether anything’s running on it or not. The Azure Infrastructure blog published a good deep-dive on this recently: Demystifying On-Demand Capacity Reservations. I specifically like the parking garage analogy, worth a read.

For DR workloads, the answer is usually both. Capacity Reservation so the slot is actually there when you need it. Reserved Instance so you’re not paying full PAYG rates while it sits idle. A failover VM can’t depend on capacity being available by luck.

One gotcha worth knowing: a “regional” capacity reservation doesn’t mean what it sounds like. Regional reservations don’t provide zone-level resilience or automatic failover guarantees. If you need resilience across zones, create separate reservations per zone.

Capacity reservations can also be shared across subscriptions, useful for lending DR capacity to pre-prod temporarily, or pooling reserved slots across environments without duplicating the cost.

Quota is not Capacity either

Quota is the ceiling on what you’re allowed to request in a given subscription and region. Capacity is what’s physically available in the infrastructure. You can have 1000 vCPU quota in West Europe and still fail to allocate a Standard_D16s_v5 because that SKU is constrained there right now. Two different problems, and the error messages look almost identical:

Quota exceeded: OperationNotAllowed, ResourceQuotaExceeded
Capacity failure: AllocationFailed, ZonalAllocationFailed, NotAvailableForSubscription

There’s a second thing quota requests do that most people don’t realise: they are believed to be one of the demand signals to Microsoft’s capacity planning team. The process broadly flows from usage data → demand forecasting → procurement → hardware deployment. Therefore, raising quota early isn’t just about unblocking yourself, it’s how Microsoft knows to provision more capacity for your region and SKU. Sitting on default quota and hoping for the best doesn’t send that signal.

Raise quota before you need it. Request quota in your secondary region too, it costs nothing and gives you a real fallback if you ever need it.

When an allocation fails: check the quota blade first, then try a different availability zone. One of those two usually resolves it without a support ticket.

Subscription Vending and the Default Quota Problem

This one catches nearly every enterprise landing zone team eventually.

Under MCA, every new subscription starts with default quota, regardless of how long your billing account has existed, how many subscriptions sit under it, or what you spend with Microsoft. Azure largely treats each new subscription like a new customer from a quota perspective. Of course, billing/account context does seem to matter in some backend processes (especially when escalating quota requests).

There’s another layer to this: when a region is under capacity pressure, Microsoft may restrict newly created subscriptions from allocating there at all, to protect existing customers’ ability to scale. New subscriptions in congested regions like West Europe can hit this on day one, before they’ve deployed anything. The error looks like NotAvailableForSubscription or a message that the selected region is not accepting new customers.

If you’re vending subscriptions automatically as part of a landing zone, both of these will bite you. The fix: include quota raise requests in the vending pipeline. Before the subscription gets handed over, request quota for the SKU families and regions you know will be used. Also worth keeping in mind: if you urgently need to unblock a new deployment, check whether an existing subscription’s quota can be used temporarily while the new one’s request is processed.

# Check current quota for a subscription
az quota list \
  --scope /subscriptions/<sub-id>/providers/Microsoft.Compute/locations/swedencentral \
  --output table

# Request an increase
az quota create \
  --scope /subscriptions/<sub-id>/providers/Microsoft.Compute/locations/swedencentral \
  --resource-name standardDSv5Family \
  --limit-object value=100 \
  --resource-type dedicated

Quota approvals on new subscriptions also tend to be slower, Microsoft’s system looks at usage history, and a subscription created five minutes ago has none. Raising the request at provisioning time, even before workloads exist, at least starts that clock.

If you run into quota issues where you need help from Microsoft Support, it’s worth looking into the Azure Support REST API as well.

Quota Groups

Managing quota per-subscription at scale is a mess. Dozens of increase requests across dozens of subscriptions, no central view, and every new subscription starting from zero.

Quota Groups let you define a pool at the Management Group level and distribute it across subscriptions. New subscriptions draw from the group rather than starting with defaults. Total consumption is visible in one place. Particularly useful for ISVs and service providers managing multiple customer subscriptions, or enterprises that shard workloads across many subscriptions by department or team.

Set this up early. Retrofitting it when you already have 30 subscriptions running is painful.

Configured through the Azure Quota portal under Management Group quota.

Useful Quota References

Azure Quota portal — portal.azure.com/#view/Microsoft_Azure_Capacity/QuotaMenuBlade — limits per subscription, region and SKU family, with increase requests built in
CLI — az vm list-usage --location swedencentral — vCPU consumption vs. limits without opening the portal

Checking Capacity Before You Deploy

Some useful commands to help you get started before you deploy:

# Check if a SKU is available in a region/zone:
az vm list-skus \
  --location swedencentral \
  --size Standard_D8s_v5 \
  --output table

Look at the Restrictions column. If it says None, the SKU is not restricted for your subscription. If you see NotAvailableForSubscription or zone-level restrictions, that’s your signal, though worth noting these commands show subscription-level restrictions, not real-time physical capacity. A SKU can show as unrestricted and still fail to allocate under pressure, that’s the AllocationFailed error rather than NotAvailableForSubscription.

# Check across all SKUs in a region at once:
az vm list-skus \
  --location swedencentral \
  --resource-type virtualMachines \
  --query "[?restrictions[?reasonCode=='NotAvailableForSubscription' || reasonCode=='NotAvailableForLocation']].[name,restrictions[0].reasonCode]" \
  --output table

# Check a specific zone:
az vm list-skus \
  --location swedencentral \
  --size Standard_D8s_v5 \
  --query "[].{SKU:name, Zones:locationInfo[0].zones, Restrictions:restrictions}" \
  --output json

Availability Zones Are Not Interchangeable

Each zone has its own capacity pool. Zone 1 having headroom for Standard_D8s_v5 says nothing about Zone 2. Even distribution across zones isn’t guaranteed.

For VMSS in flexible orchestration mode, zoneBalance: false lets Azure place instances where capacity exists rather than enforcing an even split. An uneven deployment that runs beats a balanced one that doesn’t.

Define SKU fallbacks too. If Standard_D8s_v5 isn’t available, fall through to Standard_D8s_v4 or Standard_D8as_v5. The workload usually doesn’t care which generation it’s on, and giving the platform options makes allocation failures less likely.

The Logical vs. Physical Zone Problem

Physical zones are mapped to logical zones per subscription, and the mapping differs between subscriptions. Azure assigns this mapping at creation time and it’s effectively fixed for the lifetime of the subscription, meaning subscription A’s logical Zone 1 and subscription B’s logical Zone 1 may point to completely different physical datacenters. Microsoft can and does remap zones during major infrastructure expansions, though this is rare and communicated in advance. Read up on availability zones if this is new to you.

Azure does this deliberately. If every subscription mapped Zone 1 to the same physical datacenter, that DC would be overloaded within days. Shuffling the assignments distributes load across the physical infrastructure.

The risk is that deploying to “Zone 1” across subscriptions doesn’t guarantee the same physical datacenter, which creates hidden problems in HA and DR architectures. Production in Subscription A (Zone 1) and DR in Subscription B (Zone 1) can both map to the same physical datacenter, meaning your DR strategy provides zero protection. You can’t choose which physical zones your subscription maps to, and you can’t change the mappings later.

To validate:

# Azure CLI -- query the locations API:
az rest --method get \
  --uri '/subscriptions/{subscriptionId}/locations?api-version=2022-12-01' \
  --query 'value[?name==`westeurope`].availabilityZoneMappings'

# Cross-subscription peering -- use the Check Zone Peers API:
az rest --method post \
  --uri 'https://management.azure.com/subscriptions/{subId}/providers/Microsoft.Resources/checkZonePeers/?api-version=2022-12-01' \
  --body '{"location": "westeurope", "subscriptionIds": ["subscriptions/{otherSubId}"]}'

This gives you the availabilityZoneMappings array showing which logical zone (1, 2, 3) maps to which physical zone (e.g., westeurope-az2).

The key scenarios where this bites in practice:

Hub-and-spoke topologies across subscriptions — your “Zone 1” hub and “Zone 1” spoke may be in different datacenters, adding latency
DR setups — prod and DR in the same logical zone number across subscriptions could accidentally land in the same physical DC
Low-latency workloads (financial, gaming, SAP) — cross-subscription co-location requires explicit physical zone validation

Always validate zone mappings per subscription before deploying, especially in multi-subscription architectures.

Stop Defaulting to West Europe

West Europe (Amsterdam) is one of the oldest Azure regions and one of the most congested. Standard_D and Standard_E families are often constrained.

As best I can tell, West Europe and its pair, North Europe, are treated in Microsoft’s capacity planning as regions prioritised for existing workload growth, not accelerated expansion. Sweden Central on the other hand has seen significant recent investment.

There’s a data residency angle too: West Europe sits in the Netherlands. For customers in Belgium, Germany, or elsewhere in the EU, that’s not always what’s actually needed, even if it’s what gets picked by default.

The latency argument for staying in West Europe is usually weaker than assumed. West Europe to Sweden Central typically sits under ~20ms in tests. For most workloads, not a real constraint, but measure it rather than guess.

azurespeed.com runs latency tests from your browser to Azure regions using Microsoft’s own test blobs. Good enough for a sanity check before you’ve deployed anything.

From within Azure, Network Watcher’s Connection Monitor is the right tool. Continuous latency metrics, Log Analytics integration, alerting. Worth setting up for any workload where inter-region latency matters.

What I did in practice: spin up a small VM in the candidate region and run these from the existing environment:

# TCP latency -- more realistic than ICMP for actual application traffic
hping3 -S -p 443 -c 10 <target-ip>

# Windows equivalent
psping <target-ip>:443

# curl timing breakdown
curl -o /dev/null -s -w "Connect: %{time_connect}s\nTotal: %{time_total}s\n" https://<target-endpoint>

Measure it and let the numbers decide. West Europe being the lowest-latency option for Belgian or Dutch customers is wrong more often than people expect.

Reservation Hygiene

Advisor’s RI recommendations are worth acting on. They’re based on up to 30 days of usage data and are usually accurate. What Advisor won’t surface: stale reservations. Something bought 18 months ago for a SKU generation that’s since been replaced can sit at 20% utilisation and nobody notices. Check Cost Management → Reservations periodically. Below 80% utilisation is worth investigating, either the workload changed, or the scope is wrong.

Scope is the common issue. A reservation scoped to one subscription doesn’t apply anywhere else. In EA and MCA environments, scope at the Management Group or Billing Account level so the discount floats to wherever the usage actually is.

Summary

Scenario	What to do
DR workloads needing guaranteed startup	Capacity Reservation + Reserved Instance
New subscription via vending pipeline	Raise quota at provisioning time via API/CLI
Multi-subscription enterprise estate	Set up Quota Groups at Management Group level
Allocation failure during deployment	Check error code, try adjacent zone, try SKU alternatives
Multi-zone VMSS in constrained region	`zoneBalance: false`, define SKU fallback list
Greenfield deployment, flexible on region	Consider Sweden Central
Reservation utilisation dropping	Review scope, check if SKU generation changed

Capacity planning is unglamorous work. But it’s the kind of thing that ruins a day when ignored, usually the day you can least afford it.