Backup and DR in the Time of Hybrid Clouds (The Microsoft Way): Part Two

Written by Ermin Mlinaric | Dec 6, 2018 12:11:20 AM

Part One: Azure Backup Server

Part Two: Azure Site Recovery

At last, the long awaited (and long overdue) second part of the series is in front of you. Today, I’ll cover Azure Site Recovery (ASR), Microsoft’s Disaster Recovery-as-a-Service (DRaaS) offering, basics and how it can help you protect your data by creating replica of your production workloads, whether they are virtual or physical, running on Hyper-V or VMware, or even running in Azure (in case you need to have a Disaster Recovery site in a different Azure region or even in a different subscription). ASR is also an excellent migration tool for your workloads, regardless of the source, as long as your destination is Microsoft Azure. In this article, I’ll focus on ASR as a DR for your on-premises VMware workloads. Let’s begin.

Where to start?

As with any other deployment, the most important part is planning and design. My suggestion is to first assess your DR requirements, and then head over to the ASR support matrix to check supported components and settings. ASR does have some limitations (e.g. maximum data disk size 4 TB, etc), and you’d be wise to check them before beginning.

One very important step when planning your replication and failover strategy is the network: how to connect to the Azure VMs after failover? Use different IP addresses or retain the same? Express route or VPN? Public facing VMs? DNS changes? All these questions need to be answered before you begin.

It’s also important to understand the VMware to Azure replication process and architecture.

The main ASR components are: Recovery Services Vault, Storage account and Network on Azure side and Configuration server on VMware side. Replicated data from on-premises VMs is stored in the storage account. Azure VMs are created with the replicated data only when you run a failover from on-premises to Azure. The Azure VMs connect to the Azure virtual network when they're created. The configuration server coordinates communication between on-premises and Azure, and manages data replication.

 

Figure 1. VMware to Azure replication process

 

Microsoft has made capacity planning and scaling really easy for your DR by creating Azure Site Recovery Deployment Planner, an extremely useful tool that will help you with following: Compatibility assessment, Network bandwidth need versus RPO assessment, Azure infrastructure requirements, On-premises infrastructure requirements and even Estimated disaster recovery cost to Azure.

I highly recommend running this tool before any ASR project and for at least 7 days, preferably more, to get the best results. Indeed, during the site recovery infrastructure preparation, the wizard will ask you: Have you completed deployment planning? Ensure you have.

Deployment and Configuration

On the Azure side, things are simple. All you need to do is create an Azure storage account that will store images of replicated VMs, then create a Recovery Services vault that will hold your metadata and configuration information for VMs. Finally, set up an Azure network that will be used for failed-over VMs.

After that, using a very straight forward wizard, Recovery Services vault will help you prepare your site recovery infrastructure in 5 simple steps.

 

Figure 2. ASR infrastructure preparation

 

On the VMware side of things, you need to deploy and setup a configuration server, which coordinates communications between on-premises VMware and Azure, and manages data replication. Microsoft created OVA template for easy deployment to VMware, which you can download here or during the above configuration wizard.

After you sign in to your new Windows Server 2016 VM, the Azure Site Recovery Configuration Tool will start and will take you through a few configuration steps (connect to your Azure vault, install 3rd party software: MySQL and VMware PowerCLI, configure vCenter/ESXi and VM credentials).

Now it’s time to configure and enable replication. This, as well as any other management task, is done in the Recovery Services vault in Azure portal. Your first step is to create replication policies that align with your RTO and RPO objectives, then select VMs to be replicated. Finally, enable replication, which will trigger the ASR agent (aka Mobility Service) installation on selected VMs, using credentials that you configured before. Initial replication will take some time depending on your VM size and available network throughput. After the initial replication is complete, ASR replicates data in incremental chunks (changed data) at an interval defined by your replication policy. The Recovery Services vault in the Azure portal has a nice intuitive interface, with an impressive level of detail, that helps you monitor and troubleshoot your ASR infrastructure and jobs.

 

Figure 3. ASR overview in Azure portal

 

Failover and Failback

Now that the initial replication is complete, you need to validate the setup and determine what actions need to be performed in case of failover. Microsoft has provided an easy way to do this with Test Failover. All you need to do is select replicated VM, click “Test Failover”, choose a recovery point and destination Azure network. This will create a VM from replicated data according to settings you specified under Compute and Network settings of the VM, and connect it to the network you specify. Make sure you use a network different from production! Once you’re happy with your VM testing, select it again under “Replicated items” and click “Cleanup test failover”. This will delete the test VM and all associated components.

Test or real failover can be orchestrated using Recovery plans. These are a set of steps, scripts (automation runbooks) and manual actions to be performed automatically after failover initiation (e.g. check prerequisites, synchronise latest changes, shutdown the source, reconfigure network, etc). Imagine doing your DR testing with the click of a button, without having to worry about performing multiple complex configuration steps in particular order. For anyone who has done DR testing before, myself included, this sounds like a dream come true. This is where the true power and advantages of Infrastructure as a Service (IaaS) become obvious, especially compared to traditional infrastructure.

When you execute a planned failover, you need to re-protect the machines after they’ve failed over. Once your source site is up, you can failback the VMs using the process server, master target server, and a failback policy.

Conclusion

Overall, Microsoft Azure Site Recovery is an excellent, cost-effective and easy to deploy DR solution. The fact that it’s free for the first 31 days makes it a perfect candidate for workload migrations to Azure as well. After the first 31 days, the price is $31.85/month per instance protected (at the time of writing this article). In addition, there’s a storage account cost, plus any other data transfer related costs. With a large number of VMs, costs can add up quickly. However, it’s still cheaper than running your DR on a dedicated hardware in data centre. And, more importantly, what is the price of the peace of mind knowing your production is protected in case of a disaster? Priceless.

 

Ermin Mlinaric

Microsoft Infrastructure and Cloud Subject Matter Expert

Enabling digital transformation using the best of Microsoft and Citrix technologies with strong focus on Cloud.