Demystifying the Connection from VPC to your DC

By: Josiah Huckins - 8/27/2019
minute read

Amazon Web Services' Virtual Private Cloud (VPC) is a game changer.

VPC allows customers to setup any addressing scheme they want. The platform is not limited in the types of IP ranges and subnets that customers can define. What this means is you can have a truly complete virtual network, with all hosts in AWS, and you can even define the same CIDR blocks and routes as on your physical network. This affords you an easier migration to VPC, without the need for drastic redistribution of assigned address space.

While this is great for a P2V migration, most customers prefer a hybrid approach with some critical services still behind the corporate firewall. There are legitimate business and legal reasons for this. VPC supports this use case and provides the means to connect VPC to your datacenter.

VPC Scalability

Before we get into the customer datacenter to VPC connection options, it's helpful to understand how VPC is able to support the "anything goes" addressing for its customers. As should be expected for virtual networking, VPC uses Virtual Routing and Forwarding tables (VRFs) to split up its customer networks. VPC uses virtual local area networks (VLANS) to create subnets. This poses a problem. There is a hard limit to the number of VLANs that can be defined (4094 to be precise) per ethernet network, that is, per switch. Many switches support a much lower number. Virtual Routing and Forwarding systems are typically limited to an average of 512 VRFs, even on the largest enterprise routers. For a single company network, this is not so bad, but when you're a company which hosts and supports the networks of millions of other companies, these could be used up quickly. Not to mention, a single company may want the option of setting up multiple virtual networks in VPC. Along with these volume challenges, Amazon wanted to support having any instance in any VPC, on any physical server in their infrastructure.

VPC solves these problems by assigning each customer VPC a unique identifier. These VPC IDs are unique to each VPC network. Hosts in a VPC can exist on any of Amazon's physical servers and exist in the context of a single VRF, under a single VPC ID, all while still being able to communicate with other virtual instances in the same VPC across router boundaries.

VPC also uses a mapping service, which is a middle layer between VPCs. When a virtual instance wants to send packets to another virtual instance, the packets are first trapped and wrapped with a VPC header to indicate the VPC the packet belongs within. The packet is also wrapped with an IP header corresponding to the source and destination hosts (these are the physical hosts, which are hosting the VPC virtual servers). Once encapsulated, the packet is sent on to the physical destination host. Once it arrives, the destination host cross-checks the received packet's VPC header with the mapping service. The mapping service confirms or denies that the source is in the correct VPC (has the same VPC ID as the destination). If the source is in the same VPC, the packet is received and communication continues with the packet being passed to the destination virtual instance. This allows the same subnets and addresses to exist in different VPCs without conflicting with each other in the Amazon network. In addition, each VPC is isolated, allowing a customer to avoid data leakage into other customer VPCs. This also allows customers to distinguish and separate multiple VPCs as needed. VPC scalability is on point.

Connecting VPC to your Company DC

The VPC design allows VPC instance to instance communication, but via the same packet communication process mentioned above, it also allows VPC instance to customer instance communication. It does this by creating edge routes between what is referred to as VPC Edges (which are just other physical hosts in the Amazon network, at the network edge), and the customer's public address(es).

There are 2 primary types of connections from VPC to the customer datacenter, VPN and Direct Connect. With a VPN connection, customer destined packets are encapsulated via IPSec. These packets go through a virtual private gateway on their way out of the VPC. With this setup, you can have multiple VPNs connected to a VPC, requiring just 1 external route for the VPC on your routers.

Branch Offices Diagram | Source: docs.aws.amazon.com

A cloud bursting scenario comes to mind here, allowing you to expand into a VPC when your on-site network is at capacity. I've seen situations where large datacenters have used up their IP host ranges. Being able to pivot new host addresses to a VPC could help prevent this issue.

Direct Connect provides a direct physical connection via fiber. Another difference here is that it encapsulates packets in a VLAN ID for routing to the customer.

One important note regarding VPN connections

VPC does allow you to define multiple VPCs with the same subnet address ranges in each VPC. However, if you do this you cannot connect them to your on-premise network through the VPN. A best practice would be to define unique CIDR blocks within each VPC and allow them both to communicate home over VPN.

Closing Thoughts

We looked at Amazon's Virtual Private Cloud service and the options available for connecting it to a customer's own network. There is a lot more that can be discussed around specific scenarios and configurations. If you're interested in setting up a VPC, I suggest reading through their well written documentation and FAQs.

Thank you for reading.