Private network connectivity between AWS and IBM Cloud via IPSec

dmytrorakytskyi

November 12, 2023

Problem statement

Our customer has an environment split between cloud providers, partially due to isolation and partially due to other cloud-specific or functional needs and abilities.

The main CI infrastructure is located in AWS (Jenkins, Artifactory). Smaller but essential for the Customer part is located in the IBM cloud (K8s cluster).

At first, communication was made via the public internet, but since smaller parts tend to grow, there appears to be a need to connect AWS and IBM account’s VPCs together. Our task was to create a private network connectivity between two clouds (AWS and IBM).

Solution design

There are various possibilities for completing the task, which might depend on the reasons and needs of such a connection, but the IPSec Site-to-Site VPN connection was the most convenient for us. This would allow secure connection under any protocol, fully transparent for the applications.

This is how communication looked schematically on a high level before the VPN:

This is what we wanted to achieve after adding the VPN:

List of requirements:

Site-to-Site VPN connection
Resources are on both sides of the connection – hence on AWS and IBM – under Terraform.
Control of the Network Ports from the AWS side; all the outgoing (from AWS to IBM) ports should be blocked except some.

There are two types of IPSec VPN in IBM:

Policy-based
Routing-based

You can read more about each of them here

But the general difference between those is in the number of Active tunnels:
A policy-based VPN operates in Active-Standby mode with a single VPN gateway IP shared between the members, while a route-based VPN offers Active-Active redundancy with two VPN gateway IPs (link)

Active-Active seems better at first glance, but IBM describes it in a way that from AWS Side 4 tunnels are required (2 from AWS for 1 of IBM):
You must have one IBM VPN gateway and two AWS VPN connections (a total of four tunnels) for this setup (link)

Difference in scheme view:

Policy Based (Active-Standby)

Routing Based (Active-Active)

Since our smaller but essential part has grown not enough to have twice as many tunnels, we decided to use a Policy-Based VPN.

Integration process

VPNs are interdependent – the IBM Side will require a Gateway IP of AWS SIde, whereas the AWS Side will require a Gateway IP of the IBM Side. Creating a VPN Gateway in AWS w/o supplying it with full information about both Virtual Private (AWS Side) and Customer (IBM Side) Gateways is impossible.

IBM Side Part 1:

Fortunately for us, VPN implementation between AWS and IBM differs in a way that IBM could have a VPN gateway that lives w/o a Site-to-Site connection. So, in terms of AWS, no Virtual Private and Customer Gateways are required to create a Gateway and acquire the IP of the IBM Side (Virtual Customer Gateway in AWS terms) of the tunnel.
So IBM VPN Gateway could be created and have a routable public Gateway IP, at the same time not being connected to any VPC nor have any routing configured.

Creation of the VPN Gateway on the IBM Side:

resource "ibm_is_vpn_gateway" "main" {
 name   = "${var.name_prefix}-vpn-gateway"
 subnet = var.subnet_id
 mode   = "policy"
}

AWS Side:

Once done, the AWS Side part could be fully configured, awaiting for IBM connection.

AWS Part consists of

Customer Gateway – representation of IBM Side from AWS point of view.
Virtual Private Gateway – A virtual private gateway is the VPN endpoint on the Amazon side of your Site-to-Site VPN connection that can be attached to a single VPC (but multiple VPN Connections).
VPN Connection – An entity that ties together Customer Gateway and Virtual Private Gateway via a secure VPN tunnel is also a Gateway. We use only static routes.

resource "aws_customer_gateway" "cgw_ibm" {
 bgp_asn    = 65000
 ip_address = var.ibm_vpn_gateway_ip
 type       = "ipsec.1"

 tags = {
   Name = "IBM-Gateway"
 }
}

resource "aws_vpn_gateway" "vpn_gw" {
 vpc_id = aws_vpc.main.id


 tags = {
   Name = "AWS VPC"
 }
}

resource "aws_vpn_connection" "vpn_ibm" {
 customer_gateway_id                     = aws_customer_gateway.cgw_ibm.id
 outside_ip_address_type                 = "PublicIpv4"
 vpn_gateway_id                          = aws_vpn_gateway.vpn_gw.id
 type                                    = "ipsec.1"
 static_routes_only                      = true

 tunnel1_preshared_key                   = var.preshared_key
 tunnel2_preshared_key                   = var.preshared_key

 tags = {
   Name = "VPN Connection between IBM Gateway and AWS VPC"
 }
}

In addition to the above resources Routing also should be configured in two places:

VPN Connection itself is configured with a CIDR of IBM side VPC we want to talk with
VPC’s Route Table where Virtual Private Gateway is set as the next hop for destination CIDR of IBM side VPC

NOTE: IBM documentation states that only single CIDR record should be present in a list of VPN Connection’s routes: To reach multiple contiguous subnets in IBM VPC, use a larger CIDR range that covers all the required subnets (link)

resource "aws_vpn_connection_route" "route_ibm" {
 destination_cidr_block = var.ibm_vpn_cidr
 vpn_connection_id      = aws_vpn_connection.vpn_ibm.id
}

resource "aws_route" "vpn_route_ibm" {
 route_table_id         = var.route_table_id
 destination_cidr_block = var.ibm_vpn_cidr
 gateway_id             = var.vpn_gateway_id
}

Control of network ports (Requirement 3):

Since one of our requirements is to deny networking from the AWS side – we use Network ACL’s Outgoing (Egress) filter.

This is a trivial task with the exception that NACLs are stateless, so except for required ports for communication we also have to open the Ephemeral ports range (since applications use some of those ports after the connection has been established on an application’s port):

resource "aws_network_acl_rule" "allow_ssh" {
 network_acl_id = var.network_acl_id
 rule_number    = 98
 egress         = true
 protocol       = "tcp"
 rule_action    = "allow"
 cidr_block     = var.ibm_vpn_cidr
 from_port      = 22
 to_port        = 22
}

resource "aws_network_acl_rule" "allow_https" {
 network_acl_id = var.network_acl_id
 rule_number    = 97
 egress         = true
 protocol       = "tcp"
 rule_action    = "allow"
 cidr_block     = var.ibm_vpn_cidr
 from_port      = 443
 to_port        = 443
}

resource "aws_network_acl_rule" "allow_k8sAPI" {
 network_acl_id = var.network_acl_id
 rule_number    = 96
 egress         = true
 protocol       = "tcp"
 rule_action    = "allow"
 cidr_block     = var.ibm_vpn_cidr
 from_port      = 6443
 to_port        = 6443
}

resource "aws_network_acl_rule" "allow_ping" {
 network_acl_id = var.network_acl_id
 rule_number    = 95
 egress         = true
 protocol       = "icmp"
 rule_action    = "allow"
 cidr_block     = var.ibm_vpn_cidr
 icmp_type      = "-1"
 icmp_code      = "-1"
}

resource "aws_network_acl_rule" "allow_ephemeral_ports_tcp" {
 network_acl_id = var.network_acl_id
 rule_number    = 94
 egress         = true
 protocol       = "tcp"
 rule_action    = "allow"
 cidr_block     = var.ibm_vpn_cidr
 from_port      = 32768
 to_port        = 60999
}

resource "aws_network_acl_rule" "deny_all_IBM" {
 network_acl_id = var.network_acl_id
 rule_number    = 99
 egress         = true
 protocol       = "-1"
 rule_action    = "deny"
 cidr_block     = var.ibm_vpn_cidr
 from_port      = 0
 to_port        = 0
}

IBM Side part 2:

AWS VPN has a requirement that the Perfect Forward Secrecy(PFS) option of IPSec VPN need to be enabled.

PFS improves the Security of the VPN tunnel by re-calculating the new Diffie-Helman key upon Connection establishment and Rekey procedures during eventual Security Association renegotiation. This way, even if the original DH-Key was compromised, it will be regenerated either on Connection establishment or during the connection life upon periodic Rekey. (link)

It is implemented in IBM by IPSec policies, and the default policy does not have PFS, so we would create a new policy:

resource "ibm_is_ipsec_policy" "main" {
 name                     = "${var.name_prefix}-ipsec-policy"
 authentication_algorithm = "sha256"
 encryption_algorithm     = "aes256"
 pfs                      = "group_14"
}

Since we are already in phase two, we might set ‘create_connection’ to ‘true’ and terraform apply missing parts of IBM’s Side VPN that encapsulates IPSec configuration options and Routing in a single resource called VPN Connection.

peer_address – the address of AWS VPN Gateway IP
local_cidrs – CIDR of IBM’s VPC we want to be connected to AWS (IBM requires only a single CIDR record here)
peer_cidrs – CIDR of AWS’s VPC we want to be connected to IBM (IBM requires only a single CIDR record here)
preshared_key – password for VPN connection that must match on both tunnel’s sides
ipsec_policy – is the updated policy with PFS enabled

resource "ibm_is_vpn_gateway_connection" "main" {
 count = var.create_connection ? 1 : 0
 name          = "${var.name_prefix}-vpn-gateway-connection"
 admin_state_up = true
 vpn_gateway   = ibm_is_vpn_gateway.main.id
 peer_address  = var.peer_ip
 preshared_key = var.preshared_key
 # for CIDRs use single CIDR for either local/peer_cidrs
 # IBM docs suggest that: https://cloud.ibm.com/docs/vpc?topic=vpc-aws-config#aws-to-policy-based-ibm-config
 local_cidrs   = [var.local_cidr]
 peer_cidrs    = [var.peer_cidr]

 # AWS requires ipsec phase2 to have enabled PFS
 ipsec_policy  =  ibm_is_ipsec_policy.main.id
}

Afterward, open ports in Security Groups on the IBM and AWS side as usual, and connectivity should be established between these two Cloud providers.

Both IBM and AWS sides of VPN display healthy and connected statuses:

Costs:

IBM side:

The estimated billing for a VPN connection is around $65. For comparison, the smallest available IBM instance (2 CPU/ 8 RAM) is estimated to cost $75-80, depending on the discount.

AWS side:

Billing for almost a month costs $35.

Total:

Round AWS costs up to 40 and the total cost of Site-to-Site would be around $110 a month.

Latency:

Ping with standard packages from AWS (Frankfurt) to IBM (Washington DC) over Public Internet runs at stable 92ms:

100 packets transmitted, 100 received, 0% packet loss, time 99134ms
rtt min/avg/max/mdev = 92.326/92.455/93.316/0.098 ms

The same ping over a VPN tunnel takes a bit more time and has 97 milliseconds of latency:

100 packets transmitted, 100 received, 0% packet loss, time 99114ms
rtt min/avg/max/mdev = 97.377/97.822/105.036/0.992 ms

Conclusion:

In this article, we connected two VPCs in different cloud providers via a secure VPN tunnel, allowing applications on both ends to connect to each other transparently with no significant latency overhead. Utilizing NACLs that control flow in our trusted account. Our networking infrastructure is stored as a Code. VPN solutions on both ends are managed by cloud providers and cost as much as a couple of instances.

Since AWS requires peer IP on VPN creation, we first created an IBM VPN Gateway that is not connected anywhere, then created an AWS VPN Connection to that Gateway. After building the AWS side, we created the IBM VPN Connection by connecting IBM VPC to the VPN Gateway, and adding Routing into VPC, and tuning VPN Policies to match AWS securing standards.