Enhanced Networking - SR-IOV in EC2
Traditionally, EC2 instances send network traffic through the Xen hypervisor. With SR-IOV (Single Root I/O Virtualization) support in the C3 and I2 families, each physical ethernet NIC virtualizes itself as multiple independent PCIe Ethernet NICs, each of which can be assigned to a Xen guest.
Thus, an EC2 instance running on hardware that supports Enhanced Networking can "own" one of the virtualized network interfaces, which means it can send and receive network traffic without invoking the Xen hypervisor.
Enabling Enhanced Networking is as simple as:
- Create a VPC and subnet
- Pick an HVM AMI with the Intel ixgbevf Virtual Function driver
- Launch a C3 or I2 instance using the HVM AMI
Benchmarking
We use the Amazon Linux AMI, as it already has the ixgbevf driver installed, and Amazon Linux is available in all regions. We use netperf to benchmark C3 instances running in a VPC (ie. Enhanced Networking enabled) against non-VPC (ie. Enhanced Networking disabled).
Round-trip Latency
Many message passing MPI & HPC applications are latency sensitive. Here Enhanced Networking support really shines, with a max. speedup of 2.37 over the normal EC2 networking stack.
Amazon says that both the c3.large and c3.xlarge instances have "Moderate" network performance, but we found that c3.large peaks at around 415 Mbps, while c3.xlarge almost reaches 1Gbps. We believe the extra bandwidth headroom is for EBS traffic, as c3.xlarge can be configured as "EBS-optimized" while c3.large cannot.
Conclusion 2
Notice that c3.2xlarge with enhanced networking enabled has a around-trip latency of 92 millisecond, which is much higher that of the smaller instance types in the C3 family. We repeated the test in both the us-east-1 and us-west-2 regions and got idential results.
Currently AWS has a shortage of C3 instances -- all c3.4xlarge and c3.8xlarge instance launch requests we issued so far resulted in "Insufficient capacity". We are closely monitoring the situration, and we are planning to benchmark the c3.4xlarge and c3.8xlarge instance types and see if we can reproduce the increased latency issue.
Updated Jan 8, 2014: We have published Enhanced Networking in the AWS Cloud (Part 2) that includes the benchmark results for the remaining C3 types.