Steps to Optimize Network Quality of Service in Your Data Center

Virtualization technologies have evolved such that support for multiple networks on a single host is a must-have feature. For example, Red Hat Enterprise Virtualization allows administrators to configure multiple NICs using bonding for several networks to allow high throughput or high availability. In this configuration, different networks can be used for connecting virtual machines (using layer 2 Linux bridges) or for other uses such as host storage access (iSCSI, NFS), migration, display (SPICE, VNC), or for virtual machine management.  While it is possible to consolidate all of these networks into a single network, separating them into multiple networks enables simplified management, improved security, and an easier way to track errors and/or downtime.

The aforementioned configuration works great but leaves us with a network bottleneck at the host level. All networks compete on the same queue in the NIC / in a bonded configuration and Linux will only enforce a trivial quality of service queuing algorithm, namely: pfifo_fast, which queues side by side, where packets can be enqueued based on their Type of Service bits or assigned priority. One can easily imagine a case where a single network is hogging the outgoing link (e.g. during a migration storm where many virtual machines are being migrated out from the host simultaneously or when there is an attacker VM). The consequences of such cases can include things like lost connectivity to the management engine or lost storage for the host.

A simple solution is to configure known network-hogs (e.g. migration) on top of a separate host NIC. However, this wastes bandwidth and hardware resources.

Red Hat Enterprise Virtualization 3.6 introduces a feature called host network QOS to solve the above mentioned challenge. Because Red Hat Enterprise Virtualization is co-engineered with Red Hat Enterprise Linux 7, virtualization administrators can leverage a plethora of network QOS algorithms called qdiscs – queueing disciplines. The administrator has options and can configure the algorithm on the host NICs. TC (the traffic control command line tool) has the ability to classify different kinds of traffic and apply different priorities to them. Administrators who love the command line can fine-tune these algorithms to their liking. However, for those who want to use the GUI, RHEV-M (RHEV’s Management interface) makes things easier. In addition, configuring QOS via RHEV-M is manageable in multiple layers (data centers, networks, hosts), persistent in the engine and will also persist host reboots.

RHEV-M leverages the HFSC (hierarchical fair service curve) implementation that is embedded in Red Hat Enterprise Linux 7. HFSC allows the administrator to ensure both bandwidth and latency with relatively low performance overhead.

Using HFSC, the administrator can configure a few invariants on the egress traffic including:

  1. Link sharing. Ensure that a certain class of traffic will not exceed a certain share of the bandwidth during times where there is contention on the NIC.
  2. Real Time. Allow bursts of traffic to temporarily breach the link share, in order to ensure lower latency.
  3. Upper limit. A hard upper limit on the egress traffic.

RHEV allows an easy way to configure HFSC. Under the hood, RHEV-M uses TC to configure everything needed to classify the traffic correctly and apply the desired values of QOS to the networks the administrator needs.

It is important to note that since the separation of networks in RHEV is performed via VLAN tagging, the traffic is automatically classified by its VLAN tag and this is currently the finest classification of granularity that is supported.

Let’s take a look at a real-life example of the feature being implemented: a management network and a VM network sharing the same NIC. While the VM network can be expected to consume high bandwidth (as there may be many busy VMs running on the host simultaneously), the management network is expected to require a much lower bandwidth. However, in terms of latency, the management network may be more sensitive than the VM network. In other words, at times when the NIC is saturated, we can allow short delays in VM traffic, but we would like to ensure that packets sent back to the engine via the management network experience a certain minimal latency – else the engine might consider the host unreachable. To remedy this, a configuration of a high link share for the VM network, and a low link share for the management network can be applied. As a result, HFSC will allow bursts of management traffic even while the link is saturated.

To take this new feature for a spin, visit our download page.

This post was authored by Ido Barkan before leaving Red Hat earlier this year.  While Ido will be missed, his work continues in the capable hands of the Red Hat Enterprise Virtualization team.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s