Operational Visibility in the SDDC
Today developers and operators face the requirements of thousands of users expecting hundreds of application to operate without any disruptions at a high speed from everywhere around the globe. That is why cloud computing is and will be of rising importance to companies of all sizes. As a logical consequence, so is the demand for capacity and availability of the underlying IT infrastructure. The developers, operators, and users should have the luxury of being able to focus on applications and not on capabilities and limitations of data centers. To this end the implementation of an adaptable software-defined data center (SDDC) is a viable option that gains more and more traction.
The foundation of an SDDC is virtualization of CPU, memory, storage, and lately even networks. Virtual servers are created and maintained using an API and provide unprecedented flexibility compared to provisioning of physical servers, let alone saving of time. The main challenges of SDDC are the same as with traditional data centers—keeping costs at a minimum, while providing good user experience. Software-defined data centers are inherently built to scale all kinds of resources up and down in a timely manner by utilizing provisioning and orchestration tools. However, this dynamism is not feasible with traditional networks that rely on human beings configuring switches, routers, and firewalls by hand. That’s where network virtualization comes into play. Software-defined networks (SDN) provide the required network programmability and are thereby an integral cornerstone of every SDDC. The shift to software-defined networks doesn’t void the necessity of monitoring network traffic, but the approach to monitoring needs to be adapted.
With today’s network virtualization capabilities it is hard to determine if the network connectivity of any given operating system is physical or virtual
Up until the rise of network virtualization, the best way to monitor network traffic was to place a physical probe that was able to capture and identify all traffic and the communication endpoints of all applications. In combination with real user data and performance metrics of the application servers operators were able to pinpoint the exact reason for performance degradation with ease. In increasingly virtualized environments the placement of physical probes becomes more and more difficult, and less practicable. If two virtual servers running on the same hypervisor talk to each other, it is very unlikely that even one byte of data will pass the physical network interface of the host running the hypervisor, because the whole network communication between the two virtual machines happens within the hypervisor. This might leave important gaps in probe-based monitoring.
Agent-based monitoring is a complementary approach to stay on top of the complexity and the dynamics of software-defined data centers. Though, one of the drawbacks is, that operators need to take care of installing agents in multiple locations throughout the data center: in hypervisors, in virtual machines, in containers, and ideally also in active networking components. Then again, modern provisioning and orchestration tools provide mechanisms to take care of the agent installation, so you don’t have to bother with doing that manually.
Network interfaces, networks, and the Internet are inherently shared resources and have therefore the largest potential to cause end user performance degradation. So there are obvious reasons to monitor traffic and activities in software-defined networks. To that end, it is essential to understand how the technologies implementing SDNs work. Network overlays, one manifestation of SDN, are often used to create networks for containers on top of an existing network infrastructure. For operators it is imperative to know if the virtual network has been configured correctly and that it works as intended. Therefore agents must be able to detect network overlays and the data collector must have means to correlate network data from network overlays with physical network traffic, in order to be able to uniquely identify all communicating parties.
With today’s network virtualization capabilities it is hard to determine if the network connectivity of any given operating system is physical or virtual. Of course network traffic has to hit a physical transfer medium at some point, except the communication takes place within the hypervisor. But it’s virtually impossible to tell how many layers of network virtualization are in place from an operating system’s point of view. Since encapsulation causes additional overhead in network traffic and de- and encapsulation takes time, it is valid to argue that network virtualization has a negative effect on overall performance. Everything comes at a price—impaired performance appears to be the tradeoff for faster provisioning and easier administration. That is why operators need several observation points at different levels of virtualization to be able to get a holistic overview of the infrastructure and to correctly pinpoint failures.
Let’s say you start two containers connected by a network in two virtual machines running in a tenant network in a distributed OpenStack cluster that runs on a Ravello application, published to AWS. You then would have at least four layers of network virtualization in place: VXLAN encapsulation of the container network overlay, VXLAN encapsulation of the tenant network on OpenStack, Ravello’s HVX SDN abstraction of AWS, and AWS SDN abstraction of the physical network. Nevertheless, the network configuration in the containers still looks like they have a physical network interface. A lower MTU of said network interface might be a good indicator of this virtualization. Then again, in contrast to VXLAN, alternative protocols like STT (Stateless Transport Tunneling Protocol) and Geneve (Generic Network Virtualization Encapsulation) make use of the network interfaces’ capability to do TCP segmentation offloading, which on the one hand might result in a steady MTU value of virtualized network interfaces and on the other hand might result in a dramatic performance increase under heavy loads.
Admittedly, this multilayered network “inception” scenario sounds a bit complex at first. However, this is already a reality in some productive environments and should give you a rough idea of what kind of complexity to expect from network virtualization and virtualization in general in the mid-term. Needless to say, smart monitoring solutions that can manage this complexity and provide precise information about topologies and status of connectivity at all layers of network virtualization are mandatory, but yet to be built. An agent-based approach, where hundreds of observation points are distributed throughout different layers of the entire application infrastructure, appears to be a reasonable solution towards application performance management in the era of holistic virtualization.