Part 3: The Evolution to Virtual Routing

Part 3: The Evolution to Virtual Routing

Graphic of network cables

In the previous two parts of this three-post series, we have provided a taxonomy for routers based on how their data plane and control plane components are implemented. We have put emphasis on disaggregated routers in the context of telecom operators, where the data plane is usually implemented in hardware, and the control plane implemented in software, and both are provided by different vendors. We have also presented several software alternatives for the control plane, both closed source, and open source, and we have presented a cloud control plane, where the software that implements the control plane runs in the cloud instead of in the same hardware as the data plane.

In this last post of the series, we are going to go into detail on the cloud control plane, exploring its scalability and high-availability benefits, and explaining how it enables the creation of multiple virtual routers in the same hardware where the data plane runs. We are then also going to cover the benefits of these multi-tenant virtual routers, including automatic network growth in already deployed hardware, customer separation in the same device at the administrative level, in addition to at the VRF/VLAN level, and how this dual separation enables support for both hard and soft network slicing, which in turn enables support for new use cases such as RAN sharing in 5G and in multi-customer applications in MEC. Finally, we will show the management advantages of having all virtual routers in the same cloud platform, especially when the platform can be programmatically accessed via standard APIs such as gRPC and NETCONF/YANG.

In the Volta platform, we denote Virtual Route Processor, or VRP, as the set of routing processes in the cloud that implement the cloud control plane of a router. VRPs run in the cloud in Virtual Machines running in either dedicated servers, private clouds, or public clouds. By multiple VRPs sharing the same compute node, and thus its RAM and CPU resources, statistical multiplexing gains can be achieved. This is analogous to RAN functional splist in the context of open RAN, where DU and CU run in a common pool of resources. In addition, cloud elasticity enables compute nodes to scale both vertically (more RAM and CPU) and horizontally (adding more compute nodes and redistributing existing VRPs between them). Both statistical multiplexing gain and cloud elasticity enable an unprecedented scaling for control planes, especially when compared to the fixed CPU and RAM of a route processor in a legacy router.

In terms of high availability, different compute nodes can be placed in different geographical locations, and an orchestration process in the cloud, such as the one managing VRPs in the Volta platform, can monitor the health of VRP processes, and restart them in the existing compute node if needed, or in alternative compute nodes if the old compute node goes down or its entire location becomes unreachable. Therefore, failure modes such as process down, node down or location down can be handled for cloud control planes. Even new failure modes introduced by the physical separation between the control plane and data plane, such as a network partition between the VRP and the HW where the data plane runs, can be addressed by having a heartbeat mechanism between each compute node’s location and the HW, to determine the valid compute nodes to host the control planes for that HW.

In the context of a control plane running in the cloud, and a data plane running in HW with a physical separation between them, we define a virtual router, a type of disaggregated router, as the combination of the VRP in the cloud, and the set of resources in the device (e.g., white label switch), necessary to implement the data plane.

By assigning different resources in the white-label switch, such as logical or physical ports, to different virtual routers, multi-tenancy is accomplished, where multiple virtual routers run in the same physical switch. That is, the VRP components of the different virtual routers are already separated in the cloud, so the only other additional piece needed is the slicing of data plane resources, to have a separate and dedicated data plane per virtual router, too. In the Volta platform, this is managed by the Volta Agent process running in the white-label switch, along with the corresponding virtual representation of the device in the cloud, the Device Manager process.

Multi-tenancy virtual routers
Multi-tenancy virtual routers

Each virtual router is administratively separated from other virtual routers running in the same device, which introduces an additional layer of customer separation. That is, in legacy routers or even disaggregated routers where the cloud control plane runs in the same device than the data plane, only one router can be implemented per device. In that context, customer separation is done at the VRF or the VLAN level, by assigning a different port to each of the services of each customer. Therefore, customers are separated at the VRF/VLAN level, but they are under the same administrative domain, the one the router pertains to. Therefore, there is no administrative separation at the router level and device level, because there is a single router per device.

In contrast, with virtual routers, VRF/VLAN-based customer separation is still achieved within a given virtual router, and an additional, administrative level separation is introduced at the device level, given that there are multiple virtual routers per device, and each virtual router can be managed separately from the others.

Customer separation at the VRF/VLAN level is known as Soft Network Slicing, and customer separation at the virtual router level is known as Hard Network Slicing, where virtual routers can be grouped under the same administrative domain (e.g., RBAC domain), and they can be managed independently of the rest of virtual routers running in the same set of devices.

Network Slicing abstraction
Network Slicing abstraction

This administrative separation at the router and network slice level empowers telco operators to share the management of network resources in a controlled function and provides support to new use cases such as 5G RAN sharing, or combining internal and external applications in the same edge infrastructure within MEC.

In the context of 5G RAN sharing, different virtual routers running in the same Disaggregated Cell Site Gateway (DCSG) node can be assigned to different Mobile Network Operators (MNO), enabling each one to provide DCSG related services, such as QoS services, in an independent fashion. A separate virtual router can be also created and assigned to the owner of the infrastructure, for managing device level specific services, such as time synchronization services.

RAN sharing use case
RAN sharing use case

In the context of Multi-access Edge Computing (MEC), compute and storage resources are being placed at the edge of the network, to support both internal and external applications. Examples of internal applications for telecom operators can be the DU and CU running as VNF in the context of open RAN, or the same cloud control plane of a virtual router. Examples of external customer applications that need expensive computation in real-time are autonomous driving, immersive UIs based on AR/VR, and IoT applications. Virtualization is needed to share the same edge infrastructure among these different applications, so each one can get its own storage, compute and network services isolated from the rest, and Hard Network Slicing is one way to address the network part.

In terms of network growth, note that deploying new virtual routers in an existing deployment of white-label switches is an automatic process because there is no need to deploy additional HW. Instead, a new VRP is created in the cloud and assigned to the virtual router, and resources in the device are reserved for that virtual router too. This automatic process significantly reduces deployment costs because it avoids sending field engineers on site.

When new white-label switches do need to be deployed, its deployment process can be simplified when the Volta platform is used by preloading their configurations in advance in an internal provisioning database. When a new white-label switch boots up, its corresponding Volta Agent process will “call home”, and the cloud platform will check if there is any configuration in the provisioning database. If so, the Volta Agent process will apply the configuration automatically, thus creating one or more virtual routers along with the associated specified services. This automatic process also significantly reduces deployment costs because it does not require expert field engineers on-site, just technicians to plug in and boot up the white-label switch.

The simplified deployment of new white-label switches, and the automatic deployment of new virtual routers in already deployed white-label switches, illustrates one of the benefits of centralized management through automatic device provisioning. Other benefits of centralized management include enabling a single CLI session with the platform in order to manage all virtual routers hosted by the platform, and the use of a gRPC API to programmatically access the platform, either directly, of via higher-level mechanisms that consume API, such as NETCONF/YANG or a library of scripts that implement common configurations.

We conclude the last post of these series where we have described a cloud control plane and we have shown the benefits they can bring in terms of scalability and high availability. We have also introduced the concept of multi-tenant virtual routers enabled by the cloud control plane. We have shown the benefits of multi-tenant virtual routers in terms of ease of deployment and network growth, the ability to provide customer separation at both administrative level and VRF/VLAN level, the support of both levels of network slicing, soft and hard, and how all this enables support for new use cases such as RAN sharing in 5G and multi-customer applications in MEC.