draft-ietf-nvo3-vmm-04.txt   draft-ietf-nvo3-vmm-05.txt 
Network Working Group L. Dunbar
Internet Draft Futurewei
Intended status: Informational B. Sarikaya
Expires: Dec 2019 Denpel Informatique
B.Khasnabish
Independent
T. Herbert
Intel
S. Dikshit
Aruba-HPE
August 22, 2019
Network Working Group B. Sarikaya Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks
Internet-Draft Denpel Informatique draft-ietf-nvo3-vmm-05
Intended status: Best Current Practice L. Dunbar
Expires: February 10, 2019 Huawei USA
B. Khasnabish
ZTE (TX) Inc.
T. Herbert
Quantonium
S. Dikshit
Cisco Systems
August 9, 2018
Virtual Machine Mobility Protocol for L2 and L3 Overlay Networks
draft-ietf-nvo3-vmm-04.txt
Abstract Abstract
This document describes a virtual machine mobility protocol commonly This document describes virtual machine mobility solutions commonly
used in data centers built with overlay-based network virtualization used in data centers built with overlay-based network. This document
approach. For layer 2, it is based on using a Network Virtualization is intended for describing the solutions and the impact of moving
Authority (NVA)-Network Virtualization Edge (NVE) protocol to update VMs (or applications) from one Rack to another connected by the
Address Resolution Protocol (ARP) table or neighbor cache entries at Overlay networks.
the NVA and the source NVEs tunneling in-flight packets to the
destination NVE after the virtual machine moves from source NVE to
the destination NVE. For Layer 3, it is based on address and
connection migration after the move.
Status of This Memo For layer 2, it is based on using an NVA (Network Virtualization
Authority) - NVE (Network Virtualization Edge) protocol to update
ARP (Address Resolution Protocol) table or neighbor cache entries
after VM (virtual machine) moves from Old NVE to the New NVE. For
Layer 3, it is based on address and connection migration after the
move.
Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF), its areas, and its working groups. Note that
working documents as Internet-Drafts. The list of current Internet- other groups may also distribute working documents as Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six
and may be updated, replaced, or obsoleted by other documents at any months and may be updated, replaced, or obsoleted by other documents
time. It is inappropriate to use Internet-Drafts as reference at any time. It is inappropriate to use Internet-Drafts as
material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on February 10, 2019. The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html
This Internet-Draft will expire on February 22, 2009.
Copyright Notice Copyright Notice
Copyright (c) 2018 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with
to this document. Code Components extracted from this document must respect to this document. Code Components extracted from this
include Simplified BSD License text as described in Section 4.e of document must include Simplified BSD License text as described in
the Trust Legal Provisions and are provided without warranty as Section 4.e of the Trust Legal Provisions and are provided without
described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction...................................................3
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 2. Conventions used in this document..............................4
3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Requirements...................................................5
4. Overview of the protocol . . . . . . . . . . . . . . . . . . 4 4. Overview of the VM Mobility Solutions..........................6
4.1. VM Migration . . . . . . . . . . . . . . . . . . . . . . 5 4.1. VM Migration in Layer 2 Network...........................6
4.2. Task Migration . . . . . . . . . . . . . . . . . . . . . 6 4.2. Task Migration in Layer-3 Network.........................7
4.2.1. Address and Connection Migration in Task Migration . 7 4.2.1. Address and Connection Migration in Task Migration...8
5. Handling Packets in Flight . . . . . . . . . . . . . . . . . 8 5. Handling Packets in Flight.....................................9
6. Moving Local State of VM . . . . . . . . . . . . . . . . . . 9 6. Moving Local State of VM......................................10
7. Handling of Hot, Warm and Cold Virtual Machine Mobility . . . 9 7. Handling of Hot, Warm and Cold VM Mobility....................10
8. Virtual Machine Operation . . . . . . . . . . . . . . . . . . 10 8. VM Operation..................................................11
8.1. Virtual Machine Lifecycle Management . . . . . . . . . . 10 9. Security Considerations.......................................12
9. Security Considerations . . . . . . . . . . . . . . . . . . . 10 10. IANA Considerations..........................................12
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 11. Acknowledgments..............................................12
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 12. Change Log...................................................12
12. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . 11 13. References...................................................13
13. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 13.1. Normative References....................................13
13.1. Normative References . . . . . . . . . . . . . . . . . . 11 13.2. Informative References..................................14
13.2. Informative references . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
Data center networks are being increasingly used by telecom operators
as well as by enterprises. In this document we are interested in
overlay-based data center networks supporting multitenancy. These
networks are organized as one large Layer 2 network geographically
distributed in several buildings. In some cases geographical
distribution can span across Layer 2 boundaries. In that case need
arises for connectivity between Layer 2 boundaries which can be
achieved by the network virtualization edge (NVE) functioning as
Layer 3 gateway routing across bridging domain such as in Warehouse
Scale Computers (WSC).
Virtualization which is being used in almost all of today's data 1. Introduction
centers enables many virtual machines to run on a single physical This document describes the overlay-based data center networks
computer or compute server. Virtual machines (VM) need hypervisor solutions in supporting multitenancy and VM (Virtual Machine)
running on the physical compute server to provide them shared mobility. Many large DCs, especially Cloud DCs, host tasks (or
processor/memory/storage. Network connectivity is provided by the workloads) for multiple tenants, which can be multiple departments
network virtualization edge (NVE) [RFC8014]. Being able to move VMs of one organization or multiple organizations. There is
dynamically, or live migration, from one server to another allows for communication among tasks belonging to one tenant and
dynamic load balancing or work distribution and thus it is a highly communications among tasks belonging to different tenants or with
desirable feature [RFC7364]. external entities.
Server Virtualization, which is being used in almost all of
today's data centers, enables many VMs to run on a single physical
computer or compute server sharing the processor/memory/storage.
Network connectivity among VMs is provided by the network
virtualization edge (NVE) [RFC8014]. It is highly desirable
[RFC7364] to allow VMs to be moved dynamically (live, hot, or cold
move) from one server to another for dynamic load balancing or
optimized work distribution.
There are many challenges and requirements related to VM mobility
in large data centers, including dynamic attaching/detaching VMs
to/from Virtual Network Edges (VNEs). Retaining IP addresses
after a move is a key requirement [RFC7364]. Such a requirement
is needed in order to maintain existing transport connections.
In traditional Layer-3 based networks, retaining IP addresses
after a move is generally not recommended because the frequent
move will cause non-aggregated IP addresses (a.k.a. fragmented IP
addresses), which introduces complexity in IP address management.
There are many challenges and requirements related to migration, In view of many VM mobility schemes that exist today, there is a
mobility, and interconnection of Virtual Machines (VMs)and Virtual desire to document comprehensive VM mobility solutions that cover
Network Elements (VNEs). Retaining IP addresses after a move is a both IPv4 and IPv6. The large Data Center networks can be
key requirement [RFC7364]. Such a requirement is needed in order to organized as one large Layer-2 network geographically distributed
maintain existing transport connections. in several buildings/cities or Layer-3 networks with large number
of host routes that cannot be aggregated as the result of frequent
move from one location to another without changing their IP
addresses. The connectivity between Layer 2 boundaries can be
achieved by the network virtualization edge (NVE) functioning as
Layer 3 gateway routing across bridging domain such as in
Warehouse Scale Computers (WSC).
In L3 based data networks, retaining IP addresses after a move is 2. Conventions used in this document
simply not possible. This introduces complexity in IP address
management and as a result transport connections need to be
reestablished.
In view of many virtual machine mobility schemes that exist today, The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
there is a desire to define a standard control plane protocol for NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
virtual machine mobility. The protocol should be based on IPv4 or "OPTIONAL" in this document are to be interpreted as described in
IPv6. In this document we specify such a protocol for Layer 2 and RFC 2119 [RFC2119] and [RFC8014].
Layer 3 data networks.
2. Conventions and Terminology This document uses the terminology defined in [RFC7364]. In
addition, we make the following definitions:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", VM: Virtual Machine
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119] and
[RFC8014].
This document uses the terminology defined in [RFC7364]. In addition Tasks: Task is a program instantiated or running on a virtual
we make the following definitions: machine or container. Tasks in virtual machines or
containers can be migrated from one server to another.
We use task, workload and virtual machine
interchangeably in this document.
Tasks. Tasks are the generalization of virtual machines. Tasks in Hot VM Mobility: A given VM could be moved from one server to
containers that can be migrated correspond to the virtual machines another in running state.
that can be migrated. We use task and virtual machine
interchangeably in this document.
Hot VM Mobility. A given VM could be moved from one server to Warm VM Mobility: In case of warm VM mobility, the VM states are
another in running state. mirrored to the secondary server (or domain) at a
predefined (configurable) regular intervals. This
reduces the overheads and complexity, but this may also
lead to a situation when both servers may not contain
the exact same data (state information)
Warm VM Mobility. In case of warm VM mobility, the VM states are Cold VM Mobility: A given VM could be moved from one server to
mirrored to the secondary server (or domain) at a predefined another in stopped or suspended state.
(configurable) regular intervals. This reduces the overheads and
complexity but this may also lead to a situation when both servers
may not contain the exact same data (state information)
Cold VM Mobility. A given VM could be moved from one server to Old NVE: refers to the old NVE where packets were forwarded to
another in stopped or suspended state. before migration.
Source NVE refers to the old NVE where packets were forwarded to New NVE: refers to the new NVE after migration.
before migration.
Destination NVE refers to the new NVE after migration. Packets in flight: refers to the packets received by the Old NVE
sent by the correspondents that have old ARP or neighbor
cache entry before VM or task migration.
Packets in flight refers to the packets received by the source NVE Users of VMs in diskless systems or systems not using
sent by the correspondents that have old ARP or neighbor cache entry configuration files are called end user clients.
before VM or task migration.
Users of VMs in diskless systems or systems not using configuration Cloud DC: Third party data centers that host applications,
files are called end user clients. tasks or workloads owned by different organizations or
tenants.
3. Requirements 3. Requirements
This section states requirements on data center network virtual This section states requirements on data center network virtual
machine mobility. machine mobility.
Data center network SHOULD support virtual machine mobility in IPv6. Data center network should support both IPv4 and IPv6 VM mobility.
IPv4 SHOULD also be supported in virtual machine mobility.
Virtual machine mobility protocol MAY support host routes to Virtual machine mobility should not require changing their IP
accomplish virtualization. addresses after the move.
Virtual machine mobility protocol SHOULD not support triangular There is "Hot Migration" with transport service continuing, and
routing except for handling packets in flight. there is a "Cold Migration" with transport service restarted, i.e.
stop the task running on the Old NVE and move to the New NVE before
restart as described in the Task Migration.
Virtual machine mobility protocol SHOULD not need to use tunneling VM mobility solutions/procedures should minimize triangular routing
except for handling packets in flight. except for handling packets in flight.
4. Overview of the protocol VM mobility solutions/procedures should not need to use tunneling
except for handling packets in flight.
Layer 2 and Layer 3 protocols are described next. In the following
sections, we examine more advanced features.
4.1. VM Migration
Being able to move Virtual Machines dynamically, from one server to
another allows for dynamic load balancing or work distribution and
thus it is a highly desirable feature. In a Layer-2 based data
center approach, virtual machine moving to another server does not
change its IP address. Because of this an IP based virtual machine
mobility protocol is not needed. However, when a virtual machine
moves, NVEs need to change their caches associating VM Layer 2 or
Medium Access Control (MAC) address with NVE's IP address. Such a
change enables NVE to send outgoing MAC frames addressed to the
virtual machine. VM movement across Layer 3 boundaries is not
typical but the same solution applies if the VM moves in the same
link such as in WSCs.
Virtual machine moves from its source NVE to a new, destination NVE.
After the move the virtual machine IP address(es) do not change but
this virtual machine is now under a new NVE, previously communicating
NVEs will continue to send their packets to the source NVE. Address
Resolution Protocol (ARP) cache in IPv4 [RFC0826] or neighbor cache
in IPv6 [RFC4861] in the NVEs need to be updated.
It may take some time to refresh ARP/ND cache when a VM is moved to a
new destination NVE. During this period, a tunnel is needed so that
source NVE forwards packets to the destination NVE.
In IPv4, the virtual machine immediately after the move should send a
gratuitous ARP request message containing its IPv4 and Layer 2 or MAC
address in its new NVE, destination NVE. This message's destination
address is the broadcast address. Source NVE receives this message.
source NVE should update VM's ARP entry in the central directory at
the NVA. Source NVE asks NVA to update its mappings to record IPv4
address of the moving VM along with MAC address of VM, and NVE IPv4
address. An NVE-to-NVA protocol is used for this purpose [RFC8014].
Reverse ARP (RARP) which enables the host to discover its IPv4
address when it boots from a local server [RFC0903] is not used by
VMs because the VM already knows its IPv4 address. IPv4/v6 address
is assigned to a newly created VM, possibly using Dynamic Host
Configuration Protocol (DHCP). Next, we describe a case where RARP
is used.
There are some vendor deployments (diskless systems or systems
without configuration files) wherein VM users, i.e. end-user clients
ask for the same MAC address upon migration. This can be achieved by
the clients sending RARP request reverse message which carries the
old MAC address looking for an IP address allocation. The server, in
this case the new NVE needs to communicate with NVA, just like in the
gratuitous ARP case to ensure that the same IPv4 address is assigned
to the VM. NVA uses the MAC address as the key in the search of ARP
cache to find the IP address and informs this to the new NVE which in
turn sends RARP reply reverse message. This completes IP address
assignment to the migrating VM.
All NVEs communicating with this virtual machine uses the old ARP
entry. If any VM in those NVEs need to talk to the new VM in the
destination NVE, it uses the old ARP entry. Thus the packets are
delivered to the source NVE. The source NVE MUST tunnel these in-
flight packets to the destination NVE.
When an ARP entry in those VMs times out, their corresponding NVEs 4. Overview of the VM Mobility Solutions
should access the NVA for an update.
IPv6 operation is slightly different: Layer 2 and Layer 3 mobility solutions are described respectively
in the following sections.
In IPv6, the virtual machine immediately after the move sends an 4.1. VM Migration in Layer 2 Network
unsolicited neighbor advertisement message containing its IPv6
address and Layer-2 MAC address in its new NVE, the destination NVE.
This message is sent to the IPv6 Solicited Node Multicast Address
corresponding to the target address which is VM's IPv6 address. NVE
receives this message. NVE should update VM's neighbor cache entry
in the central directory of the NVA. IPv6 address of VM, MAC address
of VM and NVE IPv6 address are recorded in the entry. An NVE-to-NVA
protocol is used for this purpose [RFC8014].
All NVEs communicating with this virtual machine uses the old Being able to move VMs dynamically, from one server to another,
neighbor cache entry. If any VM in those NVEs need to talk to the makes it possible for dynamic load balancing or work distribution.
new VM in the destination NVE, it uses the old neighbor cache entry. Therefore, it is highly desirable for large scale multi-tenants
Thus the packets are delivered to the source NVE. The source NVE data centers.
MUST tunnel these in-flight packets to the destination NVE.
When a neighbor cache entry in those VMs times out, their In a Layer-2 based approach, VM moving to another server does not
corresponding NVEs should access the NVA for an update. change its IP address, but this VM is now under a new NVE,
previously communicating NVEs will continue to send their packets
to the Old NVE. To solve this problem, Address Resolution
Protocol (ARP) cache in IPv4 [RFC0826] or neighbor cache in IPv6
[RFC4861] in the NVEs need to be updated. NVEs need to change
their caches associating the VM Layer-2 or Medium Access Control
(MAC) address with the NVE's IP address. Such a change enables
NVEs to encapsulate the outgoing MAC frames with the current
target NVE address. It may take some time to refresh ARP/ND cache
when a VM is moved to a New NVE. During this period, a tunnel is
needed so that Old NVE can forwards packets destined to the VM to
the New NVE.
4.2. Task Migration In IPv4, the VM immediately after the move should send a
gratuitous ARP request message containing its IPv4 and Layer 2 MAC
address in its new NVE. This message's destination address is the
broadcast address. Old NVE receives this message. Both Old and
New NVEs should update VM's ARP entry in the central directory at
the NVA, to update its mappings to record the IPv4 address & MAC
address of the moving VM along with the new NVE IPv4 address. An
NVE-to-NVA protocol is used for this purpose [RFC8014].
Virtualization in L2 based data center networks becomes quickly Reverse ARP (RARP) which enables the host to discover its IPv4
prohibitive because ARP/neighbor caches don't scale. Scaling can be address when it boots from a local server [RFC0903], is not used
accomplished seamlessly in L3 data center networks by just giving by VMs because the VM already knows its IPv4 address. Next, we
each virtual network an IP subnet and a default route that points to describe a case where RARP is used.
NVE. This means no explosion of ARP/ neighbor cache in VMs and NVEs
(just one ARP/ neighbor cache entry for default route) and there is
no need to have Ethernet header in encapsulation [RFC7348] which
saves at least 16 bytes.
In L3 based data center networks, since IP address of the task has to There are some vendor deployments (diskless systems or systems
change after move, an IP based task migration protocol is needed. without configuration files) wherein VM users, i.e. end-user
The protocol mostly used is the identifier locator addressing or ILA clients ask for the same MAC address upon migration. This can be
[I-D.herbert-nvo3-ila]. Address and connection migration introduce achieved by the clients sending RARP request message which carries
complications in task migration protocol as we discuss below. the old MAC address looking for an IP address allocation. The
Especially informing the communicating hosts of the migration becomes server, in this case the new NVE needs to communicate with NVA,
a major issue. Also, in L3 based networks, because broadcasting is just like in the gratuitous ARP case to ensure that the same IPv4
not available, multicast of neighbor solicitations in IPv6 would need address is assigned to the VM. NVA uses the MAC address as the
to be emulated. key in the search of ARP cache to find the IP address and informs
this to the new NVE which in turn sends RARP reply message. This
completes IP address assignment to the migrating VM.
Task migration involves the following steps: Other NVEs communicating with this VM could have the old ARP
entry. If any VMs in those NVEs need to communicate with the VM
attached to the New NVE, old ARP entries might be used. Thus, the
packets are delivered to the Old NVE. The Old NVE MUST tunnel
these in-flight packets to the New NVE.
Stop running the task. When an ARP entry for those VMs times out, their corresponding
NVEs should access the NVA for an update.
Package the runtime state of the job. IPv6 operation is slightly different:
Send the runtime state of the task to the destination NVE where the In IPv6, after the move, the VM immediately sends an unsolicited
task is to run. neighbor advertisement message containing its IPv6 address and
Layer-2 MAC address to its new NVE. This message is sent to the
IPv6 Solicited Node Multicast Address corresponding to the target
address which is the VM's IPv6 address. The NVE receiving this
message should send request to update VM's neighbor cache entry in
the central directory of the NVA. The NVA's neighbor cache entry
should include IPv6 address of the VM, MAC address of the VM and
the NVE IPv6 address. An NVE-to-NVA protocol is used for this
purpose [RFC8014].
Instantiate the task's state on the new machine. Other NVEs communicating with this VM might still use the old
neighbor cache entry. If any VM in those NVEs need to communicate
with the VM attached to the New NVE, it could use the old neighbor
cache entry. Thus, the packets are delivered to the Old NVE. The
Old NVE MUST tunnel these in-flight packets to the New NVE.
Start the tasks for the task continuing from the point at which it When a neighbor cache entry in those VMs times out, their
was stopped. corresponding NVEs should access the NVA for an update.
Address migration and connection migration in moving tasks are 4.2. Task Migration in Layer-3 Network
addressed next.
4.2.1. Address and Connection Migration in Task Migration Layer-2 based data center networks become quickly prohibitive
because ARP/neighbor caches don't scale. Scaling can be
accomplished seamlessly Layer-3 data center networks by just
giving each virtual network an IP subnet and a default route that
points to NVE. This means no explosion of ARP/ neighbor cache in
VMs and NVEs (just one ARP/ neighbor cache entry for default
route) and there is no need to have Ethernet header in
encapsulation [RFC7348] which saves at least 16 bytes.
Address migration is achieved as follows: Even though the term VM and Task are used interchangeably in this
document, the term Task is used in the context of Layer-3
migration mainly to have slight emphasis on the moving an entity
(Task) that is instantiated on a VM or a container.
Configure IPv4/v6 address on the target host. Traditional Layer-3 based data center networks require IP address
of the task to change after moving because the prefixes of the IP
address usually reflect the locations. It is necessary to have an
IP based VM migration solution that can allow IP addresses staying
the same after moving to different locations. The Identifier
Locator Addressing or ILA [I-D.herbert-nvo3-ila] is one of such
solutions.
Suspend use of the address on the old host. This includes handling Because broadcasting is not available in Layer-3 based networks,
established connections. A state may be established to drop packets multicast of neighbor solicitations in IPv6 would need to be
or send ICMPv4 or ICMPv6 destination unreachable message when packets emulated.
to the migrated address are received.
Push the new mapping to hosts. Communicating hosts will learn of the Cold task migration, which is a common practice in many data
new mapping via a control plane either by participation in a protocol centers, involves the following steps:
for mapping propagation or by getting the new mapping from a central
database such as Domain Name System (DNS).
Connection migration involves reestablishing existing TCP connections - Stop running the task.
of the task in the new place. - Package the runtime state of the job.
- Send the runtime state of the task to the New NVE where the
task is to run.
- Instantiate the task's state on the new machine.
- Start the tasks for the task continuing from the point at which
it was stopped.
The simplest course of action is to drop TCP connections across a Address migration and connection migration in moving tasks or VMs
migration. Since migrations should be relatively rare events, it is are addressed next.
conceivable that TCP connections could be automatically closed in the
network stack during a migration event. If the applications running
are known to handle this gracefully (i.e. reopen dropped connections)
then this may be viable.
More involved approach to connection migration entails pausing the 4.2.1. Address and Connection Migration in Task Migration
connection, packaging connection state and sending to target,
instantiating connection state in the peer stack, and restarting the
connection. From the time the connection is paused to the time it is
running again in the new stack, packets received for the connection
should be silently dropped. For some period of time, the old stack
will need to keep a record of the migrated connection. If it
receives a packet, it should either silently drop the packet or
forward it to the new location, similarly as in Section 5.
5. Handling Packets in Flight Address migration is achieved as follows:
Source hypervisor may receive packets from the virtual machine's - Configure IPv4/v6 address on the target Task.
ongoing communications and these packets should not be lost and they - Suspend use of the address on the old Task. This includes
should be sent to the destination hypervisor to be delivered to the handling established connections. A state may be established
virtual machine. The steps involved in handling packets in flight to drop packets or send ICMPv4 or ICMPv6 destination
are as follows: unreachable message when packets to the migrated address are
received.
Preparation Step It takes some time, possibly a few seconds for a VM - Push the new mapping to VM. Communicating VMs will learn of
to move from its source hypervisor to a new destination one. the new mapping via a control plane either by participation in
During this period, a tunnel needs to be established so that the a protocol for mapping propagation or by getting the new
source NVE forwards packets to the destination NVE. mapping from a central database such as Domain Name System
(DNS).
Tunnel Establishment - IPv6 Inflight packets are tunneled to the Connection migration involves reestablishing existing TCP
destination NVE using the encapsulation protocol such as VXLAN in connections of the task in the new place.
IPv6. Source NVE gets destination NVE address from NVA in the
request to move the virtual machine.
Tunnel Establishment - IPv4 Inflight packets are tunneled to the The simplest course of action is to drop TCP connections across a
destination NVE using the encapsulation protocol such as VXLAN in migration. It the migrations are relatively rare events, it is
IPv4. Source NVE gets destination NVE address from NVA when NVA conceivable that TCP connections could be automatically closed in
requests NVE to move the virtual machine. the network stack during a migration event. If the applications
running are known to handle this gracefully (i.e. reopen dropped
connections) then this may be viable.
Tunneling Packets - IPv6 IPv6 packets are received for the migrating More involved approach to connection migration entails pausing the
virtual machine encapsulated in an IPv6 header at the source NVE. connection, packaging connection state and sending to target,
Destination NVE decapsulates the packet and sends IPv6 packet to instantiating connection state in the peer stack, and restarting
the migrating VM. the connection. From the time the connection is paused to the
time it is running again in the new stack, packets received for
the connection could be silently dropped. For some period of
time, the old stack will need to keep a record of the migrated
connection. If it receives a packet, it can either silently drop
the packet or forward it to the new location, similarly as in
Section 5.
Tunneling Packets - IPv4 IPv4 packets are received for the migrating 5. Handling Packets in Flight
virtual machine encapsulated in an IPv4 header at the source NVE.
Destination NVE decapsulates the packet and sends IPv4 packet to
the migrating VM.
Stop Tunneling Packets When source NVE stops receiving packets The Old NVE may receive packets from the VM's ongoing
destined to the virtual machine that has just moved to the communications and these packets should not be lost, and they
destination NVE. should be sent to the New NVE to be delivered to the VM. The
steps involved in handling packets in flight are as follows:
6. Moving Local State of VM Preparation Step: It takes some time, possibly a few seconds for
a VM to move from its Old NVE to a New NVE. During this period, a
tunnel needs to be established so that the Old NVE can forward
packets to the New NVE. Old NVE gets New NVE address from NVA in
the request to move the VM. The Old NVE can store the New NVE
address for the VM with a timer. When the timer expired, the entry
for the New NVE for the VM can be deleted.
After VM mobility related signaling (VM Mobility Registration Tunnel Establishment - IPv6: Inflight packets are tunneled to the
Request/Reply), the virtual machine state needs to be transferred to New NVE using the encapsulation protocol such as VXLAN in IPv6.
the destination Hypervisor. The state includes its memory and file
system. Source NVE opens a TCP connection with destination NVE over
which VM's memory state is transferred.
File system or local storage is more complicated to transfer. The Tunnel Establishment - IPv4: Inflight packets are tunneled to the
transfer should ensure consistency, i.e. the VM at the destination New NVE using the encapsulation protocol such as VXLAN in IPv4.
should find the same file system it had at the source. Precopying is
a commonly used technique for transferring the file system. First
the whole disk image is transferred while VM continues to run. After
the VM is moved any changes in the file system are packaged together
and sent to the destination Hypervisor which reflects these changes
to the file system locally at the destination.
7. Handling of Hot, Warm and Cold Virtual Machine Mobility Tunneling Packets - IPv6: IPv6 packets received for the migrating
VM are encapsulated in an IPv6 header at the Old NVE. New NVE
decapsulates the packet and sends IPv6 packet to the migrating VM.
Cold Virtual Machine mobility is facilitated by the VM initially Tunneling Packets - IPv4: IPv4 packets received for the migrating
sending an ARP or Neighbor Discovery message at the destination NVE VM are encapsulated in an IPv4 header at the Old NVE. New NVE
but the source NVE not receiving any packets inflight. Cold VM decapsulates the packet and sends IPv4 packet to the migrating VM.
mobility also allows all previous source NVEs and all communicating
NVEs to time out ARP/neighbor cache entries of the VM and then get
NVA to push to NVEs or get NVEs to pull the updated ARP/neighbor
cache entry from NVA.
The VMs that are used for cold standby receive scheduled backup Stop Tunneling Packets: When Old NVE stops receiving packets
information but less frequently than that would be for warm standby destined to the VM that has just moved to the New NVE. The Timer
option. Therefore, the cold mobility option can be used for non- for storing the New NVE address for the VM should be long enough
critical applications and services. for all other NVEs that need to communicate with the VM to get
their NVE-VM cache entries updated.
In cases of warm standby option, the backup VMs receive backup 6. Moving Local State of VM
information at regular intervals. The duration of the interval In addition to the VM mobility related signaling (VM Mobility
determines the warmth of the standby option. The larger the Registration Request/Reply), the VM state needs to be transferred
duration, the less warm (and hence cold) the standby option becomes. to the New NVE. The state includes its memory and file system if
the VM cannot access the memory and the file system after moved to
the New NVE. Old NVE opens a TCP connection with New NVE over
which VM's memory state is transferred.
In case of hot standby option, the VMs in both primary and secondary File system or local storage is more complicated to transfer. The
domains have identical information and can provide services transfer should ensure consistency, i.e. the VM at the New NVE
simultaneously as in load-share mode of operation. If the VMs in the should find the same file system it had at the Old NVE. Pre-
primary domain fails, there is no need to actively move the VMs to copying is a commonly used technique for transferring the file
the secondary domain because the VMs in the secondary domain already system. First the whole disk image is transferred while VM
contain identical information. The hot standby option is the most continues to run. After the VM is moved any changes in the file
costly mechanism for providing redundancy, and hence this option is system are packaged together and sent to the New NVE Hypervisor
utilized only for mission-critical applications and services. In hot which reflects these changes to the file system locally at the
standby option, regarding TCP connections, one option is to start destination.
with and maintain TCP connections to two different VMs at the same
time. The least loaded VM responds first and pickup providing
service while the sender (origin) still continues to receive Ack from
the heavily loaded (secondary) VM and chooses not use the service of
the secondary responding VM. If the situation (loading condition of
the primary responding VM) changes the secondary responding VM may
start providing service to the sender (origin).
8. Virtual Machine Operation 7. Handling of Hot, Warm and Cold VM Mobility
Both Cold and Warm VM mobility (or migration) refers to the VM
being completely shut down at the Old NVE before restarted at the
New NVE. Therefore, all transport services to the VM are
restarted.
Virtual machines are not involved in any mobility signalling. Once Upon starting at the New NVE, the VM should send an ARP or
VM moves to the destination NVE, VM IP address does not change and VM Neighbor Discovery message. Cold VM mobility also allows the Old
should be able to continue to receive packets to its address(es). NVE and all communicating NVEs to time out ARP/neighbor cache
This happens in hot VM mobility scenarios. entries of the VM. It is necessary for the NVA to push the
updated ARP/neighbor cache entry to NVEs or for NVEs to pull the
updated ARP/neighbor cache entry from NVA.
Virtual machine sends a gratuitous Address Resolution Protocol or The Cold VM mobility can be facilitated by cold standby entity
unsolicited Neighbor Advertisement message upstream after each move. receiving scheduled backup information. The cold standby entity
can be a VM or can be other form factors which is beyond the scope
of this document. The cold mobility option can be used for non-
critical applications and services that can tolerate interrupted
TCP connections.
8.1. Virtual Machine Lifecycle Management The Warm VM mobility refers the backup entities receive backup
information at more frequent intervals. The duration of the
interval determines the warmth of the option. The larger the
duration, the less warm (and hence cold) the Warm VM mobility
option becomes.
Managing the lifecycle of VM includes creating a VM with all of the There is also a Hot Standby option in addition to the Hot
required resources, and managing them seamlessly as the VM migrates Mobility, where there are VMs in both primary and secondary NVEs
from one service to another during its lifetime. The on-boarding and they identical information and can provide services
process includes the following steps: simultaneously as in load-share mode of operation. If the VMs in
the primary NVE fails, there is no need to actively move the VMs
to the secondary NVE because the VMs in the secondary NVE already
contain identical information. The hot standby option is the most
costly mechanism, and hence this option is utilized only for
mission-critical applications and services. In hot standby
option, regarding TCP connections, one option is to start with and
maintain TCP connections to two different VMs at the same time.
The least loaded VM responds first and pickup providing service
while the sender (origin) still continues to receive Ack from the
heavily loaded (secondary) VM and chooses not use the service of
the secondary responding VM. If the situation (loading condition
of the primary responding VM) changes the secondary responding VM
may start providing service to the sender (origin).
1. Sending an allowed (authorized/authenticated) request to Network 8. VM Operation
Virtualization Authority (NVA) in an acceptable format with Once VM moves to a New NVE, VM IP address does not change and VM
mandatory/optional virtualized resources {cpu, memory, storage, should be able to continue to receive packets to its address(es).
process/thread support, etc.} and interface information
2. Receiving an acknowledgement from the NVA regarding availability VM needs to send a gratuitous Address Resolution message or
and usability of virtualized resources and interface package unsolicited Neighbor Advertisement message upstream after each
move.
3. Sending a confirmation message to the NVA with request for The VM lifecycle management is a complicated task, which is beyond
approval to adapt/adjust/modify the virtualized resources and the scope of this document. Not only it involves monitoring server
interface package for utilization in a service. utilization, balanced distribution of workload, etc., but also
needs to manage seamlessly VM migration from one server to
another.
9. Security Considerations 9. Security Considerations
Security threats for the data and control plane for overlay
networks are discussed in [RFC8014]. There are several issues in
a multi-tenant environment that create problems. In Layer-2 based
overlay data center networks, lack of security in VXLAN,
corruption of VNI can lead to delivery to wrong tenant. Also, ARP
in IPv4 and ND in IPv6 are not secure, especially if we accept
gratuitous versions. When these are done over a UDP
encapsulation, like VXLAN, the problem is worse since it is
trivial for a non-trusted entity to spoof UDP packets.
Security threats for the data and control plane are discussed in In Layer-3 based overlay data center networks, the problem of
[RFC8014]. There are several issues in a multi-tenant environment address spoofing may arise. An NVE may have untrusted tasks
that create problems. In L2 based data center networks, lack of attached. This usually happens in cases like the VMs (tasks)
security in VXLAN, corruption of VNI can lead to delivery to wrong running third party applications. This requires the usage of
tenant. Also, ARP in IPv4 and ND in IPv6 are not secure especially stronger security mechanisms.
if we accept gratuitous versions. When these are done over a UDP
encapsulation, like VXLAN, the problem is worse since it is trivial
for a non trusted application to spoof UDP packets.
In L3 based data center networks, the problem of address spoofing may 10. IANA Considerations
arise. As a result the destinations may contain untrusted hosts.
This usually happens in cases like the virtual machines running third
part applications. This requires the usage of stronger security
mechanisms.
10. IANA Considerations This document makes no request to IANA.
This document makes no request to IANA. 11. Acknowledgments
11. Acknowledgements The authors are grateful to Bob Briscoe, David Black, Dave R.
Worley, Qiang Zu, Andrew Malis for helpful comments.
The authors are grateful to Dave R. Worley, Qiang Zu, Andrew Malis 12. Change Log
for helpful comments.
12. Change Log . submitted version -00 as a working group draft after adoption
o submitted version -00 as a working group draft after adoption . submitted version -01 with these changes: references are updated,
o added packets in flight definition to Section 2
o submitted version -01 with these changes: references are updated, . submitted version -02 with updated address.
added packets in flight definition to Section 2
o submitted version -02 with updated address. . submitted version -03 to fix the nits.
o submitted version -03 to fix the nits. . submitted version -04 in reference to the WG Last call comments.
o submitted version -04 in reference to the WG Last call comments. . Submitted version - 05 to address IETF LC comments from TSV area.
13. References 13. References
13.1. Normative References 13.1. Normative References
[RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or
Converting Network Protocol Addresses to 48.bit Ethernet Converting Network Protocol Addresses to 48.bit Ethernet
Address for Transmission on Ethernet Hardware", STD 37, Address for Transmission on Ethernet Hardware", STD 37,
RFC 826, DOI 10.17487/RFC0826, November 1982, RFC 826, DOI 10.17487/RFC0826, November 1982,
<https://www.rfc-editor.org/info/rfc826>. <https://www.rfc-editor.org/info/rfc826>.
[RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A [RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A
Reverse Address Resolution Protocol", STD 38, RFC 903, Reverse Address Resolution Protocol", STD 38, RFC 903,
DOI 10.17487/RFC0903, June 1984, DOI 10.17487/RFC0903, June 1984, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc903>. editor.org/info/rfc903>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119, March 1997.
DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>.
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629,
DOI 10.17487/RFC2629, June 1999, DOI 10.17487/RFC2629, June 1999, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc2629>. editor.org/info/rfc2629>.
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman,
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, "Neighbor Discovery for IP version 6 (IPv6)", RFC 4861,
DOI 10.17487/RFC4861, September 2007, DOI 10.17487/RFC4861, September 2007, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc4861>. editor.org/info/rfc4861>.
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P.,
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual Kreeger, L., Sridhar, T., Bursell, M., and C. Wright,
eXtensible Local Area Network (VXLAN): A Framework for "Virtual eXtensible Local Area Network (VXLAN): A
Overlaying Virtualized Layer 2 Networks over Layer 3 Framework for Overlaying Virtualized Layer 2 Networks over
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, Layer 3 Networks", RFC 7348, DOI 10.17487/RFC7348, August
<https://www.rfc-editor.org/info/rfc7348>. 2014, <https://www.rfc-editor.org/info/rfc7348>.
[RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L., [RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L.,
Kreeger, L., and M. Napierala, "Problem Statement: Kreeger, L., and M. Napierala, "Problem Statement:
Overlays for Network Virtualization", RFC 7364, Overlays for Network Virtualization", RFC 7364, DOI
DOI 10.17487/RFC7364, October 2014, 10.17487/RFC7364, October 2014, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc7364>. editor.org/info/rfc7364>.
[RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T.
Narten, "An Architecture for Data-Center Network Narten, "An Architecture for Data-Center Network
Virtualization over Layer 3 (NVO3)", RFC 8014, Virtualization over Layer 3 (NVO3)", RFC 8014, DOI
DOI 10.17487/RFC8014, December 2016, 10.17487/RFC8014, December 2016, <https://www.rfc-
<https://www.rfc-editor.org/info/rfc8014>. editor.org/info/rfc8014>.
13.2. Informative references 13.2. Informative References
[I-D.herbert-nvo3-ila] [I-D.herbert-nvo3-ila] Herbert, T. and P. Lapukhov, "Identifier-
Herbert, T. and P. Lapukhov, "Identifier-locator locator addressing for IPv6", draft-herbert-nvo3-ila-04
addressing for IPv6", draft-herbert-nvo3-ila-04 (work in (work in progress), March 2017.
progress), March 2017.
Authors' Addresses Authors' Addresses
Linda Dunbar
Futurewei
Email: ldunbar@futurewei.com
Behcet Sarikaya Behcet Sarikaya
Denpel Informatique Denpel Informatique
Email: sarikaya@ieee.org Email: sarikaya@ieee.org
Linda Dunbar
Huawei USA
5340 Legacy Dr. Building 3
Plano, TX 75024
Email: linda.dunbar@huawei.com
Bhumip Khasnabish Bhumip Khasnabish
ZTE (TX) Inc. Independent
55 Madison Avenue, Suite 160 55 Madison Avenue, Suite 160
Morristown, NJ 07960 Morristown, NJ 07960
Email: vumip1@gmail.com
Email: vumip1@gmail.com, bhumip.khasnabish@ztetx.com
Tom Herbert Tom Herbert
Quantonium Intel
Email: tom@herbertland.com Email: tom@herbertland.com
Saumya Dikshit Saumya Dikshit
Cisco Systems Aruba-HPE
Cessna Business Park Bangalore, India
Bangalore, Karnataka, India 560 087 Email: saumya.dikshit@hpe.com
Email: sadikshi@cisco.com
 End of changes. 111 change blocks. 
449 lines changed or deleted 455 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/