draft-ietf-nvo3-vmm-05.txt   draft-ietf-nvo3-vmm-06.txt 
Network Working Group L. Dunbar Network Working Group L. Dunbar
Internet Draft Futurewei Internet Draft Futurewei
Intended status: Informational B. Sarikaya Intended status: Informational B. Sarikaya
Expires: Dec 2019 Denpel Informatique Expires: May 18, 2020 Denpel Informatique
B.Khasnabish B.Khasnabish
Independent Independent
T. Herbert T. Herbert
Intel Intel
S. Dikshit S. Dikshit
Aruba-HPE Aruba-HPE
August 22, 2019 November 18, 2019
Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks
draft-ietf-nvo3-vmm-05 draft-ietf-nvo3-vmm-06
Abstract Abstract
This document describes virtual machine mobility solutions commonly This document discusses Virtual Machine (VM) mobility solutions that
used in data centers built with overlay-based network. This document are commonly used in overlay-based Data Center (DC) networks. The
is intended for describing the solutions and the impact of moving objective is to describe the solutions and their impact on moving
VMs (or applications) from one Rack to another connected by the VMs (and applications) from one rack to another connected by the
Overlay networks. Overlay networks.
For layer 2, it is based on using an NVA (Network Virtualization For layer 2 networks, it is based on using an NVA (Network
Authority) - NVE (Network Virtualization Edge) protocol to update Virtualization Authority) - NVE (Network Virtualization Edge)
ARP (Address Resolution Protocol) table or neighbor cache entries protocol to update the ARP (Address Resolution Protocol) table or
after VM (virtual machine) moves from Old NVE to the New NVE. For neighbor cache entries after a VM (virtual machine) moves from an
Layer 3, it is based on address and connection migration after the Old NVE to a New NVE. For Layer 3, it is based on migration of
move. address and connection after the move.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.This Internet-Draft is submitted in
full conformance with the provisions of BCP 78 and BCP 79. This
This Internet-Draft is submitted in full conformance with the document may not be modified, and derivative works of it may not be
provisions of BCP 78 and BCP 79. This document may not be modified, created, except to publish it as an RFC and to translate it into
and derivative works of it may not be created, except to publish it languages other than English.
as an RFC and to translate it into languages other than English.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on February 22, 2009. This Internet-Draft will expire on May 10, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 43 skipping to change at page 2, line 40
respect to this document. Code Components extracted from this respect to this document. Code Components extracted from this
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
2. Conventions used in this document..............................4 2. Conventions used in this document..............................4
3. Requirements...................................................5 3. Requirements...................................................5
4. Overview of the VM Mobility Solutions..........................6 4. Overview of the VM Mobility Solutions..........................5
4.1. VM Migration in Layer 2 Network...........................6 4.1. VM Migration in Layer-2 Network...........................5
4.2. Task Migration in Layer-3 Network.........................7 4.2. Task Migration in Layer-3 Network.........................7
4.2.1. Address and Connection Migration in Task Migration...8 4.2.1. Address and Connection Migration in Task Migration...8
5. Handling Packets in Flight.....................................9 5. Handling Packets in Flight.....................................9
6. Moving Local State of VM......................................10 6. Moving Local State of VM......................................10
7. Handling of Hot, Warm and Cold VM Mobility....................10 7. Handling of Hot, Warm and Cold VM Mobility....................10
8. VM Operation..................................................11 8. VM Operation..................................................11
9. Security Considerations.......................................12 9. Security Considerations.......................................11
10. IANA Considerations..........................................12 10. IANA Considerations..........................................12
11. Acknowledgments..............................................12 11. Acknowledgments..............................................12
12. Change Log...................................................12 12. Change Log...................................................12
13. References...................................................13 13. References...................................................12
13.1. Normative References....................................13 13.1. Normative References....................................13
13.2. Informative References..................................14 13.2. Informative References..................................14
1. Introduction 1. Introduction
This document describes the overlay-based data center networks This document describes the overlay-based DC networking solutions
solutions in supporting multitenancy and VM (Virtual Machine) in support of multi-tenancy and VM mobility. Many large DCs,
mobility. Many large DCs, especially Cloud DCs, host tasks (or especially Cloud DCs, host tasks (or workloads) for multiple
workloads) for multiple tenants, which can be multiple departments tenants. A tenant can be a department of one organization or an
of one organization or multiple organizations. There is organization. There is communication among tasks belonging to one
communication among tasks belonging to one tenant and tenant and communication among tasks belonging to different
communications among tasks belonging to different tenants or with tenants or with external entities.
external entities.
Server Virtualization, which is being used in almost all of Server Virtualization, which is being used in almost all of
today's data centers, enables many VMs to run on a single physical today's DCs, enables many VMs to run on a single physical computer
computer or compute server sharing the processor/memory/storage. or server sharing the processor/memory/storage. Network
Network connectivity among VMs is provided by the network connectivity among VMs is provided by the network virtualization
virtualization edge (NVE) [RFC8014]. It is highly desirable edge (NVE) [RFC8014]. It is highly desirable [RFC7364] to allow
[RFC7364] to allow VMs to be moved dynamically (live, hot, or cold VMs to move dynamically (live, hot, or cold move) from one
move) from one server to another for dynamic load balancing or server to another for dynamic load balancing or optimized workload
optimized work distribution. distribution.
There are many challenges and requirements related to VM mobility There are many challenges and requirements related to VM mobility
in large data centers, including dynamic attaching/detaching VMs in large data centers, including dynamically attaching/detaching
to/from Virtual Network Edges (VNEs). Retaining IP addresses VMs to/from Virtual Network Edges (VNEs). In addition, retaining
after a move is a key requirement [RFC7364]. Such a requirement the IP addresses after a move is a key requirement [RFC7364].
is needed in order to maintain existing transport connections. Such a requirement is needed in order to maintain existing
transport connections.
In traditional Layer-3 based networks, retaining IP addresses In traditional Layer-3 based networks, retaining IP addresses
after a move is generally not recommended because the frequent after a move is generally not recommended because the frequent
move will cause non-aggregated IP addresses (a.k.a. fragmented IP move will cause fragmented IP addresses, which complicates IP
addresses), which introduces complexity in IP address management. address management.
In view of many VM mobility schemes that exist today, there is a In view of many VM mobility schemes that exist today, there is a
desire to document comprehensive VM mobility solutions that cover need to document comprehensive VM mobility solutions that cover
both IPv4 and IPv6. The large Data Center networks can be both IPv4 and IPv6. Large DC networks can be organized as one
organized as one large Layer-2 network geographically distributed large (a) Layer-2 network geographically distributed across
in several buildings/cities or Layer-3 networks with large number buildings/cities or (b) Layer-3 networks with large number of host
of host routes that cannot be aggregated as the result of frequent routes that cannot be aggregated as a result of frequent moves
move from one location to another without changing their IP from one location to another without changing the IP addresses.
addresses. The connectivity between Layer 2 boundaries can be
achieved by the network virtualization edge (NVE) functioning as The connectivity between Layer 2 boundaries can be achieved by the
Layer 3 gateway routing across bridging domain such as in NVE functioning as Layer-3 gateway, performing routing across
Warehouse Scale Computers (WSC). bridging domain such as in Warehouse Scale Computers (WSC).
2. Conventions used in this document 2. Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in "OPTIONAL" in this document are to be interpreted as described in
RFC 2119 [RFC2119] and [RFC8014]. RFC 2119 [RFC2119] and [RFC8014].
This document uses the terminology defined in [RFC7364]. In This document uses the terminology defined in [RFC7364]. In
addition, we make the following definitions: addition, we make the following definitions:
skipping to change at page 5, line 8 skipping to change at page 4, line 40
Warm VM Mobility: In case of warm VM mobility, the VM states are Warm VM Mobility: In case of warm VM mobility, the VM states are
mirrored to the secondary server (or domain) at a mirrored to the secondary server (or domain) at a
predefined (configurable) regular intervals. This predefined (configurable) regular intervals. This
reduces the overheads and complexity, but this may also reduces the overheads and complexity, but this may also
lead to a situation when both servers may not contain lead to a situation when both servers may not contain
the exact same data (state information) the exact same data (state information)
Cold VM Mobility: A given VM could be moved from one server to Cold VM Mobility: A given VM could be moved from one server to
another in stopped or suspended state. another in stopped or suspended state.
Old NVE: refers to the old NVE where packets were forwarded to Old NVE: This refers to the old NVE where packets were forwarded
before migration. to before migration.
New NVE: refers to the new NVE after migration. New NVE: This refers to the new NVE after migration.
Packets in flight: refers to the packets received by the Old NVE Packets in flight: This refers to the packets received by the Old
sent by the correspondents that have old ARP or neighbor NVE sent by the correspondents that have old ARP or
cache entry before VM or task migration. neighbor cache entry before VM or task migration.
Users of VMs in diskless systems or systems not using Users of VMs in diskless systems or the systems that are not
configuration files are called end user clients. using configuration files are called end user clients.
Cloud DC: Third party data centers that host applications, Cloud DC: Third party DCs that host applications, tasks or
tasks or workloads owned by different organizations or workloads and owned by different organizations or
tenants. tenants.
3. Requirements 3. Requirements
This section states requirements on data center network virtual This section states VM mobility requirements on DC networks.
machine mobility.
Data center network should support both IPv4 and IPv6 VM mobility. DC networks should support both IPv4 and IPv6 VM mobility.
Virtual machine mobility should not require changing their IP VM mobility should not require changing their IP addresses after the
addresses after the move. move.
There is "Hot Migration" with transport service continuing, and There exist "Hot Migration" where transport service continuity is
there is a "Cold Migration" with transport service restarted, i.e. maintained, and "Cold Migration" where the transport service needs
stop the task running on the Old NVE and move to the New NVE before to be restarted, i.e., execution of the tasks is stopped on the
restart as described in the Task Migration. "Old" NVE, moved to the "New" NVE and the task is restarted.
VM mobility solutions/procedures should minimize triangular routing VM mobility solutions/procedures should minimize triangular routing
except for handling packets in flight. except for handling packets in flight.
VM mobility solutions/procedures should not need to use tunneling VM mobility solutions/procedures should not need to use tunneling
except for handling packets in flight. except for handling packets in flight.
4. Overview of the VM Mobility Solutions 4. Overview of the VM Mobility Solutions
Layer 2 and Layer 3 mobility solutions are described respectively Layer-2 and Layer-3 mobility solutions are described respectively
in the following sections. in the following sections.
4.1. VM Migration in Layer 2 Network 4.1. VM Migration in Layer-2 Network
Being able to move VMs dynamically, from one server to another, Ability to move VMs dynamically, from one server to another, makes
makes it possible for dynamic load balancing or work distribution. it possible for dynamic load balancing or workload distribution.
Therefore, it is highly desirable for large scale multi-tenants
data centers.
In a Layer-2 based approach, VM moving to another server does not Therefore, this scheme is highly desirable for utilization in
change its IP address, but this VM is now under a new NVE, large scale multi-tenant DCs.
previously communicating NVEs will continue to send their packets
to the Old NVE. To solve this problem, Address Resolution
Protocol (ARP) cache in IPv4 [RFC0826] or neighbor cache in IPv6
[RFC4861] in the NVEs need to be updated. NVEs need to change
their caches associating the VM Layer-2 or Medium Access Control
(MAC) address with the NVE's IP address. Such a change enables
NVEs to encapsulate the outgoing MAC frames with the current
target NVE address. It may take some time to refresh ARP/ND cache
when a VM is moved to a New NVE. During this period, a tunnel is
needed so that Old NVE can forwards packets destined to the VM to
the New NVE.
In IPv4, the VM immediately after the move should send a In a Layer-2 based VM migration approach, a VM that is moving to
gratuitous ARP request message containing its IPv4 and Layer 2 MAC another server does not change its IP address. But since this VM
address in its new NVE. This message's destination address is the is now under a new NVE, previously communicating NVEs will
broadcast address. Old NVE receives this message. Both Old and continue sending their packets to the Old NVE. To solve this
New NVEs should update VM's ARP entry in the central directory at problem, Address Resolution Protocol (ARP) cache in IPv4 [RFC0826]
the NVA, to update its mappings to record the IPv4 address & MAC or neighbor cache in IPv6 [RFC4861] in the NVEs need to be updated
promptly. All NVEs need to change their caches associating the VM
Layer-2 or Medium Access Control (MAC) address with the new NVE's
IP address as soon as the VM moves. Such a change enables all NVEs
to encapsulate the outgoing MAC frames with the current target NVE
IP address. It may take some time to refresh the ARP/ND cache when
a VM has moved to a New NVE. During this period, a tunnel is
needed for that Old NVE to forward packets destined to the VM
under the New NVE.
In case of IPv4, immediately after the move, the VM should send a
gratuitous ARP request message containing its IPv4 and Layer-2 MAC
address to its new NVE. This message's destination address is the
broadcast address. Upon receiving this message, both old and new
NVEs should update the VM's ARP entry in the central directory at
the NVA, to update its mappings to record the IPv4 address and MAC
address of the moving VM along with the new NVE IPv4 address. An address of the moving VM along with the new NVE IPv4 address. An
NVE-to-NVA protocol is used for this purpose [RFC8014]. NVE-to-NVA protocol is used for this purpose [RFC8014].
Reverse ARP (RARP) which enables the host to discover its IPv4 Reverse ARP (RARP) which enables the host to discover its IPv4
address when it boots from a local server [RFC0903], is not used address when it boots from a local server [RFC0903], is not used
by VMs because the VM already knows its IPv4 address. Next, we by VMs because the VM already knows its IPv4 address. Next, we
describe a case where RARP is used. describe a case where RARP is used.
There are some vendor deployments (diskless systems or systems There are some vendor deployments (e.g., diskless systems or
without configuration files) wherein VM users, i.e. end-user systems without configuration files) where the VM's user, i.e.,
clients ask for the same MAC address upon migration. This can be end-user client asks for the same MAC address upon migration.
achieved by the clients sending RARP request message which carries This can be achieved by the clients sending RARP request message
the old MAC address looking for an IP address allocation. The which carries the MAC address looking for an IP address
server, in this case the new NVE needs to communicate with NVA, allocation. The server, in this case the new NVE, needs to
just like in the gratuitous ARP case to ensure that the same IPv4 communicate with NVA, just like in the gratuitous ARP case to
address is assigned to the VM. NVA uses the MAC address as the ensure that the same IPv4 address is assigned to the VM. NVA uses
key in the search of ARP cache to find the IP address and informs the MAC address as the key in the search of ARP cache to find the
this to the new NVE which in turn sends RARP reply message. This IP address and informs this to the new NVE which in turn sends
completes IP address assignment to the migrating VM. RARP reply message. This completes IP address assignment to the
migrating VM.
Other NVEs communicating with this VM could have the old ARP Other NVEs communicating with this VM could have the old ARP
entry. If any VMs in those NVEs need to communicate with the VM entry. If any VMs in those NVEs need to communicate with the VM
attached to the New NVE, old ARP entries might be used. Thus, the attached to the new NVE, old ARP entries might be used. Thus, the
packets are delivered to the Old NVE. The Old NVE MUST tunnel packets are delivered to the old NVE. The old NVE MUST tunnel
these in-flight packets to the New NVE. these in-flight packets to the new NVE.
When an ARP entry for those VMs times out, their corresponding When an ARP entry for those VMs times out, their corresponding
NVEs should access the NVA for an update. NVEs should access the NVA for an update.
IPv6 operation is slightly different: IPv6 operation is slightly different:
In IPv6, after the move, the VM immediately sends an unsolicited In IPv6, after the move, the VM immediately sends an unsolicited
neighbor advertisement message containing its IPv6 address and neighbor advertisement message containing its IPv6 address and
Layer-2 MAC address to its new NVE. This message is sent to the Layer-2 MAC address to its new NVE. This message is sent to the
IPv6 Solicited Node Multicast Address corresponding to the target IPv6 Solicited Node Multicast Address corresponding to the target
address which is the VM's IPv6 address. The NVE receiving this address which is the VM's IPv6 address. The NVE receiving this
message should send request to update VM's neighbor cache entry in message should send request to update VM's neighbor cache entry in
the central directory of the NVA. The NVA's neighbor cache entry the central directory of the NVA. The NVA's neighbor cache entry
should include IPv6 address of the VM, MAC address of the VM and should include IPv6 address of the VM, MAC address of the VM and
the NVE IPv6 address. An NVE-to-NVA protocol is used for this the NVE IPv6 address. An NVE-to-NVA protocol is used for this
purpose [RFC8014]. purpose [RFC8014].
Other NVEs communicating with this VM might still use the old Other NVEs communicating with this VM might still use the old
neighbor cache entry. If any VM in those NVEs need to communicate neighbor cache entry. If any VM in those NVEs need to communicate
with the VM attached to the New NVE, it could use the old neighbor with the VM attached to the new NVE, it could use the old neighbor
cache entry. Thus, the packets are delivered to the Old NVE. The cache entry. Thus, the packets are delivered to the old NVE. The
Old NVE MUST tunnel these in-flight packets to the New NVE. old NVE MUST tunnel these in-flight packets to the new NVE.
When a neighbor cache entry in those VMs times out, their When a neighbor cache entry in those VMs times out, their
corresponding NVEs should access the NVA for an update. corresponding NVEs should access the NVA for an update.
4.2. Task Migration in Layer-3 Network 4.2. Task Migration in Layer-3 Network
Layer-2 based data center networks become quickly prohibitive Layer-2 based DC networks become quickly prohibitive because
because ARP/neighbor caches don't scale. Scaling can be ARP/neighbor caches don't scale. Scaling can be accomplished
accomplished seamlessly Layer-3 data center networks by just seamlessly in Layer-3 data center networks by just giving each
giving each virtual network an IP subnet and a default route that virtual network an IP subnet and a default route that points to
points to NVE. This means no explosion of ARP/ neighbor cache in its NVE. This means no explosion of ARP/ neighbor cache in VMs
VMs and NVEs (just one ARP/ neighbor cache entry for default and NVEs (just one ARP/ neighbor cache entry for the default
route) and there is no need to have Ethernet header in route) and there is no need to have Ethernet header in
encapsulation [RFC7348] which saves at least 16 bytes. encapsulation [RFC7348] which saves at least 16 bytes.
Even though the term VM and Task are used interchangeably in this Even though the term VM and Task are used interchangeably in this
document, the term Task is used in the context of Layer-3 document, the term Task is used in the context of Layer-3
migration mainly to have slight emphasis on the moving an entity migration mainly to have slight emphasis on the task of moving an
(Task) that is instantiated on a VM or a container. entity that is instantiated on a VM or a container.
Traditional Layer-3 based data center networks require IP address Traditional Layer-3 based DC networks require IP address of the
of the task to change after moving because the prefixes of the IP task to change after moving because the pre-fixes of the IP
address usually reflect the locations. It is necessary to have an address usually reflect the locations. It is necessary to have an
IP based VM migration solution that can allow IP addresses staying IP based VM migration solution that can allow IP addresses staying
the same after moving to different locations. The Identifier the same after the VMs move to different locations. The Identifier
Locator Addressing or ILA [I-D.herbert-nvo3-ila] is one of such Locator Addressing or ILA [I-D.herbert-nvo3-ila] is one of such
solutions. solutions.
Because broadcasting is not available in Layer-3 based networks, Because broadcasting is not available in Layer-3 based networks,
multicast of neighbor solicitations in IPv6 would need to be multicast of neighbor solicitations in IPv6 would need to be
emulated. emulated.
Cold task migration, which is a common practice in many data Cold task migration, which is a common practice in many data
centers, involves the following steps: centers, involves the following steps:
- Stop running the task. - Stop running the task.
- Package the runtime state of the job. - Package the runtime state of the job.
- Send the runtime state of the task to the New NVE where the - Send the runtime state of the task to the new NVE where the
task is to run. task is to run.
- Instantiate the task's state on the new machine. - Instantiate the task's state on the new machine.
- Start the tasks for the task continuing from the point at which - Start the tasks continuing it from the point at which it was
it was stopped. stopped.
Address migration and connection migration in moving tasks or VMs Address migration and connection migration in moving tasks or VMs
are addressed next. are addressed next.
4.2.1. Address and Connection Migration in Task Migration 4.2.1. Address and Connection Migration in Task Migration
Address migration is achieved as follows: Address migration is achieved as follows:
- Configure IPv4/v6 address on the target Task. - Configure IPv4/v6 address on the target Task.
- Suspend use of the address on the old Task. This includes - Suspend use of the address on the old Task. This includes
skipping to change at page 9, line 4 skipping to change at page 8, line 41
4.2.1. Address and Connection Migration in Task Migration 4.2.1. Address and Connection Migration in Task Migration
Address migration is achieved as follows: Address migration is achieved as follows:
- Configure IPv4/v6 address on the target Task. - Configure IPv4/v6 address on the target Task.
- Suspend use of the address on the old Task. This includes - Suspend use of the address on the old Task. This includes
handling established connections. A state may be established handling established connections. A state may be established
to drop packets or send ICMPv4 or ICMPv6 destination to drop packets or send ICMPv4 or ICMPv6 destination
unreachable message when packets to the migrated address are unreachable message when packets to the migrated address are
received. received.
- Push the new mapping to VM. Communicating VMs will learn of - Push the new mapping to VM. Communicating VMs will learn of
the new mapping via a control plane either by participation in the new mapping via a control plane either by participating in
a protocol for mapping propagation or by getting the new a protocol for mapping propagation or by getting the new
mapping from a central database such as Domain Name System mapping from a central database such as Domain Name System
(DNS). (DNS).
Connection migration involves reestablishing existing TCP Connection migration involves reestablishing existing TCP
connections of the task in the new place. connections of the task in the new place.
The simplest course of action is to drop TCP connections across a The simplest course of action is to drop all TCP connections to
migration. It the migrations are relatively rare events, it is the VM across a migration. If the migrations are relatively rare
conceivable that TCP connections could be automatically closed in events in a data center, impact is relatively small when TCP
the network stack during a migration event. If the applications connections are automatically closed in the network stack during a
running are known to handle this gracefully (i.e. reopen dropped migration event. If the applications running are known to handle
connections) then this may be viable. this gracefully (i.e. reopen dropped connections) then this
approach may be viable.
More involved approach to connection migration entails pausing the More involved approach to connection migration entails pausing the
connection, packaging connection state and sending to target, connection, packaging connection state and sending to target,
instantiating connection state in the peer stack, and restarting instantiating connection state in the peer stack, and restarting
the connection. From the time the connection is paused to the the connection. From the time the connection is paused to the
time it is running again in the new stack, packets received for time it is running again in the new stack, packets received for
the connection could be silently dropped. For some period of the connection could be silently dropped. For some period of
time, the old stack will need to keep a record of the migrated time, the old stack will need to keep a record of the migrated
connection. If it receives a packet, it can either silently drop connection. If it receives a packet, it can either silently drop
the packet or forward it to the new location, similarly as in the packet or forward it to the new location, as described in
Section 5. Section 5.
5. Handling Packets in Flight 5. Handling Packets in Flight
The Old NVE may receive packets from the VM's ongoing The Old NVE may receive packets from the VM's ongoing
communications and these packets should not be lost, and they communications. These packets should not be lost; they should be
should be sent to the New NVE to be delivered to the VM. The sent to the New NVE to be delivered to the VM. The steps involved
steps involved in handling packets in flight are as follows: in handling packets in flight are as follows:
Preparation Step: It takes some time, possibly a few seconds for Preparation Step: It takes some time, possibly a few seconds for
a VM to move from its Old NVE to a New NVE. During this period, a a VM to move from its Old NVE to a New NVE. During this period, a
tunnel needs to be established so that the Old NVE can forward tunnel needs to be established so that the Old NVE can forward
packets to the New NVE. Old NVE gets New NVE address from NVA in packets to the New NVE. Old NVE gets New NVE address from NVA in
the request to move the VM. The Old NVE can store the New NVE the request to move the VM. The Old NVE can store the New NVE
address for the VM with a timer. When the timer expired, the entry address for the VM with a timer. When the timer expired, the entry
for the New NVE for the VM can be deleted. for the New NVE for the VM can be deleted.
Tunnel Establishment - IPv6: Inflight packets are tunneled to the Tunnel Establishment - IPv6: Inflight packets are tunneled to the
skipping to change at page 10, line 16 skipping to change at page 10, line 9
New NVE using the encapsulation protocol such as VXLAN in IPv4. New NVE using the encapsulation protocol such as VXLAN in IPv4.
Tunneling Packets - IPv6: IPv6 packets received for the migrating Tunneling Packets - IPv6: IPv6 packets received for the migrating
VM are encapsulated in an IPv6 header at the Old NVE. New NVE VM are encapsulated in an IPv6 header at the Old NVE. New NVE
decapsulates the packet and sends IPv6 packet to the migrating VM. decapsulates the packet and sends IPv6 packet to the migrating VM.
Tunneling Packets - IPv4: IPv4 packets received for the migrating Tunneling Packets - IPv4: IPv4 packets received for the migrating
VM are encapsulated in an IPv4 header at the Old NVE. New NVE VM are encapsulated in an IPv4 header at the Old NVE. New NVE
decapsulates the packet and sends IPv4 packet to the migrating VM. decapsulates the packet and sends IPv4 packet to the migrating VM.
Stop Tunneling Packets: When Old NVE stops receiving packets Stop Tunneling Packets: When the Timer for storing the New NVE
destined to the VM that has just moved to the New NVE. The Timer address for the VM expires. The Timer should be long enough for
for storing the New NVE address for the VM should be long enough all other NVEs that need to communicate with the VM to get their
for all other NVEs that need to communicate with the VM to get NVE-VM cache entries updated.
their NVE-VM cache entries updated.
6. Moving Local State of VM 6. Moving Local State of VM
In addition to the VM mobility related signaling (VM Mobility In addition to the VM mobility related signaling (VM Mobility
Registration Request/Reply), the VM state needs to be transferred Registration Request/Reply), the VM state needs to be transferred
to the New NVE. The state includes its memory and file system if to the New NVE. The state includes its memory and file system if
the VM cannot access the memory and the file system after moved to the VM cannot access the memory and the file system after moving
the New NVE. Old NVE opens a TCP connection with New NVE over to the New NVE. Old NVE opens a TCP connection with New NVE over
which VM's memory state is transferred. which VM's memory state is transferred.
File system or local storage is more complicated to transfer. The File system or local storage is more complicated to transfer. The
transfer should ensure consistency, i.e. the VM at the New NVE transfer should ensure consistency, i.e. the VM at the New NVE
should find the same file system it had at the Old NVE. Pre- should find the same file system it had at the Old NVE. Pre-
copying is a commonly used technique for transferring the file copying is a commonly used technique for transferring the file
system. First the whole disk image is transferred while VM system. First the whole disk image is transferred while VM
continues to run. After the VM is moved any changes in the file continues to run. After the VM is moved, any changes in the file
system are packaged together and sent to the New NVE Hypervisor system are packaged together and sent to the New NVE Hypervisor
which reflects these changes to the file system locally at the which reflects these changes to the file system locally at the
destination. destination.
7. Handling of Hot, Warm and Cold VM Mobility 7. Handling of Hot, Warm and Cold VM Mobility
Both Cold and Warm VM mobility (or migration) refers to the VM Both Cold and Warm VM mobility (migration), refers to the VM being
being completely shut down at the Old NVE before restarted at the completely shut down at the old NVE before restarted at the new
New NVE. Therefore, all transport services to the VM are NVE. Therefore, all transport services to the VM need to restart.
restarted.
Upon starting at the New NVE, the VM should send an ARP or Upon starting at the new NVE, the VM should send an ARP or
Neighbor Discovery message. Cold VM mobility also allows the Old Neighbor Discovery message. Cold VM mobility also allows the Old
NVE and all communicating NVEs to time out ARP/neighbor cache NVE and all communicating NVEs to time out ARP/neighbor cache
entries of the VM. It is necessary for the NVA to push the entries of the VM. It is necessary for the NVA to push the
updated ARP/neighbor cache entry to NVEs or for NVEs to pull the updated ARP/neighbor cache entry to NVEs or for NVEs to pull the
updated ARP/neighbor cache entry from NVA. updated ARP/neighbor cache entry from NVA.
The Cold VM mobility can be facilitated by cold standby entity The Cold VM mobility can be facilitated by cold standby entity
receiving scheduled backup information. The cold standby entity receiving scheduled backup information. The cold standby entity
can be a VM or can be other form factors which is beyond the scope can be a VM or other form factors which is beyond the scope of
of this document. The cold mobility option can be used for non- this document. The cold mobility option can be used for non-
critical applications and services that can tolerate interrupted critical applications and services that can tolerate interrupted
TCP connections. TCP connections.
The Warm VM mobility refers the backup entities receive backup The Warm VM mobility refers the backup entities receive backup
information at more frequent intervals. The duration of the information at more frequent intervals. The duration of the
interval determines the warmth of the option. The larger the interval determines the warmth of the option. The larger the
duration, the less warm (and hence cold) the Warm VM mobility duration, the less warm (and hence cold) the Warm VM mobility
option becomes. option becomes.
There is also a Hot Standby option in addition to the Hot There is also a Hot Standby option in addition to the Hot
Mobility, where there are VMs in both primary and secondary NVEs Mobility, where there are VMs in both primary and secondary NVEs.
and they identical information and can provide services They have identical information and can provide services
simultaneously as in load-share mode of operation. If the VMs in simultaneously as in load-share mode of operation. If the VM in
the primary NVE fails, there is no need to actively move the VMs the primary NVE fails, there is no need to actively move the VM to
to the secondary NVE because the VMs in the secondary NVE already the secondary NVE because the VM in the secondary NVE already
contain identical information. The hot standby option is the most contains identical information. The Hot Standby option is the
costly mechanism, and hence this option is utilized only for costliest mechanism, and hence this option is utilized only for
mission-critical applications and services. In hot standby mission-critical applications and services. In Hot Standby
option, regarding TCP connections, one option is to start with and option, regarding TCP connections, one option is to start with and
maintain TCP connections to two different VMs at the same time. maintain TCP connections to two different VMs at the same time.
The least loaded VM responds first and pickup providing service The least loaded VM responds first and starts providing service
while the sender (origin) still continues to receive Ack from the while the sender (origin) still continues to receive Ack from the
heavily loaded (secondary) VM and chooses not use the service of heavily loaded (secondary) VM and chooses not to use the service
the secondary responding VM. If the situation (loading condition of the secondary responding VM. If the situation (loading
of the primary responding VM) changes the secondary responding VM condition of the primary responding VM) changes the secondary VM
may start providing service to the sender (origin). may start providing service to the sender (origin).
8. VM Operation 8. VM Operation
Once VM moves to a New NVE, VM IP address does not change and VM Once a VM moves to a new NVE, the VM's IP address does not change
should be able to continue to receive packets to its address(es). and the VM should be able to continue to receive packets to its
address(es).
VM needs to send a gratuitous Address Resolution message or The VM needs to send a gratuitous Address Resolution message or
unsolicited Neighbor Advertisement message upstream after each unsolicited Neighbor Advertisement message upstream after each
move. move.
The VM lifecycle management is a complicated task, which is beyond The VM lifecycle management is a complicated task, which is beyond
the scope of this document. Not only it involves monitoring server the scope of this document. Not only it involves monitoring server
utilization, balanced distribution of workload, etc., but also utilization, balancing the distribution of workload, etc., but
needs to manage seamlessly VM migration from one server to also needs seamless management VM migration from one server to
another. another.
9. Security Considerations 9. Security Considerations
Security threats for the data and control plane for overlay Security threats for the data and control plane for overlay
networks are discussed in [RFC8014]. There are several issues in networks are discussed in [RFC8014]. There are several issues in
a multi-tenant environment that create problems. In Layer-2 based a multi-tenant environment that create problems. In Layer-2 based
overlay data center networks, lack of security in VXLAN, overlay DC networks, lack of security in VXLAN, and corruption of
corruption of VNI can lead to delivery to wrong tenant. Also, ARP VNI can lead to delivery of information to the wrong tenant.
in IPv4 and ND in IPv6 are not secure, especially if we accept
gratuitous versions. When these are done over a UDP Also, ARP in IPv4 and ND in IPv6 are not secure, especially if we
encapsulation, like VXLAN, the problem is worse since it is accept the gratuitous versions. When these are done over a UDP
encapsulation, as in VXLAN, the problem gets worse since it is
trivial for a non-trusted entity to spoof UDP packets. trivial for a non-trusted entity to spoof UDP packets.
In Layer-3 based overlay data center networks, the problem of In Layer-3 based overlay data center networks, the problem of
address spoofing may arise. An NVE may have untrusted tasks address spoofing may arise. An NVE may have untrusted tasks
attached. This usually happens in cases like the VMs (tasks) attached to it. This usually happens in situations when the VMs
running third party applications. This requires the usage of (tasks) running third party applications. This requires the usage
stronger security mechanisms. of stronger security mechanisms.
10. IANA Considerations 10. IANA Considerations
This document makes no request to IANA. This document makes no request to IANA.
11. Acknowledgments 11. Acknowledgments
The authors are grateful to Bob Briscoe, David Black, Dave R. The authors are grateful to Bob Briscoe, David Black, Dave R.
Worley, Qiang Zu, Andrew Malis for helpful comments. Worley, Qiang Zu, and Andrew Malis for helpful comments.
12. Change Log 12. Change Log
. submitted version -00 as a working group draft after adoption . submitted version -00 as a working group draft after adoption
. submitted version -01 with these changes: references are updated, . submitted version -01 with these changes: references are updated,
o added packets in flight definition to Section 2 o added packets in flight definition to Section 2
. submitted version -02 with updated address. . submitted version -02 with updated address.
skipping to change at page 15, line 17 skipping to change at page 15, line 17
Linda Dunbar Linda Dunbar
Futurewei Futurewei
Email: ldunbar@futurewei.com Email: ldunbar@futurewei.com
Behcet Sarikaya Behcet Sarikaya
Denpel Informatique Denpel Informatique
Email: sarikaya@ieee.org Email: sarikaya@ieee.org
Bhumip Khasnabish Bhumip Khasnabish
Independent Independent
55 Madison Avenue, Suite 160
Morristown, NJ 07960
Email: vumip1@gmail.com Email: vumip1@gmail.com
Tom Herbert Tom Herbert
Intel Intel
Email: tom@herbertland.com Email: tom@herbertland.com
Saumya Dikshit Saumya Dikshit
Aruba-HPE Aruba-HPE
Bangalore, India Bangalore, India
Email: saumya.dikshit@hpe.com Email: saumya.dikshit@hpe.com
 End of changes. 59 change blocks. 
186 lines changed or deleted 184 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/