draft-ietf-nvo3-vmm-14.txt   draft-ietf-nvo3-vmm-15.txt 
Network Working Group L. Dunbar Network Working Group L. Dunbar
Internet Draft Futurewei Internet Draft Futurewei
Intended status: Informational B. Sarikaya Intended status: Informational B. Sarikaya
Expires: October 1, 2020 Denpel Informatique Expires: December 15, 2020 Denpel Informatique
B.Khasnabish B.Khasnabish
Independent Independent
T. Herbert T. Herbert
Intel Intel
S. Dikshit S. Dikshit
Aruba-HPE Aruba-HPE
April 1, 2020 June 15, 2020
Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks
draft-ietf-nvo3-vmm-14 draft-ietf-nvo3-vmm-15
Abstract Abstract
This document describes virtual machine mobility solutions commonly This document describes virtual machine (VM) mobility solutions
used in data centers built with overlay-based network. This document commonly used in data centers built with an overlay network. This
is intended for describing the solutions and the impact of moving document is intended for describing the solutions and the impact of
VMs (or applications) from one Rack to another connected by the moving VMs, or applications, from one rack to another connected by
Overlay networks. the overlay network.
For layer 2, it is based on using an NVA (Network Virtualization For layer 2, it is based on using an NVA (Network Virtualization
Authority) - NVE (Network Virtualization Edge) protocol to update Authority) to NVE (Network Virtualization Edge) protocol to update
ARP (Address Resolution Protocol) table or neighbor cache entries ARP (Address Resolution Protocol) tables or neighbor cache entries
after a VM (virtual machine) moves from an Old NVE to a New NVE. after a VM moves from an old NVE to a new NVE. For Layer 3, it is
For Layer 3, it is based on address and connection migration after based on address and connection migration after the move.
the move.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified, provisions of BCP 78 and BCP 79. This document may not be modified,
and derivative works of it may not be created, except to publish it and derivative works of it may not be created, except to publish it
as an RFC and to translate it into languages other than English. as an RFC and to translate it into languages other than English.
skipping to change at page 2, line 21 skipping to change at page 2, line 18
months and may be updated, replaced, or obsoleted by other documents months and may be updated, replaced, or obsoleted by other documents
at any time. It is inappropriate to use Internet-Drafts as at any time. It is inappropriate to use Internet-Drafts as
reference material or to cite them other than as "work in progress." reference material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html http://www.ietf.org/shadow.html
This Internet-Draft will expire on October 1, 2020. This Internet-Draft will expire on December 10, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 44 skipping to change at page 2, line 41
document must include Simplified BSD License text as described in document must include Simplified BSD License text as described in
Section 4.e of the Trust Legal Provisions and are provided without Section 4.e of the Trust Legal Provisions and are provided without
warranty as described in the Simplified BSD License. warranty as described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction...................................................3
2. Conventions used in this document..............................4 2. Conventions used in this document..............................4
3. Requirements...................................................5 3. Requirements...................................................5
4. Overview of the VM Mobility Solutions..........................6 4. Overview of the VM Mobility Solutions..........................6
4.1. Inter-VNs and External Communication......................6 4.1. Inter-VN and External Communication.......................6
4.2. VM Migration in Layer 2 Network...........................6 4.2. VM Migration in a Layer 2 Network.........................6
4.3. VM Migration in Layer-3 Network...........................8 4.3. VM Migration in Layer-3 Network...........................8
4.4. Address and Connection Management in VM Migration.........9 4.4. Address and Connection Management in VM Migration.........9
5. Handling Packets in Flight....................................10 5. Handling Packets in Flight....................................10
6. Moving Local State of VM......................................11 6. Moving Local State of VM......................................10
7. Handling of Hot, Warm and Cold VM Mobility....................11 7. Handling of Hot, Warm and Cold VM Mobility....................11
8. Other Options.................................................12 8. Other Options.................................................12
9. VM Lifecycle Management.......................................13 9. VM Lifecycle Management.......................................13
10. Security Considerations......................................13 10. Security Considerations......................................13
11. IANA Considerations..........................................14 11. IANA Considerations..........................................14
12. Acknowledgments..............................................14 12. Acknowledgments..............................................14
13. Change Log...................................................14 13. Change Log...................................................14
14. References...................................................14 14. References...................................................14
14.1. Normative References....................................15 14.1. Normative References....................................14
14.2. Informative References..................................16 14.2. Informative References..................................16
1. Introduction 1. Introduction
This document describes the overlay-based data center networks This document describes the overlay-based data center network
solutions in supporting multitenancy and VM (Virtual Machine) solutions in support of multitenancy and VM mobility. Being able
mobility. Being able to move VMs dynamically, from one server to to move VMs dynamically, from one server to another, makes it
another, makes it possible for dynamic load balancing or work possible for dynamic load balancing or work distribution.
distribution. Therefore, dynamic VM Mobility is highly desirable Therefore, dynamic VM Mobility is highly desirable for large scale
for large scale multi-tenant DCs. multi-tenant DCs.
This document is strictly within the DCVPN, as defined by the NVO3 This document is strictly within the DCVPN, as defined by the NVO3
Framework [RFC 7365]. The intent is to describe Layer 2 and Layer Framework [RFC 7365]. The intent is to describe Layer 2 and Layer
3 Network behavior when VMs are moved from one NVE to another. 3 Network behavior when VMs are moved from one NVE to another.
This document assumes that the VMs move is initiated by VM This document assumes that the VM's move is initiated by the VM
management system, i.e. planed move. How and when to move VM are management system, i.e. planed move. How and when to move VMs is
out of the scope of this document. RFC7666 already has the out of the scope of this document. RFC7666 already has the
description of the MIB for VMs controlled by Hypervisor. The description of the MIB for VMs controlled by Hypervisor. The
impact of VM mobility on higher layer protocols and applications impact of VM mobility on higher layer protocols and applications
is outside its scope. is outside its scope.
Many large DCs (Data Centers), especially Cloud DCs, host tasks Many large DCs (Data Centers), especially Cloud DCs, host tasks
(or workloads) for multiple tenants. A tenant can be a department (or workloads) for multiple tenants. A tenant can be an
of one organization or an organization. There are communications organization or a department of an organization. There are
among tasks belonging to one tenant and communications among tasks communications among tasks belonging to one tenant and
belonging to different tenants or with external entities. communications among tasks belonging to different tenants or with
external entities.
Server Virtualization, which is being used in almost all of Server Virtualization, which is being used in almost all of
today's data centers, enables many VMs to run on a single physical today's data centers, enables many VMs to run on a single physical
computer or server sharing the processor/memory/storage. Network computer or server sharing the processor/memory/storage. Network
connectivity among VMs is provided by the network virtualization connectivity among VMs is provided by the network virtualization
edge (NVE) [RFC8014]. It is highly desirable [RFC7364] to allow edge (NVE) [RFC8014]. It is highly desirable [RFC7364] to allow
VMs to be moved dynamically (live, hot, or cold move) from one VMs to be moved dynamically (live, hot, or cold move) from one
server to another for dynamic load balancing or optimized work server to another for dynamic load balancing or optimized work
distribution. distribution.
There are many challenges and requirements related to VM mobility There are many challenges and requirements related to VM mobility
in large data centers, including dynamic attaching/detaching VMs in large data centers, including dynamic attachment and detachment
to/from Virtual Network Edges (VNEs). In addition, retaining IP of VMs to/from Virtual Network Edges (VNEs). In addition,
addresses after a move is a key requirement [RFC7364]. Such a retaining IP addresses after a move is a key requirement
requirement is needed in order to maintain existing transport [RFC7364]. Such a requirement is needed in order to maintain
connections. existing transport layer connections.
In traditional Layer-3 based networks, retaining IP addresses In traditional Layer-3 based networks, retaining IP addresses
after a move is generally not recommended because the frequent after a move is generally not recommended because frequent moves
move will cause fragmented IP addresses, which introduces will cause fragmented IP addresses, which introduces complexity in
complexity in IP address management. IP address management.
In view of many VM mobility schemes that exist today, there is a In view of the many VM mobility schemes that exist today, there is
desire to document comprehensive VM mobility solutions that cover a desire to document comprehensive VM mobility solutions that
both IPv4 and IPv6. The large Data Center networks can be cover both IPv4 and IPv6. The large Data Center networks can be
organized as one large Layer-2 network geographically distributed organized as one large Layer-2 network geographically distributed
in several buildings/cities or Layer-3 networks with large number in several buildings/cities or Layer-3 networks with large number
of host routes that cannot be aggregated as the result of frequent of host routes that cannot be aggregated as the result of frequent
moves from one location to another without changing their IP moves from one location to another without changing their IP
addresses. The connectivity between Layer 2 boundaries can be addresses. The connectivity between Layer 2 boundaries can be
achieved by the network virtualization edge (NVE) functioning as achieved by the NVE functioning as a Layer 3 gateway router
Layer 3 gateway routing across bridging domain such as in across bridging domains.
Warehouse Scale Computers (WSC).
2. Conventions used in this document 2. Conventions used in this document
This document uses the terminology defined in [RFC7364]. In This document uses the terminology defined in [RFC7364]. In
addition, we make the following definitions: addition, we make the following definitions:
VM: Virtual Machine VM: Virtual Machine
Tasks: Task is a program instantiated or running on a virtual Task: A task is a program instantiated or running on a VM or a
machine or container. Tasks in virtual machines or container. Tasks running in VMs or containers can be
containers can be migrated from one server to another. migrated from one server to another. We use task,
We use task, workload and virtual machine workload and VM interchangeably in this document.
interchangeably in this document.
Hot VM Mobility: A given VM could be moved from one server to Hot VM Mobility: A given VM could be moved from one server to
another in running state. another in a running state without terminating the VM.
Warm VM Mobility: In case of warm VM mobility, the VM states are Warm VM Mobility: In case of warm VM mobility, the VM states are
mirrored to the secondary server (or domain) at a mirrored to the secondary server (or domain) at
predefined (configurable) regular intervals. This predefined regular intervals. This reduces the
reduces the overheads and complexity, but this may also overheads and complexity, but this may also lead to a
lead to a situation when both servers may not contain situation when both servers may not contain the exact
the exact same data (state information) same data (state information)
Cold VM Mobility: A given VM could be moved from one server to Cold VM Mobility: A given VM could be moved from one server to
another in stopped or suspended state. another in stopped or suspended state.
Old NVE: refers to the old NVE where packets were forwarded to Old NVE: refers to the old NVE where packets were forwarded to
before migration. before migration.
New NVE: refers to the new NVE after migration. New NVE: refers to the new NVE after migration.
Packets in flight: refers to the packets received by the Old NVE Packets in flight: refers to the packets received by the old NVE
sent by the correspondents that have old ARP or neighbor sent by the correspondents that have old ARP or neighbor
cache entry before VM or task migration. cache entry before VM or task migration.
Users of VMs in diskless systems or systems not using Users of VMs in diskless systems or systems not using
configuration files are called end user clients. configuration files are called end user clients.
Cloud DC: Third party data centers that host applications, Cloud DC: Third party data centers that host applications,
tasks or workloads owned by different organizations or tasks or workloads owned by different organizations or
tenants. tenants.
3. Requirements 3. Requirements
This section states requirements on data center network virtual This section states requirements on data center network VM mobility.
machine mobility.
Data center network should support both IPv4 and IPv6 VM mobility.
Virtual machine (VM) mobility should not require changing VMs' IP
addresses after the move.
There is "Hot Migration" with transport service continuing, and
"Cold Migration" with transport service restarted, i.e. the task
running is stopped on the Old NVE, moved to the New NVE and the task
is restarted. Not all DCs support "Hot Migration. DCs that only
support Cold Migration should make their customers aware of the
potential service interruption during the Cold Migration.
VM mobility solutions/procedures should minimize triangular routing
except for handling packets in flight.
VM mobility solutions/procedures should not need to use tunneling - Data center network should support both IPv4 and IPv6 VM mobility.
except for handling packets in flight. - VM mobility should not require changing an VM's IP address(es) after
the move.
- "Hot Migration" requires the transport service continuity across the
move, while in "Cold Migration" the transport service is restarted,
i.e. the task is stopped on the old NVE, is moved to the new NVE and
then restarted. Not all DCs support "Hot Migration. DCs that only
support Cold Migration should make their customers aware of the
potential service interruption during a Cold Migration.
- VM mobility solutions/procedures should minimize triangular routing
except for handling packets in flight.
- VM mobility solutions/procedures should not need to use tunneling
except for handling packets in flight.
4. Overview of the VM Mobility Solutions 4. Overview of the VM Mobility Solutions
Layer 2 and Layer 3 mobility solutions are described respectively 4.1. Inter-VN and External Communication
in the following sections.
4.1. Inter-VNs and External Communication
Inter VNs (Virtual Networks) communication refers to communication Inter VN (Virtual Network) communication refers to communication
among tenants (or hosts) belonging to different VNs. Those tenants among tenants (or hosts) belonging to different VNs. Those tenants
can be attached to the NVEs co-located in the same Data Center or can be attached to the NVEs co-located in the same Data Center or
in different Data centers. When a VM communicates with an external in different Data centers. When a VM communicates with an external
entity, the VM is effectively communicating with a peer in a entity, the VM is effectively communicating with a peer in a
different network or a globally reachable host. different network or a globally reachable host.
This document assumes that the inter-VNs communication and This document assumes that the inter-VNs communication and
the communication with external entities are via NVO3 Gateway the communication with external entities are via NVO3 Gateway
functionality as described in Section 5.3 of RFC 8014 functionality as described in Section 5.3 of RFC 8014
[RFC8014]. NVO3 Gateways relay traffic onto and off of a virtual [RFC8014]. NVO3 Gateways relay traffic onto and off of a virtual
network, enabling communication both across different VNs and with network, enabling communication both across different VNs and with
external entities. external entities.
NVO3 Gateway functionality enforces appropriate policies to NVO3 Gateway functionality enforces appropriate policies to
control communication among VNs and with external entities (e.g., control communication among VNs and with external entities (e.g.,
hosts). hosts).
Moving a VM a new NVE may move the VM away from the NVO3 Moving a VM to a new NVE may move the VM away from the NVO3
Gateway(s) used by the VM's traffic, e.g., some traffic may be Gateway(s) used by the VM's traffic, e.g., some traffic may be
better handled by an NVO3 Gateway that is closer to the new NVE better handled by an NVO3 Gateway that is closer to the new NVE
than the NVO3 Gateway that was used before the VM move. If NVO3 than the NVO3 Gateway that was used before the VM move. If NVO3
Gateway changes are not possible for some reason, then the VM's Gateway changes are not possible for some reason, then the VM's
traffic can continue to use the prior NVO3 Gateway(s), which may traffic can continue to use the prior NVO3 Gateway(s), which may
have some drawbacks, e.g., longer network paths. have some drawbacks, e.g., longer network paths.
4.2. VM Migration in Layer 2 Network 4.2. VM Migration in a Layer 2 Network
In a Layer-2 based approach, VM moving to another NVE does not In a Layer-2 based approach, a VM moving to another NVE does not
change its IP address. But this VM is now under a new NVE, change its IP address. But this VM is now under a new NVE,
previously communicating NVEs may continue sending their packets previously communicating NVEs may continue sending their packets
to the Old NVE. Therefore, Address Resolution Protocol (ARP) to the old NVE. Therefore, the previously communicating NVEs need
cache in IPv4 [RFC0826] or neighbor cache in IPv6 [RFC4861] in the to promptly update their Address Resolution Protocol (ARP) caches
NVEs that have attached VMs communicating with the VM being moved of IPv4 [RFC0826] or neighbor caches of IPv6 [RFC4861] . If the VM
need to be updated promptly. If the VM being moved has being moved has communication with external entities, the NVO3
communication with external entities, the NVO3 gateway needs to be gateway needs to be notified of the new NVE where the VM is moved
notified of the new NVE where the VM is moved to. to.
In IPv4, the VM immediately after the move should send a In IPv4, the VM immediately after the move should send a
gratuitous ARP request message containing its IPv4 and Layer 2 MAC gratuitous ARP request message containing its IPv4 and Layer 2 MAC
address in its new NVE. Upon receiving this message, the New NVE address in its new NVE. Upon receiving this message, the new NVE
can update its ARP cache. The New NVE should send a notification can update its ARP cache. The new NVE should send a notification
of the newly attached VM to the central directory [RFC7067] of the newly attached VM to the central directory [RFC7067]
embedded in the NVA to update the mapping of the IPv4 address & embedded in the NVA to update the mapping of the IPv4 address &
MAC address of the moving VM along with the new NVE address. An MAC address of the moving VM along with the new NVE address. An
NVE-to-NVA protocol is used for this purpose [RFC8014]. The old NVE-to-NVA protocol is used for this purpose [RFC8014]. The old
NVE, upon a VM is moved away, should send an ARP scan to all its NVE, upon a VM is moved away, should send an ARP scan to all its
attached VMs to refresh its ARP Cache. attached VMs to refresh its ARP Cache.
Reverse ARP (RARP) which enables the host to discover its IPv4 Reverse ARP (RARP) which enables the host to discover its IPv4
address when it boots from a local server [RFC0903], is not used address when it boots from a local server [RFC0903], is not used
by VMs if the VM already knows its IPv4 address (most common by VMs if the VM already knows its IPv4 address (most common
skipping to change at page 7, line 49 skipping to change at page 7, line 38
Other NVEs that have attached VMs or the NVO3 Gateway that have Other NVEs that have attached VMs or the NVO3 Gateway that have
external entities communicating with this VM may still have the external entities communicating with this VM may still have the
old ARP entry. To avoid old ARP entries being used by other NVEs, old ARP entry. To avoid old ARP entries being used by other NVEs,
the old NVE upon discovering a VM is detached should send a the old NVE upon discovering a VM is detached should send a
notification to all other NVEs and its NVO3 Gateway to time out notification to all other NVEs and its NVO3 Gateway to time out
the ARP cache for the VM [RFC8171]. When an NVE (including the old the ARP cache for the VM [RFC8171]. When an NVE (including the old
NVE) receives packet or ARP request destined towards a VM (its MAC NVE) receives packet or ARP request destined towards a VM (its MAC
or IP address) that is not in the NVE's ARP cache, the NVE should or IP address) that is not in the NVE's ARP cache, the NVE should
send query to NVA's Directory Service to get the associated NVE send query to NVA's Directory Service to get the associated NVE
address for the VM. This is how the Old NVE tunneling these in- address for the VM. This is how the old NVE tunneling these in-
flight packets to the New NVE to avoid packets loss. flight packets to the new NVE to avoid packets loss.
When VM address is IPv6, the operation is similar: When VM address is IPv6, the operation is similar:
In IPv6, after the move, the VM immediately sends an unsolicited In IPv6, after the move, the VM immediately sends an unsolicited
neighbor advertisement message containing its IPv6 address and neighbor advertisement message containing its IPv6 address and
Layer-2 MAC address to its new NVE. This message is sent to the Layer-2 MAC address to its new NVE. This message is sent to the
IPv6 Solicited Node Multicast Address corresponding to the target IPv6 Solicited Node Multicast Address corresponding to the target
address which is the VM's IPv6 address. The NVE receiving this address which is the VM's IPv6 address. The NVE receiving this
message should send request to update VM's neighbor cache entry in message should send request to update VM's neighbor cache entry in
the central directory of the NVA. The NVA's neighbor cache entry the central directory of the NVA. The NVA's neighbor cache entry
skipping to change at page 9, line 14 skipping to change at page 9, line 4
reports from an NVE of all the subnets being attached/detached, as reports from an NVE of all the subnets being attached/detached, as
described by RFC8171. described by RFC8171.
Hot VM Migration in Layer 3 involves coordination among many Hot VM Migration in Layer 3 involves coordination among many
entities, such as VM management system and NVA. Cold task entities, such as VM management system and NVA. Cold task
migration, which is a common practice in many data centers, migration, which is a common practice in many data centers,
involves the following steps: involves the following steps:
- Stop running the task. - Stop running the task.
- Package the runtime state of the job. - Package the runtime state of the job.
- Send the runtime state of the task to the New NVE where the
- Send the runtime state of the task to the new NVE where the
task is to run. task is to run.
- Instantiate the task's state on the new machine. - Instantiate the task's state on the new machine.
- Start the tasks for the task continuing from the point at which - Start the tasks for the task continuing from the point at which
it was stopped. it was stopped.
RFC7666 has the more detailed description of the State Machine of RFC7666 has the more detailed description of the State Machine of
VMs controlled by Hypervisor VMs controlled by Hypervisor
4.4. Address and Connection Management in VM Migration 4.4. Address and Connection Management in VM Migration
Since the VM attached to the New NVE needs to be assigned with the Since the VM attached to the new NVE needs to be assigned with the
same address as VM attached to the Old NVE, extra processing or same address as VM attached to the old NVE, extra processing or
configuration is needed, such as: configuration is needed, such as:
- Configure IPv4/v6 address on the target VM/NVE. - Configure IPv4/v6 address on the target VM/NVE.
- Suspend use of the address on the old NVE. This includes the - Suspend use of the address on the old NVE. This includes the
old NVE sending query to NVA upon receiving packets destined old NVE sending query to NVA upon receiving packets destined
towards the VM being moved away. If there is no response from towards the VM being moved away. If there is no response from
NVA for the new NVE for the VM, the old NVE can only drop the NVA for the new NVE for the VM, the old NVE can only drop the
packets. Referring to the VM State Machine described in packets. Referring to the VM State Machine described in
RFC7666. RFC7666.
- Trigger NVA to push the new NVE-VM mapping to other NVEs which - Trigger NVA to push the new NVE-VM mapping to other NVEs which
skipping to change at page 10, line 22 skipping to change at page 10, line 11
connection. From the time the connection is paused to the time it connection. From the time the connection is paused to the time it
is running again in the new stack, packets received for the is running again in the new stack, packets received for the
connection could be silently dropped. For some period of time, connection could be silently dropped. For some period of time,
the old stack will need to keep a record of the migrated the old stack will need to keep a record of the migrated
connection. If it receives a packet, it can either silently drop connection. If it receives a packet, it can either silently drop
the packet or forward it to the new location, as described in the packet or forward it to the new location, as described in
Section 5. Section 5.
5. Handling Packets in Flight 5. Handling Packets in Flight
The Old NVE may receive packets from the VM's ongoing The old NVE may receive packets from the VM's ongoing
communications. These packets should not be lost; they should be communications. These packets should not be lost; they should be
sent to the New NVE to be delivered to the VM. The steps involved sent to the new NVE to be delivered to the VM. The steps involved
in handling packets in flight are as follows: in handling packets in flight are as follows:
Preparation Step: It takes some time, possibly a few seconds for Preparation Step: It takes some time, possibly a few seconds for
a VM to move from its Old NVE to a New NVE. During this period, a a VM to move from its old NVE to a new NVE. During this period, a
tunnel needs to be established so that the Old NVE can forward tunnel needs to be established so that the old NVE can forward
packets to the New NVE. Old NVE gets New NVE address from its NVA packets to the new NVE. The old NVE gets the new NVE address from
assuming that the NVA gets the notification when a VM is moved its NVA assuming that the NVA gets the notification when a VM is
from one NVE to another. It is out of the scope of this document moved from one NVE to another. It is out of the scope of this
on which entity manages the VM move and how NVA gets notified of document on which entity manages the VM move and how NVA gets
the move. The Old NVE can store the New NVE address for the VM notified of the move. The old NVE can store the new NVE address
with a timer. When the timer expired, the entry for the New NVE for the VM with a timer. When the timer expired, the entry for the
for the VM can be deleted. new NVE for the VM can be deleted.
Tunnel Establishment - IPv6: Inflight packets are tunneled to the Tunnel Establishment - IPv6: Inflight packets are tunneled to the
New NVE using the encapsulation protocol such as VXLAN in IPv6. new NVE using the encapsulation protocol such as VXLAN in IPv6.
Tunnel Establishment - IPv4: Inflight packets are tunneled to the Tunnel Establishment - IPv4: Inflight packets are tunneled to the
New NVE using the encapsulation protocol such as VXLAN in IPv4. new NVE using the encapsulation protocol such as VXLAN in IPv4.
Tunneling Packets - IPv6: IPv6 packets received for the migrating Tunneling Packets - IPv6: IPv6 packets received for the migrating
VM are encapsulated in an IPv6 header at the Old NVE. New NVE VM are encapsulated in an IPv6 header at the old NVE. The new NVE
decapsulates the packet and sends IPv6 packet to the migrating VM. decapsulates the packet and sends IPv6 packet to the migrating VM.
Tunneling Packets - IPv4: IPv4 packets received for the migrating Tunneling Packets - IPv4: IPv4 packets received for the migrating
VM are encapsulated in an IPv4 header at the Old NVE. New NVE VM are encapsulated in an IPv4 header at the old NVE. The new NVE
decapsulates the packet and sends IPv4 packet to the migrating VM. decapsulates the packet and sends IPv4 packet to the migrating VM.
Stop Tunneling Packets: When the Timer for storing the New NVE Stop Tunneling Packets: When the Timer for storing the new NVE
address for the VM expires. The Timer should be long enough for address for the VM expires. The Timer should be long enough for
all other NVEs that need to communicate with the VM to get their all other NVEs that need to communicate with the VM to get their
NVE-VM cache entries updated. NVE-VM cache entries updated.
6. Moving Local State of VM 6. Moving Local State of VM
In addition to the VM mobility related signaling (VM Mobility In addition to the VM mobility related signaling (VM Mobility
Registration Request/Reply), the VM state needs to be transferred Registration Request/Reply), the VM state needs to be transferred
to the New NVE. The state includes its memory and file system if to the new NVE. The state includes its memory and file system if
the VM cannot access the memory and the file system after moving the VM cannot access the memory and the file system after moving
to the New NVE. to the new NVE.
The mechanism of transferring VM States and file system is out of The mechanism of transferring VM States and file system is out of
the scope of this document. Referring to RFC7666 for detailed the scope of this document. Referring to RFC7666 for detailed
information. information.
7. Handling of Hot, Warm and Cold VM Mobility 7. Handling of Hot, Warm and Cold VM Mobility
Both Cold and Warm VM mobility (or migration) refers to the VM Both Cold and Warm VM mobility (or migration) refer to the
being completely shut down at the Old NVE before restarted at the complete shut down of the VM at the old NVE before restarting the
New NVE. Therefore, all transport services to the VM are VM at the new NVE. Therefore, all transport services to the VM
restarted. need to be restarted.
In this document, all VM mobility is initiated by VM Management In this document, all VM mobility is initiated by VM Management
System. The Cold VM mobility only exchange the needed states System. In case of Cold VM mobility, the exchange of states
between the Old NVE and the New NVE after the VM attached to the between the old NVE and the new NVE occurs after the VM attached
Old NVE is completely shut down. There is time delay before the to the old NVE is completely shut down. There is a time delay
new VM is launched. The cold mobility option can be used for non- before the new VM is launched. The cold mobility option can be
critical applications and services that can tolerate interrupted used for non-mission-critical applications and services that can
TCP connections. tolerate interruptions of TCP connections.
The Warm VM mobility refers to having the functional components For Hot VM Mobility, a VM moving to a new NVE does not change its
under the new NVE to receive running status of the VM at frequent IP address and the service running on the VM is not interrupted.
intervals, so that it can take less time to launch the VM under The VM needs to send a gratuitous Address Resolution message or
the new NVE and other NVEs that communicate with the VM can be unsolicited Neighbor Advertisement message upstream after each
notified of the VM move more promptly. The duration of the move.
interval determines the effectiveness (or benefit) of Warm VM
mobility. The larger the duration, the less effective the Warm VM
mobility option becomes.
For Hot VM Mobility, once a VM moves to a New NVE, the VM IP In case of Warm VM mobility, the functional components of the
address does not change and the VM should be able to continue to new NVE receive the running status of the VM at frequent
receive packets to its address(es). The VM needs to send a intervals., Consequently it takes less time to launch the VM
gratuitous Address Resolution message or unsolicited Neighbor under the new NVE.Other NVEs that communicate with the VM can be
Advertisement message upstream after each move. notified promptly about the VM migration c. The duration of the
time interval determines the effectiveness (or benefit) of Warm VM
mobility. The larger the time duration, the less effective the
Warm VM mobility becomes.
Upon starting at the New NVE, the VM should send an ARP or In case of Cold VM mobility, the VM on the old NVE is completely
Neighbor Discovery message. Cold VM mobility also allows the Old shut down and the VM is launched on the new NVE. To minimize the
NVE and all communicating NVEs to time out ARP/neighbor cache chance of the previously communicating NVEs sending packets to the
entries of the VM. It is necessary for the NVA to push the old NVE, the NVA should push the updated ARP/neighbor cache entry
updated ARP/neighbor cache entry to NVEs or for NVEs to pull the to all previously communicating NVEs when the VM is started on the
updated ARP/neighbor cache entry from NVA. new NVE. Alternatively, all NVEs can periodically pull the updated
ARP/neighbor cache entry from the NVA to shorten the time span
that packets are sent to the old NVE.
Upon starting at the new NVE, the VM should send an ARP or
Neighbor Discovery message.
8. Other Options 8. Other Options
VM Hot mobility is to enable uninterrupted running of the Hot, Warm and Cold mobility are planned activities which are
application or workload instantiated on the VM when the VM running managed by VM management system.
conditions changes, such as utilization overload, hardware running
condition changes, or others. Hot, Warm and Cold mobility are
planned activities which are managed by VM management system.
For unexpected events, such as unexpected failure, a VM might need For unexpected events, such as overloads and failure, a VM might
to move to a new NVE, which is called Hot VM Failover in this need to move to a new NVE without any service interruption, and
document. For Hot VM Failover, there are redundant primary and this is called Hot VM Failover in this document. In such case,
secondary VMs whose states are synchronized by means that are there are redundant primary and secondary VMs whose states are
outside the scope of this draft. If the VM in the primary NVE continuously synchronized by using methods that are outside the
fails, there is no need to actively move the VM to the secondary scope of this draft. If the VM in the primary NVE fails, there is
NVE because the VM in the secondary NVE can immediately pick up no need to actively move the VM to the secondary NVE because the
the processing. VM in the secondary NVE can immediately pick up and continue
processing the applications/services.
The VM Failover to the new NVE is transparent to the peers that The Hot VM Failover is transparent to the peers that communicate
communicate with this VM. This can be achieved by both active VM with this VM. This can be achieved via distributed load balancing
and standby VM share the same TCP port and same IP address, and when both active VM and standby VM share the same TCP port and
using distributed load balancing functionality that controls which same IP address, . In the absence of a failure, the new VM can
VM responds to each service request. In the absence of a failure, pick up providing service while the sender (peer) continues to
the new VM can pick up providing service while the sender (peer) receive Ack from the old VM. If the situation (loading condition
still continues to receive Ack from the old VM. If the situation of the primary responding VM) changes the secondary responding VM
(loading condition of the primary responding VM) changes the may start providing service to the sender (peers). When a failure
secondary responding VM may start providing service to the sender occurs, the sender (peer) may have to retry the request, so this
(peers). On failure, the sender (peer) may have to retry the structure is limited to requests that can be safely retried.
request, so this structure is limited to requests that can be
safely retried.
If load balancing functionality is not used, the VM Failover can If the load balancing functionality is not used, the Hot VM
be made transparent to the sender (peers) without relying on Failover can be made transparent to the sender (peers) without
request retry by using techniques described in section 4 that do relying on request retry and by using the techniques that are
not depend on the primary VM or its associated NVE doing anything described in section 4. This does not depend on the primary VM or
after the failure. This restriction is necessary because a its associated NVE doing anything after the failure. This
failure that affects the primary VM may also cause its associated restriction is necessary because a failure that affects the
NVE to fail (e.g., if the NVE is located in the hypervisor primary VM may also cause its associated NVE to fail. For example,
that hosts the primary VM and the underlying physical server a physical server failure can cause the VM and its NVE to fail.
fails, both the primary VM and the hypervisor that contains the
NVE fail as a consequence).
The Hot VM Failover option is the costliest mechanism, and hence The Hot VM Failover option is the costliest mechanism, and hence
this option is utilized only for mission-critical applications and this option is utilized only for mission-critical applications and
services. services.
9. VM Lifecycle Management 9. VM Lifecycle Management
The VM lifecycle management is a complicated task, which is beyond The VM lifecycle management is a complicated task, which is beyond
the scope of this document. Not only it involves monitoring server the scope of this document. Not only it involves monitoring server
utilization, balanced distribution of workload, etc., but also utilization, balancing the distribution of workload, etc., but
needs to manage seamlessly VM migration from one server to also needs to support seamless migration of VM from one
another. server to another.
10. Security Considerations 10. Security Considerations
Security threats for the data and control plane for overlay Security threats for the data and control plane for overlay
networks are discussed in [RFC8014]. ARP (IPv4) and ND (IPv6) are networks are discussed in [RFC8014]. ARP (IPv4) and ND (IPv6) are
not secure, especially if they can be sent gratuitously across not secure, especially if they can be sent gratuitously across
tenant boundaries in a multi-tenant environment. tenant boundaries in a multi-tenant environment.
In overlay data center networks, ARP and ND messages can be used In overlay data center networks, ARP and ND messages can be used
to mount address spoofing attacks from untrusted VMs and/or other to mount address spoofing attacks from untrusted VMs and/or other
untrusted sources. Examples of untrusted VMs include running third untrusted sources. Examples of untrusted VMs are the VMs
party applications (i.e., applications not written by the tenant instantiated with the third party applications that are not
who controls the VM). Those untrusted VMs can send falsified ARP written by the tenant of the VMs. Those untrusted VMs can send
(IPv4) and ND (IPv6) messages, causing NVE, NVO3 Gateway, and NVA false ARP (IPv4) and ND (IPv6) messages, causing significant
to be overwhelmed and not able to perform legitimate functions. overloads in NVEs, NVO3 Gateways, and NVAs. The attacker can
The attacker can intercept, modify, or even stop data in-transit intercept, modify, or even stop data in-transit ARP/ND messages
ARP/ND messages intended for other VNs and initiate DDOS attacks intended for other VNs and initiate DDOS attacks to other VMs
to other VMs attached to the same NVE. A simple black-hole attacks attached to the same NVE. A simple black-hole attacks can be
can be mounted by sending a falsified ARP/ND message to indicate mounted by sending a false ARP/ND message to indicate that the
that the victim's IP address has moved to the attacker's VM. That victim's IP address has moved to the attacker's VM. That
technique can also be used to mount man-in-the-middle attacks with technique can also be used to mount man-in-the-middle attacks.
some more effort to ensure that the intercepted traffic is Additional effort are required to ensure that the intercepted
eventually delivered to the victim. traffic can be eventually delivered to the impacted VMs.
The locator-identifier mechanism given as an example (ILA) doesn't The locator-identifier mechanism given as an example (ILA) doesn't
include secure binding. It doesn't discuss how to securely bind include secure binding. It does not discuss how to securely bind
the new locator to the identifier. the new locator to the identifier.
Because of those threats, VM management system needs to apply Because of those threats, VM management system needs to apply
stronger security mechanisms when add a VM to an NVE. Some tenants stronger security mechanisms when adding a VM to an NVE. Some
may have requirement that prohibit their VMs to be co-attached to tenants may have requirements that prohibit their VMs to be co-
the NVEs with other tenants. Some Data Centers deploy additional attached to the NVEs with other tenants. Some Data Centers deploy
functionality in their NVO3 Gateways for mitigation of ARP/ND additional functionality in their NVO3 Gateways to mitigate the
threats, e.g., periodically sending each Gateway's ARP/ND cache ARP/ND threats. These may include periodically sending each
contents to the NVA or other central control system to check for Gateway's ARP/ND cache contents to the NVA or other central
ARP/ND cache entries that are not consistent with the locations of control system. The objective is to identify the ARP/ND cache
VMs and their IP addresses indicated by the VM Management System. entries that are not consistent with the locations of VMs and
their IP addresses indicated by the VM Management System.
11. IANA Considerations 11. IANA Considerations
This document makes no request to IANA. This document makes no request to IANA.
12. Acknowledgments 12. Acknowledgments
The authors are grateful to Bob Briscoe, David Black, Dave R. The authors are grateful to Bob Briscoe, David Black, Dave R.
Worley, Qiang Zu, Andrew Malis for helpful comments. Worley, Qiang Zu, Andrew Malis for helpful comments.
skipping to change at page 14, line 30 skipping to change at page 14, line 27
. submitted version -01 with these changes: references are updated, . submitted version -01 with these changes: references are updated,
o added packets in flight definition to Section 2 o added packets in flight definition to Section 2
. submitted version -02 with updated address. . submitted version -02 with updated address.
. submitted version -03 to fix the nits. . submitted version -03 to fix the nits.
. submitted version -04 in reference to the WG Last call comments. . submitted version -04 in reference to the WG Last call comments.
. Submitted version - 05, 06, 07, 08, 09, 10, 11, 12 to address IETF . Submitted version - 05, 06, 07, 08, 09, 10, 11, 12, 13, 14 to
LC comments from TSV area. address IETF LC comments from TSV area.
14. References 14. References
14.1. Normative References 14.1. Normative References
[RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or
Converting Network Protocol Addresses to 48.bit Ethernet Converting Network Protocol Addresses to 48.bit Ethernet
Address for Transmission on Ethernet Hardware", STD 37, Address for Transmission on Ethernet Hardware", STD 37,
RFC 826, DOI 10.17487/RFC0826, November 1982, RFC 826, DOI 10.17487/RFC0826, November 1982,
<https://www.rfc-editor.org/info/rfc826>. <https://www.rfc-editor.org/info/rfc826>.
[RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A [RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A
Reverse Address Resolution Protocol", STD 38, RFC 903, Reverse Address Resolution Protocol", STD 38, RFC 903,
skipping to change at page 17, line 16 skipping to change at page 17, line 16
Linda Dunbar Linda Dunbar
Futurewei Futurewei
Email: ldunbar@futurewei.com Email: ldunbar@futurewei.com
Behcet Sarikaya Behcet Sarikaya
Denpel Informatique Denpel Informatique
Email: sarikaya@ieee.org Email: sarikaya@ieee.org
Bhumip Khasnabish Bhumip Khasnabish
Independent Info.: https://about.me/bhumip
Email: vumip1@gmail.com Email: vumip1@gmail.com
Tom Herbert Tom Herbert
Intel Intel
Email: tom@herbertland.com Email: tom@herbertland.com
Saumya Dikshit Saumya Dikshit
Aruba-HPE Aruba-HPE
Bangalore, India Bangalore, India
Email: saumya.dikshit@hpe.com Email: saumya.dikshit@hpe.com
 End of changes. 59 change blocks. 
206 lines changed or deleted 200 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/