draft-ietf-nvo3-vmm-15.txt   draft-ietf-nvo3-vmm-16.txt 
Network Working Group L. Dunbar Network Working Group L. Dunbar
Internet Draft Futurewei Internet Draft Futurewei
Intended status: Informational B. Sarikaya Intended status: Informational B. Sarikaya
Expires: December 15, 2020 Denpel Informatique Expires: December 17, 2020 Denpel Informatique
B.Khasnabish B.Khasnabish
Independent Independent
T. Herbert T. Herbert
Intel Intel
S. Dikshit S. Dikshit
Aruba-HPE Aruba-HPE
June 15, 2020 June 17, 2020
Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks Virtual Machine Mobility Solutions for L2 and L3 Overlay
draft-ietf-nvo3-vmm-15 Networks draft-ietf-nvo3-vmm-16
Abstract Abstract
This document describes virtual machine (VM) mobility solutions This document describes virtual machine (VM) mobility
commonly used in data centers built with an overlay network. This solutions commonly used in data centers built with an overlay
document is intended for describing the solutions and the impact of network. This document is intended for describing the
moving VMs, or applications, from one rack to another connected by solutions and the impact of moving VMs, or applications, from
the overlay network. one rack to another connected by the overlay network.
For layer 2, it is based on using an NVA (Network Virtualization For layer 2, it is based on using an NVA (Network
Authority) to NVE (Network Virtualization Edge) protocol to update Virtualization Authority) to NVE (Network Virtualization
ARP (Address Resolution Protocol) tables or neighbor cache entries Edge) protocol to update ARP (Address Resolution Protocol)
after a VM moves from an old NVE to a new NVE. For Layer 3, it is tables or neighbor cache entries after a VM moves from an old
based on address and connection migration after the move. NVE to a new NVE. For Layer 3, it is based on address and
connection migration after the move.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. This document may not be modified, provisions of BCP 78 and BCP 79. This document may not be
and derivative works of it may not be created, except to publish it modified, and derivative works of it may not be created,
as an RFC and to translate it into languages other than English. except to publish it as an RFC and to translate it into
languages other than English.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet
Task Force (IETF), its areas, and its working groups. Note that Engineering Task Force (IETF), its areas, and its working
other groups may also distribute working documents as Internet- groups. Note that other groups may also distribute working
Drafts. documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six Internet-Drafts are draft documents valid for a maximum of
months and may be updated, replaced, or obsoleted by other documents six months and may be updated, replaced, or obsoleted by
at any time. It is inappropriate to use Internet-Drafts as other documents at any time. It is inappropriate to use
reference material or to cite them other than as "work in progress." Internet-Drafts as reference material or to cite them other
than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed
http://www.ietf.org/shadow.html at http://www.ietf.org/shadow.html
This Internet-Draft will expire on December 10, 2020. This Internet-Draft will expire on December 17, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as
document authors. All rights reserved. the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date
publication of this document. Please review these documents of publication of this document. Please review these
carefully, as they describe your rights and restrictions with documents carefully, as they describe your rights and
respect to this document. Code Components extracted from this restrictions with respect to this document. Code Components
document must include Simplified BSD License text as described in extracted from this document must include Simplified BSD
Section 4.e of the Trust Legal Provisions and are provided without License text as described in Section 4.e of the Trust Legal
warranty as described in the Simplified BSD License. Provisions and are provided without warranty as described in
the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction...................................................3 1. Introduction................................................ 3
2. Conventions used in this document..............................4 2. Conventions used in this document........................... 4
3. Requirements...................................................5 3. Requirements................................................ 5
4. Overview of the VM Mobility Solutions..........................6 4. Overview of the VM Mobility Solutions....................... 6
4.1. Inter-VN and External Communication.......................6 4.1. Inter-VN and External Communication.................... 6
4.2. VM Migration in a Layer 2 Network.........................6 4.2. VM Migration in a Layer 2 Network...................... 7
4.3. VM Migration in Layer-3 Network...........................8 4.3. VM Migration in Layer-3 Network........................ 8
4.4. Address and Connection Management in VM Migration.........9 4.4. Address and Connection Management in VM Migration...... 9
5. Handling Packets in Flight................................. 10
5. Handling Packets in Flight....................................10 6. Moving Local State of VM................................... 11
6. Moving Local State of VM......................................10 7. Handling of Hot, Warm and Cold VM Mobility................. 12
7. Handling of Hot, Warm and Cold VM Mobility....................11 8. Other Options.............................................. 13
8. Other Options.................................................12 9. VM Lifecycle Management.................................... 13
9. VM Lifecycle Management.......................................13 10. Security Considerations................................... 14
10. Security Considerations......................................13 11. IANA Considerations....................................... 15
11. IANA Considerations..........................................14 12. Acknowledgments........................................... 15
12. Acknowledgments..............................................14 13. References................................................ 15
13. Change Log...................................................14 13.1. Normative References................................. 15
14. References...................................................14 13.2. Informative References............................... 16
14.1. Normative References....................................14
14.2. Informative References..................................16
1. Introduction 1. Introduction
This document describes the overlay-based data center network This document describes the overlay-based data center
solutions in support of multitenancy and VM mobility. Being able network solutions in support of multitenancy and VM
to move VMs dynamically, from one server to another, makes it mobility. Being able to move VMs dynamically, from one
possible for dynamic load balancing or work distribution. server to another, makes it possible for dynamic load
Therefore, dynamic VM Mobility is highly desirable for large scale balancing or work distribution. Therefore, dynamic VM
multi-tenant DCs. Mobility is highly desirable for large scale multi-tenant
This document is strictly within the DCVPN, as defined by the NVO3 DCs.
Framework [RFC 7365]. The intent is to describe Layer 2 and Layer This document is strictly within the DCVPN, as defined by
3 Network behavior when VMs are moved from one NVE to another. the NVO3 Framework [RFC7365]. The intent is to describe
This document assumes that the VM's move is initiated by the VM Layer 2 and Layer 3 Network behavior when VMs are moved
management system, i.e. planed move. How and when to move VMs is from one NVE to another. This document assumes that the
out of the scope of this document. RFC7666 already has the VM's move is initiated by the VM management system, i.e.
description of the MIB for VMs controlled by Hypervisor. The planed move. How and when to move VMs is out of the scope
impact of VM mobility on higher layer protocols and applications of this document. RFC7666 already has the description of
is outside its scope. the MIB for VMs controlled by Hypervisor. The impact of VM
Many large DCs (Data Centers), especially Cloud DCs, host tasks mobility on higher layer protocols and applications is
(or workloads) for multiple tenants. A tenant can be an outside its scope.
organization or a department of an organization. There are Many large DCs (Data Centers), especially Cloud DCs, host
communications among tasks belonging to one tenant and tasks (or workloads) for multiple tenants. A tenant can be
communications among tasks belonging to different tenants or with an organization or a department of an organization. There
external entities. are communications among tasks belonging to one tenant and
communications among tasks belonging to different tenants
or with external entities.
Server Virtualization, which is being used in almost all of Server Virtualization, which is being used in almost all of
today's data centers, enables many VMs to run on a single physical today's data centers, enables many VMs to run on a single
computer or server sharing the processor/memory/storage. Network physical computer or server sharing the
connectivity among VMs is provided by the network virtualization processor/memory/storage. Network connectivity among VMs
edge (NVE) [RFC8014]. It is highly desirable [RFC7364] to allow is provided by the network virtualization edge (NVE)
VMs to be moved dynamically (live, hot, or cold move) from one [RFC8014]. It is highly desirable [RFC7364] to allow VMs
server to another for dynamic load balancing or optimized work to be moved dynamically (live, hot, or cold move) from one
distribution. server to another for dynamic load balancing or optimized
work distribution.
There are many challenges and requirements related to VM mobility There are many challenges and requirements related to VM
in large data centers, including dynamic attachment and detachment mobility in large data centers, including dynamic
of VMs to/from Virtual Network Edges (VNEs). In addition, attachment and detachment of VMs to/from Virtual Network
retaining IP addresses after a move is a key requirement Edges (VNEs). In addition, retaining IP addresses after a
[RFC7364]. Such a requirement is needed in order to maintain move is a key requirement [RFC7364]. Such a requirement is
existing transport layer connections. needed in order to maintain existing transport layer
In traditional Layer-3 based networks, retaining IP addresses connections.
after a move is generally not recommended because frequent moves In traditional Layer-3 based networks, retaining IP
will cause fragmented IP addresses, which introduces complexity in addresses after a move is generally not recommended because
IP address management. frequent moves will cause fragmented IP addresses, which
In view of the many VM mobility schemes that exist today, there is introduces complexity in IP address management.
a desire to document comprehensive VM mobility solutions that In view of the many VM mobility schemes that exist today,
cover both IPv4 and IPv6. The large Data Center networks can be there is a desire to document comprehensive VM mobility
organized as one large Layer-2 network geographically distributed solutions that cover both IPv4 and IPv6. The large Data
in several buildings/cities or Layer-3 networks with large number Center networks can be organized as one large Layer-2
of host routes that cannot be aggregated as the result of frequent network geographically distributed in several
moves from one location to another without changing their IP buildings/cities or Layer-3 networks with large number of
addresses. The connectivity between Layer 2 boundaries can be host routes that cannot be aggregated as the result of
achieved by the NVE functioning as a Layer 3 gateway router frequent moves from one location to another without
across bridging domains. changing their IP addresses. The connectivity between
Layer 2 boundaries can be achieved by the NVE functioning
as a Layer 3 gateway router across bridging domains.
2. Conventions used in this document 2. Conventions used in this document
This document uses the terminology defined in [RFC7364]. In This document uses the terminology defined in [RFC7364].
addition, we make the following definitions: In addition, we make the following definitions:
VM: Virtual Machine VM: Virtual Machine
Task: A task is a program instantiated or running on a VM or a Task: A task is a program instantiated or running on a
container. Tasks running in VMs or containers can be VM or a container. Tasks running in VMs or
migrated from one server to another. We use task, containers can be migrated from one server to
workload and VM interchangeably in this document. another. We use task, workload and VM
interchangeably in this document.
Hot VM Mobility: A given VM could be moved from one server to Hot VM Mobility: A given VM could be moved from one server
another in a running state without terminating the VM. to another in a running state without terminating
the VM.
Warm VM Mobility: In case of warm VM mobility, the VM states are Warm VM Mobility: In case of warm VM mobility, the VM
mirrored to the secondary server (or domain) at states are mirrored to the secondary server (or
predefined regular intervals. This reduces the domain) at predefined regular intervals. This
overheads and complexity, but this may also lead to a reduces the overheads and complexity, but this
situation when both servers may not contain the exact may also lead to a situation when both servers
same data (state information) may not contain the exact same data (state
information)
Cold VM Mobility: A given VM could be moved from one server to Cold VM Mobility: A given VM could be moved from one
another in stopped or suspended state. server to another in stopped or suspended state.
Old NVE: refers to the old NVE where packets were forwarded to Old NVE: refers to the old NVE where packets were
before migration. forwarded to before migration.
New NVE: refers to the new NVE after migration. New NVE: refers to the new NVE after migration.
Packets in flight: refers to the packets received by the old NVE Packets in flight: refers to the packets received by the
sent by the correspondents that have old ARP or neighbor old NVE sent by the correspondents that have old
cache entry before VM or task migration. ARP or neighbor cache entry before VM or task
migration.
Users of VMs in diskless systems or systems not using Users of VMs in diskless systems or systems not using
configuration files are called end user clients. configuration files are called end user clients.
Cloud DC: Third party data centers that host applications, Cloud DC: Third party data centers that host
tasks or workloads owned by different organizations or applications, tasks or workloads owned by
tenants. different organizations or tenants.
3. Requirements 3. Requirements
This section states requirements on data center network VM mobility. This section states requirements on data center network VM
mobility.
- Data center network should support both IPv4 and IPv6 VM mobility. - Data center network should support both IPv4 and IPv6 VM
- VM mobility should not require changing an VM's IP address(es) after mobility.
the move. - VM mobility should not require changing an VM's IP
- "Hot Migration" requires the transport service continuity across the address(es) after the move.
move, while in "Cold Migration" the transport service is restarted, - "Hot Migration" requires the transport service continuity
i.e. the task is stopped on the old NVE, is moved to the new NVE and across the move, while in "Cold Migration" the transport
then restarted. Not all DCs support "Hot Migration. DCs that only service is restarted, i.e. the task is stopped on the old
support Cold Migration should make their customers aware of the NVE, is moved to the new NVE and then restarted. Not all DCs
potential service interruption during a Cold Migration. support "Hot Migration. DCs that only support Cold Migration
- VM mobility solutions/procedures should minimize triangular routing should make their customers aware of the potential service
except for handling packets in flight. interruption during a Cold Migration.
- VM mobility solutions/procedures should not need to use tunneling - VM mobility solutions/procedures should minimize triangular
except for handling packets in flight. routing except for handling packets in flight.
- VM mobility solutions/procedures should not need to use
tunneling except for handling packets in flight.
4. Overview of the VM Mobility Solutions 4. Overview of the VM Mobility Solutions
4.1. Inter-VN and External Communication 4.1. Inter-VN and External Communication
Inter VN (Virtual Network) communication refers to communication Inter VN (Virtual Network) communication refers to
among tenants (or hosts) belonging to different VNs. Those tenants communication among tenants (or hosts) belonging to
can be attached to the NVEs co-located in the same Data Center or different VNs. Those tenants can be attached to the NVEs
in different Data centers. When a VM communicates with an external co-located in the same Data Center or in different Data
entity, the VM is effectively communicating with a peer in a centers. When a VM communicates with an external entity,
the VM is effectively communicating with a peer in a
different network or a globally reachable host. different network or a globally reachable host.
This document assumes that the inter-VNs communication and This document assumes that the inter-VNs communication and
the communication with external entities are via NVO3 Gateway the communication with external entities are via NVO3
functionality as described in Section 5.3 of RFC 8014 Gateway functionality as described in Section 5.3 of RFC
[RFC8014]. NVO3 Gateways relay traffic onto and off of a virtual 8014 [RFC8014]. NVO3 Gateways relay traffic onto and off of
network, enabling communication both across different VNs and with a virtual network, enabling communication both across
external entities. different VNs and with external entities.
NVO3 Gateway functionality enforces appropriate policies to NVO3 Gateway functionality enforces appropriate policies to
control communication among VNs and with external entities (e.g., control communication among VNs and with external entities
hosts). (e.g., hosts).
Moving a VM to a new NVE may move the VM away from the NVO3 Moving a VM to a new NVE may move the VM away from the NVO3
Gateway(s) used by the VM's traffic, e.g., some traffic may be Gateway(s) used by the VM's traffic, e.g., some traffic may
better handled by an NVO3 Gateway that is closer to the new NVE be better handled by an NVO3 Gateway that is closer to the
than the NVO3 Gateway that was used before the VM move. If NVO3 new NVE than the NVO3 Gateway that was used before the VM
Gateway changes are not possible for some reason, then the VM's move. If NVO3 Gateway changes are not possible for some
traffic can continue to use the prior NVO3 Gateway(s), which may reason, then the VM's traffic can continue to use the prior
have some drawbacks, e.g., longer network paths. NVO3 Gateway(s), which may have some drawbacks, e.g.,
longer network paths.
4.2. VM Migration in a Layer 2 Network 4.2. VM Migration in a Layer 2 Network
In a Layer-2 based approach, a VM moving to another NVE does not In a Layer-2 based approach, a VM moving to another NVE
change its IP address. But this VM is now under a new NVE, does not change its IP address. But this VM is now under a
previously communicating NVEs may continue sending their packets new NVE, previously communicating NVEs may continue sending
to the old NVE. Therefore, the previously communicating NVEs need their packets to the old NVE. Therefore, the previously
to promptly update their Address Resolution Protocol (ARP) caches communicating NVEs need to promptly update their Address
of IPv4 [RFC0826] or neighbor caches of IPv6 [RFC4861] . If the VM Resolution Protocol (ARP) caches of IPv4 [RFC826] or
being moved has communication with external entities, the NVO3 neighbor caches of IPv6 [RFC4861]. If the VM being moved
gateway needs to be notified of the new NVE where the VM is moved has communication with external entities, the NVO3 gateway
needs to be notified of the new NVE where the VM is moved
to. to.
In IPv4, the VM immediately after the move should send a In IPv4, the VM immediately after the move should send a
gratuitous ARP request message containing its IPv4 and Layer 2 MAC gratuitous ARP request message containing its IPv4 and
address in its new NVE. Upon receiving this message, the new NVE Layer 2 MAC address in its new NVE. Upon receiving this
can update its ARP cache. The new NVE should send a notification message, the new NVE can update its ARP cache. The new NVE
of the newly attached VM to the central directory [RFC7067] should send a notification of the newly attached VM to the
embedded in the NVA to update the mapping of the IPv4 address & central directory [RFC7067] embedded in the NVA to update
MAC address of the moving VM along with the new NVE address. An the mapping of the IPv4 address & MAC address of the moving
NVE-to-NVA protocol is used for this purpose [RFC8014]. The old VM along with the new NVE address. An NVE-to-NVA protocol
NVE, upon a VM is moved away, should send an ARP scan to all its is used for this purpose [RFC8014]. The old NVE, upon a VM
attached VMs to refresh its ARP Cache. is moved away, should send an ARP scan to all its attached
VMs to refresh its ARP Cache.
Reverse ARP (RARP) which enables the host to discover its IPv4 Reverse ARP (RARP) which enables the host to discover its
address when it boots from a local server [RFC0903], is not used IPv4 address when it boots from a local server [RFC903], is
by VMs if the VM already knows its IPv4 address (most common not used by VMs if the VM already knows its IPv4 address
scenario). Next, we describe a case where RARP is used. (most common scenario). Next, we describe a case where RARP
is used.
There are some vendor deployments (diskless systems or systems There are some vendor deployments (diskless systems or
without configuration files) wherein the VM's user, i.e. end-user systems without configuration files) wherein the VM's user,
client askes for the same MAC address upon migration. This can be i.e. end-user client askes for the same MAC address upon
achieved by the clients sending RARP request message which carries migration. This can be achieved by the clients sending
the MAC address looking for an IP address allocation. The server, RARP request message which carries the MAC address looking
in this case the new NVE needs to communicate with NVA, just like for an IP address allocation. The server, in this case the
in the gratuitous ARP case to ensure that the same IPv4 address is new NVE needs to communicate with NVA, just like in the
assigned to the VM. NVA uses the MAC address as the key in the gratuitous ARP case to ensure that the same IPv4 address is
search of ARP cache to find the IP address and informs this to the assigned to the VM. NVA uses the MAC address as the key in
new NVE which in turn sends RARP reply message. This completes IP the search of ARP cache to find the IP address and informs
address assignment to the migrating VM. this to the new NVE which in turn sends RARP reply message.
This completes IP address assignment to the migrating VM.
Other NVEs that have attached VMs or the NVO3 Gateway that have Other NVEs that have attached VMs or the NVO3 Gateway that
external entities communicating with this VM may still have the have external entities communicating with this VM may still
old ARP entry. To avoid old ARP entries being used by other NVEs, have the old ARP entry. To avoid old ARP entries being used
the old NVE upon discovering a VM is detached should send a by other NVEs, the old NVE upon discovering a VM is
notification to all other NVEs and its NVO3 Gateway to time out detached should send a notification to all other NVEs and
the ARP cache for the VM [RFC8171]. When an NVE (including the old its NVO3 Gateway to time out the ARP cache for the VM
NVE) receives packet or ARP request destined towards a VM (its MAC [RFC8171]. When an NVE (including the old NVE) receives
or IP address) that is not in the NVE's ARP cache, the NVE should packet or ARP request destined towards a VM (its MAC or IP
send query to NVA's Directory Service to get the associated NVE address) that is not in the NVE's ARP cache, the NVE should
address for the VM. This is how the old NVE tunneling these in- send query to NVA's Directory Service to get the associated
flight packets to the new NVE to avoid packets loss. NVE address for the VM. This is how the old NVE tunneling
these in-flight packets to the new NVE to avoid packets
loss.
When VM address is IPv6, the operation is similar: When VM address is IPv6, the operation is similar:
In IPv6, after the move, the VM immediately sends an unsolicited In IPv6, after the move, the VM immediately sends an
neighbor advertisement message containing its IPv6 address and unsolicited neighbor advertisement message containing its
Layer-2 MAC address to its new NVE. This message is sent to the IPv6 address and Layer-2 MAC address to its new NVE. This
IPv6 Solicited Node Multicast Address corresponding to the target message is sent to the IPv6 Solicited Node Multicast
address which is the VM's IPv6 address. The NVE receiving this Address corresponding to the target address which is the
message should send request to update VM's neighbor cache entry in VM's IPv6 address. The NVE receiving this message should
the central directory of the NVA. The NVA's neighbor cache entry send request to update VM's neighbor cache entry in the
should include IPv6 address of the VM, MAC address of the VM and central directory of the NVA. The NVA's neighbor cache
the NVE IPv6 address. An NVE-to-NVA protocol is used for this entry should include IPv6 address of the VM, MAC address of
purpose [RFC8014]. the VM and the NVE IPv6 address. An NVE-to-NVA protocol is
used for this purpose [RFC8014].
To avoid other NVEs communicating with this VM using the old To avoid other NVEs communicating with this VM using the
neighbor cache entry, the old NVE upon discovering a VM being old neighbor cache entry, the old NVE upon discovering a VM
moved or VM management system which initiates the VM move should being moved or VM management system which initiates the VM
send a notification to all NVEs to timeout the ND cache for the VM move should send a notification to all NVEs to timeout the
being moved. When a ND cache entry for those VMs times out, their ND cache for the VM being moved. When a ND cache entry for
corresponding NVEs should send query to the NVA for an update. those VMs times out, their corresponding NVEs should send
query to the NVA for an update.
4.3. VM Migration in Layer-3 Network 4.3. VM Migration in Layer-3 Network
Traditional Layer-3 based data center networks usually have all Traditional Layer-3 based data center networks usually have
hosts (tasks) within one subnet attached to one NVE. By this all hosts (tasks) within one subnet attached to one NVE. By
design, the NVE becomes the default route for all hosts (tasks) this design, the NVE becomes the default route for all
within the subnet. But this design requires IP address of a host hosts (tasks) within the subnet. But this design requires
(task) to change after the move to comply with the prefixes of the IP address of a host (task) to change after the move to
IP address under the new NVE. comply with the prefixes of the IP address under the new
NVE.
A VM migration in Layer 3 Network solution is to allow IP A VM migration in Layer 3 Network solution is to allow IP
addresses staying the same after moving to different locations. addresses staying the same after moving to different
The Identifier Locator Addressing or ILA [I-D.herbert-intarea-ila] locations. The Identifier Locator Addressing or ILA
is one of such solutions. [Herbert-ILA] is one of such solutions.
Because broadcasting is not available in Layer-3 based networks, Because broadcasting is not available in Layer-3 based
multicast of neighbor solicitations in IPv6 and ARP for IPv4 would networks, multicast of neighbor solicitations in IPv6 and
need to be emulated. Scalability of the multicast (such as IPv6 ND ARP for IPv4 would need to be emulated. Scalability of the
and IPv4 ARP) can become problematic because the hosts belonging multicast (such as IPv6 ND and IPv4 ARP) can become
to one subnet (or one VLAN) can span across many NVEs. Sending problematic because the hosts belonging to one subnet (or
broadcast traffic to all NVEs can cause unnecessary traffic in the one VLAN) can span across many NVEs. Sending broadcast
DCN if the hosts belonging to one subnet are only attached to a traffic to all NVEs can cause unnecessary traffic in the
very small number of NVEs. It is preferable to have a directory DCN if the hosts belonging to one subnet are only attached
[RFC7067] or NVA to manage the updates to an NVE of the potential to a very small number of NVEs. It is preferable to have a
other NVEs a specific subnet may be attached and get periodic directory [RFC7067] or NVA to manage the updates to an NVE
reports from an NVE of all the subnets being attached/detached, as of the potential other NVEs a specific subnet may be
described by RFC8171. attached and get periodic reports from an NVE of all the
subnets being attached/detached, as described by RFC8171.
Hot VM Migration in Layer 3 involves coordination among many Hot VM Migration in Layer 3 involves coordination among
entities, such as VM management system and NVA. Cold task many entities, such as VM management system and NVA. Cold
migration, which is a common practice in many data centers, task migration, which is a common practice in many data
involves the following steps: centers, involves the following steps:
- Stop running the task. - Stop running the task.
- Package the runtime state of the job. - Package the runtime state of the job.
- Send the runtime state of the task to the new NVE where
- Send the runtime state of the task to the new NVE where the the task is to run.
task is to run.
- Instantiate the task's state on the new machine. - Instantiate the task's state on the new machine.
- Start the tasks for the task continuing from the point at which - Start the tasks for the task continuing from the point
it was stopped. at which it was stopped.
RFC7666 has the more detailed description of the State Machine of RFC7666 has the more detailed description of the State
VMs controlled by Hypervisor Machine of VMs controlled by Hypervisor
4.4. Address and Connection Management in VM Migration 4.4. Address and Connection Management in VM Migration
Since the VM attached to the new NVE needs to be assigned with the Since the VM attached to the new NVE needs to be assigned
same address as VM attached to the old NVE, extra processing or with the same address as VM attached to the old NVE, extra
configuration is needed, such as: processing or configuration is needed, such as:
- Configure IPv4/v6 address on the target VM/NVE. - Configure IPv4/v6 address on the target VM/NVE.
- Suspend use of the address on the old NVE. This includes the - Suspend use of the address on the old NVE. This
old NVE sending query to NVA upon receiving packets destined includes the old NVE sending query to NVA upon receiving
towards the VM being moved away. If there is no response from packets destined towards the VM being moved away. If
NVA for the new NVE for the VM, the old NVE can only drop the there is no response from NVA for the new NVE for the
packets. Referring to the VM State Machine described in VM, the old NVE can only drop the packets. Referring to
RFC7666. the VM State Machine described in RFC7666.
- Trigger NVA to push the new NVE-VM mapping to other NVEs which - Trigger NVA to push the new NVE-VM mapping to other NVEs
have the attached VMs communicating with the VM being moved. which have the attached VMs communicating with the VM
being moved.
Connection management for the applications running on the VM being Connection management for the applications running on the
moved involves reestablishing existing TCP connections in the new VM being moved involves reestablishing existing TCP
place. connections in the new place.
The simplest course of action is to drop all TCP connections to The simplest course of action is to drop all TCP
the applications running on the VM during a migration. If the connections to the applications running on the VM during a
migrations are relatively rare events in a data center, impact is migration. If the migrations are relatively rare events in
relatively small when TCP connections are automatically closed in a data center, impact is relatively small when TCP
the network stack during a migration event. If the applications connections are automatically closed in the network stack
running are known to handle this gracefully (i.e. reopen dropped during a migration event. If the applications running are
known to handle this gracefully (i.e. reopen dropped
connections) then this approach may be viable. connections) then this approach may be viable.
More involved approach to connection migration entails a proxy to More involved approach to connection migration entails a
the application (or the application itself) to pause the proxy to the application (or the application itself) to
connection, package connection state and send to target, pause the connection, package connection state and send to
instantiate connection state in the peer stack, and restarting the target, instantiate connection state in the peer stack, and
connection. From the time the connection is paused to the time it restarting the connection. From the time the connection is
is running again in the new stack, packets received for the paused to the time it is running again in the new stack,
connection could be silently dropped. For some period of time, packets received for the connection could be silently
the old stack will need to keep a record of the migrated dropped. For some period of time, the old stack will need
connection. If it receives a packet, it can either silently drop to keep a record of the migrated connection. If it
the packet or forward it to the new location, as described in receives a packet, it can either silently drop the packet
Section 5. or forward it to the new location, as described in Section
5.
5. Handling Packets in Flight 5. Handling Packets in Flight
The old NVE may receive packets from the VM's ongoing The old NVE may receive packets from the VM's ongoing
communications. These packets should not be lost; they should be communications. These packets should not be lost; they
sent to the new NVE to be delivered to the VM. The steps involved should be sent to the new NVE to be delivered to the VM.
in handling packets in flight are as follows: The steps involved in handling packets in flight are as
follows:
Preparation Step: It takes some time, possibly a few seconds for Preparation Step: It takes some time, possibly a few
a VM to move from its old NVE to a new NVE. During this period, a seconds for a VM to move from its old NVE to a new NVE.
tunnel needs to be established so that the old NVE can forward During this period, a tunnel needs to be established so
packets to the new NVE. The old NVE gets the new NVE address from that the old NVE can forward packets to the new NVE. The
its NVA assuming that the NVA gets the notification when a VM is old NVE gets the new NVE address from its NVA assuming that
moved from one NVE to another. It is out of the scope of this the NVA gets the notification when a VM is moved from one
document on which entity manages the VM move and how NVA gets NVE to another. It is out of the scope of this document on
notified of the move. The old NVE can store the new NVE address which entity manages the VM move and how NVA gets notified
for the VM with a timer. When the timer expired, the entry for the of the move. The old NVE can store the new NVE address for
new NVE for the VM can be deleted. the VM with a timer. When the timer expired, the entry for
the new NVE for the VM can be deleted.
Tunnel Establishment - IPv6: Inflight packets are tunneled to the Tunnel Establishment - IPv6: Inflight packets are tunneled
new NVE using the encapsulation protocol such as VXLAN in IPv6. to the new NVE using the encapsulation protocol such as
VXLAN in IPv6.
Tunnel Establishment - IPv4: Inflight packets are tunneled to the Tunnel Establishment - IPv4: Inflight packets are tunneled
new NVE using the encapsulation protocol such as VXLAN in IPv4. to the new NVE using the encapsulation protocol such as
VXLAN in IPv4.
Tunneling Packets - IPv6: IPv6 packets received for the migrating Tunneling Packets - IPv6: IPv6 packets received for the
VM are encapsulated in an IPv6 header at the old NVE. The new NVE migrating VM are encapsulated in an IPv6 header at the old
decapsulates the packet and sends IPv6 packet to the migrating VM. NVE. The new NVE decapsulates the packet and sends IPv6
packet to the migrating VM.
Tunneling Packets - IPv4: IPv4 packets received for the migrating Tunneling Packets - IPv4: IPv4 packets received for the
VM are encapsulated in an IPv4 header at the old NVE. The new NVE migrating VM are encapsulated in an IPv4 header at the old
decapsulates the packet and sends IPv4 packet to the migrating VM. NVE. The new NVE decapsulates the packet and sends IPv4
packet to the migrating VM.
Stop Tunneling Packets: When the Timer for storing the new NVE Stop Tunneling Packets: When the Timer for storing the new
address for the VM expires. The Timer should be long enough for NVE address for the VM expires. The Timer should be long
all other NVEs that need to communicate with the VM to get their enough for all other NVEs that need to communicate with the
NVE-VM cache entries updated. VM to get their NVE-VM cache entries updated.
6. Moving Local State of VM 6. Moving Local State of VM
In addition to the VM mobility related signaling (VM Mobility In addition to the VM mobility related signaling (VM
Registration Request/Reply), the VM state needs to be transferred Mobility Registration Request/Reply), the VM state needs to
to the new NVE. The state includes its memory and file system if be transferred to the new NVE. The state includes its
the VM cannot access the memory and the file system after moving memory and file system if the VM cannot access the memory
to the new NVE. and the file system after moving to the new NVE.
The mechanism of transferring VM States and file system is out of The mechanism of transferring VM States and file system is
the scope of this document. Referring to RFC7666 for detailed out of the scope of this document. Referring to RFC7666 for
information. detailed information.
7. Handling of Hot, Warm and Cold VM Mobility 7. Handling of Hot, Warm and Cold VM Mobility
Both Cold and Warm VM mobility (or migration) refer to the Both Cold and Warm VM mobility (or migration) refer to the
complete shut down of the VM at the old NVE before restarting the complete shutdown of the VM at the old NVE before
VM at the new NVE. Therefore, all transport services to the VM restarting the VM at the new NVE. Therefore, all transport
need to be restarted. services to the VM need to be restarted.
In this document, all VM mobility is initiated by VM Management In this document, all VM mobility is initiated by VM
System. In case of Cold VM mobility, the exchange of states Management System. In case of Cold VM mobility, the
between the old NVE and the new NVE occurs after the VM attached exchange of states between the old NVE and the new NVE
to the old NVE is completely shut down. There is a time delay occurs after the VM attached to the old NVE is completely
before the new VM is launched. The cold mobility option can be shut down. There is a time delay before the new VM is
used for non-mission-critical applications and services that can launched. The cold mobility option can be used for non-
mission-critical applications and services that can
tolerate interruptions of TCP connections. tolerate interruptions of TCP connections.
For Hot VM Mobility, a VM moving to a new NVE does not change its For Hot VM Mobility, a VM moving to a new NVE does not
IP address and the service running on the VM is not interrupted. change its IP address and the service running on the VM is
The VM needs to send a gratuitous Address Resolution message or not interrupted. The VM needs to send a gratuitous Address
unsolicited Neighbor Advertisement message upstream after each Resolution message or unsolicited Neighbor Advertisement
move. message upstream after each move.
In case of Warm VM mobility, the functional components of the In case of Warm VM mobility, the functional components of
new NVE receive the running status of the VM at frequent the new NVE receive the running status of the VM at
intervals., Consequently it takes less time to launch the VM frequent intervals. Consequently, it takes less time to
under the new NVE.Other NVEs that communicate with the VM can be launch the VM under the new NVE. Other NVEs that
notified promptly about the VM migration c. The duration of the communicate with the VM can be notified promptly about the
time interval determines the effectiveness (or benefit) of Warm VM VM migration. The duration of the time interval determines
mobility. The larger the time duration, the less effective the the effectiveness (or benefit) of Warm VM mobility. The
Warm VM mobility becomes. larger the time duration, the less effective the Warm VM
mobility becomes.
In case of Cold VM mobility, the VM on the old NVE is completely In case of Cold VM mobility, the VM on the old NVE is
shut down and the VM is launched on the new NVE. To minimize the completely shut down and the VM is launched on the new NVE.
chance of the previously communicating NVEs sending packets to the To minimize the chance of the previously communicating NVEs
old NVE, the NVA should push the updated ARP/neighbor cache entry sending packets to the old NVE, the NVA should push the
to all previously communicating NVEs when the VM is started on the updated ARP/neighbor cache entry to all previously
new NVE. Alternatively, all NVEs can periodically pull the updated communicating NVEs when the VM is started on the new NVE.
ARP/neighbor cache entry from the NVA to shorten the time span Alternatively, all NVEs can periodically pull the updated
that packets are sent to the old NVE. ARP/neighbor cache entry from the NVA to shorten the time
span that packets are sent to the old NVE.
Upon starting at the new NVE, the VM should send an ARP or Upon starting at the new NVE, the VM should send an ARP or
Neighbor Discovery message. Neighbor Discovery message.
8. Other Options 8. Other Options
Hot, Warm and Cold mobility are planned activities which are Hot, Warm and Cold mobility are planned activities which
managed by VM management system. are managed by VM management system.
For unexpected events, such as overloads and failure, a VM might For unexpected events, such as overloads and failure, a VM
need to move to a new NVE without any service interruption, and might need to move to a new NVE without any service
this is called Hot VM Failover in this document. In such case, interruption, and this is called Hot VM Failover in this
there are redundant primary and secondary VMs whose states are document. In such case, there are redundant primary and
continuously synchronized by using methods that are outside the secondary VMs whose states are continuously synchronized by
scope of this draft. If the VM in the primary NVE fails, there is using methods that are outside the scope of this draft. If
no need to actively move the VM to the secondary NVE because the the VM in the primary NVE fails, there is no need to
VM in the secondary NVE can immediately pick up and continue actively move the VM to the secondary NVE because the VM in
the secondary NVE can immediately pick up and continue
processing the applications/services. processing the applications/services.
The Hot VM Failover is transparent to the peers that communicate The Hot VM Failover is transparent to the peers that
with this VM. This can be achieved via distributed load balancing communicate with this VM. This can be achieved via
when both active VM and standby VM share the same TCP port and distributed load balancing when both active VM and standby
same IP address, . In the absence of a failure, the new VM can VM share the same TCP port and same IP address. In the
pick up providing service while the sender (peer) continues to absence of a failure, the new VM can pick up providing
receive Ack from the old VM. If the situation (loading condition service while the sender (peer) continues to receive Ack
of the primary responding VM) changes the secondary responding VM from the old VM. If the situation (loading condition of the
may start providing service to the sender (peers). When a failure primary responding VM) changes the secondary responding VM
occurs, the sender (peer) may have to retry the request, so this may start providing service to the sender (peers). When a
structure is limited to requests that can be safely retried. failure occurs, the sender (peer) may have to retry the
request, so this structure is limited to requests that can
be safely retried.
If the load balancing functionality is not used, the Hot VM If the load balancing functionality is not used, the Hot VM
Failover can be made transparent to the sender (peers) without Failover can be made transparent to the sender (peers)
relying on request retry and by using the techniques that are without relying on request retry and by using the
described in section 4. This does not depend on the primary VM or techniques that are described in section 4. This does not
its associated NVE doing anything after the failure. This depend on the primary VM or its associated NVE doing
restriction is necessary because a failure that affects the anything after the failure. This restriction is necessary
primary VM may also cause its associated NVE to fail. For example, because a failure that affects the primary VM may also
a physical server failure can cause the VM and its NVE to fail. cause its associated NVE to fail. For example, a physical
server failure can cause the VM and its NVE to fail.
The Hot VM Failover option is the costliest mechanism, and hence The Hot VM Failover option is the costliest mechanism, and
this option is utilized only for mission-critical applications and hence this option is utilized only for mission-critical
services. applications and services.
9. VM Lifecycle Management 9. VM Lifecycle Management
The VM lifecycle management is a complicated task, which is beyond The VM lifecycle management is a complicated task, which is
the scope of this document. Not only it involves monitoring server beyond the scope of this document. Not only it involves
utilization, balancing the distribution of workload, etc., but monitoring server utilization, balancing the distribution
also needs to support seamless migration of VM from one of workload, etc., but also needs to support seamless
server to another. migration of VM from one server to another.
10. Security Considerations 10. Security Considerations
Security threats for the data and control plane for overlay Security threats for the data and control plane for overlay
networks are discussed in [RFC8014]. ARP (IPv4) and ND (IPv6) are networks are discussed in [RFC8014]. ARP (IPv4) and ND
not secure, especially if they can be sent gratuitously across (IPv6) are not secure, especially if they can be sent
tenant boundaries in a multi-tenant environment. gratuitously across tenant boundaries in a multi-tenant
environment.
In overlay data center networks, ARP and ND messages can be used In overlay data center networks, ARP and ND messages can be
to mount address spoofing attacks from untrusted VMs and/or other used to mount address spoofing attacks from untrusted
untrusted sources. Examples of untrusted VMs are the VMs VMs and/or other untrusted sources. Examples of untrusted
instantiated with the third party applications that are not VMs are the VMs instantiated with the third-party
written by the tenant of the VMs. Those untrusted VMs can send applications that are not written by the tenant of the VMs.
false ARP (IPv4) and ND (IPv6) messages, causing significant Those untrusted VMs can send false ARP (IPv4) and ND (IPv6)
overloads in NVEs, NVO3 Gateways, and NVAs. The attacker can messages, causing significant overloads in NVEs, NVO3
intercept, modify, or even stop data in-transit ARP/ND messages Gateways, and NVAs. The attacker can intercept, modify, or
intended for other VNs and initiate DDOS attacks to other VMs even stop data in-transit ARP/ND messages intended for
attached to the same NVE. A simple black-hole attacks can be other VNs and initiate DDOS attacks to other VMs attached
mounted by sending a false ARP/ND message to indicate that the to the same NVE. A simple black-hole attacks can be mounted
by sending a false ARP/ND message to indicate that the
victim's IP address has moved to the attacker's VM. That victim's IP address has moved to the attacker's VM. That
technique can also be used to mount man-in-the-middle attacks. technique can also be used to mount man-in-the-middle
Additional effort are required to ensure that the intercepted attacks. Additional effort is required to ensure that the
traffic can be eventually delivered to the impacted VMs. intercepted traffic can be eventually delivered to the
impacted VMs.
The locator-identifier mechanism given as an example (ILA) doesn't The locator-identifier mechanism given as an example (ILA)
include secure binding. It does not discuss how to securely bind doesn't include secure binding. It does not discuss how to
the new locator to the identifier. securely bind the new locator to the identifier.
Because of those threats, VM management system needs to apply Because of those threats, VM management system needs to
stronger security mechanisms when adding a VM to an NVE. Some apply stronger security mechanisms when adding a VM to an
tenants may have requirements that prohibit their VMs to be co- NVE. Some tenants may have requirements that prohibit their
attached to the NVEs with other tenants. Some Data Centers deploy VMs to be co-attached to the NVEs with other tenants. Some
additional functionality in their NVO3 Gateways to mitigate the Data Centers deploy additional functionality in their NVO3
ARP/ND threats. These may include periodically sending each Gateways to mitigate the ARP/ND threats. These may include
Gateway's ARP/ND cache contents to the NVA or other central periodically sending each Gateway's ARP/ND cache contents
control system. The objective is to identify the ARP/ND cache to the NVA or other central control system. The objective
entries that are not consistent with the locations of VMs and is to identify the ARP/ND cache entries that are not
their IP addresses indicated by the VM Management System. consistent with the locations of VMs and their IP addresses
indicated by the VM Management System.
11. IANA Considerations 11. IANA Considerations
This document makes no request to IANA. This document makes no request to IANA.
12. Acknowledgments 12. Acknowledgments
The authors are grateful to Bob Briscoe, David Black, Dave R. The authors are grateful to Bob Briscoe, David Black, Dave R.
Worley, Qiang Zu, Andrew Malis for helpful comments. Worley, Qiang Zu, Andrew Malis for helpful comments.
13. Change Log 13. References
. submitted version -00 as a working group draft after adoption
. submitted version -01 with these changes: references are updated,
o added packets in flight definition to Section 2
. submitted version -02 with updated address.
. submitted version -03 to fix the nits.
. submitted version -04 in reference to the WG Last call comments.
. Submitted version - 05, 06, 07, 08, 09, 10, 11, 12, 13, 14 to
address IETF LC comments from TSV area.
14. References
14.1. Normative References
[RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or
Converting Network Protocol Addresses to 48.bit Ethernet
Address for Transmission on Ethernet Hardware", STD 37,
RFC 826, DOI 10.17487/RFC0826, November 1982,
<https://www.rfc-editor.org/info/rfc826>.
[RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A 13.1. Normative References
Reverse Address Resolution Protocol", STD 38, RFC 903,
DOI 10.17487/RFC0903, June 1984, <https://www.rfc-
editor.org/info/rfc903>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC826] Plummer, D., "An Ethernet Address Resolution
Requirement Levels", BCP 14, RFC 2119, March 1997. Protocol: Or Converting Network Protocol Addresses
to 48.bit Ethernet Address for Transmission on
Ethernet Hardware", RFC826, November 1982.
[RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, [RFC903] Finlayson, R., Mann, T., Mogul, J., and M.
DOI 10.17487/RFC2629, June 1999, <https://www.rfc- Theimer, "A Reverse Address Resolution Protocol",
editor.org/info/rfc2629>. STD 38, RFC 903.
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, [RFC4861] Narten, T., Nordmark, E., Simpson, W., and H.
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, Soliman, "Neighbor Discovery for IP version 6
DOI 10.17487/RFC4861, September 2007, <https://www.rfc- (IPv6)", RFC 4861, DOI 10.17487/RFC4861, September
editor.org/info/rfc4861>. 2007, <https://www.rfc-editor.org/info/rfc4861>.
[RFC7067] L. Dunbar, D. Eastlake, R. Perlman, I. Gashinsky, [RFC7067] L. Dunbar, D. Eastlake, R. Perlman, I. Gashinsky,
"directory Assistance Problem and High Level Design "directory Assistance Problem and High Level Design
Proposal", RFC7067, Nov. 2013 Proposal", RFC7067, Nov. 2013
[RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, [RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang,
L., Sridhar, T., Bursell, M., and C. Wright, "Virtual L., Kreeger, L., and M. Napierala, "Problem
eXtensible Local Area Network (VXLAN): A Framework for Statement: Overlays for Network Virtualization",
Overlaying Virtualized Layer 2 Networks over Layer 3 RFC 7364, DOI 10.17487/RFC7364, October 2014,
Networks", RFC 7348, DOI 10.17487/RFC7348, August 2014, <https://www.rfc-editor.org/info/rfc7364>.
<https://www.rfc-editor.org/info/rfc7348>.
[RFC7364] Narten, T., Ed., Gray, E., Ed., Black, D., Fang, L.,
Kreeger, L., and M. Napierala, "Problem Statement:
Overlays for Network Virtualization", RFC 7364, DOI
10.17487/RFC7364, October 2014, <https://www.rfc-
editor.org/info/rfc7364>.
[RFC7666] H. Asai, et al, "Management Information Base for Virtual [RFC7365] Lesserre, M, et al, "Framework for Data Center (DC)
Machines Controlled by a Hypervisor", RFC7666, Oct 2015. Network Virtualization", RFC7365, Oct 2014.
[RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M.,
Narten, "An Architecture for Data-Center Network and T. Narten, "An Architecture for Data-Center
Virtualization over Layer 3 (NVO3)", RFC 8014, DOI Network Virtualization over Layer 3 (NVO3)", RFC
10.17487/RFC8014, December 2016, <https://www.rfc- 8014, DOI 10.17487/RFC8014, December 2016,
editor.org/info/rfc8014>. <https://www.rfc-editor.org/info/rfc8014>.
[RFC8171] D. Eastlake, L. Dunbar, R. Perlman, Y. Li, "Edge Directory [RFC8171] D. Eastlake, L. Dunbar, R. Perlman, Y. Li, "Edge
Assistance Mechanisms", RFC 8171, June 2017 Directory Assistance Mechanisms", RFC 8171, June
14.2. Informative References 2017
13.2. Informative References
[I-D.herbert-intarea-ila] Herbert, T. and P. Lapukhov, "Identifier- [Herbert-ILA] Herbert, T. and P. Lapukhov, "Identifier-
locator addressing for IPv6", draft-herbert-intarea-ila - locator addressing for IPv6", draft-herbert-
04 (work in progress), March 2017. intarea-ila-01 (work in progress), September 2018.
Authors' Addresses Authors' Addresses
Linda Dunbar Linda Dunbar
Futurewei Futurewei
Email: ldunbar@futurewei.com Email: ldunbar@futurewei.com
Behcet Sarikaya Behcet Sarikaya
Denpel Informatique Denpel Informatique
Email: sarikaya@ieee.org Email: sarikaya@ieee.org
 End of changes. 82 change blocks. 
487 lines changed or deleted 500 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/