draft-ietf-nvo3-vmm-06.txt | draft-ietf-nvo3-vmm-07.txt | |||
---|---|---|---|---|
Network Working Group L. Dunbar | Network Working Group L. Dunbar | |||
Internet Draft Futurewei | Internet Draft Futurewei | |||
Intended status: Informational B. Sarikaya | Intended status: Informational B. Sarikaya | |||
Expires: May 18, 2020 Denpel Informatique | Expires: August 21, 2020 Denpel Informatique | |||
B.Khasnabish | B.Khasnabish | |||
Independent | Independent | |||
T. Herbert | T. Herbert | |||
Intel | Intel | |||
S. Dikshit | S. Dikshit | |||
Aruba-HPE | Aruba-HPE | |||
November 18, 2019 | February 21, 2020 | |||
Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks | Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks | |||
draft-ietf-nvo3-vmm-06 | draft-ietf-nvo3-vmm-07 | |||
Abstract | Abstract | |||
This document discusses Virtual Machine (VM) mobility solutions that | This document describes virtual machine mobility solutions commonly | |||
are commonly used in overlay-based Data Center (DC) networks. The | used in data centers built with overlay-based network. This document | |||
objective is to describe the solutions and their impact on moving | is intended for describing the solutions and the impact of moving | |||
VMs (and applications) from one rack to another connected by the | VMs (or applications) from one Rack to another connected by the | |||
Overlay networks. | Overlay networks. | |||
For layer 2 networks, it is based on using an NVA (Network | For layer 2, it is based on using an NVA (Network Virtualization | |||
Virtualization Authority) - NVE (Network Virtualization Edge) | Authority) - NVE (Network Virtualization Edge) protocol to update | |||
protocol to update the ARP (Address Resolution Protocol) table or | ARP (Address Resolution Protocol) table or neighbor cache entries | |||
neighbor cache entries after a VM (virtual machine) moves from an | after a VM (virtual machine) moves from an Old NVE to a New NVE. | |||
Old NVE to a New NVE. For Layer 3, it is based on migration of | For Layer 3, it is based on address and connection migration after | |||
address and connection after the move. | the move. | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79.This Internet-Draft is submitted in | provisions of BCP 78 and BCP 79. | |||
full conformance with the provisions of BCP 78 and BCP 79. This | ||||
document may not be modified, and derivative works of it may not be | This Internet-Draft is submitted in full conformance with the | |||
created, except to publish it as an RFC and to translate it into | provisions of BCP 78 and BCP 79. This document may not be modified, | |||
languages other than English. | and derivative works of it may not be created, except to publish it | |||
as an RFC and to translate it into languages other than English. | ||||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
at any time. It is inappropriate to use Internet-Drafts as | at any time. It is inappropriate to use Internet-Drafts as | |||
reference material or to cite them other than as "work in progress." | reference material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt | http://www.ietf.org/ietf/1id-abstracts.txt | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html | http://www.ietf.org/shadow.html | |||
This Internet-Draft will expire on May 10, 2020. | This Internet-Draft will expire on August 21, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2020 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with | carefully, as they describe your rights and restrictions with | |||
respect to this document. Code Components extracted from this | respect to this document. Code Components extracted from this | |||
document must include Simplified BSD License text as described in | document must include Simplified BSD License text as described in | |||
Section 4.e of the Trust Legal Provisions and are provided without | Section 4.e of the Trust Legal Provisions and are provided without | |||
warranty as described in the Simplified BSD License. | warranty as described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction...................................................3 | 1. Introduction...................................................3 | |||
2. Conventions used in this document..............................4 | 2. Conventions used in this document..............................4 | |||
3. Requirements...................................................5 | 3. Requirements...................................................5 | |||
4. Overview of the VM Mobility Solutions..........................5 | 4. Overview of the VM Mobility Solutions..........................6 | |||
4.1. VM Migration in Layer-2 Network...........................5 | 4.1. VM Migration in Layer 2 Network...........................6 | |||
4.2. Task Migration in Layer-3 Network.........................7 | 4.2. Task Migration in Layer-3 Network.........................7 | |||
4.2.1. Address and Connection Migration in Task Migration...8 | 4.2.1. Address and Connection Migration in Task Migration...8 | |||
5. Handling Packets in Flight.....................................9 | 5. Handling Packets in Flight.....................................9 | |||
6. Moving Local State of VM......................................10 | 6. Moving Local State of VM......................................10 | |||
7. Handling of Hot, Warm and Cold VM Mobility....................10 | 7. Handling of Hot, Warm and Cold VM Mobility....................10 | |||
8. VM Operation..................................................11 | 8. Other VM Mobility Options.....................................11 | |||
9. Security Considerations.......................................11 | 9. VM Lifecycle Management.......................................11 | |||
10. IANA Considerations..........................................12 | 10. Security Considerations......................................11 | |||
11. Acknowledgments..............................................12 | 11. IANA Considerations..........................................12 | |||
12. Change Log...................................................12 | 12. Acknowledgments..............................................12 | |||
13. References...................................................12 | 13. Change Log...................................................12 | |||
13.1. Normative References....................................13 | 14. References...................................................12 | |||
13.2. Informative References..................................14 | 14.1. Normative References....................................13 | |||
14.2. Informative References..................................14 | ||||
1. Introduction | 1. Introduction | |||
This document describes the overlay-based DC networking solutions | This document describes the overlay-based data center networks | |||
in support of multi-tenancy and VM mobility. Many large DCs, | solutions in supporting multitenancy and VM (Virtual Machine) | |||
especially Cloud DCs, host tasks (or workloads) for multiple | mobility. Many large DCs (Data Centers), especially Cloud DCs, | |||
tenants. A tenant can be a department of one organization or an | host tasks (or workloads) for multiple tenants. A tenant can be a | |||
organization. There is communication among tasks belonging to one | department of one organization or an organization. There are | |||
tenant and communication among tasks belonging to different | communication among tasks belonging to one tenant and | |||
tenants or with external entities. | communication among tasks belonging to different tenants or with | |||
external entities. | ||||
Server Virtualization, which is being used in almost all of | Server Virtualization, which is being used in almost all of | |||
today's DCs, enables many VMs to run on a single physical computer | today's data centers, enables many VMs to run on a single physical | |||
or server sharing the processor/memory/storage. Network | computer or server sharing the processor/memory/storage. Network | |||
connectivity among VMs is provided by the network virtualization | connectivity among VMs is provided by the network virtualization | |||
edge (NVE) [RFC8014]. It is highly desirable [RFC7364] to allow | edge (NVE) [RFC8014]. It is highly desirable [RFC7364] to allow | |||
VMs to move dynamically (live, hot, or cold move) from one | VMs to be moved dynamically (live, hot, or cold move) from one | |||
server to another for dynamic load balancing or optimized workload | server to another for dynamic load balancing or optimized work | |||
distribution. | distribution. | |||
There are many challenges and requirements related to VM mobility | There are many challenges and requirements related to VM mobility | |||
in large data centers, including dynamically attaching/detaching | in large data centers, including dynamic attaching/detaching VMs | |||
VMs to/from Virtual Network Edges (VNEs). In addition, retaining | to/from Virtual Network Edges (VNEs). In addition, retaining IP | |||
the IP addresses after a move is a key requirement [RFC7364]. | addresses after a move is a key requirement [RFC7364]. Such a | |||
Such a requirement is needed in order to maintain existing | requirement is needed in order to maintain existing transport | |||
transport connections. | connections. | |||
In traditional Layer-3 based networks, retaining IP addresses | In traditional Layer-3 based networks, retaining IP addresses | |||
after a move is generally not recommended because the frequent | after a move is generally not recommended because the frequent | |||
move will cause fragmented IP addresses, which complicates IP | move will cause fragmented IP addresses, which introduces | |||
address management. | complexity in IP address management. | |||
In view of many VM mobility schemes that exist today, there is a | In view of many VM mobility schemes that exist today, there is a | |||
need to document comprehensive VM mobility solutions that cover | desire to document comprehensive VM mobility solutions that cover | |||
both IPv4 and IPv6. Large DC networks can be organized as one | both IPv4 and IPv6. The large Data Center networks can be | |||
large (a) Layer-2 network geographically distributed across | organized as one large Layer-2 network geographically distributed | |||
buildings/cities or (b) Layer-3 networks with large number of host | in several buildings/cities or Layer-3 networks with large number | |||
routes that cannot be aggregated as a result of frequent moves | of host routes that cannot be aggregated as the result of frequent | |||
from one location to another without changing the IP addresses. | moves from one location to another without changing their IP | |||
addresses. The connectivity between Layer 2 boundaries can be | ||||
The connectivity between Layer 2 boundaries can be achieved by the | achieved by the network virtualization edge (NVE) functioning as | |||
NVE functioning as Layer-3 gateway, performing routing across | Layer 3 gateway routing across bridging domain such as in | |||
bridging domain such as in Warehouse Scale Computers (WSC). | Warehouse Scale Computers (WSC). | |||
2. Conventions used in this document | 2. Conventions used in this document | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL | |||
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | |||
"OPTIONAL" in this document are to be interpreted as described in | "OPTIONAL" in this document are to be interpreted as described in | |||
RFC 2119 [RFC2119] and [RFC8014]. | RFC 2119 [RFC2119] and [RFC8014]. | |||
This document uses the terminology defined in [RFC7364]. In | This document uses the terminology defined in [RFC7364]. In | |||
addition, we make the following definitions: | addition, we make the following definitions: | |||
skipping to change at page 4, line 40 ¶ | skipping to change at page 5, line 8 ¶ | |||
Warm VM Mobility: In case of warm VM mobility, the VM states are | Warm VM Mobility: In case of warm VM mobility, the VM states are | |||
mirrored to the secondary server (or domain) at a | mirrored to the secondary server (or domain) at a | |||
predefined (configurable) regular intervals. This | predefined (configurable) regular intervals. This | |||
reduces the overheads and complexity, but this may also | reduces the overheads and complexity, but this may also | |||
lead to a situation when both servers may not contain | lead to a situation when both servers may not contain | |||
the exact same data (state information) | the exact same data (state information) | |||
Cold VM Mobility: A given VM could be moved from one server to | Cold VM Mobility: A given VM could be moved from one server to | |||
another in stopped or suspended state. | another in stopped or suspended state. | |||
Old NVE: This refers to the old NVE where packets were forwarded | Old NVE: refers to the old NVE where packets were forwarded to | |||
to before migration. | before migration. | |||
New NVE: This refers to the new NVE after migration. | New NVE: refers to the new NVE after migration. | |||
Packets in flight: This refers to the packets received by the Old | Packets in flight: refers to the packets received by the Old NVE | |||
NVE sent by the correspondents that have old ARP or | sent by the correspondents that have old ARP or neighbor | |||
neighbor cache entry before VM or task migration. | cache entry before VM or task migration. | |||
Users of VMs in diskless systems or the systems that are not | Users of VMs in diskless systems or systems not using | |||
using configuration files are called end user clients. | configuration files are called end user clients. | |||
Cloud DC: Third party DCs that host applications, tasks or | Cloud DC: Third party data centers that host applications, | |||
workloads and owned by different organizations or | tasks or workloads owned by different organizations or | |||
tenants. | tenants. | |||
3. Requirements | 3. Requirements | |||
This section states VM mobility requirements on DC networks. | This section states requirements on data center network virtual | |||
machine mobility. | ||||
DC networks should support both IPv4 and IPv6 VM mobility. | Data center network should support both IPv4 and IPv6 VM mobility. | |||
VM mobility should not require changing their IP addresses after the | Virtual machine (VM) mobility should not require changing VMs' IP | |||
move. | addresses after the move. | |||
There exist "Hot Migration" where transport service continuity is | There is "Hot Migration" with transport service continuing, and | |||
maintained, and "Cold Migration" where the transport service needs | "Cold Migration" with transport service restarted, i.e. the task | |||
to be restarted, i.e., execution of the tasks is stopped on the | running is stopped on the Old NVE, moved to the New NVE and the task | |||
"Old" NVE, moved to the "New" NVE and the task is restarted. | is restarted. Not all DCs support "Hot Migration. DCs that only | |||
support Cold Migration should make their customers aware of the | ||||
potential service interruption during the Cold Migration. | ||||
VM mobility solutions/procedures should minimize triangular routing | VM mobility solutions/procedures should minimize triangular routing | |||
except for handling packets in flight. | except for handling packets in flight. | |||
VM mobility solutions/procedures should not need to use tunneling | VM mobility solutions/procedures should not need to use tunneling | |||
except for handling packets in flight. | except for handling packets in flight. | |||
4. Overview of the VM Mobility Solutions | 4. Overview of the VM Mobility Solutions | |||
Layer-2 and Layer-3 mobility solutions are described respectively | Layer 2 and Layer 3 mobility solutions are described respectively | |||
in the following sections. | in the following sections. | |||
4.1. VM Migration in Layer-2 Network | 4.1. VM Migration in Layer 2 Network | |||
Ability to move VMs dynamically, from one server to another, makes | ||||
it possible for dynamic load balancing or workload distribution. | ||||
Therefore, this scheme is highly desirable for utilization in | Being able to move VMs dynamically, from one server to another, | |||
large scale multi-tenant DCs. | makes it possible for dynamic load balancing or work distribution. | |||
Therefore, dynamic VM Mobility is highly desirable for large scale | ||||
multi-tenant DCs. | ||||
In a Layer-2 based VM migration approach, a VM that is moving to | In a Layer-2 based approach, VM moving to another server does not | |||
another server does not change its IP address. But since this VM | change its IP address. But this VM is now under a new NVE, | |||
is now under a new NVE, previously communicating NVEs will | previously communicating NVEs will continue sending their packets | |||
continue sending their packets to the Old NVE. To solve this | to the Old NVE. To solve this problem, Address Resolution | |||
problem, Address Resolution Protocol (ARP) cache in IPv4 [RFC0826] | Protocol (ARP) cache in IPv4 [RFC0826] or neighbor cache in IPv6 | |||
or neighbor cache in IPv6 [RFC4861] in the NVEs need to be updated | [RFC4861] in the NVEs need to be updated promptly. All NVEs need | |||
promptly. All NVEs need to change their caches associating the VM | to change their caches associating the VM Layer-2 or Medium Access | |||
Layer-2 or Medium Access Control (MAC) address with the new NVE's | Control (MAC) address with the new NVE's IP address as soon as the | |||
IP address as soon as the VM moves. Such a change enables all NVEs | VM is moved. Such a change enables all NVEs to encapsulate the | |||
to encapsulate the outgoing MAC frames with the current target NVE | outgoing MAC frames with the current target NVE IP address. It may | |||
IP address. It may take some time to refresh the ARP/ND cache when | take some time to refresh ARP/ND cache when a VM is moved to a New | |||
a VM has moved to a New NVE. During this period, a tunnel is | NVE. During this period, a tunnel is needed for that Old NVE to | |||
needed for that Old NVE to forward packets destined to the VM | forward packets destined to the VM to the New NVE. | |||
under the New NVE. | ||||
In case of IPv4, immediately after the move, the VM should send a | In IPv4, the VM immediately after the move should send a | |||
gratuitous ARP request message containing its IPv4 and Layer-2 MAC | gratuitous ARP request message containing its IPv4 and Layer 2 MAC | |||
address to its new NVE. This message's destination address is the | address in its new NVE. This message's destination address is the | |||
broadcast address. Upon receiving this message, both old and new | broadcast address. Upon receiving this message, both Old and New | |||
NVEs should update the VM's ARP entry in the central directory at | NVEs should update the VM's ARP entry in the central directory at | |||
the NVA, to update its mappings to record the IPv4 address and MAC | the NVA, to update its mappings to record the IPv4 address & MAC | |||
address of the moving VM along with the new NVE IPv4 address. An | address of the moving VM along with the new NVE IPv4 address. An | |||
NVE-to-NVA protocol is used for this purpose [RFC8014]. | NVE-to-NVA protocol is used for this purpose [RFC8014]. | |||
Reverse ARP (RARP) which enables the host to discover its IPv4 | Reverse ARP (RARP) which enables the host to discover its IPv4 | |||
address when it boots from a local server [RFC0903], is not used | address when it boots from a local server [RFC0903], is not used | |||
by VMs because the VM already knows its IPv4 address. Next, we | by VMs if the VM already knows its IPv4 address (most common | |||
describe a case where RARP is used. | scenario). Next, we describe a case where RARP is used. | |||
There are some vendor deployments (e.g., diskless systems or | There are some vendor deployments (diskless systems or systems | |||
systems without configuration files) where the VM's user, i.e., | without configuration files) wherein the VM's user, i.e. end-user | |||
end-user client asks for the same MAC address upon migration. | client askes for the same MAC address upon migration. This can be | |||
This can be achieved by the clients sending RARP request message | achieved by the clients sending RARP request message which carries | |||
which carries the MAC address looking for an IP address | the MAC address looking for an IP address allocation. The server, | |||
allocation. The server, in this case the new NVE, needs to | in this case the new NVE needs to communicate with NVA, just like | |||
communicate with NVA, just like in the gratuitous ARP case to | in the gratuitous ARP case to ensure that the same IPv4 address is | |||
ensure that the same IPv4 address is assigned to the VM. NVA uses | assigned to the VM. NVA uses the MAC address as the key in the | |||
the MAC address as the key in the search of ARP cache to find the | search of ARP cache to find the IP address and informs this to the | |||
IP address and informs this to the new NVE which in turn sends | new NVE which in turn sends RARP reply message. This completes IP | |||
RARP reply message. This completes IP address assignment to the | address assignment to the migrating VM. | |||
migrating VM. | ||||
Other NVEs communicating with this VM could have the old ARP | Other NVEs communicating with this VM could have the old ARP | |||
entry. If any VMs in those NVEs need to communicate with the VM | entry. If any VMs in those NVEs need to communicate with the VM | |||
attached to the new NVE, old ARP entries might be used. Thus, the | attached to the New NVE, old ARP entries might be used. Thus, the | |||
packets are delivered to the old NVE. The old NVE MUST tunnel | packets are delivered to the Old NVE. The Old NVE MUST tunnel | |||
these in-flight packets to the new NVE. | these in-flight packets to the New NVE. | |||
When an ARP entry for those VMs times out, their corresponding | When an ARP entry for those VMs times out, their corresponding | |||
NVEs should access the NVA for an update. | NVEs should access the NVA for an update. | |||
IPv6 operation is slightly different: | IPv6 operation is slightly different: | |||
In IPv6, after the move, the VM immediately sends an unsolicited | In IPv6, after the move, the VM immediately sends an unsolicited | |||
neighbor advertisement message containing its IPv6 address and | neighbor advertisement message containing its IPv6 address and | |||
Layer-2 MAC address to its new NVE. This message is sent to the | Layer-2 MAC address to its new NVE. This message is sent to the | |||
IPv6 Solicited Node Multicast Address corresponding to the target | IPv6 Solicited Node Multicast Address corresponding to the target | |||
address which is the VM's IPv6 address. The NVE receiving this | address which is the VM's IPv6 address. The NVE receiving this | |||
message should send request to update VM's neighbor cache entry in | message should send request to update VM's neighbor cache entry in | |||
the central directory of the NVA. The NVA's neighbor cache entry | the central directory of the NVA. The NVA's neighbor cache entry | |||
should include IPv6 address of the VM, MAC address of the VM and | should include IPv6 address of the VM, MAC address of the VM and | |||
the NVE IPv6 address. An NVE-to-NVA protocol is used for this | the NVE IPv6 address. An NVE-to-NVA protocol is used for this | |||
purpose [RFC8014]. | purpose [RFC8014]. | |||
Other NVEs communicating with this VM might still use the old | Other NVEs communicating with this VM might still use the old | |||
neighbor cache entry. If any VM in those NVEs need to communicate | neighbor cache entry. If any VM in those NVEs need to communicate | |||
with the VM attached to the new NVE, it could use the old neighbor | with the VM attached to the New NVE, it could use the old neighbor | |||
cache entry. Thus, the packets are delivered to the old NVE. The | cache entry. Thus, the packets are delivered to the Old NVE. The | |||
old NVE MUST tunnel these in-flight packets to the new NVE. | Old NVE MUST tunnel these in-flight packets to the New NVE. | |||
When a neighbor cache entry in those VMs times out, their | When a neighbor cache entry in those VMs times out, their | |||
corresponding NVEs should access the NVA for an update. | corresponding NVEs should access the NVA for an update. | |||
4.2. Task Migration in Layer-3 Network | 4.2. Task Migration in Layer-3 Network | |||
Layer-2 based DC networks become quickly prohibitive because | ARP/neighbor cache scalability considerations can limit the size | |||
ARP/neighbor caches don't scale. Scaling can be accomplished | of Layer-2 based DC networks. Scaling can be accomplished | |||
seamlessly in Layer-3 data center networks by just giving each | seamlessly in Layer-3 data center networks by just giving each | |||
virtual network an IP subnet and a default route that points to | virtual network an IP subnet and a default route that points to | |||
its NVE. This means no explosion of ARP/ neighbor cache in VMs | its NVE. This means no explosion of ARP/ neighbor cache in VMs | |||
and NVEs (just one ARP/ neighbor cache entry for the default | and NVEs (just one ARP/ neighbor cache entry for the default | |||
route) and there is no need to have Ethernet header in | route) and there is no need to have Ethernet header in | |||
encapsulation [RFC7348] which saves at least 16 bytes. | encapsulation [RFC7348] which saves at least 16 bytes. | |||
Even though the term VM and Task are used interchangeably in this | Even though the term VM and Task are used interchangeably in this | |||
document, the term Task is used in the context of Layer-3 | document, the term Task is used in the context of Layer-3 | |||
migration mainly to have slight emphasis on the task of moving an | migration mainly to have slight emphasis on the moving an entity | |||
entity that is instantiated on a VM or a container. | (Task) that is instantiated on a VM or a container. | |||
Traditional Layer-3 based DC networks require IP address of the | Traditional Layer-3 based data center networks require IP address | |||
task to change after moving because the pre-fixes of the IP | of the task to change after moving because the prefixes of the IP | |||
address usually reflect the locations. It is necessary to have an | address usually reflect the locations. It is necessary to have an | |||
IP based VM migration solution that can allow IP addresses staying | IP based VM migration solution that can allow IP addresses staying | |||
the same after the VMs move to different locations. The Identifier | the same after moving to different locations. The Identifier | |||
Locator Addressing or ILA [I-D.herbert-nvo3-ila] is one of such | Locator Addressing or ILA [I-D.herbert-nvo3-ila] is one of such | |||
solutions. | solutions. | |||
Because broadcasting is not available in Layer-3 based networks, | Because broadcasting is not available in Layer-3 based networks, | |||
multicast of neighbor solicitations in IPv6 would need to be | multicast of neighbor solicitations in IPv6 would need to be | |||
emulated. | emulated. | |||
Cold task migration, which is a common practice in many data | Cold task migration, which is a common practice in many data | |||
centers, involves the following steps: | centers, involves the following steps: | |||
- Stop running the task. | - Stop running the task. | |||
- Package the runtime state of the job. | - Package the runtime state of the job. | |||
- Send the runtime state of the task to the new NVE where the | - Send the runtime state of the task to the New NVE where the | |||
task is to run. | task is to run. | |||
- Instantiate the task's state on the new machine. | - Instantiate the task's state on the new machine. | |||
- Start the tasks continuing it from the point at which it was | - Start the tasks for the task continuing from the point at which | |||
stopped. | it was stopped. | |||
Address migration and connection migration in moving tasks or VMs | ||||
are addressed next. | ||||
4.2.1. Address and Connection Migration in Task Migration | 4.2.1. Address and Connection Migration in Task Migration | |||
Address migration is achieved as follows: | Address migration is achieved as follows: | |||
- Configure IPv4/v6 address on the target Task. | - Configure IPv4/v6 address on the target Task. | |||
- Suspend use of the address on the old Task. This includes | - Suspend use of the address on the old Task. This includes | |||
handling established connections. A state may be established | handling established connections. A state may be established | |||
to drop packets or send ICMPv4 or ICMPv6 destination | to drop packets or send ICMPv4 or ICMPv6 destination | |||
unreachable message when packets to the migrated address are | unreachable message when packets to the migrated address are | |||
skipping to change at page 9, line 37 ¶ | skipping to change at page 9, line 40 ¶ | |||
5. Handling Packets in Flight | 5. Handling Packets in Flight | |||
The Old NVE may receive packets from the VM's ongoing | The Old NVE may receive packets from the VM's ongoing | |||
communications. These packets should not be lost; they should be | communications. These packets should not be lost; they should be | |||
sent to the New NVE to be delivered to the VM. The steps involved | sent to the New NVE to be delivered to the VM. The steps involved | |||
in handling packets in flight are as follows: | in handling packets in flight are as follows: | |||
Preparation Step: It takes some time, possibly a few seconds for | Preparation Step: It takes some time, possibly a few seconds for | |||
a VM to move from its Old NVE to a New NVE. During this period, a | a VM to move from its Old NVE to a New NVE. During this period, a | |||
tunnel needs to be established so that the Old NVE can forward | tunnel needs to be established so that the Old NVE can forward | |||
packets to the New NVE. Old NVE gets New NVE address from NVA in | packets to the New NVE. Old NVE gets New NVE address from its NVA | |||
the request to move the VM. The Old NVE can store the New NVE | assuming that the NVA gets the notification when a VM is moved | |||
address for the VM with a timer. When the timer expired, the entry | from one NVE to another. It is out of the scope of this document | |||
for the New NVE for the VM can be deleted. | on which entity manages the VM move and how NVA gets notified of | |||
the move. The Old NVE can store the New NVE address for the VM | ||||
with a timer. When the timer expired, the entry for the New NVE | ||||
for the VM can be deleted. | ||||
Tunnel Establishment - IPv6: Inflight packets are tunneled to the | Tunnel Establishment - IPv6: Inflight packets are tunneled to the | |||
New NVE using the encapsulation protocol such as VXLAN in IPv6. | New NVE using the encapsulation protocol such as VXLAN in IPv6. | |||
Tunnel Establishment - IPv4: Inflight packets are tunneled to the | Tunnel Establishment - IPv4: Inflight packets are tunneled to the | |||
New NVE using the encapsulation protocol such as VXLAN in IPv4. | New NVE using the encapsulation protocol such as VXLAN in IPv4. | |||
Tunneling Packets - IPv6: IPv6 packets received for the migrating | Tunneling Packets - IPv6: IPv6 packets received for the migrating | |||
VM are encapsulated in an IPv6 header at the Old NVE. New NVE | VM are encapsulated in an IPv6 header at the Old NVE. New NVE | |||
decapsulates the packet and sends IPv6 packet to the migrating VM. | decapsulates the packet and sends IPv6 packet to the migrating VM. | |||
skipping to change at page 10, line 19 ¶ | skipping to change at page 10, line 26 ¶ | |||
Stop Tunneling Packets: When the Timer for storing the New NVE | Stop Tunneling Packets: When the Timer for storing the New NVE | |||
address for the VM expires. The Timer should be long enough for | address for the VM expires. The Timer should be long enough for | |||
all other NVEs that need to communicate with the VM to get their | all other NVEs that need to communicate with the VM to get their | |||
NVE-VM cache entries updated. | NVE-VM cache entries updated. | |||
6. Moving Local State of VM | 6. Moving Local State of VM | |||
In addition to the VM mobility related signaling (VM Mobility | In addition to the VM mobility related signaling (VM Mobility | |||
Registration Request/Reply), the VM state needs to be transferred | Registration Request/Reply), the VM state needs to be transferred | |||
to the New NVE. The state includes its memory and file system if | to the New NVE. The state includes its memory and file system if | |||
the VM cannot access the memory and the file system after moving | the VM cannot access the memory and the file system after moving | |||
to the New NVE. Old NVE opens a TCP connection with New NVE over | to the New NVE. | |||
which VM's memory state is transferred. | ||||
File system or local storage is more complicated to transfer. The | The mechanism of transferring VM States and file system is out of | |||
transfer should ensure consistency, i.e. the VM at the New NVE | the scope of this document. | |||
should find the same file system it had at the Old NVE. Pre- | ||||
copying is a commonly used technique for transferring the file | ||||
system. First the whole disk image is transferred while VM | ||||
continues to run. After the VM is moved, any changes in the file | ||||
system are packaged together and sent to the New NVE Hypervisor | ||||
which reflects these changes to the file system locally at the | ||||
destination. | ||||
7. Handling of Hot, Warm and Cold VM Mobility | 7. Handling of Hot, Warm and Cold VM Mobility | |||
Both Cold and Warm VM mobility (migration), refers to the VM being | Both Cold and Warm VM mobility (or migration) refers to the VM | |||
completely shut down at the old NVE before restarted at the new | being completely shut down at the Old NVE before restarted at the | |||
NVE. Therefore, all transport services to the VM need to restart. | New NVE. Therefore, all transport services to the VM are | |||
restarted. | ||||
Upon starting at the new NVE, the VM should send an ARP or | Upon starting at the New NVE, the VM should send an ARP or | |||
Neighbor Discovery message. Cold VM mobility also allows the Old | Neighbor Discovery message. Cold VM mobility also allows the Old | |||
NVE and all communicating NVEs to time out ARP/neighbor cache | NVE and all communicating NVEs to time out ARP/neighbor cache | |||
entries of the VM. It is necessary for the NVA to push the | entries of the VM. It is necessary for the NVA to push the | |||
updated ARP/neighbor cache entry to NVEs or for NVEs to pull the | updated ARP/neighbor cache entry to NVEs or for NVEs to pull the | |||
updated ARP/neighbor cache entry from NVA. | updated ARP/neighbor cache entry from NVA. | |||
The Cold VM mobility can be facilitated by cold standby entity | The Cold VM mobility can be facilitated by cold standby entity | |||
receiving scheduled backup information. The cold standby entity | receiving scheduled backup information. The cold standby entity | |||
can be a VM or other form factors which is beyond the scope of | can be a VM or can be other form factors which is beyond the scope | |||
this document. The cold mobility option can be used for non- | of this document. The cold mobility option can be used for non- | |||
critical applications and services that can tolerate interrupted | critical applications and services that can tolerate interrupted | |||
TCP connections. | TCP connections. | |||
The Warm VM mobility refers the backup entities receive backup | The Warm VM mobility refers the backup entities receive backup | |||
information at more frequent intervals. The duration of the | information at more frequent intervals. The duration of the | |||
interval determines the warmth of the option. The larger the | interval determines the warmth of the option. The larger the | |||
duration, the less warm (and hence cold) the Warm VM mobility | duration, the less warm (and hence cold) the Warm VM mobility | |||
option becomes. | option becomes. | |||
For Hot VM Mobility, once a VM moves to a New NVE, the VM IP | ||||
address does not change and the VM should be able to continue to | ||||
receive packets to its address(es). The VM needs to send a | ||||
gratuitous Address Resolution message or unsolicited Neighbor | ||||
Advertisement message upstream after each move. | ||||
8. Other VM Mobility Options | ||||
There is also a Hot Standby option in addition to the Hot | There is also a Hot Standby option in addition to the Hot | |||
Mobility, where there are VMs in both primary and secondary NVEs. | Mobility, where there are VMs in both primary and secondary NVEs. | |||
They have identical information and can provide services | They have identical information and can provide services | |||
simultaneously as in load-share mode of operation. If the VM in | simultaneously as in load-share mode of operation. If the VM in | |||
the primary NVE fails, there is no need to actively move the VM to | the primary NVE fails, there is no need to actively move the VM to | |||
the secondary NVE because the VM in the secondary NVE already | the secondary NVE because the VM in the secondary NVE already | |||
contains identical information. The Hot Standby option is the | contain identical information. The Hot Standby option is the | |||
costliest mechanism, and hence this option is utilized only for | costliest mechanism, and hence this option is utilized only for | |||
mission-critical applications and services. In Hot Standby | mission-critical applications and services. In Hot Standby | |||
option, regarding TCP connections, one option is to start with and | option, regarding TCP connections, one option is to start with and | |||
maintain TCP connections to two different VMs at the same time. | maintain TCP connections to two different VMs at the same time. | |||
The least loaded VM responds first and starts providing service | The least loaded VM responds first and pickup providing service | |||
while the sender (origin) still continues to receive Ack from the | while the sender (origin) still continues to receive Ack from the | |||
heavily loaded (secondary) VM and chooses not to use the service | heavily loaded (secondary) VM and chooses not to use the service | |||
of the secondary responding VM. If the situation (loading | of the secondary responding VM. If the situation (loading | |||
condition of the primary responding VM) changes the secondary VM | condition of the primary responding VM) changes the secondary | |||
may start providing service to the sender (origin). | responding VM may start providing service to the sender (origin). | |||
8. VM Operation | ||||
Once a VM moves to a new NVE, the VM's IP address does not change | ||||
and the VM should be able to continue to receive packets to its | ||||
address(es). | ||||
The VM needs to send a gratuitous Address Resolution message or | ||||
unsolicited Neighbor Advertisement message upstream after each | ||||
move. | ||||
9. VM Lifecycle Management | ||||
The VM lifecycle management is a complicated task, which is beyond | The VM lifecycle management is a complicated task, which is beyond | |||
the scope of this document. Not only it involves monitoring server | the scope of this document. Not only it involves monitoring server | |||
utilization, balancing the distribution of workload, etc., but | utilization, balanced distribution of workload, etc., but also | |||
also needs seamless management VM migration from one server to | needs to manage seamlessly VM migration from one server to | |||
another. | another. | |||
9. Security Considerations | 10. Security Considerations | |||
Security threats for the data and control plane for overlay | Security threats for the data and control plane for overlay | |||
networks are discussed in [RFC8014]. There are several issues in | networks are discussed in [RFC8014]. There are several issues in | |||
a multi-tenant environment that create problems. In Layer-2 based | a multi-tenant environment that create problems. In Layer-2 based | |||
overlay DC networks, lack of security in VXLAN, and corruption of | overlay data center networks, lack of security in VXLAN, | |||
VNI can lead to delivery of information to the wrong tenant. | corruption of VNI can lead to delivery to wrong tenant. Also, ARP | |||
in IPv4 and ND in IPv6 are not secure, especially if we accept | ||||
Also, ARP in IPv4 and ND in IPv6 are not secure, especially if we | gratuitous versions. When these are done over a UDP | |||
accept the gratuitous versions. When these are done over a UDP | encapsulation, like VXLAN, the problem is worse since it is | |||
encapsulation, as in VXLAN, the problem gets worse since it is | ||||
trivial for a non-trusted entity to spoof UDP packets. | trivial for a non-trusted entity to spoof UDP packets. | |||
In Layer-3 based overlay data center networks, the problem of | In Layer-3 based overlay data center networks, the problem of | |||
address spoofing may arise. An NVE may have untrusted tasks | address spoofing may arise. An NVE may have untrusted tasks | |||
attached to it. This usually happens in situations when the VMs | attached. This usually happens in cases like the VMs (tasks) | |||
(tasks) running third party applications. This requires the usage | running third party applications. This requires the usage of | |||
of stronger security mechanisms. | stronger security mechanisms. | |||
10. IANA Considerations | 11. IANA Considerations | |||
This document makes no request to IANA. | This document makes no request to IANA. | |||
11. Acknowledgments | 12. Acknowledgments | |||
The authors are grateful to Bob Briscoe, David Black, Dave R. | The authors are grateful to Bob Briscoe, David Black, Dave R. | |||
Worley, Qiang Zu, and Andrew Malis for helpful comments. | Worley, Qiang Zu, Andrew Malis for helpful comments. | |||
12. Change Log | 13. Change Log | |||
. submitted version -00 as a working group draft after adoption | . submitted version -00 as a working group draft after adoption | |||
. submitted version -01 with these changes: references are updated, | . submitted version -01 with these changes: references are updated, | |||
o added packets in flight definition to Section 2 | o added packets in flight definition to Section 2 | |||
. submitted version -02 with updated address. | . submitted version -02 with updated address. | |||
. submitted version -03 to fix the nits. | . submitted version -03 to fix the nits. | |||
. submitted version -04 in reference to the WG Last call comments. | . submitted version -04 in reference to the WG Last call comments. | |||
. Submitted version - 05 to address IETF LC comments from TSV area. | . Submitted version - 05 to address IETF LC comments from TSV area. | |||
13. References | 14. References | |||
13.1. Normative References | 14.1. Normative References | |||
[RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or | [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or | |||
Converting Network Protocol Addresses to 48.bit Ethernet | Converting Network Protocol Addresses to 48.bit Ethernet | |||
Address for Transmission on Ethernet Hardware", STD 37, | Address for Transmission on Ethernet Hardware", STD 37, | |||
RFC 826, DOI 10.17487/RFC0826, November 1982, | RFC 826, DOI 10.17487/RFC0826, November 1982, | |||
<https://www.rfc-editor.org/info/rfc826>. | <https://www.rfc-editor.org/info/rfc826>. | |||
[RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A | [RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A | |||
Reverse Address Resolution Protocol", STD 38, RFC 903, | Reverse Address Resolution Protocol", STD 38, RFC 903, | |||
DOI 10.17487/RFC0903, June 1984, <https://www.rfc- | DOI 10.17487/RFC0903, June 1984, <https://www.rfc- | |||
skipping to change at page 14, line 11 ¶ | skipping to change at page 14, line 11 ¶ | |||
Overlays for Network Virtualization", RFC 7364, DOI | Overlays for Network Virtualization", RFC 7364, DOI | |||
10.17487/RFC7364, October 2014, <https://www.rfc- | 10.17487/RFC7364, October 2014, <https://www.rfc- | |||
editor.org/info/rfc7364>. | editor.org/info/rfc7364>. | |||
[RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. | [RFC8014] Black, D., Hudson, J., Kreeger, L., Lasserre, M., and T. | |||
Narten, "An Architecture for Data-Center Network | Narten, "An Architecture for Data-Center Network | |||
Virtualization over Layer 3 (NVO3)", RFC 8014, DOI | Virtualization over Layer 3 (NVO3)", RFC 8014, DOI | |||
10.17487/RFC8014, December 2016, <https://www.rfc- | 10.17487/RFC8014, December 2016, <https://www.rfc- | |||
editor.org/info/rfc8014>. | editor.org/info/rfc8014>. | |||
13.2. Informative References | 14.2. Informative References | |||
[I-D.herbert-nvo3-ila] Herbert, T. and P. Lapukhov, "Identifier- | [I-D.herbert-nvo3-ila] Herbert, T. and P. Lapukhov, "Identifier- | |||
locator addressing for IPv6", draft-herbert-nvo3-ila-04 | locator addressing for IPv6", draft-herbert-nvo3-ila-04 | |||
(work in progress), March 2017. | (work in progress), March 2017. | |||
Authors' Addresses | Authors' Addresses | |||
Linda Dunbar | Linda Dunbar | |||
Futurewei | Futurewei | |||
Email: ldunbar@futurewei.com | Email: ldunbar@futurewei.com | |||
End of changes. 62 change blocks. | ||||
188 lines changed or deleted | 182 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |