--- 1/draft-ietf-nvo3-vmm-11.txt 2020-03-30 20:13:16.705756674 -0700 +++ 2/draft-ietf-nvo3-vmm-12.txt 2020-03-30 20:13:16.741757588 -0700 @@ -4,21 +4,21 @@ Expires: September 30, 2020 Denpel Informatique B.Khasnabish Independent T. Herbert Intel S. Dikshit Aruba-HPE March 30, 2020 Virtual Machine Mobility Solutions for L2 and L3 Overlay Networks - draft-ietf-nvo3-vmm-11 + draft-ietf-nvo3-vmm-12 Abstract This document describes virtual machine mobility solutions commonly used in data centers built with overlay-based network. This document is intended for describing the solutions and the impact of moving VMs (or applications) from one Rack to another connected by the Overlay networks. For layer 2, it is based on using an NVA (Network Virtualization @@ -47,21 +47,21 @@ months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html - This Internet-Draft will expire on September 27, 2020. + This Internet-Draft will expire on September 30, 2020. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -70,35 +70,35 @@ document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction...................................................3 2. Conventions used in this document..............................4 3. Requirements...................................................5 4. Overview of the VM Mobility Solutions..........................6 - 4.1. Inter-VNs communication...................................6 + 4.1. Inter-VNs and External Communication......................6 4.2. VM Migration in Layer 2 Network...........................6 4.3. VM Migration in Layer-3 Network...........................8 4.4. Address and Connection Management in VM Migration.........9 5. Handling Packets in Flight....................................10 6. Moving Local State of VM......................................11 7. Handling of Hot, Warm and Cold VM Mobility....................11 8. Other Options.................................................12 9. VM Lifecycle Management.......................................13 10. Security Considerations......................................13 - 11. IANA Considerations..........................................13 + 11. IANA Considerations..........................................14 12. Acknowledgments..............................................14 13. Change Log...................................................14 14. References...................................................14 - 14.1. Normative References....................................14 + 14.1. Normative References....................................15 14.2. Informative References..................................16 1. Introduction This document describes the overlay-based data center networks solutions in supporting multitenancy and VM (Virtual Machine) mobility. Being able to move VMs dynamically, from one server to another, makes it possible for dynamic load balancing or work distribution. Therefore, dynamic VM Mobility is highly desirable for large scale multi-tenant DCs. This document is strictly within the DCVPN, as defined by the NVO3 @@ -208,42 +208,42 @@ except for handling packets in flight. VM mobility solutions/procedures should not need to use tunneling except for handling packets in flight. 4. Overview of the VM Mobility Solutions Layer 2 and Layer 3 mobility solutions are described respectively in the following sections. -4.1. Inter-VNs communication +4.1. Inter-VNs and External Communication Inter VNs (Virtual Networks) communication refers to communication among tenants (or hosts) belonging to different VNs. Those tenants can be attached to the NVEs co-located in the same Data Center or - in different Data centers. This document assumes that the inter- - VNs communication is via the NVO3 Gateway as described in RFC8014 - (NVO3 Architecture). RFC 8014 (Section 5.3) describes the NVO3 - Gateway function which is to relay traffic onto and off of a - virtual network, i.e. among different VNs. + in different Data centers. When a VM communicates with an external + entity, the VM is effectively communicating with a peer in a + different network or a globally reachable host. - When a VM communicates with an external entity, the VM is - effectively communicating with a peer in a different network or a - globally reachable host. Communicating with hosts in other VNs - and external hosts are all through the NVO3 Gateway. There are - different policies on the NVo3 Gateway to govern the communication - among VNs and with external hosts. + This document assumes that the inter-VNs communication and the + communication with external entities are via the NVO3 Gateway as + described in RFC8014 (NVO3 Architecture). RFC 8014 (Section 5.3) + describes the NVO3 Gateway function which is to relay traffic onto + and off of a virtual network, i.e. among different VNs. + + There are different policies on the NVO3 Gateway to govern the + communication among VNs and with external entities (or hosts). After a VM is moved to a new NVE, the VM's corresponding Gateway may need to change as well. If such a change is not possible, then - the path to the external entity need to be hair-pinned to the NVO3 - Gateway used prior to the VM move. + the path to the external entities need to be hair-pinned to the + NVO3 Gateway used prior to the VM move. 4.2. VM Migration in Layer 2 Network In a Layer-2 based approach, VM moving to another NVE does not change its IP address. But this VM is now under a new NVE, previously communicating NVEs may continue sending their packets to the Old NVE. Therefore, Address Resolution Protocol (ARP) cache in IPv4 [RFC0826] or neighbor cache in IPv6 [RFC4861] in the NVEs that have attached VMs communicating with the VM being moved need to be updated promptly. If the VM being moved has @@ -479,89 +479,97 @@ 8. Other Options VM Hot mobility is to enable uninterrupted running of the application or workload instantiated on the VM when the VM running conditions changes, such as utilization overload, hardware running condition changes, or others. Hot, Warm and Cold mobility are planned activities which are managed by VM management system. For unexpected events, such as unexpected failure, a VM might need to move to a new NVE, which is called Hot VM Failover in this - document. For Hot VM Failover, there are VMs in both primary and - secondary NVEs. They can provide services simultaneously as in - load-share mode of operation. If the VM in the primary NVE fails, - there is no need to actively move the VM to the secondary NVE - because the VM in the secondary NVE can immediately pick up the - processing. It is out of the scope of this document on how and - what information are exchange between the two VMs under two - different NVE. + document. For Hot VM Failover, there are redundant primary and + secondary VMs whose states are synchronized by means that are + outside the scope of this draft. If the VM in the primary NVE + fails, there is no need to actively move the VM to the secondary + NVE because the VM in the secondary NVE can immediately pick up + the processing. Details of how state is synchronized between the + primary and secondary VMs are beyond the scope of this document. The VM Failover to the new NVE is transparent to the peers that communicate with this VM. This can be achieved by both active VM - and standby VM share the same TCP port and same IP address. There - must be a load balancer that can distribute the packets to the VM - under the new NVE. The new VM can pick up providing service while - the sender (peer) still continues to receive Ack from the old VM - and chooses not to use the service of the secondary responding VM. - If the situation (loading condition of the primary responding VM) - changes the secondary responding VM may start providing service to - the sender (peers). + and standby VM share the same TCP port and same IP address, and + using distributed load balancing functionality that controls which + VM responds to each service request. In the absence of a failure, + the new VM can pick up providing service while the sender (peer) + still continues to receive Ack from the old VM. If the situation + (loading condition of the primary responding VM) changes the + secondary responding VM may start providing service to the sender + (peers). On failure, the sender (peer) may have to retry the + request, so this structure is limited to requests that can be + safely retried. - If TCP states are not properly synchronized among the two VMs, the - VM under the New NVE after failover can force the peers to re- - establish a new TCP connection by stopping the previous TCP - connection. As most TCP connections are short lived, re- - establishing a new one is not a big problem. + If load balancing functionality is not used, the VM Failover can + be made transparent to the sender (peers) without relying on + request retry by using techniques described in section 4 that do + not depend on the primary VM or its associated NVE doing anything + after the failure. This restriction is necessary because a + failure that affects the primary VM may also cause its associated + NVE to fail (e.g., if the NVE is located in the hypervisor + that hosts the primary VM and the underlying physical server + fails, both the primary VM and the hypervisor that contains the + NVE fail as a consequence). The Hot VM Failover option is the costliest mechanism, and hence this option is utilized only for mission-critical applications and services. 9. VM Lifecycle Management The VM lifecycle management is a complicated task, which is beyond the scope of this document. Not only it involves monitoring server utilization, balanced distribution of workload, etc., but also needs to manage seamlessly VM migration from one server to another. 10. Security Considerations Security threats for the data and control plane for overlay - networks are discussed in [RFC8014]. ARP (IPv40 and ND (IPv6) are - not secure, especially if we accept gratuitous versions in multi- - tenant environment. + networks are discussed in [RFC8014]. ARP (IPv4) and ND (IPv6) are + not secure, especially if they can be sent gratuitously across + tenant boundaries in a multi-tenant environment. - In Layer-3 based overlay data center networks, ARP and ND messages - can be used to mount address spoofing attacks. An NVE may have - untrusted VMs attached. This usually happens in cases like the VMs - running third party applications. Those untrusted VMs can send - falsified ARP (IPv4) and ND (IPv6) messages, causing NVE, NVO3 - Gateway, and NVA to be overwhelmed and not able to perform - legitimate functions. The attacker can intercept, modify, or even - stop data in-transit ARP/ND messages intended for other VNs and - initiate DDOS attacks to other VMs attached to the same NVE. A - simple black-hole attacks can be mounted by sending a falsified - ARP/ND message to indicate that the victim's IP address has moved - to the attacker's VM. That technique can also be used to mount - man-in-the-middle attacks with some more effort to ensure that the - intercepted traffic is eventually delivered to the victim. + In overlay data center networks, ARP and ND messages can be used + to mount address spoofing attacks from untrusted VMs and/or other + untrusted sources. Examples of untrusted VMs include running third + party applications (i.e., applications not written by the tenant + who controls the VM). Those untrusted VMs can send falsified ARP + (IPv4) and ND (IPv6) messages, causing NVE, NVO3 Gateway, and NVA + to be overwhelmed and not able to perform legitimate functions. + The attacker can intercept, modify, or even stop data in-transit + ARP/ND messages intended for other VNs and initiate DDOS attacks + to other VMs attached to the same NVE. A simple black-hole attacks + can be mounted by sending a falsified ARP/ND message to indicate + that the victim's IP address has moved to the attacker's VM. That + technique can also be used to mount man-in-the-middle attacks with + some more effort to ensure that the intercepted traffic is + eventually delivered to the victim. The locator-identifier mechanism given as an example (ILA) doesn't include secure binding. It doesn't discuss how to securely bind the new locator to the identifier. Because of those threats, VM management system needs to apply stronger security mechanisms when add a VM to an NVE. Some tenants may have requirement that prohibit their VMs to be co-attached to the NVEs with other tenants. Some Data Centers have their NVO3 Gateways to be equipped with capability to mitigate ARP/ND threats, such as periodically exchanging its ARP/ND cache with - NVA's central control system. + NVA's central control system to validate the ARP/ND cache learned + by the NVE with the VM Management System. 11. IANA Considerations This document makes no request to IANA. 12. Acknowledgments The authors are grateful to Bob Briscoe, David Black, Dave R. Worley, Qiang Zu, Andrew Malis for helpful comments. @@ -571,25 +579,24 @@ . submitted version -01 with these changes: references are updated, o added packets in flight definition to Section 2 . submitted version -02 with updated address. . submitted version -03 to fix the nits. . submitted version -04 in reference to the WG Last call comments. - . Submitted version - 05, 06, 07, and 08 to address IETF LC comments - from TSV area. + . Submitted version - 05, 06, 07, 08, 09, 10, 11, 12 to address IETF + LC comments from TSV area. 14. References - 14.1. Normative References [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware", STD 37, RFC 826, DOI 10.17487/RFC0826, November 1982, . [RFC0903] Finlayson, R., Mann, T., Mogul, J., and M. Theimer, "A Reverse Address Resolution Protocol", STD 38, RFC 903,