draft-ietf-nvo3-dataplane-requirements-01.txt | draft-ietf-nvo3-dataplane-requirements-02.txt | |||
---|---|---|---|---|
Internet Engineering Task Force Nabil Bitar | Internet Engineering Task Force Nabil Bitar | |||
Internet Draft Verizon | Internet Draft Verizon | |||
Intended status: Informational | Intended status: Informational | |||
Expires: January 2014 Marc Lasserre | Expires: May 2014 Marc Lasserre | |||
Florin Balus | Florin Balus | |||
Alcatel-Lucent | Alcatel-Lucent | |||
Thomas Morin | Thomas Morin | |||
France Telecom Orange | France Telecom Orange | |||
Lizhong Jin | Lizhong Jin | |||
Bhumip Khasnabish | Bhumip Khasnabish | |||
ZTE | ZTE | |||
July 1, 2013 | November 12, 2013 | |||
NVO3 Data Plane Requirements | NVO3 Data Plane Requirements | |||
draft-ietf-nvo3-dataplane-requirements-01.txt | draft-ietf-nvo3-dataplane-requirements-02.txt | |||
Status of this Memo | Status of this Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six | Internet-Drafts are draft documents valid for a maximum of six | |||
months and may be updated, replaced, or obsoleted by other documents | months and may be updated, replaced, or obsoleted by other documents | |||
at any time. It is inappropriate to use Internet-Drafts as | at any time. It is inappropriate to use Internet-Drafts as | |||
reference material or to cite them other than as "work in progress." | reference material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 1, 2013. | This Internet-Draft will expire on May 12, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 24 | skipping to change at page 2, line 24 | |||
Abstract | Abstract | |||
Several IETF drafts relate to the use of overlay networks to support | Several IETF drafts relate to the use of overlay networks to support | |||
large scale virtual data centers. This draft provides a list of data | large scale virtual data centers. This draft provides a list of data | |||
plane requirements for Network Virtualization over L3 (NVO3) that | plane requirements for Network Virtualization over L3 (NVO3) that | |||
have to be addressed in solutions documents. | have to be addressed in solutions documents. | |||
Table of Contents | Table of Contents | |||
1. Introduction................................................3 | 1. Introduction..................................................3 | |||
1.1. Conventions used in this document.......................3 | 1.1. Conventions used in this document........................3 | |||
1.2. General terminology.....................................3 | 1.2. General terminology......................................3 | |||
2. Data Path Overview..........................................4 | 2. Data Path Overview............................................4 | |||
3. Data Plane Requirements......................................5 | 3. Data Plane Requirements.......................................5 | |||
3.1. Virtual Access Points (VAPs)............................5 | 3.1. Virtual Access Points (VAPs).............................5 | |||
3.2. Virtual Network Instance (VNI)..........................5 | 3.2. Virtual Network Instance (VNI)...........................5 | |||
3.2.1. L2 VNI...............................................5 | 3.2.1. L2 VNI.................................................5 | |||
3.2.2. L3 VNI...............................................6 | 3.2.2. L3 VNI.................................................6 | |||
3.3. Overlay Module.........................................7 | 3.3. Overlay Module...........................................7 | |||
3.3.1. NVO3 overlay header...................................8 | 3.3.1. NVO3 overlay header....................................8 | |||
3.3.1.1. Virtual Network Context Identification..............8 | 3.3.1.1. Virtual Network Context Identification...............8 | |||
3.3.1.2. Service QoS identifier..............................8 | 3.3.1.2. Service QoS identifier...............................8 | |||
3.3.2. Tunneling function....................................9 | 3.3.2. Tunneling function.....................................9 | |||
3.3.2.1. LAG and ECMP.......................................10 | 3.3.2.1. LAG and ECMP........................................10 | |||
3.3.2.2. DiffServ and ECN marking...........................10 | 3.3.2.2. DiffServ and ECN marking............................10 | |||
3.3.2.3. Handling of BUM traffic............................11 | 3.3.2.3. Handling of BUM traffic.............................11 | |||
3.4. External NVO3 connectivity.............................11 | 3.4. External NVO3 connectivity..............................11 | |||
3.4.1. GW Types............................................12 | 3.4.1. GW Types..............................................12 | |||
3.4.1.1. VPN and Internet GWs...............................12 | 3.4.1.1. VPN and Internet GWs................................12 | |||
3.4.1.2. Inter-DC GW........................................12 | 3.4.1.2. Inter-DC GW.........................................12 | |||
3.4.1.3. Intra-DC gateways..................................12 | 3.4.1.3. Intra-DC gateways...................................12 | |||
3.4.2. Path optimality between NVEs and Gateways............12 | 3.4.2. Path optimality between NVEs and Gateways.............12 | |||
3.4.2.1. Triangular Routing Issues (Traffic Tromboning)......13 | 3.4.2.1. Load-balancing......................................14 | |||
3.5. Path MTU..............................................14 | 3.4.2.2. Triangular Routing Issues (a.k.a. Traffic Tromboning)14 | |||
3.6. Hierarchical NVE.......................................15 | 3.5. Path MTU................................................14 | |||
3.7. NVE Multi-Homing Requirements..........................15 | 3.6. Hierarchical NVE........................................15 | |||
3.8. OAM...................................................16 | 3.7. NVE Multi-Homing Requirements...........................15 | |||
3.9. Other considerations...................................16 | 3.8. Other considerations....................................16 | |||
3.9.1. Data Plane Optimizations.............................16 | 3.8.1. Data Plane Optimizations..............................16 | |||
3.9.2. NVE location trade-offs..............................17 | 3.8.2. NVE location trade-offs...............................16 | |||
4. Security Considerations.....................................17 | 4. Security Considerations......................................17 | |||
5. IANA Considerations........................................17 | 5. IANA Considerations..........................................17 | |||
6. References.................................................18 | 6. References...................................................17 | |||
6.1. Normative References...................................18 | 6.1. Normative References....................................17 | |||
6.2. Informative References.................................18 | 6.2. Informative References..................................17 | |||
7. Acknowledgments............................................19 | 7. Acknowledgments..............................................18 | |||
1. Introduction | 1. Introduction | |||
1.1. Conventions used in this document | 1.1. Conventions used in this document | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC-2119 [RFC2119]. | document are to be interpreted as described in RFC-2119 [RFC2119]. | |||
In this document, these words will appear with that interpretation | In this document, these words will appear with that interpretation | |||
skipping to change at page 6, line 4 | skipping to change at page 6, line 4 | |||
tenants. | tenants. | |||
There are different VNI types differentiated by the virtual network | There are different VNI types differentiated by the virtual network | |||
service they provide to Tenant Systems. Network virtualization can | service they provide to Tenant Systems. Network virtualization can | |||
be provided by L2 and/or L3 VNIs. | be provided by L2 and/or L3 VNIs. | |||
3.2.1. L2 VNI | 3.2.1. L2 VNI | |||
An L2 VNI MUST provide an emulated Ethernet multipoint service as if | An L2 VNI MUST provide an emulated Ethernet multipoint service as if | |||
Tenant Systems are interconnected by a bridge (but instead by using | Tenant Systems are interconnected by a bridge (but instead by using | |||
a set of NVO3 tunnels). The emulated bridge MAY be 802.1Q enabled | a set of NVO3 tunnels). The emulated bridge could be 802.1Q enabled | |||
(allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant | (allowing use of VLAN tags as a VAP). An L2 VNI provides per tenant | |||
virtual switching instance with MAC addressing isolation and L3 | virtual switching instance with MAC addressing isolation and L3 | |||
tunneling. Loop avoidance capability MUST be provided. | tunneling. Loop avoidance capability MUST be provided. | |||
Forwarding table entries provide mapping information between tenant | Forwarding table entries provide mapping information between tenant | |||
system MAC addresses and VAPs on directly connected VNIs and L3 | system MAC addresses and VAPs on directly connected VNIs and L3 | |||
tunnel destination addresses over the overlay. Such entries MAY be | tunnel destination addresses over the overlay. Such entries could be | |||
populated by a control or management plane, or via data plane. | populated by a control or management plane, or via data plane. | |||
In the absence of a management or control plane, data plane learning | By default, data plane learning MUST be used to populate forwarding | |||
MUST be used to populate forwarding tables. As frames arrive from | tables. As frames arrive from VAPs or from overlay tunnels, standard | |||
VAPs or from overlay tunnels, standard MAC learning procedures are | MAC learning procedures are used: The tenant system source MAC | |||
used: The tenant system source MAC address is learned against the | address is learned against the VAP or the NVO3 tunneling | |||
VAP or the NVO3 tunneling encapsulation source address on which the | encapsulation source address on which the frame arrived. This | |||
frame arrived. This implies that unknown unicast traffic be flooded | implies that unknown unicast traffic will be flooded (i.e. | |||
i.e. broadcast. | broadcast). | |||
When flooding is required, either to deliver unknown unicast, or | When flooding is required, either to deliver unknown unicast, or | |||
broadcast or multicast traffic, the NVE MUST either support ingress | broadcast or multicast traffic, the NVE MUST either support ingress | |||
replication or multicast. In this latter case, the NVE MUST have one | replication or multicast. | |||
or more multicast trees that can be used by local VNIs for flooding | ||||
to NVEs belonging to the same VN. For each VNI, there is one | When using multicast, the NVE MUST have one or more multicast trees | |||
flooding tree, and a multicast tree may be dedicated per VNI or | that can be used by local VNIs for flooding to NVEs belonging to the | |||
shared across VNIs. In such cases, multiple VNIs MAY share the same | same VN. For each VNI, there is at least one flooding tree used for | |||
default flooding tree. The flooding tree is equivalent with a | Broadcast, Unknown Unicast and Multicast forwarding. This tree MAY | |||
be shared across VNIs. The flooding tree is equivalent with a | ||||
multicast (*,G) construct where all the NVEs for which the | multicast (*,G) construct where all the NVEs for which the | |||
corresponding VNI is instantiated are members. The multicast tree | corresponding VNI is instantiated are members. | |||
MAY be established automatically via routing and signaling or pre- | ||||
provisioned. | ||||
When tenant multicast is supported, it SHOULD also be possible to | When tenant multicast is supported, it SHOULD also be possible to | |||
select whether the NVE provides optimized multicast trees inside the | select whether the NVE provides optimized multicast trees inside the | |||
VNI for individual tenant multicast groups or whether the default | VNI for individual tenant multicast groups or whether the default | |||
VNI flooding tree is used. If the former option is selected the VNI | VNI flooding tree is used. If the former option is selected the VNI | |||
SHOULD be able to snoop IGMP/MLD messages in order to efficiently | SHOULD be able to snoop IGMP/MLD messages in order to efficiently | |||
join/prune Tenant System from multicast trees. | join/prune Tenant System from multicast trees. | |||
3.2.2. L3 VNI | 3.2.2. L3 VNI | |||
skipping to change at page 7, line 22 | skipping to change at page 7, line 21 | |||
L2 and L3 VNIs can be deployed in isolation or in combination to | L2 and L3 VNIs can be deployed in isolation or in combination to | |||
optimize traffic flows per tenant across the overlay network. For | optimize traffic flows per tenant across the overlay network. For | |||
example, an L2 VNI may be configured across a number of NVEs to | example, an L2 VNI may be configured across a number of NVEs to | |||
offer L2 multi-point service connectivity while a L3 VNI can be co- | offer L2 multi-point service connectivity while a L3 VNI can be co- | |||
located to offer local routing capabilities and gateway | located to offer local routing capabilities and gateway | |||
functionality. In addition, integrated routing and bridging per | functionality. In addition, integrated routing and bridging per | |||
tenant MAY be supported on an NVE. An instantiation of such service | tenant MAY be supported on an NVE. An instantiation of such service | |||
may be realized by interconnecting an L2 VNI as access to an L3 VNI | may be realized by interconnecting an L2 VNI as access to an L3 VNI | |||
on the NVE. | on the NVE. | |||
The L3 VNI does not require support for Broadcast and Unknown | When multicast is supported, it MAY be possible to select whether | |||
Unicast traffic. The L3 VNI MAY provide support for customer | the NVE provides optimized multicast trees inside the VNI for | |||
multicast groups. When multicast is supported, it SHOULD be possible | individual tenant multicast groups or whether a default VNI | |||
to select whether the NVE provides optimized multicast trees inside | multicasting tree, where all the NVEs of the corresponding VNI are | |||
the VNI for individual tenant multicast groups or whether a default | members, is used. | |||
VNI multicasting tree, where all the NVEs of the corresponding VNI | ||||
are members, is used. | ||||
3.3. Overlay Module | 3.3. Overlay Module | |||
The overlay module performs a number of functions related to NVO3 | The overlay module performs a number of functions related to NVO3 | |||
header and tunnel processing. | header and tunnel processing. | |||
The following figure shows a generic NVO3 encapsulated frame: | The following figure shows a generic NVO3 encapsulated frame: | |||
+--------------------------+ | +--------------------------+ | |||
| Tenant Frame | | | Tenant Frame | | |||
skipping to change at page 8, line 21 | skipping to change at page 8, line 18 | |||
this packet. | this packet. | |||
. Outer underlay header: Can be either IP or MPLS | . Outer underlay header: Can be either IP or MPLS | |||
. Outer link layer header: Header specific to the physical | . Outer link layer header: Header specific to the physical | |||
transmission link used | transmission link used | |||
3.3.1. NVO3 overlay header | 3.3.1. NVO3 overlay header | |||
An NVO3 overlay header MUST be included after the underlay tunnel | An NVO3 overlay header MUST be included after the underlay tunnel | |||
header when forwarding tenant traffic. Note that this information | header when forwarding tenant traffic. | |||
can be carried within existing protocol headers (when overloading of | ||||
specific fields is possible) or within a separate header. | Note that this information can be carried within existing protocol | |||
headers (when overloading of specific fields is possible) or within | ||||
a separate header. | ||||
3.3.1.1. Virtual Network Context Identification | 3.3.1.1. Virtual Network Context Identification | |||
The overlay encapsulation header MUST contain a field which allows | The overlay encapsulation header MUST contain a field which allows | |||
the encapsulated frame to be delivered to the appropriate virtual | the encapsulated frame to be delivered to the appropriate virtual | |||
network endpoint by the egress NVE. The egress NVE uses this field | network endpoint by the egress NVE. | |||
to determine the appropriate virtual network context in which to | ||||
process the packet. This field MAY be an explicit, unique (to the | ||||
administrative domain) virtual network identifier (VNID) or MAY | ||||
express the necessary context information in other ways (e.g. a | ||||
locally significant identifier). | ||||
It SHOULD be aligned on a 32-bit boundary so as to make it | The egress NVE uses this field to determine the appropriate virtual | |||
efficiently processable by the data path. It MUST be distributable | network context in which to process the packet. This field MAY be an | |||
by a control-plane or configured via a management plane. | explicit, unique (to the administrative domain) virtual network | |||
identifier (VNID) or MAY express the necessary context information | ||||
in other ways (e.g. a locally significant identifier). | ||||
In the case of a global identifier, this field MUST be large enough | In the case of a global identifier, this field MUST be large enough | |||
to scale to 100's of thousands of virtual networks. Note that there | to scale to 100's of thousands of virtual networks. Note that there | |||
is no such constraint when using a local identifier. | is typically no such constraint when using a local identifier. | |||
3.3.1.2. Service QoS identifier | 3.3.1.2. Service QoS identifier | |||
Traffic flows originating from different applications could rely on | Traffic flows originating from different applications could rely on | |||
differentiated forwarding treatment to meet end-to-end availability | differentiated forwarding treatment to meet end-to-end availability | |||
and performance objectives. Such applications may span across one or | and performance objectives. Such applications may span across one or | |||
more overlay networks. To enable such treatment, support for | more overlay networks. To enable such treatment, support for | |||
multiple Classes of Service across or between overlay networks MAY | multiple Classes of Service across or between overlay networks MAY | |||
be required. | be required. | |||
skipping to change at page 10, line 7 | skipping to change at page 10, line 7 | |||
ISID tags and MPLS TC bits in the VPN labels. | ISID tags and MPLS TC bits in the VPN labels. | |||
3.3.2. Tunneling function | 3.3.2. Tunneling function | |||
This section describes the underlay tunneling requirements. From an | This section describes the underlay tunneling requirements. From an | |||
encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4 | encapsulation perspective, IPv4 or IPv6 MUST be supported, both IPv4 | |||
and IPv6 SHOULD be supported, MPLS tunneling MAY be supported. | and IPv6 SHOULD be supported, MPLS tunneling MAY be supported. | |||
3.3.2.1. LAG and ECMP | 3.3.2.1. LAG and ECMP | |||
For performance reasons, multipath over LAG and ECMP paths SHOULD be | For performance reasons, multipath over LAG and ECMP paths MAY be | |||
supported. | supported. | |||
LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal | LAG (Link Aggregation Group) [IEEE 802.1AX-2008] and ECMP (Equal | |||
Cost Multi Path) are commonly used techniques to perform load- | Cost Multi Path) are commonly used techniques to perform load- | |||
balancing of microflows over a set of a parallel links either at | balancing of microflows over a set of a parallel links either at | |||
Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware | Layer-2 (LAG) or Layer-3 (ECMP). Existing deployed hardware | |||
implementations of LAG and ECMP uses a hash of various fields in the | implementations of LAG and ECMP uses a hash of various fields in the | |||
encapsulation (outermost) header(s) (e.g. source and destination MAC | encapsulation (outermost) header(s) (e.g. source and destination MAC | |||
addresses for non-IP traffic, source and destination IP addresses, | addresses for non-IP traffic, source and destination IP addresses, | |||
L4 protocol, L4 source and destination port numbers, etc). | L4 protocol, L4 source and destination port numbers, etc). | |||
Furthermore, hardware deployed for the underlay network(s) will be | Furthermore, hardware deployed for the underlay network(s) will be | |||
most often unaware of the carried, innermost L2 frames or L3 packets | most often unaware of the carried, innermost L2 frames or L3 packets | |||
transmitted by the TS. Thus, in order to perform fine-grained load- | transmitted by the TS. | |||
balancing over LAG and ECMP paths in the underlying network, the | ||||
encapsulation MUST result in sufficient entropy to exercise all | Thus, in order to perform fine-grained load-balancing over LAG and | |||
paths through several LAG/ECMP hops. The entropy information MAY be | ECMP paths in the underlying network, the encapsulation MUST result | |||
inferred from the NVO3 overlay header or underlay header. If the | in sufficient entropy to exercise all paths through several LAG/ECMP | |||
overlay protocol does not support the necessary entropy information | hops. | |||
or the switches/routers in the underlay do not support parsing of | ||||
the additional entropy information in the overlay header, underlay | The entropy information can be inferred from the NVO3 overlay header | |||
switches and routers should be programmable, i.e. select the | or underlay header. If the overlay protocol does not support the | |||
appropriate fields in the underlay header for hash calculation based | necessary entropy information or the switches/routers in the | |||
on the type of overlay header. | underlay do not support parsing of the additional entropy | |||
information in the overlay header, underlay switches and routers | ||||
should be programmable, i.e. select the appropriate fields in the | ||||
underlay header for hash calculation based on the type of overlay | ||||
header. | ||||
All packets that belong to a specific flow MUST follow the same path | All packets that belong to a specific flow MUST follow the same path | |||
in order to prevent packet re-ordering. This is typically achieved | in order to prevent packet re-ordering. This is typically achieved | |||
by ensuring that the fields used for hashing are identical for a | by ensuring that the fields used for hashing are identical for a | |||
given flow. | given flow. | |||
All paths available to the overlay network SHOULD be used | The goal is for all paths available to the overlay network to be | |||
efficiently. Different flows SHOULD be distributed as evenly as | used efficiently. Different flows should be distributed as evenly as | |||
possible across multiple underlay network paths. For instance, this | possible across multiple underlay network paths. For instance, this | |||
can be achieved by ensuring that some fields used for hashing are | can be achieved by ensuring that some fields used for hashing are | |||
randomly generated. | randomly generated. | |||
3.3.2.2. DiffServ and ECN marking | 3.3.2.2. DiffServ and ECN marking | |||
When traffic is encapsulated in a tunnel header, there are numerous | When traffic is encapsulated in a tunnel header, there are numerous | |||
options as to how the Diffserv Code-Point (DSCP) and Explicit | options as to how the Diffserv Code-Point (DSCP) and Explicit | |||
Congestion Notification (ECN) markings are set in the outer header | Congestion Notification (ECN) markings are set in the outer header | |||
and propagated to the inner header on decapsulation. | and propagated to the inner header on decapsulation. | |||
[RFC2983] defines two modes for mapping the DSCP markings from inner | [RFC2983] defines two modes for mapping the DSCP markings from inner | |||
to outer headers and vice versa. The Uniform model copies the inner | to outer headers and vice versa. The Uniform model copies the inner | |||
DSCP marking to the outer header on tunnel ingress, and copies that | DSCP marking to the outer header on tunnel ingress, and copies that | |||
outer header value back to the inner header at tunnel egress. The | outer header value back to the inner header at tunnel egress. The | |||
Pipe model sets the DSCP value to some value based on local policy | Pipe model sets the DSCP value to some value based on local policy | |||
at ingress and does not modify the inner header on egress. Both | at ingress and does not modify the inner header on egress. Both | |||
models SHOULD be supported. | models SHOULD be supported. | |||
ECN marking MUST be performed according to [RFC6040] which describes | [RFC6040] defines ECN marking and processing for IP tunnels. | |||
the correct ECN behavior for IP tunnels. | ||||
3.3.2.3. Handling of BUM traffic | 3.3.2.3. Handling of BUM traffic | |||
NVO3 data plane support for either ingress replication or point-to- | NVO3 data plane support for either ingress replication or point-to- | |||
multipoint tunnels is required to send traffic destined to multiple | multipoint tunnels is required to send traffic destined to multiple | |||
locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2 | locations on a per-VNI basis (e.g. L2/L3 multicast traffic, L2 | |||
broadcast and unknown unicast traffic). It is possible that both | broadcast and unknown unicast traffic). It is possible that both | |||
methods be used simultaneously. | methods be used simultaneously. | |||
There is a bandwidth vs state trade-off between the two approaches. | There is a bandwidth vs state trade-off between the two approaches. | |||
User-definable knobs MUST be provided to select which method(s) gets | User-configurable knobs MUST be provided to select which method(s) | |||
used based upon the amount of replication required (i.e. the number | gets used based upon the amount of replication required (i.e. the | |||
of hosts per group), the amount of multicast state to maintain, the | number of hosts per group), the amount of multicast state to | |||
duration of multicast flows and the scalability of multicast | maintain, the duration of multicast flows and the scalability of | |||
protocols. | multicast protocols. | |||
When ingress replication is used, NVEs MUST track for each VNI the | When ingress replication is used, NVEs MUST maintain for each VNI | |||
related tunnel endpoints to which it needs to replicate the frame. | the related tunnel endpoints to which it needs to replicate the | |||
frame. | ||||
For point-to-multipoint tunnels, the bandwidth efficiency is | For point-to-multipoint tunnels, the bandwidth efficiency is | |||
increased at the cost of more state in the Core nodes. The ability | increased at the cost of more state in the Core nodes. The ability | |||
to auto-discover or pre-provision the mapping between VNI multicast | to auto-discover or pre-provision the mapping between VNI multicast | |||
trees to related tunnel endpoints at the NVE and/or throughout the | trees to related tunnel endpoints at the NVE and/or throughout the | |||
core SHOULD be supported. | core SHOULD be supported. | |||
3.4. External NVO3 connectivity | 3.4. External NVO3 connectivity | |||
NVO3 services MUST interoperate with current VPN and Internet | NVO3 services MUST interoperate with current VPN and Internet | |||
skipping to change at page 12, line 39 | skipping to change at page 12, line 43 | |||
3.4.1.3. Intra-DC gateways | 3.4.1.3. Intra-DC gateways | |||
Even within one DC there may be End Devices that do not support NVO3 | Even within one DC there may be End Devices that do not support NVO3 | |||
encapsulation, for example bare metal servers, hardware appliances | encapsulation, for example bare metal servers, hardware appliances | |||
and storage. A gateway device, e.g. a ToR, is required to translate | and storage. A gateway device, e.g. a ToR, is required to translate | |||
the NVO3 to Ethernet VLAN encapsulation. | the NVO3 to Ethernet VLAN encapsulation. | |||
3.4.2. Path optimality between NVEs and Gateways | 3.4.2. Path optimality between NVEs and Gateways | |||
Within the NVO3 overlay, a default assumption is that NVO3 traffic | Within an NVO3 overlay, a default assumption is that NVO3 traffic | |||
will be equally load-balanced across the underlying network | will be equally load-balanced across the underlying network | |||
consisting of LAG and/or ECMP paths. This assumption is valid only | consisting of LAG and/or ECMP paths. This assumption is valid only | |||
as long as: a) all traffic is load-balanced equally among each of | as long as: a) all traffic is load-balanced equally among each of | |||
the component-links and paths; and, b) each of the component- | the component-links and paths; and, b) each of the component- | |||
links/paths is of identical capacity. During the course of normal | links/paths is of identical capacity. During the course of normal | |||
operation of the underlying network, it is possible that one, or | operation of the underlying network, it is possible that one, or | |||
more, of the component-links/paths of a LAG may be taken out-of- | more, of the component-links/paths of a LAG may be taken out-of- | |||
service in order to be repaired, e.g.: due to hardware failure of | service in order to be repaired, e.g.: due to hardware failure of | |||
cabling, optics, etc. In such cases, the administrator should | cabling, optics, etc. In such cases, the administrator should | |||
configure the underlying network such that an entire LAG bundle in | configure the underlying network such that an entire LAG bundle in | |||
skipping to change at page 13, line 32 | skipping to change at page 13, line 37 | |||
On the other hand, for Inter-DC and DC to External Network cases | On the other hand, for Inter-DC and DC to External Network cases | |||
that use a WAN, the costs of the underlying network and/or service | that use a WAN, the costs of the underlying network and/or service | |||
(e.g.: IPVPN service) are more expensive; therefore, there is a | (e.g.: IPVPN service) are more expensive; therefore, there is a | |||
requirement on administrators to both: a) ensure high availability | requirement on administrators to both: a) ensure high availability | |||
(active-backup failover or active-active load-balancing); and, b) | (active-backup failover or active-active load-balancing); and, b) | |||
maintaining substantial utilization of the WAN transport capacity at | maintaining substantial utilization of the WAN transport capacity at | |||
nearly all times, particularly in the case of active-active load- | nearly all times, particularly in the case of active-active load- | |||
balancing. With respect to the dataplane requirements of NVO3 | balancing. With respect to the dataplane requirements of NVO3 | |||
solutions, in the case of active-backup fail-over, all of the | solutions, in the case of active-backup fail-over, all of the | |||
ingress NVE's MUST dynamically adapt to the failure of an active NVE | ingress NVE's need to dynamically adapt to the failure of an active | |||
GW when the backup NVE GW announces itself into the NVO3 overlay | NVE GW when the backup NVE GW announces itself into the NVO3 overlay | |||
immediately following a failure of the previously active NVE GW and | immediately following a failure of the previously active NVE GW and | |||
update their forwarding tables accordingly, (e.g.: perhaps through | update their forwarding tables accordingly, (e.g.: perhaps through | |||
dataplane learning and/or translation of a gratuitous ARP, IPv6 | dataplane learning and/or translation of a gratuitous ARP, IPv6 | |||
Router Advertisement, etc.) Note that active-backup fail-over could | Router Advertisement). Note that active-backup fail-over could be | |||
be used to accomplish a crude form of load-balancing by, for | used to accomplish a crude form of load-balancing by, for example, | |||
example, manually configuring each tenant to use a different NVE GW, | manually configuring each tenant to use a different NVE GW, in a | |||
in a round-robin fashion. On the other hand, with respect to active- | round-robin fashion. | |||
active load-balancing across physically separate NVE GW's (e.g.: | ||||
two, separate chassis) an NVO3 solution SHOULD support forwarding | ||||
tables that can simultaneously map a single egress NVE to more than | ||||
one NVO3 tunnels. The granularity of such mappings, in both active- | ||||
backup and active-active, MUST be unique to each tenant. | ||||
3.4.2.1. Triangular Routing Issues (Traffic Tromboning) | 3.4.2.1. Load-balancing | |||
When using active-active load-balancing across physically separate | ||||
NVE GW's (e.g.: two, separate chassis) an NVO3 solution SHOULD | ||||
support forwarding tables that can simultaneously map a single | ||||
egress NVE to more than one NVO3 tunnels. The granularity of such | ||||
mappings, in both active-backup and active-active, MUST be specific | ||||
to each tenant. | ||||
3.4.2.2. Triangular Routing Issues (a.k.a. Traffic Tromboning) | ||||
L2/ELAN over NVO3 service may span multiple racks distributed across | L2/ELAN over NVO3 service may span multiple racks distributed across | |||
different DC regions. Multiple ELANs belonging to one tenant may be | different DC regions. Multiple ELANs belonging to one tenant may be | |||
interconnected or connected to the outside world through multiple | interconnected or connected to the outside world through multiple | |||
Router/VRF gateways distributed throughout the DC regions. In this | Router/VRF gateways distributed throughout the DC regions. In this | |||
scenario, without aid from an NVO3 or other type of solution, | scenario, without aid from an NVO3 or other type of solution, | |||
traffic from an ingress NVE destined to External gateways will take | traffic from an ingress NVE destined to External gateways will take | |||
a non-optimal path that will result in higher latency and costs, | a non-optimal path that will result in higher latency and costs, | |||
(since it is using more expensive resources of a WAN). In the case | (since it is using more expensive resources of a WAN). In the case | |||
of traffic from an IP/MPLS network destined toward the entrance to | of traffic from an IP/MPLS network destined toward the entrance to | |||
an NVO3 overlay, well-known IP routing techniques MAY be used to | an NVO3 overlay, well-known IP routing techniques MAY be used to | |||
optimize traffic into the NVO3 overlay, (at the expense of | optimize traffic into the NVO3 overlay, (at the expense of | |||
additional routes in the IP/MPLS network). In summary, these issues | additional routes in the IP/MPLS network). In summary, these issues | |||
are well known as triangular routing. | are well known as triangular routing. | |||
Procedures for gateway selection to avoid triangular routing issues | Procedures for gateway selection to avoid triangular routing issues | |||
SHOULD be provided. The details of such procedures are, most likely, | SHOULD be provided. | |||
part of the NVO3 Management and/or Control Plane requirements and, | ||||
thus, out of scope of this document. However, a key requirement on | The details of such procedures are, most likely, part of the NVO3 | |||
the dataplane of any NVO3 solution to avoid triangular routing is | Management and/or Control Plane requirements and, thus, out of scope | |||
stated above, in Section 3.4.2, with respect to active-active load- | of this document. However, a key requirement on the dataplane of any | |||
balancing. More specifically, an NVO3 solution SHOULD support | NVO3 solution to avoid triangular routing is stated above, in | |||
forwarding tables that can simultaneously map a single egress NVE to | Section 3.4.2, with respect to active-active load-balancing. More | |||
more than one NVO3 tunnels. The expectation is that, through the | specifically, an NVO3 solution SHOULD support forwarding tables that | |||
Control and/or Management Planes, this mapping information MAY be | can simultaneously map a single egress NVE to more than one NVO3 | |||
dynamically manipulated to, for example, provide the closest | tunnel. | |||
geographic and/or topological exit point (egress NVE) for each | ||||
ingress NVE. | The expectation is that, through the Control and/or Management | |||
Planes, this mapping information may be dynamically manipulated to, | ||||
for example, provide the closest geographic and/or topological exit | ||||
point (egress NVE) for each ingress NVE. | ||||
3.5. Path MTU | 3.5. Path MTU | |||
The tunnel overlay header can cause the MTU of the path to the | The tunnel overlay header can cause the MTU of the path to the | |||
egress tunnel endpoint to be exceeded. | egress tunnel endpoint to be exceeded. | |||
IP fragmentation SHOULD be avoided for performance reasons. | IP fragmentation SHOULD be avoided for performance reasons. | |||
The interface MTU as seen by a Tenant System SHOULD be adjusted such | The interface MTU as seen by a Tenant System SHOULD be adjusted such | |||
that no fragmentation is needed. This can be achieved by | that no fragmentation is needed. This can be achieved by | |||
skipping to change at page 15, line 14 | skipping to change at page 15, line 30 | |||
o The underlay network MAY be designed in such a way that the MTU | o The underlay network MAY be designed in such a way that the MTU | |||
can accommodate the extra tunnel overhead. | can accommodate the extra tunnel overhead. | |||
3.6. Hierarchical NVE | 3.6. Hierarchical NVE | |||
It might be desirable to support the concept of hierarchical NVEs, | It might be desirable to support the concept of hierarchical NVEs, | |||
such as spoke NVEs and hub NVEs, in order to address possible NVE | such as spoke NVEs and hub NVEs, in order to address possible NVE | |||
performance limitations and service connectivity optimizations. | performance limitations and service connectivity optimizations. | |||
For instance, spoke NVE functionality MAY be used when processing | For instance, spoke NVE functionality may be used when processing | |||
capabilities are limited. A hub NVE would provide additional data | capabilities are limited. A hub NVE would provide additional data | |||
processing capabilities such as packet replication. | processing capabilities such as packet replication. | |||
NVEs can be either connected in an any-to-any or hub and spoke | NVEs can be either connected in an any-to-any or hub and spoke | |||
topology on a per VNI basis. | topology on a per VNI basis. | |||
3.7. NVE Multi-Homing Requirements | 3.7. NVE Multi-Homing Requirements | |||
Multi-homing techniques SHOULD be used to increase the reliability | Multi-homing techniques SHOULD be used to increase the reliability | |||
of an nvo3 network. It is also important to ensure that physical | of an nvo3 network. It is also important to ensure that physical | |||
skipping to change at page 16, line 5 | skipping to change at page 16, line 18 | |||
system is co-located with an NVE, IP routing can be relied upon to | system is co-located with an NVE, IP routing can be relied upon to | |||
handle routing over diverse links to TORs. | handle routing over diverse links to TORs. | |||
External connectivity MAY be handled by two or more nvo3 gateways. | External connectivity MAY be handled by two or more nvo3 gateways. | |||
Each gateway is connected to a different domain (e.g. ISP) and runs | Each gateway is connected to a different domain (e.g. ISP) and runs | |||
BGP multi-homing. They serve as an access point to external networks | BGP multi-homing. They serve as an access point to external networks | |||
such as VPNs or the Internet. When a connection to an upstream | such as VPNs or the Internet. When a connection to an upstream | |||
router is lost, the alternative connection is used and the failed | router is lost, the alternative connection is used and the failed | |||
route withdrawn. | route withdrawn. | |||
3.8. OAM | 3.8. Other considerations | |||
NVE MAY be able to originate/terminate OAM messages for connectivity | ||||
verification, performance monitoring, statistic gathering and fault | ||||
isolation. Depending on configuration, NVEs SHOULD be able to | ||||
process or transparently tunnel OAM messages, as well as supporting | ||||
alarm propagation capabilities. | ||||
Given the critical requirement to load-balance NVO3 encapsulated | ||||
packets over LAG and ECMP paths, it will be equally critical to | ||||
ensure existing and/or new OAM tools allow NVE administrators to | ||||
proactively and/or reactively monitor the health of various | ||||
component-links that comprise both LAG and ECMP paths carrying NVO3 | ||||
encapsulated packets. For example, it will be important that such | ||||
OAM tools allow NVE administrators to reveal the set of underlying | ||||
network hops (topology) in order that the underlying network | ||||
administrators can use this information to quickly perform fault | ||||
isolation and restore the underlying network. | ||||
The NVE MUST provide the ability to reveal the set of ECMP and/or | ||||
LAG paths used by NVO3 encapsulated packets in the underlying | ||||
network from an ingress NVE to egress NVE. The NVE MUST provide the | ||||
ability to provide a "ping"-like functionality that can be used to | ||||
determine the health (liveness) of remote NVE's or their VNI's. The | ||||
NVE SHOULD provide a "ping"-like functionality to more expeditiously | ||||
aid in troubleshooting performance problems, i.e.: blackholing or | ||||
other types of congestion occurring in the underlying network, for | ||||
NVO3 encapsulated packets carried over LAG and/or ECMP paths. | ||||
3.9. Other considerations | ||||
3.9.1. Data Plane Optimizations | 3.8.1. Data Plane Optimizations | |||
Data plane forwarding and encapsulation choices SHOULD consider the | Data plane forwarding and encapsulation choices SHOULD consider the | |||
limitation of possible NVE implementations, specifically in software | limitation of possible NVE implementations, specifically in software | |||
based implementations (e.g. servers running VSwitches) | based implementations (e.g. servers running VSwitches) | |||
NVE SHOULD provide efficient processing of traffic. For instance, | NVE SHOULD provide efficient processing of traffic. For instance, | |||
packet alignment, the use of offsets to minimize header parsing, | packet alignment, the use of offsets to minimize header parsing, | |||
padding techniques SHOULD be considered when designing NVO3 | padding techniques SHOULD be considered when designing NVO3 | |||
encapsulation types. | encapsulation types. | |||
The NV03 encapsulation/decapsulation processing in software-based | The NV03 encapsulation/decapsulation processing in software-based | |||
NVEs SHOULD make use of hardware assist provided by NICs in order to | NVEs SHOULD make use of hardware assist provided by NICs in order to | |||
speed up packet processing. | speed up packet processing. | |||
3.9.2. NVE location trade-offs | 3.8.2. NVE location trade-offs | |||
In the case of DC traffic, traffic originated from a VM is native | In the case of DC traffic, traffic originated from a VM is native | |||
Ethernet traffic. This traffic can be switched by a local VM switch | Ethernet traffic. This traffic can be switched by a local VM switch | |||
or ToR switch and then by a DC gateway. The NVE function can be | or ToR switch and then by a DC gateway. The NVE function can be | |||
embedded within any of these elements. | embedded within any of these elements. | |||
The NVE function can be supported in various DC network elements | The NVE function can be supported in various DC network elements | |||
such as a VM, VM switch, ToR switch or DC GW. | such as a VM, VM switch, ToR switch or DC GW. | |||
The following criteria SHOULD be considered when deciding where the | The following criteria SHOULD be considered when deciding where the | |||
skipping to change at page 19, line 10 | skipping to change at page 18, line 41 | |||
[RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires | [RFC6391] Bryant, S. et al, "Flow-Aware Transport of Pseudowires | |||
over an MPLS Packet Switched Network", RFC6391, November | over an MPLS Packet Switched Network", RFC6391, November | |||
2011 | 2011 | |||
7. Acknowledgments | 7. Acknowledgments | |||
In addition to the authors the following people have contributed to | In addition to the authors the following people have contributed to | |||
this document: | this document: | |||
Shane Amante, Level3 | Shane Amante, Dimitrios Stiliadis, Rotem Salomonovitch, Larry | |||
Kreeger, and Eric Gray. | ||||
Dimitrios Stiliadis, Rotem Salomonovitch, Alcatel-Lucent | ||||
Larry Kreeger, Cisco | ||||
This document was prepared using 2-Word-v2.0.template.dot. | This document was prepared using 2-Word-v2.0.template.dot. | |||
Authors' Addresses | Authors' Addresses | |||
Nabil Bitar | Nabil Bitar | |||
Verizon | Verizon | |||
40 Sylvan Road | 40 Sylvan Road | |||
Waltham, MA 02145 | Waltham, MA 02145 | |||
Email: nabil.bitar@verizon.com | Email: nabil.bitar@verizon.com | |||
End of changes. 31 change blocks. | ||||
165 lines changed or deleted | 140 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |