draft-ietf-mboned-dc-deploy-06.txt | draft-ietf-mboned-dc-deploy-07.txt | |||
---|---|---|---|---|
MBONED M. McBride | MBONED M. McBride | |||
Internet-Draft Futurewei | Internet-Draft Futurewei | |||
Intended status: Informational O. Komolafe | Intended status: Informational O. Komolafe | |||
Expires: December 8, 2019 Arista Networks | Expires: January 24, 2020 Arista Networks | |||
June 6, 2019 | July 23, 2019 | |||
Multicast in the Data Center Overview | Multicast in the Data Center Overview | |||
draft-ietf-mboned-dc-deploy-06 | draft-ietf-mboned-dc-deploy-07 | |||
Abstract | Abstract | |||
The volume and importance of one-to-many traffic patterns in data | The volume and importance of one-to-many traffic patterns in data | |||
centers is likely to increase significantly in the future. Reasons | centers is likely to increase significantly in the future. Reasons | |||
for this increase are discussed and then attention is paid to the | for this increase are discussed and then attention is paid to the | |||
manner in which this traffic pattern may be judiously handled in data | manner in which this traffic pattern may be judiously handled in data | |||
centers. The intuitive solution of deploying conventional IP | centers. The intuitive solution of deploying conventional IP | |||
multicast within data centers is explored and evaluated. Thereafter, | multicast within data centers is explored and evaluated. Thereafter, | |||
a number of emerging innovative approaches are described before a | a number of emerging innovative approaches are described before a | |||
skipping to change at page 1, line 38 ¶ | skipping to change at page 1, line 38 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on December 8, 2019. | This Internet-Draft will expire on January 24, 2020. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2019 IETF Trust and the persons identified as the | Copyright (c) 2019 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 16 ¶ | skipping to change at page 2, line 16 ¶ | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | |||
2. Reasons for increasing one-to-many traffic patterns . . . . . 3 | 2. Reasons for increasing one-to-many traffic patterns . . . . . 3 | |||
2.1. Applications . . . . . . . . . . . . . . . . . . . . . . 3 | 2.1. Applications . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2.2. Overlays . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.2. Overlays . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2.3. Protocols . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2.3. Protocols . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
3. Handling one-to-many traffic using conventional multicast . . 6 | 2.4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
3.1. Layer 3 multicast . . . . . . . . . . . . . . . . . . . . 6 | 3. Handling one-to-many traffic using conventional multicast . . 7 | |||
3.2. Layer 2 multicast . . . . . . . . . . . . . . . . . . . . 6 | 3.1. Layer 3 multicast . . . . . . . . . . . . . . . . . . . . 7 | |||
3.3. Example use cases . . . . . . . . . . . . . . . . . . . . 8 | 3.2. Layer 2 multicast . . . . . . . . . . . . . . . . . . . . 7 | |||
3.3. Example use cases . . . . . . . . . . . . . . . . . . . . 9 | ||||
3.4. Advantages and disadvantages . . . . . . . . . . . . . . 9 | 3.4. Advantages and disadvantages . . . . . . . . . . . . . . 9 | |||
4. Alternative options for handling one-to-many traffic . . . . 9 | 4. Alternative options for handling one-to-many traffic . . . . 10 | |||
4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 9 | 4.1. Minimizing traffic volumes . . . . . . . . . . . . . . . 11 | |||
4.2. Head end replication . . . . . . . . . . . . . . . . . . 10 | 4.2. Head end replication . . . . . . . . . . . . . . . . . . 12 | |||
4.3. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | 4.3. Programmable Forwarding Planes . . . . . . . . . . . . . 12 | |||
4.4. Segment Routing . . . . . . . . . . . . . . . . . . . . . 12 | 4.4. BIER . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 12 | 4.5. Segment Routing . . . . . . . . . . . . . . . . . . . . . 14 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 | 5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15 | |||
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 13 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 | |||
9.1. Normative References . . . . . . . . . . . . . . . . . . 13 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
9.2. Informative References . . . . . . . . . . . . . . . . . 13 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 | 9.2. Informative References . . . . . . . . . . . . . . . . . 16 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 | ||||
1. Introduction | 1. Introduction | |||
The volume and importance of one-to-many traffic patterns in data | The volume and importance of one-to-many traffic patterns in data | |||
centers is likely to increase significantly in the future. Reasons | centers is likely to increase significantly in the future. Reasons | |||
for this increase include the nature of the traffic generated by | for this increase include the nature of the traffic generated by | |||
applications hosted in the data center, the need to handle broadcast, | applications hosted in the data center, the need to handle broadcast, | |||
unknown unicast and multicast (BUM) traffic within the overlay | unknown unicast and multicast (BUM) traffic within the overlay | |||
technologies used to support multi-tenancy at scale, and the use of | technologies used to support multi-tenancy at scale, and the use of | |||
certain protocols that traditionally require one-to-many control | certain protocols that traditionally require one-to-many control | |||
message exchanges. These trends, allied with the expectation that | message exchanges. | |||
future highly virtualized data centers must support communication | ||||
These trends, allied with the expectation that future highly | ||||
virtualized large-scale data centers must support communication | ||||
between potentially thousands of participants, may lead to the | between potentially thousands of participants, may lead to the | |||
natural assumption that IP multicast will be widely used in data | natural assumption that IP multicast will be widely used in data | |||
centers, specifically given the bandwidth savings it potentially | centers, specifically given the bandwidth savings it potentially | |||
offers. However, such an assumption would be wrong. In fact, there | offers. However, such an assumption would be wrong. In fact, there | |||
is widespread reluctance to enable IP multicast in data centers for a | is widespread reluctance to enable conventional IP multicast in data | |||
number of reasons, mostly pertaining to concerns about its | centers for a number of reasons, mostly pertaining to concerns about | |||
scalability and reliability. | its scalability and reliability. | |||
This draft discusses some of the main drivers for the increasing | This draft discusses some of the main drivers for the increasing | |||
volume and importance of one-to-many traffic patterns in data | volume and importance of one-to-many traffic patterns in data | |||
centers. Thereafter, the manner in which conventional IP multicast | centers. Thereafter, the manner in which conventional IP multicast | |||
may be used to handle this traffic pattern is discussed and some of | may be used to handle this traffic pattern is discussed and some of | |||
the associated challenges highlighted. Following this discussion, a | the associated challenges highlighted. Following this discussion, a | |||
number of alternative emerging approaches are introduced, before | number of alternative emerging approaches are introduced, before | |||
concluding by discussing key trends and making a number of | concluding by discussing key trends and making a number of | |||
recommendations. | recommendations. | |||
skipping to change at page 4, line 7 ¶ | skipping to change at page 4, line 11 ¶ | |||
requirement for robustness, stability and predicability has meant the | requirement for robustness, stability and predicability has meant the | |||
TV broadcast industry has traditionally used TV-specific protocols, | TV broadcast industry has traditionally used TV-specific protocols, | |||
infrastructure and technologies for transmitting video signals | infrastructure and technologies for transmitting video signals | |||
between end points such as cameras, monitors, mixers, graphics | between end points such as cameras, monitors, mixers, graphics | |||
devices and video servers. However, the growing cost and complexity | devices and video servers. However, the growing cost and complexity | |||
of supporting this approach, especially as the bit rates of the video | of supporting this approach, especially as the bit rates of the video | |||
signals increase due to demand for formats such as 4K-UHD and 8K-UHD, | signals increase due to demand for formats such as 4K-UHD and 8K-UHD, | |||
means there is a consensus that the TV broadcast industry will | means there is a consensus that the TV broadcast industry will | |||
transition from industry-specific transmission formats (e.g. SDI, | transition from industry-specific transmission formats (e.g. SDI, | |||
HD-SDI) over TV-specific infrastructure to using IP-based | HD-SDI) over TV-specific infrastructure to using IP-based | |||
infrastructure. The development of pertinent standards by the SMPTE, | infrastructure. The development of pertinent standards by the | |||
along with the increasing performance of IP routers, means this | Society of Motion Picture and Television Engineers (SMPTE) | |||
transition is gathering pace. A possible outcome of this transition | [SMPTE2110], along with the increasing performance of IP routers, | |||
will be the building of IP data centers in broadcast plants. Traffic | means this transition is gathering pace. A possible outcome of this | |||
flows in the broadcast industry are frequently one-to-many and so if | transition will be the building of IP data centers in broadcast | |||
IP data centers are deployed in broadcast plants, it is imperative | plants. Traffic flows in the broadcast industry are frequently one- | |||
that this traffic pattern is supported efficiently in that | to-many and so if IP data centers are deployed in broadcast plants, | |||
infrastructure. In fact, a pivotal consideration for broadcasters | it is imperative that this traffic pattern is supported efficiently | |||
considering transitioning to IP is the manner in which these one-to- | in that infrastructure. In fact, a pivotal consideration for | |||
many traffic flows will be managed and monitored in a data center | broadcasters considering transitioning to IP is the manner in which | |||
with an IP fabric. | these one-to-many traffic flows will be managed and monitored in a | |||
data center with an IP fabric. | ||||
One of the few success stories in using conventional IP multicast has | One of the few success stories in using conventional IP multicast has | |||
been for disseminating market trading data. For example, IP | been for disseminating market trading data. For example, IP | |||
multicast is commonly used today to deliver stock quotes from the | multicast is commonly used today to deliver stock quotes from stock | |||
stock exchange to financial services provider and then to the stock | exchanges to financial service providers and then to the stock | |||
analysts or brokerages. The network must be designed with no single | analysts or brokerages. It is essential that the network | |||
point of failure and in such a way that the network can respond in a | infrastructure delivers very low latency and high throughout, | |||
deterministic manner to any failure. Typically, redundant servers | especially given the proliferation of automated and algorithmic | |||
(in a primary/backup or live-live mode) send multicast streams into | trading which means stock analysts or brokerages may gain an edge on | |||
the network, with diverse paths being used across the network. | competitors simply by receiving an update a few milliseconds earlier. | |||
Another critical requirement is reliability and traceability; | As would be expected, in such deployments reliability is critical. | |||
regulatory and legal requirements means that the producer of the | The network must be designed with no single point of failure and in | |||
marketing data may need to know exactly where the flow was sent and | such a way that it can respond in a deterministic manner to failure. | |||
be able to prove conclusively that the data was received within | Typically, redundant servers (in a primary/backup or live-live mode) | |||
agreed SLAs. The stock exchange generating the one-to-many traffic | send multicast streams into the network, with diverse paths being | |||
and stock analysts/brokerage that receive the traffic will typically | used across the network. The stock exchange generating the one-to- | |||
have their own data centers. Therefore, the manner in which one-to- | many traffic and stock analysts/brokerage that receive the traffic | |||
many traffic patterns are handled in these data centers are extremely | will typically have their own data centers. Therefore, the manner in | |||
important, especially given the requirements and constraints | which one-to-many traffic patterns are handled in these data centers | |||
mentioned. | are extremely important, especially given the requirements and | |||
constraints mentioned. | ||||
Many data center cloud providers provide publish and subscribe | Another reason for the growing volume of one-to-many traffic patterns | |||
applications. There can be numerous publishers and subscribers and | in modern data centers is the increasing adoption of streaming | |||
many message channels within a data center. With publish and | telemetry. This transition is motivated by the observation that | |||
subscribe servers, a separate message is sent to each subscriber of a | traditional poll-based approaches for monitoring network devices are | |||
publication. With multicast publish/subscribe, only one message is | usually inadequate in modern data centers. These approaches | |||
sent, regardless of the number of subscribers. In a publish/ | typically suffer from poor scalability, extensibility and | |||
subscribe system, client applications, some of which are publishers | responsiveness. In contrast, in streaming telemetry, network devices | |||
and some of which are subscribers, are connected to a network of | in the data center stream highly-granular real-time updates to a | |||
message brokers that receive publications on a number of topics, and | telemetry collector/database. This collector then collates, | |||
send the publications on to the subscribers for those topics. The | normalizes and encodes this data for convenient consumption by | |||
more subscribers there are in the publish/subscribe system, the | monitoring applications. The montoring applications can subscribe to | |||
greater the improvement to network utilization there might be with | the notifications of interest, allowing them to gain insight into | |||
multicast. | pertinent state and performance metrics. Thus, the traffic flows | |||
associated with streaming telemetry are typically many-to-one between | ||||
the network devices and the telemetry collector and then one-to-many | ||||
from the collector to the monitoring applications. | ||||
The use of publish and subscribe applications is growing within data | ||||
centers, contributing to the rising volume of one-to-many traffic | ||||
flows. Such applications are attractive as they provide a robust | ||||
low-latency asynchronous messaging service, allowing senders to be | ||||
decoupled from receivers. The usual approach is for a publisher to | ||||
create and transmit a message to a specific topic. The publish and | ||||
subscribe application will retain the message and ensure it is | ||||
delivered to all subscribers to that topic. The flexibility in the | ||||
number of publishers and subscribers to a specific topic means such | ||||
applications cater for one-to-one, one-to-many and many-to-one | ||||
traffic patterns. | ||||
2.2. Overlays | 2.2. Overlays | |||
The proposed architecture for supporting large-scale multi-tenancy in | Another key contributor to the rise in one-to-many traffic patterns | |||
highly virtualized data centers [RFC8014] consists of a tenant's VMs | is the proposed architecture for supporting large-scale multi-tenancy | |||
distributed across the data center connected by a virtual network | in highly virtualized data centers [RFC8014]. In this architecture, | |||
known as the overlay network. A number of different technologies | a tenant's VMs are distributed across the data center and are | |||
have been proposed for realizing the overlay network, including VXLAN | connected by a virtual network known as the overlay network. A | |||
[RFC7348], VXLAN-GPE [I-D.ietf-nvo3-vxlan-gpe], NVGRE [RFC7637] and | number of different technologies have been proposed for realizing the | |||
GENEVE [I-D.ietf-nvo3-geneve]. The often fervent and arguably | overlay network, including VXLAN [RFC7348], VXLAN-GPE [I-D.ietf-nvo3- | |||
partisan debate about the relative merits of these overlay | vxlan-gpe], NVGRE [RFC7637] and GENEVE [I-D.ietf-nvo3-geneve]. The | |||
technologies belies the fact that, conceptually, it may be said that | often fervent and arguably partisan debate about the relative merits | |||
these overlays typically simply provide a means to encapsulate and | of these overlay technologies belies the fact that, conceptually, it | |||
tunnel Ethernet frames from the VMs over the data center IP fabric, | may be said that these overlays mainly simply provide a means to | |||
thus emulating a layer 2 segment between the VMs. Consequently, the | encapsulate and tunnel Ethernet frames from the VMs over the data | |||
VMs believe and behave as if they are connected to the tenant's other | center IP fabric, thus emulating a Layer 2 segment between the VMs. | |||
VMs by a conventional layer 2 segment, regardless of their physical | Consequently, the VMs believe and behave as if they are connected to | |||
location within the data center. Naturally, in a layer 2 segment, | the tenant's other VMs by a conventional Layer 2 segment, regardless | |||
point to multi-point traffic can result from handling BUM (broadcast, | of their physical location within the data center. | |||
unknown unicast and multicast) traffic. And, compounding this issue | ||||
within data centers, since the tenant's VMs attached to the emulated | Naturally, in a Layer 2 segment, point to multi-point traffic can | |||
segment may be dispersed throughout the data center, the BUM traffic | result from handling BUM (broadcast, unknown unicast and multicast) | |||
may need to traverse the data center fabric. Hence, regardless of | traffic. And, compounding this issue within data centers, since the | |||
the overlay technology used, due consideration must be given to | tenant's VMs attached to the emulated segment may be dispersed | |||
handling BUM traffic, forcing the data center operator to consider | throughout the data center, the BUM traffic may need to traverse the | |||
the manner in which one-to-many communication is handled within the | data center fabric. | |||
IP fabric. | ||||
Hence, regardless of the overlay technology used, due consideration | ||||
must be given to handling BUM traffic, forcing the data center | ||||
operator to pay attention to the manner in which one-to-many | ||||
communication is handled within the data center. And this | ||||
consideration is likely to become increasingly important with the | ||||
anticipated rise in the number and importance of overlays. In fact, | ||||
it may be asserted that the manner in which one-to-many | ||||
communications arising from overlays is handled is pivotal to the | ||||
performance and stability of the entire data center network. | ||||
2.3. Protocols | 2.3. Protocols | |||
Conventionally, some key networking protocols used in data centers | Conventionally, some key networking protocols used in data centers | |||
require one-to-many communication. For example, ARP and ND use | require one-to-many communications for control messages. Thus, the | |||
broadcast and multicast messages within IPv4 and IPv6 networks | data center operator must pay due attention to how these control | |||
respectively to discover MAC address to IP address mappings. | message exchanges are supported. | |||
Furthermore, when these protocols are running within an overlay | ||||
network, then it essential to ensure the messages are delivered to | ||||
all the hosts on the emulated layer 2 segment, regardless of physical | ||||
location within the data center. The challenges associated with | ||||
optimally delivering ARP and ND messages in data centers has | ||||
attracted lots of attention [RFC6820]. Popular approaches in use | ||||
mostly seek to exploit characteristics of data center networks to | ||||
avoid having to broadcast/multicast these messages, as discussed in | ||||
Section 4.1. | ||||
There are networking protocols that are being modified/developed to | For example, ARP [RFC0826] and ND [RFC4861] use broadcast and | |||
specifically target working in a data center CLOS environment. BGP | multicast messages within IPv4 and IPv6 networks respectively to | |||
has been extended to work in these type of DC environments and well | discover MAC address to IP address mappings. Furthermore, when these | |||
supports multicast. RIFT (Routing in Fat Trees) is a new protocol | protocols are running within an overlay network, it essential to | |||
being developed to work efficiently in DC CLOS environments and also | ensure the messages are delivered to all the hosts on the emulated | |||
is being specified to support multicast addressing and forwarding. | Layer 2 segment, regardless of physical location within the data | |||
center. The challenges associated with optimally delivering ARP and | ||||
ND messages in data centers has attracted lots of attention | ||||
[RFC6820]. | ||||
Another example of a protocol that may neccessitate having one-to- | ||||
many traffic flows in the data center is IGMP [RFC2236], [RFC3376]. | ||||
If the VMs attached to the Layer 2 segment wish to join a multicast | ||||
group they must send IGMP reports in response to queries from the | ||||
querier. As these devices could be located at different locations | ||||
within the data center, there is the somewhat ironic prospect of IGMP | ||||
itself leading to an increase in the volume of one-to-many | ||||
communications in the data center. | ||||
2.4. Summary | ||||
Section 2.1, Section 2.2 and Section 2.3 have discussed how the | ||||
trends in the types of applications, the overlay technologies used | ||||
and some of the essential networking protocols results in an increase | ||||
in the volume of one-to-many traffic patterns in modern highly- | ||||
virtualized data centers. Section 3 explores how such traffic flows | ||||
may be handled using conventional IP multicast. | ||||
3. Handling one-to-many traffic using conventional multicast | 3. Handling one-to-many traffic using conventional multicast | |||
Faced with ever increasing volumes of one-to-many traffic flows for | ||||
the reasons presented in Section 2, arguably the intuitive initial | ||||
course of action for a data center operator is to explore if and how | ||||
conventional IP multicast could be deployed within the data center. | ||||
This section introduces the key protocols, discusses some example use | ||||
cases where they are deployed in data centers and discusses some of | ||||
the advantages and disadvantages of such deployments. | ||||
3.1. Layer 3 multicast | 3.1. Layer 3 multicast | |||
PIM is the most widely deployed multicast routing protocol and so, | PIM is the most widely deployed multicast routing protocol and so, | |||
unsurprisingly, is the primary multicast routing protocol considered | unsurprisingly, is the primary multicast routing protocol considered | |||
for use in the data center. There are three potential popular modes | for use in the data center. There are three potential popular modes | |||
of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] or PIM- | of PIM that may be used: PIM-SM [RFC4601], PIM-SSM [RFC4607] or PIM- | |||
BIDIR [RFC5015]. It may be said that these different modes of PIM | BIDIR [RFC5015]. It may be said that these different modes of PIM | |||
tradeoff the optimality of the multicast forwarding tree for the | tradeoff the optimality of the multicast forwarding tree for the | |||
amount of multicast forwarding state that must be maintained at | amount of multicast forwarding state that must be maintained at | |||
routers. SSM provides the most efficient forwarding between sources | routers. SSM provides the most efficient forwarding between sources | |||
skipping to change at page 6, line 48 ¶ | skipping to change at page 8, line 5 ¶ | |||
With IPv4 unicast address resolution, the translation of an IP | With IPv4 unicast address resolution, the translation of an IP | |||
address to a MAC address is done dynamically by ARP. With multicast | address to a MAC address is done dynamically by ARP. With multicast | |||
address resolution, the mapping from a multicast IPv4 address to a | address resolution, the mapping from a multicast IPv4 address to a | |||
multicast MAC address is done by assigning the low-order 23 bits of | multicast MAC address is done by assigning the low-order 23 bits of | |||
the multicast IPv4 address to fill the low-order 23 bits of the | the multicast IPv4 address to fill the low-order 23 bits of the | |||
multicast MAC address. Each IPv4 multicast address has 28 unique | multicast MAC address. Each IPv4 multicast address has 28 unique | |||
bits (the multicast address range is 224.0.0.0/12) therefore mapping | bits (the multicast address range is 224.0.0.0/12) therefore mapping | |||
a multicast IP address to a MAC address ignores 5 bits of the IP | a multicast IP address to a MAC address ignores 5 bits of the IP | |||
address. Hence, groups of 32 multicast IP addresses are mapped to | address. Hence, groups of 32 multicast IP addresses are mapped to | |||
the same MAC address. And so a a multicast MAC address cannot be | the same MAC address. And so a multicast MAC address cannot be | |||
uniquely mapped to a multicast IPv4 address. Therefore, planning is | uniquely mapped to a multicast IPv4 address. Therefore, IPv4 | |||
required within an organization to choose IPv4 multicast addresses | multicast addresses must be chosen judiciously in order to avoid | |||
judiciously in order to avoid address aliasing. When sending IPv6 | unneccessary address aliasing. When sending IPv6 multicast packets | |||
multicast packets on an Ethernet link, the corresponding destination | on an Ethernet link, the corresponding destination MAC address is a | |||
MAC address is a direct mapping of the last 32 bits of the 128 bit | direct mapping of the last 32 bits of the 128 bit IPv6 multicast | |||
IPv6 multicast address into the 48 bit MAC address. It is possible | address into the 48 bit MAC address. It is possible for more than | |||
for more than one IPv6 multicast address to map to the same 48 bit | one IPv6 multicast address to map to the same 48 bit MAC address. | |||
MAC address. | ||||
The default behaviour of many hosts (and, in fact, routers) is to | The default behaviour of many hosts (and, in fact, routers) is to | |||
block multicast traffic. Consequently, when a host wishes to join an | block multicast traffic. Consequently, when a host wishes to join an | |||
IPv4 multicast group, it sends an IGMP [RFC2236], [RFC3376] report to | IPv4 multicast group, it sends an IGMP [RFC2236], [RFC3376] report to | |||
the router attached to the layer 2 segment and also it instructs its | the router attached to the Layer 2 segment and also it instructs its | |||
data link layer to receive Ethernet frames that match the | data link layer to receive Ethernet frames that match the | |||
corresponding MAC address. The data link layer filters the frames, | corresponding MAC address. The data link layer filters the frames, | |||
passing those with matching destination addresses to the IP module. | passing those with matching destination addresses to the IP module. | |||
Similarly, hosts simply hand the multicast packet for transmission to | Similarly, hosts simply hand the multicast packet for transmission to | |||
the data link layer which would add the layer 2 encapsulation, using | the data link layer which would add the Layer 2 encapsulation, using | |||
the MAC address derived in the manner previously discussed. | the MAC address derived in the manner previously discussed. | |||
When this Ethernet frame with a multicast MAC address is received by | When this Ethernet frame with a multicast MAC address is received by | |||
a switch configured to forward multicast traffic, the default | a switch configured to forward multicast traffic, the default | |||
behaviour is to flood it to all the ports in the layer 2 segment. | behaviour is to flood it to all the ports in the Layer 2 segment. | |||
Clearly there may not be a receiver for this multicast group present | Clearly there may not be a receiver for this multicast group present | |||
on each port and IGMP snooping is used to avoid sending the frame out | on each port and IGMP snooping is used to avoid sending the frame out | |||
of ports without receivers. | of ports without receivers. | |||
A switch running IGMP snooping listens to the IGMP messages exchanged | A switch running IGMP snooping listens to the IGMP messages exchanged | |||
between hosts and the router in order to identify which ports have | between hosts and the router in order to identify which ports have | |||
active receivers for a specific multicast group, allowing the | active receivers for a specific multicast group, allowing the | |||
forwarding of multicast frames to be suitably constrained. Normally, | forwarding of multicast frames to be suitably constrained. Normally, | |||
the multicast router will generate IGMP queries to which the hosts | the multicast router will generate IGMP queries to which the hosts | |||
send IGMP reports in response. However, number of optimizations in | send IGMP reports in response. However, number of optimizations in | |||
skipping to change at page 8, line 44 ¶ | skipping to change at page 9, line 46 ¶ | |||
associated with the VXLAN interface. | associated with the VXLAN interface. | |||
Another use case of PIM and IGMP in data centers is when IPTV servers | Another use case of PIM and IGMP in data centers is when IPTV servers | |||
use multicast to deliver content from the data center to end users. | use multicast to deliver content from the data center to end users. | |||
IPTV is typically a one to many application where the hosts are | IPTV is typically a one to many application where the hosts are | |||
configured for IGMPv3, the switches are configured with IGMP | configured for IGMPv3, the switches are configured with IGMP | |||
snooping, and the routers are running PIM-SSM mode. Often redundant | snooping, and the routers are running PIM-SSM mode. Often redundant | |||
servers send multicast streams into the network and the network is | servers send multicast streams into the network and the network is | |||
forwards the data across diverse paths. | forwards the data across diverse paths. | |||
Windows Media servers send multicast streams to clients. Windows | ||||
Media Services streams to an IP multicast address and all clients | ||||
subscribe to the IP address to receive the same stream. This allows | ||||
a single stream to be played simultaneously by multiple clients and | ||||
thus reducing bandwidth utilization. | ||||
3.4. Advantages and disadvantages | 3.4. Advantages and disadvantages | |||
Arguably the biggest advantage of using PIM and IGMP to support one- | Arguably the biggest advantage of using PIM and IGMP to support one- | |||
to-many communication in data centers is that these protocols are | to-many communication in data centers is that these protocols are | |||
relatively mature. Consequently, PIM is available in most routers | relatively mature. Consequently, PIM is available in most routers | |||
and IGMP is supported by most hosts and routers. As such, no | and IGMP is supported by most hosts and routers. As such, no | |||
specialized hardware or relatively immature software is involved in | specialized hardware or relatively immature software is involved in | |||
using them in data centers. Furthermore, the maturity of these | using these protocols in data centers. Furthermore, the maturity of | |||
protocols means their behaviour and performance in operational | these protocols means their behaviour and performance in operational | |||
networks is well-understood, with widely available best-practices and | networks is well-understood, with widely available best-practices and | |||
deployment guides for optimizing their performance. | deployment guides for optimizing their performance. For these | |||
reasons, PIM and IGMP have been used successfully for supporting one- | ||||
to-many traffic flows within modern data centers, as discussed | ||||
earlier. | ||||
However, somewhat ironically, the relative disadvantages of PIM and | However, somewhat ironically, the relative disadvantages of PIM and | |||
IGMP usage in data centers also stem mostly from their maturity. | IGMP usage in data centers also stem mostly from their maturity. | |||
Specifically, these protocols were standardized and implemented long | Specifically, these protocols were standardized and implemented long | |||
before the highly-virtualized multi-tenant data centers of today | before the highly-virtualized multi-tenant data centers of today | |||
existed. Consequently, PIM and IGMP are neither optimally placed to | existed. Consequently, PIM and IGMP are neither optimally placed to | |||
deal with the requirements of one-to-many communication in modern | deal with the requirements of one-to-many communication in modern | |||
data centers nor to exploit characteristics and idiosyncrasies of | data centers nor to exploit idiosyncrasies of data centers. For | |||
data centers. For example, there may be thousands of VMs | example, there may be thousands of VMs participating in a multicast | |||
participating in a multicast session, with some of these VMs | session, with some of these VMs migrating to servers within the data | |||
migrating to servers within the data center, new VMs being | center, new VMs being continually spun up and wishing to join the | |||
continually spun up and wishing to join the sessions while all the | sessions while all the time other VMs are leaving. In such a | |||
time other VMs are leaving. In such a scenario, the churn in the PIM | scenario, the churn in the PIM and IGMP state machines, the volume of | |||
and IGMP state machines, the volume of control messages they would | control messages they would generate and the amount of state they | |||
generate and the amount of state they would necessitate within | would necessitate within routers, especially if they were deployed | |||
routers, especially if they were deployed naively, would be | naively, would be untenable. Furthermore, PIM is a relatively | |||
untenable. | complex protocol. As such, PIM can be challenging to debug even in | |||
significantly more benign deployments than those envisaged for future | ||||
data centers, a fact that has evidently had a dissuasive effect on | ||||
data center operators considering enabling it within the IP fabric. | ||||
4. Alternative options for handling one-to-many traffic | 4. Alternative options for handling one-to-many traffic | |||
Section 2 has shown that there is likely to be an increasing amount | Section 2 has shown that there is likely to be an increasing amount | |||
one-to-many communications in data centers. And Section 3 has | one-to-many communications in data centers for multiple reasons. And | |||
discussed how conventional multicast may be used to handle this | Section 3 has discussed how conventional multicast may be used to | |||
traffic. Having said that, there are a number of alternative options | handle this traffic, presenting some of the associated advantages and | |||
of handling this traffic pattern in data centers, as discussed in the | disadvantages. Unsurprisingly, as discussed in the remainder of | |||
subsequent section. It should be noted that many of these techniques | Section 4, there are a number of alternative options of handling this | |||
are not mutually-exclusive; in fact many deployments involve a | traffic pattern in data centers. Critically, it should be noted that | |||
combination of more than one of these techniques. Furthermore, as | many of these techniques are not mutually-exclusive; in fact many | |||
will be shown, introducing a centralized controller or a distributed | deployments involve a combination of more than one of these | |||
control plane, makes these techniques more potent. | techniques. Furthermore, as will be shown, introducing a centralized | |||
controller or a distributed control plane, typically makes these | ||||
techniques more potent. | ||||
4.1. Minimizing traffic volumes | 4.1. Minimizing traffic volumes | |||
If handling one-to-many traffic in data centers can be challenging | If handling one-to-many traffic flows in data centers is considered | |||
then arguably the most intuitive solution is to aim to minimize the | onerous, then arguably the most intuitive solution is to aim to | |||
volume of such traffic. | minimize the volume of said traffic. | |||
It was previously mentioned in Section 2 that the three main causes | It was previously mentioned in Section 2 that the three main | |||
of one-to-many traffic in data centers are applications, overlays and | contributors to one-to-many traffic in data centers are applications, | |||
protocols. While, relatively speaking, little can be done about the | overlays and protocols. Typically the applications running on VMs | |||
volume of one-to-many traffic generated by applications, there is | are outside the control of the data center operator and thus, | |||
more scope for attempting to reduce the volume of such traffic | relatively speaking, little can be done about the volume of one-to- | |||
generated by overlays and protocols. (And often by protocols within | many traffic generated by applications. Luckily, there is more scope | |||
overlays.) This reduction is possible by exploiting certain | for attempting to reduce the volume of such traffic generated by | |||
characteristics of data center networks: fixed and regular topology, | overlays and protocols. (And often by protocols within overlays.) | |||
single administrative control, consistent hardware and software, | This reduction is possible by exploiting certain characteristics of | |||
well-known overlay encapsulation endpoints and so on. | data center networks such as a fixed and regular topology, single | |||
administrative control, consistent hardware and software, well-known | ||||
overlay encapsulation endpoints and systematic IP address allocation. | ||||
A way of minimizing the amount of one-to-many traffic that traverses | A way of minimizing the amount of one-to-many traffic that traverses | |||
the data center fabric is to use a centralized controller. For | the data center fabric is to use a centralized controller. For | |||
example, whenever a new VM is instantiated, the hypervisor or | example, whenever a new VM is instantiated, the hypervisor or | |||
encapsulation endpoint can notify a centralized controller of this | encapsulation endpoint can notify a centralized controller of this | |||
new MAC address, the associated virtual network, IP address etc. The | new MAC address, the associated virtual network, IP address etc. The | |||
controller could subsequently distribute this information to every | controller could subsequently distribute this information to every | |||
encapsulation endpoint. Consequently, when any endpoint receives an | encapsulation endpoint. Consequently, when any endpoint receives an | |||
ARP request from a locally attached VM, it could simply consult its | ARP request from a locally attached VM, it could simply consult its | |||
local copy of the information distributed by the controller and | local copy of the information distributed by the controller and | |||
reply. Thus, the ARP request is suppressed and does not result in | reply. Thus, the ARP request is suppressed and does not result in | |||
one-to-many traffic traversing the data center IP fabric. | one-to-many traffic traversing the data center IP fabric. | |||
Alternatively, the functionality supported by the controller can | Alternatively, the functionality supported by the controller can | |||
realized by a distributed control plane. BGP-EVPN [RFC7432, RFC8365] | realized by a distributed control plane. BGP-EVPN [RFC7432, RFC8365] | |||
is the most popular control plane used in data centers. Typically, | is the most popular control plane used in data centers. Typically, | |||
the encapsulation endpoints will exchange pertinent information with | the encapsulation endpoints will exchange pertinent information with | |||
each other by all peering with a BGP route reflector (RR). Thus, | each other by all peering with a BGP route reflector (RR). Thus, | |||
information about local MAC addresses, MAC to IP address mapping, | information such as local MAC addresses, MAC to IP address mapping, | |||
virtual networks identifiers etc can be disseminated. Consequently, | virtual networks identifiers, IP prefixes, and local IGMP group | |||
ARP requests from local VMs can be suppressed by the encapsulation | membership can be disseminated. Consequently, for example, ARP | |||
endpoint. | requests from local VMs can be suppressed by the encapsulation | |||
endpoint using the information learnt from the control plane about | ||||
the MAC to IP mappings at remote peers. In a similar fashion, | ||||
encapsulation endpoints can use information gleaned from the BGP-EVPN | ||||
messages to proxy for both IGMP reports and queries for the attached | ||||
VMs, thus obviating the need to transmit IGMP messages across the | ||||
data center fabric. | ||||
4.2. Head end replication | 4.2. Head end replication | |||
A popular option for handling one-to-many traffic patterns in data | A popular option for handling one-to-many traffic patterns in data | |||
centers is head end replication (HER). HER means the traffic is | centers is head end replication (HER). HER means the traffic is | |||
duplicated and sent to each end point individually using conventional | duplicated and sent to each end point individually using conventional | |||
IP unicast. Obvious disadvantages of HER include traffic duplication | IP unicast. Obvious disadvantages of HER include traffic duplication | |||
and the additional processing burden on the head end. Nevertheless, | and the additional processing burden on the head end. Nevertheless, | |||
HER is especially attractive when overlays are in use as the | HER is especially attractive when overlays are in use as the | |||
replication can be carried out by the hypervisor or encapsulation end | replication can be carried out by the hypervisor or encapsulation end | |||
point. Consequently, the VMs and IP fabric are unmodified and | point. Consequently, the VMs and IP fabric are unmodified and | |||
unaware of how the traffic is delivered to the multiple end points. | unaware of how the traffic is delivered to the multiple end points. | |||
Additionally, it is possible to use a number of approaches for | Additionally, it is possible to use a number of approaches for | |||
constructing and disseminating the list of which endpoints should | constructing and disseminating the list of which endpoints should | |||
receive what traffic and so on. | receive what traffic and so on. | |||
For example, the reluctance of data center operators to enable PIM | For example, the reluctance of data center operators to enable PIM | |||
and IGMP within the data center fabric means VXLAN is often used with | within the data center fabric means VXLAN is often used with HER. | |||
HER. Thus, BUM traffic from each VNI is replicated and sent using | Thus, BUM traffic from each VNI is replicated and sent using unicast | |||
unicast to remote VTEPs with VMs in that VNI. The list of remote | to remote VTEPs with VMs in that VNI. The list of remote VTEPs to | |||
VTEPs to which the traffic should be sent may be configured manually | which the traffic should be sent may be configured manually on the | |||
on the VTEP. Alternatively, the VTEPs may transmit appropriate state | VTEP. Alternatively, the VTEPs may transmit pertinent local state to | |||
to a centralized controller which in turn sends each VTEP the list of | a centralized controller which in turn sends each VTEP the list of | |||
remote VTEPs for each VNI. Lastly, HER also works well when a | remote VTEPs for each VNI. Lastly, HER also works well when a | |||
distributed control plane is used instead of the centralized | distributed control plane is used instead of the centralized | |||
controller. Again, BGP-EVPN may be used to distribute the | controller. Again, BGP-EVPN may be used to distribute the | |||
information needed to faciliate HER to the VTEPs. | information needed to faciliate HER to the VTEPs. | |||
4.3. BIER | 4.3. Programmable Forwarding Planes | |||
As discussed in Section 2, one of the main functions of PIM is to | ||||
build and maintain multicast distribution trees. Such a tree | ||||
indicates the path a specific flow will take through the network. | ||||
Thus, in routers traversed by the flow, the information from PIM is | ||||
ultimately used to create a multicast forwarding entry for the | ||||
specific flow and insert it into the multicast forwarding table. The | ||||
multicast forwarding table will have entries for each multicast flow | ||||
traversing the router, with the lookup key usually being a | ||||
concantenation of the source and group addresses. Critically, each | ||||
entry will contain information such as the legal input interface for | ||||
the flow and a list of output interfaces to which matching packets | ||||
should be replicated. | ||||
Viewed in this way, there is nothing remarkable about the multicast | ||||
forwarding state constructed in routers based on the information | ||||
gleaned from PIM. And, in fact, it is perfectly feasible to build | ||||
such state in the absence of PIM. Such prospects have been | ||||
significantly enhanced with the increasing popularity and performance | ||||
of network devices with programmable forwarding planes. These | ||||
devices are attractive for use in data centers since they are | ||||
amenable to being programmed by a centralized controller. If such a | ||||
controller has a global view of the sources and receivers for each | ||||
multicast flow (which can be provided by the devices attached to the | ||||
end hosts in the data center communicating with the controller), an | ||||
accurate representation of data center topology (which is usually | ||||
well-known), then it can readily compute the multicast forwarding | ||||
state that must be installed at each router to ensure the one-to-many | ||||
traffic flow is delivered properly to the correct receivers. All | ||||
that is needed is an API to program the forwarding planes of all the | ||||
network devices that need to handle the flow appropriately. Such | ||||
APIs do in fact exist and so, unsurprisingly, handling one-to-many | ||||
traffic flows using such an approach is attractive for data centers. | ||||
Being able to program the forwarding plane in this manner offers the | ||||
enticing possibility of introducing novel algorithms and concepts for | ||||
forwarding multicast traffic in data centers. These schemes | ||||
typically aim to exploit the idiosyncracies of the data center | ||||
network architecture to create ingenious, pithy and elegant encodings | ||||
of the information needed to facilitate multicast forwarding. | ||||
Depending on the scheme, this information may be carried in packet | ||||
headers, stored in the multicast forwarding table in routers or a | ||||
combination of both. The key characterstic is that the terseness of | ||||
the forwarding information means the volume of forwarding state is | ||||
significantly reduced. Additionally, the overhead associated with | ||||
building and maintaining a multicast forwarding tree has been | ||||
eliminated. The result of these reductions in the overhead | ||||
associated with multicast forwarding is a significant and impressive | ||||
increase in the effective number of multicast flows that can be | ||||
supported within the data center. | ||||
[Shabaz19] is a good example of such an approach and also presents | ||||
comprehensive discussion of other schemes in the discussion on | ||||
releated work. Although a number of promising schemes have been | ||||
proposed, no consensus has yet emerged as to which approach is best, | ||||
and in fact what "best" means. Even if a clear winner were to | ||||
emerge, it faces significant challenges to gain the vendor and | ||||
operator buy-in to ensure it is widely deployed in data centers. | ||||
4.4. BIER | ||||
As discussed in Section 3.4, PIM and IGMP face potential scalability | As discussed in Section 3.4, PIM and IGMP face potential scalability | |||
challenges when deployed in data centers. These challenges are | challenges when deployed in data centers. These challenges are | |||
typically due to the requirement to build and maintain a distribution | typically due to the requirement to build and maintain a distribution | |||
tree and the requirement to hold per-flow state in routers. Bit | tree and the requirement to hold per-flow state in routers. Bit | |||
Index Explicit Replication (BIER) [RFC 8279] is a new multicast | Index Explicit Replication (BIER) [RFC 8279] is a new multicast | |||
forwarding paradigm that avoids these two requirements. | forwarding paradigm that avoids these two requirements. | |||
When a multicast packet enters a BIER domain, the ingress router, | When a multicast packet enters a BIER domain, the ingress router, | |||
known as the Bit-Forwarding Ingress Router (BFIR), adds a BIER header | known as the Bit-Forwarding Ingress Router (BFIR), adds a BIER header | |||
to the packet. This header contains a bit string in which each bit | to the packet. This header contains a bit string in which each bit | |||
maps to an egress router, known as Bit-Forwarding Egress Router | maps to an egress router, known as Bit-Forwarding Egress Router | |||
(BFER). If a bit is set, then the packet should be forwarded to the | (BFER). If a bit is set, then the packet should be forwarded to the | |||
associated BFER. The routers within the BIER domain, Bit-Forwarding | associated BFER. The routers within the BIER domain, Bit-Forwarding | |||
Routers (BFRs), use the BIER header in the packet and information in | Routers (BFRs), use the BIER header in the packet and information in | |||
the Bit Index Forwarding Table (BIFT) to carry out simple bit- wise | the Bit Index Forwarding Table (BIFT) to carry out simple bit- wise | |||
operations to determine how the packet should be replicated optimally | operations to determine how the packet should be replicated optimally | |||
so it reaches all the appropriate BFERs. | so it reaches all the appropriate BFERs. | |||
BIER is deemed to be attractive for facilitating one-to-many | BIER is deemed to be attractive for facilitating one-to-many | |||
communications in data ceneters [I-D.ietf-bier-use-cases]. The | communications in data centers [I-D.ietf-bier-use-cases]. The | |||
deployment envisioned with overlay networks is that the the | deployment envisioned with overlay networks is that the the | |||
encapsulation endpoints would be the BFIR. So knowledge about the | encapsulation endpoints would be the BFIR. So knowledge about the | |||
actual multicast groups does not reside in the data center fabric, | actual multicast groups does not reside in the data center fabric, | |||
improving the scalability compared to conventional IP multicast. | improving the scalability compared to conventional IP multicast. | |||
Additionally, a centralized controller or a BGP-EVPN control plane | Additionally, a centralized controller or a BGP-EVPN control plane | |||
may be used with BIER to ensure the BFIR have the required | may be used with BIER to ensure the BFIR have the required | |||
information. A challenge associated with using BIER is that, unlike | information. A challenge associated with using BIER is that it | |||
most of the other approaches discussed in this draft, it requires | requires changes to the forwarding behaviour of the routers used in | |||
changes to the forwarding behaviour of the routers used in the data | the data center IP fabric. | |||
center IP fabric. | ||||
4.4. Segment Routing | 4.5. Segment Routing | |||
Segment Routing (SR) [I-D.ietf-spring-segment-routing] adopts the the | Segment Routing (SR) [RFC8402] is a manifestation of the source | |||
source routing paradigm in which the manner in which a packet | routing paradigm, so called as the path a packet takes through a | |||
traverses a network is determined by an ordered list of instructions. | network is determined at the source. The source encodes this | |||
These instructions are known as segments may have a local semantic to | information in the packet header as a sequence of instructions. | |||
an SR node or global within an SR domain. SR allows enforcing a flow | These instructions are followed by intermediate routers, ultimately | |||
through any topological path while maintaining per-flow state only at | resulting in the delivery of the packet to the desired destination. | |||
the ingress node to the SR domain. Segment Routing can be applied to | In SR, the instructions are known as segments and a number of | |||
the MPLS and IPv6 data-planes. In the former, the list of segments | different kinds of segments have been defined. Each segment has an | |||
is represented by the label stack and in the latter it is represented | identifier (SID) which is distributed throughout the network by newly | |||
as a routing extension header. Use-cases are described in [I-D.ietf- | defined extensions to standard routing protocols. Thus, using this | |||
spring-segment-routing] and are being considered in the context of | information, sources are able to determine the exact sequence of | |||
BGP-based large-scale data-center (DC) design [RFC7938]. | segments to encode into the packet. The manner in which these | |||
instructions are encoded depends on the underlying data plane. | ||||
Segment Routing can be applied to the MPLS and IPv6 data planes. In | ||||
the former, the list of segments is represented by the label stack | ||||
and in the latter it is represented as an IPv6 routing extension | ||||
header. Advantages of segment routing include the reduction in the | ||||
amount of forwarding state routers need to hold and the removal of | ||||
the need to run a signaling protocol, thus improving the network | ||||
scalability while reducing the operational complexity. | ||||
Multicast in SR continues to be discussed in a variety of drafts and | The advantages of segment routing and the ability to run it over an | |||
working groups. The SPRING WG has not yet been chartered to work on | unmodified MPLS data plane means that one of its anticipated use | |||
Multicast in SR. Multicast can include locally allocating a Segment | cases is in BGP-based large-scale data centers [RFC7938]. The exact | |||
Identifier (SID) to existing replication solutions, such as PIM, | manner in which multicast traffic will be handled in SR has not yet | |||
mLDP, P2MP RSVP-TE and BIER. It may also be that a new way to signal | been standardized, with a number of different options being | |||
and install trees in SR is developed without creating state in the | considered. For example, since with the MPLS data plane, segments | |||
network. | are simply encoded as a label stack, then the protocols traditionally | |||
used to create point-to-multipoint LSPs could be reused to allow SR | ||||
to support one-to-many traffic flows. Alternatively, a special SID | ||||
may be defined for a multicast distribution tree, with a centralized | ||||
controller being used to program routers appropriately to ensure the | ||||
traffic is delivered to the desired destinations, while avoiding the | ||||
costly process of building and maintaining a multicast distribution | ||||
tree. | ||||
5. Conclusions | 5. Conclusions | |||
As the volume and importance of one-to-many traffic in data centers | As the volume and importance of one-to-many traffic in data centers | |||
increases, conventional IP multicast is likely to become increasingly | increases, conventional IP multicast is likely to become increasingly | |||
unattractive for deployment in data centers for a number of reasons, | unattractive for deployment in data centers for a number of reasons, | |||
mostly pertaining its inherent relatively poor scalability and | mostly pertaining its relatively poor scalability and inability to | |||
inability to exploit characteristics of data center network | exploit characteristics of data center network architectures. Hence, | |||
architectures. Hence, even though IGMP/MLD is likely to remain the | even though IGMP/MLD is likely to remain the most popular manner in | |||
most popular manner in which end hosts signal interest in joining a | which end hosts signal interest in joining a multicast group, it is | |||
multicast group, it is unlikely that this multicast traffic will be | unlikely that this multicast traffic will be transported over the | |||
transported over the data center IP fabric using a multicast | data center IP fabric using a multicast distribution tree built and | |||
distribution tree built by PIM. Rather, approaches which exploit | maintained by PIM in the future. Rather, approaches which exploit | |||
characteristics of data center network architectures (e.g. fixed and | idiosyncracies of data center network architectures are better placed | |||
regular topology, single administrative control, consistent hardware | to deliver one-to-many traffic in data centers, especially when | |||
and software, well-known overlay encapsulation endpoints etc.) are | judiciously combined with a centralized controller and/or a | |||
better placed to deliver one-to-many traffic in data centers, | distributed control plane, particularly one based on BGP-EVPN. | |||
especially when judiciously combined with a centralized controller | ||||
and/or a distributed control plane (particularly one based on BGP- | ||||
EVPN). | ||||
6. IANA Considerations | 6. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
7. Security Considerations | 7. Security Considerations | |||
No new security considerations result from this document | No new security considerations result from this document | |||
8. Acknowledgements | 8. Acknowledgements | |||
skipping to change at page 13, line 25 ¶ | skipping to change at page 16, line 10 ¶ | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
<https://www.rfc-editor.org/info/rfc2119>. | <https://www.rfc-editor.org/info/rfc2119>. | |||
9.2. Informative References | 9.2. Informative References | |||
[I-D.ietf-bier-use-cases] | [I-D.ietf-bier-use-cases] | |||
Kumar, N., Asati, R., Chen, M., Xu, X., Dolganow, A., | Kumar, N., Asati, R., Chen, M., Xu, X., Dolganow, A., | |||
Przygienda, T., Gulko, A., Robinson, D., Arya, V., and C. | Przygienda, T., Gulko, A., Robinson, D., Arya, V., and C. | |||
Bestler, "BIER Use Cases", draft-ietf-bier-use-cases-06 | Bestler, "BIER Use Cases", draft-ietf-bier-use-cases-09 | |||
(work in progress), January 2018. | (work in progress), January 2019. | |||
[I-D.ietf-nvo3-geneve] | [I-D.ietf-nvo3-geneve] | |||
Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic | Gross, J., Ganga, I., and T. Sridhar, "Geneve: Generic | |||
Network Virtualization Encapsulation", draft-ietf- | Network Virtualization Encapsulation", draft-ietf- | |||
nvo3-geneve-11 (work in progress), March 2019. | nvo3-geneve-13 (work in progress), March 2019. | |||
[I-D.ietf-nvo3-vxlan-gpe] | [I-D.ietf-nvo3-vxlan-gpe] | |||
Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol | Maino, F., Kreeger, L., and U. Elzur, "Generic Protocol | |||
Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-06 (work | Extension for VXLAN", draft-ietf-nvo3-vxlan-gpe-07 (work | |||
in progress), April 2018. | in progress), April 2019. | |||
[I-D.ietf-spring-segment-routing] | [RFC0826] Plummer, D., "An Ethernet Address Resolution Protocol: Or | |||
Filsfils, C., Previdi, S., Ginsberg, L., Decraene, B., | Converting Network Protocol Addresses to 48.bit Ethernet | |||
Litkowski, S., and R. Shakir, "Segment Routing | Address for Transmission on Ethernet Hardware", STD 37, | |||
Architecture", draft-ietf-spring-segment-routing-15 (work | RFC 826, DOI 10.17487/RFC0826, November 1982, | |||
in progress), January 2018. | <https://www.rfc-editor.org/info/rfc826>. | |||
[RFC2236] Fenner, W., "Internet Group Management Protocol, Version | [RFC2236] Fenner, W., "Internet Group Management Protocol, Version | |||
2", RFC 2236, DOI 10.17487/RFC2236, November 1997, | 2", RFC 2236, DOI 10.17487/RFC2236, November 1997, | |||
<https://www.rfc-editor.org/info/rfc2236>. | <https://www.rfc-editor.org/info/rfc2236>. | |||
[RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast | [RFC2710] Deering, S., Fenner, W., and B. Haberman, "Multicast | |||
Listener Discovery (MLD) for IPv6", RFC 2710, | Listener Discovery (MLD) for IPv6", RFC 2710, | |||
DOI 10.17487/RFC2710, October 1999, | DOI 10.17487/RFC2710, October 1999, | |||
<https://www.rfc-editor.org/info/rfc2710>. | <https://www.rfc-editor.org/info/rfc2710>. | |||
skipping to change at page 14, line 20 ¶ | skipping to change at page 17, line 5 ¶ | |||
[RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | [RFC4601] Fenner, B., Handley, M., Holbrook, H., and I. Kouvelas, | |||
"Protocol Independent Multicast - Sparse Mode (PIM-SM): | "Protocol Independent Multicast - Sparse Mode (PIM-SM): | |||
Protocol Specification (Revised)", RFC 4601, | Protocol Specification (Revised)", RFC 4601, | |||
DOI 10.17487/RFC4601, August 2006, | DOI 10.17487/RFC4601, August 2006, | |||
<https://www.rfc-editor.org/info/rfc4601>. | <https://www.rfc-editor.org/info/rfc4601>. | |||
[RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for | [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for | |||
IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, | IP", RFC 4607, DOI 10.17487/RFC4607, August 2006, | |||
<https://www.rfc-editor.org/info/rfc4607>. | <https://www.rfc-editor.org/info/rfc4607>. | |||
[RFC4861] Narten, T., Nordmark, E., Simpson, W., and H. Soliman, | ||||
"Neighbor Discovery for IP version 6 (IPv6)", RFC 4861, | ||||
DOI 10.17487/RFC4861, September 2007, | ||||
<https://www.rfc-editor.org/info/rfc4861>. | ||||
[RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, | [RFC5015] Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano, | |||
"Bidirectional Protocol Independent Multicast (BIDIR- | "Bidirectional Protocol Independent Multicast (BIDIR- | |||
PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, | PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007, | |||
<https://www.rfc-editor.org/info/rfc5015>. | <https://www.rfc-editor.org/info/rfc5015>. | |||
[RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution | [RFC6820] Narten, T., Karir, M., and I. Foo, "Address Resolution | |||
Problems in Large Data Center Networks", RFC 6820, | Problems in Large Data Center Networks", RFC 6820, | |||
DOI 10.17487/RFC6820, January 2013, | DOI 10.17487/RFC6820, January 2013, | |||
<https://www.rfc-editor.org/info/rfc6820>. | <https://www.rfc-editor.org/info/rfc6820>. | |||
skipping to change at page 15, line 23 ¶ | skipping to change at page 18, line 11 ¶ | |||
Explicit Replication (BIER)", RFC 8279, | Explicit Replication (BIER)", RFC 8279, | |||
DOI 10.17487/RFC8279, November 2017, | DOI 10.17487/RFC8279, November 2017, | |||
<https://www.rfc-editor.org/info/rfc8279>. | <https://www.rfc-editor.org/info/rfc8279>. | |||
[RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | [RFC8365] Sajassi, A., Ed., Drake, J., Ed., Bitar, N., Shekhar, R., | |||
Uttaro, J., and W. Henderickx, "A Network Virtualization | Uttaro, J., and W. Henderickx, "A Network Virtualization | |||
Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | Overlay Solution Using Ethernet VPN (EVPN)", RFC 8365, | |||
DOI 10.17487/RFC8365, March 2018, | DOI 10.17487/RFC8365, March 2018, | |||
<https://www.rfc-editor.org/info/rfc8365>. | <https://www.rfc-editor.org/info/rfc8365>. | |||
[RFC8402] Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., | ||||
Decraene, B., Litkowski, S., and R. Shakir, "Segment | ||||
Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, | ||||
July 2018, <https://www.rfc-editor.org/info/rfc8402>. | ||||
[Shabaz19] | ||||
Shabaz, M., Suresh, L., Rexford, J., Feamster, N., | ||||
Rottenstreich, O., and M. Hira, "Elmo: Source Routed | ||||
Multicast for Public Clouds", ACM SIGCOMM 2019 Conference | ||||
(SIGCOMM '19) ACM, DOI 10.1145/3341302.3342066, August | ||||
2019. | ||||
[SMPTE2110] | ||||
SMTPE, Society of Motion Picture and Television Engineers, | ||||
"SMPTE2110 Standards Suite", | ||||
<http://www.smpte.org/st-2110>. | ||||
Authors' Addresses | Authors' Addresses | |||
Mike McBride | Mike McBride | |||
Futurewei | Futurewei | |||
Email: michael.mcbride@futurewei.com | Email: michael.mcbride@futurewei.com | |||
Olufemi Komolafe | Olufemi Komolafe | |||
Arista Networks | Arista Networks | |||
End of changes. 40 change blocks. | ||||
223 lines changed or deleted | 375 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |