draft-ietf-v6ops-pmtud-ecmp-problem-04.txt | draft-ietf-v6ops-pmtud-ecmp-problem-05.txt | |||
---|---|---|---|---|
v6ops M. Byerly | v6ops M. Byerly | |||
Internet-Draft Fastly | Internet-Draft Fastly | |||
Intended status: Informational M. Hite | Intended status: Informational M. Hite | |||
Expires: March 1, 2016 Evernote | Expires: April 20, 2016 Evernote | |||
J. Jaeggli | J. Jaeggli | |||
Fastly | Fastly | |||
August 29, 2015 | October 18, 2015 | |||
Close encounters of the ICMP type 2 kind (near misses with ICMPv6 PTB) | Close encounters of the ICMP type 2 kind (near misses with ICMPv6 PTB) | |||
draft-ietf-v6ops-pmtud-ecmp-problem-04 | draft-ietf-v6ops-pmtud-ecmp-problem-05 | |||
Abstract | Abstract | |||
This document calls attention to the problem of delivering ICMPv6 | This document calls attention to the problem of delivering ICMPv6 | |||
type 2 "Packet Too Big" (PTB) messages to the intended destination | type 2 "Packet Too Big" (PTB) messages to the intended destination | |||
(typically the server) in ECMP load balanced or anycast network | (typically the server) in ECMP load balanced or anycast network | |||
architectures. It discusses operational mitigations that can be | architectures. It discusses operational mitigations that can be | |||
employed to address this class of failures. | employed to address this class of failures. | |||
Status of This Memo | Status of This Memo | |||
skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on March 1, 2016. | This Internet-Draft will expire on April 20, 2016. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2015 IETF Trust and the persons identified as the | Copyright (c) 2015 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 34 | skipping to change at page 2, line 34 | |||
Operators of popular Internet services face complex challenges | Operators of popular Internet services face complex challenges | |||
associated with scaling their infrastructure. One scaling approach | associated with scaling their infrastructure. One scaling approach | |||
is to utilize equal-cost multi-path (ECMP) routing to perform | is to utilize equal-cost multi-path (ECMP) routing to perform | |||
stateless distribution of incoming TCP or UDP sessions to multiple | stateless distribution of incoming TCP or UDP sessions to multiple | |||
servers or to middle boxes such as load balancers. Distribution of | servers or to middle boxes such as load balancers. Distribution of | |||
traffic in this manner presents a problem when dealing with ICMP | traffic in this manner presents a problem when dealing with ICMP | |||
signaling. Specifically, an ICMP error is not guaranteed to hash via | signaling. Specifically, an ICMP error is not guaranteed to hash via | |||
ECMP to the same destination as its corresponding TCP or UDP session. | ECMP to the same destination as its corresponding TCP or UDP session. | |||
A case where this is particularly problematic operationally is path | A case where this is particularly problematic operationally is path | |||
MTU discovery (PMTUD). | MTU discovery RFC 1981 PMTUD [RFC1981]. | |||
2. Problem | 2. Problem | |||
A common application for stateless load balancing of TCP or UDP flows | A common application for stateless load balancing of TCP or UDP flows | |||
is to perform an initial subdivision of flows in front of a stateful | is to perform an initial subdivision of flows in front of a stateful | |||
load balancer tier or multiple servers so that the workload becomes | load balancer tier or multiple servers so that the workload becomes | |||
divided into manageable fractions of the total number of flows. The | divided into manageable fractions of the total number of flows. The | |||
flow division is performed using ECMP forwarding and a stateless but | flow division is performed using ECMP forwarding and a stateless but | |||
sticky algorithm for hashing across the available paths. This | sticky algorithm for hashing across the available paths (see RFC 2991 | |||
nexthop selection for the purposes of flow distribution is a | [RFC2991] for background on ECMP routing). This nexthop selection | |||
constrained form of anycast topology, where all anycast destinations | for the purposes of flow distribution is a constrained form of | |||
are equidistant from the upstream router responsible for making the | anycast topology, where all anycast destinations are equidistant from | |||
last next-hop forwarding decision before the flow arrives on the | the upstream router responsible for making the last next-hop | |||
destination device. In this approach, the hash is performed across | forwarding decision before the flow arrives on the destination | |||
some set of available protocol headers. Typically, these headers may | device. In this approach, the hash is performed across some set of | |||
include all or a subset of (IPv6) Flow-Label, IP-source, IP- | available protocol headers. Typically, these headers may include all | |||
destination, protocol, source-port, destination-port and potentially | or a subset of (IPv6) Flow-Label, IP-source, IP-destination, | |||
others such as ingress interface. | protocol, source-port, destination-port and potentially others such | |||
as ingress interface. | ||||
A problem common to this approach of distribution through hashing is | A problem common to this approach of distribution through hashing is | |||
impact on path MTU discovery. An ICMPv6 type 2 PTB message generated | impact on path MTU discovery. An ICMPv6 type 2 PTB message generated | |||
on an intermediate device for a packet sent from a server that is | on an intermediate device for a packet sent from a server that is | |||
part of an ECMP load balanced service to a client will have the load | part of an ECMP load balanced service to a client will have the load | |||
balanced anycast address as the destination and hence will be | balanced anycast address as the destination and hence will be | |||
statelessly load balanced to one of the servers. While the ICMPv6 | statelessly load balanced to one of the servers. While the ICMPv6 | |||
PTB message contains as much of the packet that could not be | PTB message contains as much of the packet that could not be | |||
forwarded as possible, the payload headers are not considered in the | forwarded as possible, the payload headers are not considered in the | |||
forwarding decision and are ignored. Because the PTB message is not | forwarding decision and are ignored. Because the PTB message is not | |||
skipping to change at page 4, line 29 | skipping to change at page 4, line 29 | |||
(for example, endpoint VPN clients set the tunnel interface MTU | (for example, endpoint VPN clients set the tunnel interface MTU | |||
accordingly to avoid fragmentation for performance reasons) makes the | accordingly to avoid fragmentation for performance reasons) makes the | |||
problem sufficiently rare that some existing deployments have choosen | problem sufficiently rare that some existing deployments have choosen | |||
to ignore it. | to ignore it. | |||
3. Mitigation | 3. Mitigation | |||
Mitigation of the potential for PTB messages to be mis-delivered | Mitigation of the potential for PTB messages to be mis-delivered | |||
involves ensuring that an ICMPv6 error message is distributed to the | involves ensuring that an ICMPv6 error message is distributed to the | |||
same anycast server responsible for the flow for which the error is | same anycast server responsible for the flow for which the error is | |||
generated. Ideally, mitigation could be done by the mechanism hosts | generated. With apppropiate hardware support, mitigation could be | |||
use to identify the flow, by looking into the payload of the ICMPv6 | done by the mechanism hosts use to identify the flow; by looking into | |||
message (to determine which TCP flow it was associated with) before | the payload of the ICMPv6 message (to determine which TCP flow it was | |||
making a forwarding decision. Because the encapsulated IP header | associated with) before making a forwarding decision. Because the | |||
occurs at a fixed offset in the ICMP message it is not outside the | encapsulated IP header occurs at a fixed offset in the ICMP message | |||
realm of possibility that routers with sufficient header processing | it is not outside the realm of possibility that routers with | |||
capability could parse that far into the payload. Employing a | sufficient header processing capability could parse that far into the | |||
mediation device that handles the parsing and distribution of PTB | payload. Employing a mediation device that handles the parsing and | |||
messages after policy routing or on each load-balancer/server is a | distribution of PTB messages after policy routing or on each load- | |||
possibility. | balancer/server is a possibility. | |||
Another mitigation approach is predicated upon distributing the PTB | Another mitigation approach is predicated upon distributing the PTB | |||
message to all anycast servers under the assumption that the one for | message to all anycast servers under the assumption that the one for | |||
which the message was intended will be able to match it to the flow | which the message was intended will be able to match it to the flow | |||
and update the route cache with the new MTU and that devices not able | and update the route cache with the new MTU and that devices not able | |||
to match the flow will discard these packets. Such distribution has | to match the flow will discard these packets. Such distribution has | |||
potentially significant implications for resource consumption and for | potentially significant implications for resource consumption and for | |||
self-inflicted denial-of-service if not carefully employed. | self-inflicted denial-of-service if not carefully employed. | |||
Fortunately, in real-world deployments we have observed that the | Fortunately, in real-world deployments we have observed that the | |||
number of flows for which this problem occurs is relatively small | number of flows for which this problem occurs is relatively small | |||
skipping to change at page 8, line 4 | skipping to change at page 8, line 4 | |||
6. IANA Considerations | 6. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
7. Security Considerations | 7. Security Considerations | |||
The employed mitigation has the potential to greatly amplify the | The employed mitigation has the potential to greatly amplify the | |||
impact of a deliberately malicious sending of ICMPv6 PTB messages. | impact of a deliberately malicious sending of ICMPv6 PTB messages. | |||
Sensible ingress rate limiting can reduce the potential for impact; | Sensible ingress rate limiting can reduce the potential for impact; | |||
however, legitimate traffic may be lost once the rate limit is | however, legitimate PMTUD messages may be lost once the rate limit is | |||
reached. | reached; analogous to other cases where DOS traffic can crowd out | |||
legitimate traffic. | ||||
The proxy replication results in devices not associated with the flow | The proxy replication results in devices on the subnet not associated | |||
that generated the PTB being recipients of an ICMPv6 message which | with the flow that generated the PTB, being recipients of the ICMPv6 | |||
contains a fragment of a packet. This could arguably result in | PTB message; which contains a large fragment of the packet that | |||
information disclosure. Recipient machines should be in a common | exceeded the allowable MTU. This replication of the packet freagment | |||
administrative domain. | could arguably result in information disclosure. Recipient machines | |||
should be in a common administrative domain. | ||||
8. Informative References | 8. Informative References | |||
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery | ||||
for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August | ||||
1996, <http://www.rfc-editor.org/info/rfc1981>. | ||||
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and | ||||
Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/ | ||||
RFC2991, November 2000, | ||||
<http://www.rfc-editor.org/info/rfc2991>. | ||||
[RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | |||
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | |||
<http://www.rfc-editor.org/info/rfc4821>. | <http://www.rfc-editor.org/info/rfc4821>. | |||
Authors' Addresses | Authors' Addresses | |||
Matt Byerly | Matt Byerly | |||
Fastly | Fastly | |||
Kapolei, HI | Kapolei, HI | |||
US | US | |||
End of changes. 10 change blocks. | ||||
32 lines changed or deleted | 44 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |