--- 1/draft-ietf-v6ops-pmtud-ecmp-problem-04.txt 2015-10-18 19:15:12.611639947 -0700 +++ 2/draft-ietf-v6ops-pmtud-ecmp-problem-05.txt 2015-10-18 19:15:12.635640531 -0700 @@ -1,21 +1,21 @@ v6ops M. Byerly Internet-Draft Fastly Intended status: Informational M. Hite -Expires: March 1, 2016 Evernote +Expires: April 20, 2016 Evernote J. Jaeggli Fastly - August 29, 2015 + October 18, 2015 Close encounters of the ICMP type 2 kind (near misses with ICMPv6 PTB) - draft-ietf-v6ops-pmtud-ecmp-problem-04 + draft-ietf-v6ops-pmtud-ecmp-problem-05 Abstract This document calls attention to the problem of delivering ICMPv6 type 2 "Packet Too Big" (PTB) messages to the intended destination (typically the server) in ECMP load balanced or anycast network architectures. It discusses operational mitigations that can be employed to address this class of failures. Status of This Memo @@ -26,21 +26,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on March 1, 2016. + This Internet-Draft will expire on April 20, 2016. Copyright Notice Copyright (c) 2015 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -69,39 +69,40 @@ Operators of popular Internet services face complex challenges associated with scaling their infrastructure. One scaling approach is to utilize equal-cost multi-path (ECMP) routing to perform stateless distribution of incoming TCP or UDP sessions to multiple servers or to middle boxes such as load balancers. Distribution of traffic in this manner presents a problem when dealing with ICMP signaling. Specifically, an ICMP error is not guaranteed to hash via ECMP to the same destination as its corresponding TCP or UDP session. A case where this is particularly problematic operationally is path - MTU discovery (PMTUD). + MTU discovery RFC 1981 PMTUD [RFC1981]. 2. Problem A common application for stateless load balancing of TCP or UDP flows is to perform an initial subdivision of flows in front of a stateful load balancer tier or multiple servers so that the workload becomes divided into manageable fractions of the total number of flows. The flow division is performed using ECMP forwarding and a stateless but - sticky algorithm for hashing across the available paths. This - nexthop selection for the purposes of flow distribution is a - constrained form of anycast topology, where all anycast destinations - are equidistant from the upstream router responsible for making the - last next-hop forwarding decision before the flow arrives on the - destination device. In this approach, the hash is performed across - some set of available protocol headers. Typically, these headers may - include all or a subset of (IPv6) Flow-Label, IP-source, IP- - destination, protocol, source-port, destination-port and potentially - others such as ingress interface. + sticky algorithm for hashing across the available paths (see RFC 2991 + [RFC2991] for background on ECMP routing). This nexthop selection + for the purposes of flow distribution is a constrained form of + anycast topology, where all anycast destinations are equidistant from + the upstream router responsible for making the last next-hop + forwarding decision before the flow arrives on the destination + device. In this approach, the hash is performed across some set of + available protocol headers. Typically, these headers may include all + or a subset of (IPv6) Flow-Label, IP-source, IP-destination, + protocol, source-port, destination-port and potentially others such + as ingress interface. A problem common to this approach of distribution through hashing is impact on path MTU discovery. An ICMPv6 type 2 PTB message generated on an intermediate device for a packet sent from a server that is part of an ECMP load balanced service to a client will have the load balanced anycast address as the destination and hence will be statelessly load balanced to one of the servers. While the ICMPv6 PTB message contains as much of the packet that could not be forwarded as possible, the payload headers are not considered in the forwarding decision and are ignored. Because the PTB message is not @@ -157,30 +158,30 @@ (for example, endpoint VPN clients set the tunnel interface MTU accordingly to avoid fragmentation for performance reasons) makes the problem sufficiently rare that some existing deployments have choosen to ignore it. 3. Mitigation Mitigation of the potential for PTB messages to be mis-delivered involves ensuring that an ICMPv6 error message is distributed to the same anycast server responsible for the flow for which the error is - generated. Ideally, mitigation could be done by the mechanism hosts - use to identify the flow, by looking into the payload of the ICMPv6 - message (to determine which TCP flow it was associated with) before - making a forwarding decision. Because the encapsulated IP header - occurs at a fixed offset in the ICMP message it is not outside the - realm of possibility that routers with sufficient header processing - capability could parse that far into the payload. Employing a - mediation device that handles the parsing and distribution of PTB - messages after policy routing or on each load-balancer/server is a - possibility. + generated. With apppropiate hardware support, mitigation could be + done by the mechanism hosts use to identify the flow; by looking into + the payload of the ICMPv6 message (to determine which TCP flow it was + associated with) before making a forwarding decision. Because the + encapsulated IP header occurs at a fixed offset in the ICMP message + it is not outside the realm of possibility that routers with + sufficient header processing capability could parse that far into the + payload. Employing a mediation device that handles the parsing and + distribution of PTB messages after policy routing or on each load- + balancer/server is a possibility. Another mitigation approach is predicated upon distributing the PTB message to all anycast servers under the assumption that the one for which the message was intended will be able to match it to the flow and update the route cache with the new MTU and that devices not able to match the flow will discard these packets. Such distribution has potentially significant implications for resource consumption and for self-inflicted denial-of-service if not carefully employed. Fortunately, in real-world deployments we have observed that the number of flows for which this problem occurs is relatively small @@ -314,31 +315,42 @@ 6. IANA Considerations This memo includes no request to IANA. 7. Security Considerations The employed mitigation has the potential to greatly amplify the impact of a deliberately malicious sending of ICMPv6 PTB messages. Sensible ingress rate limiting can reduce the potential for impact; - however, legitimate traffic may be lost once the rate limit is - reached. + however, legitimate PMTUD messages may be lost once the rate limit is + reached; analogous to other cases where DOS traffic can crowd out + legitimate traffic. - The proxy replication results in devices not associated with the flow - that generated the PTB being recipients of an ICMPv6 message which - contains a fragment of a packet. This could arguably result in - information disclosure. Recipient machines should be in a common - administrative domain. + The proxy replication results in devices on the subnet not associated + with the flow that generated the PTB, being recipients of the ICMPv6 + PTB message; which contains a large fragment of the packet that + exceeded the allowable MTU. This replication of the packet freagment + could arguably result in information disclosure. Recipient machines + should be in a common administrative domain. 8. Informative References + [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery + for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August + 1996, . + + [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and + Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/ + RFC2991, November 2000, + . + [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, . Authors' Addresses Matt Byerly Fastly Kapolei, HI US