draft-ietf-v6ops-pmtud-ecmp-problem-06.txt | rfc7690.txt | |||
---|---|---|---|---|
v6ops M. Byerly | Internet Engineering Task Force (IETF) M. Byerly | |||
Internet-Draft Fastly | Request for Comments: 7690 Fastly | |||
Intended status: Informational M. Hite | Category: Informational M. Hite | |||
Expires: April 20, 2016 Evernote | ISSN: 2070-1721 Evernote | |||
J. Jaeggli | J. Jaeggli | |||
Fastly | Fastly | |||
October 18, 2015 | January 2016 | |||
Close encounters of the ICMP type 2 kind (near misses with ICMPv6 PTB) | Close Encounters of the ICMP Type 2 Kind | |||
draft-ietf-v6ops-pmtud-ecmp-problem-06 | (Near Misses with ICMPv6 Packet Too Big (PTB)) | |||
Abstract | Abstract | |||
This document calls attention to the problem of delivering ICMPv6 | This document calls attention to the problem of delivering ICMPv6 | |||
type 2 "Packet Too Big" (PTB) messages to the intended destination | type 2 "Packet Too Big" (PTB) messages to the intended destination | |||
(typically the server) in ECMP load balanced or anycast network | (typically the server) in ECMP load-balanced or anycast network | |||
architectures. It discusses operational mitigations that can be | architectures. It discusses operational mitigations that can be | |||
employed to address this class of failures. | employed to address this class of failures. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This document is not an Internet Standards Track specification; it is | |||
provisions of BCP 78 and BCP 79. | published for informational purposes. | |||
Internet-Drafts are working documents of the Internet Engineering | ||||
Task Force (IETF). Note that other groups may also distribute | ||||
working documents as Internet-Drafts. The list of current Internet- | ||||
Drafts is at http://datatracker.ietf.org/drafts/current/. | ||||
Internet-Drafts are draft documents valid for a maximum of six months | This document is a product of the Internet Engineering Task Force | |||
and may be updated, replaced, or obsoleted by other documents at any | (IETF). It represents the consensus of the IETF community. It has | |||
time. It is inappropriate to use Internet-Drafts as reference | received public review and has been approved for publication by the | |||
material or to cite them other than as "work in progress." | Internet Engineering Steering Group (IESG). Not all documents | |||
approved by the IESG are a candidate for any level of Internet | ||||
Standard; see Section 2 of RFC 5741. | ||||
This Internet-Draft will expire on April 20, 2016. | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | ||||
http://www.rfc-editor.org/info/rfc7690. | ||||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2015 IETF Trust and the persons identified as the | Copyright (c) 2016 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 | 2. Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 3. Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3.1. Alternative Mitigations . . . . . . . . . . . . . . . . . 5 | 3.1. Alternative Mitigations . . . . . . . . . . . . . . . . . 5 | |||
3.2. Implementation . . . . . . . . . . . . . . . . . . . . . 5 | 3.2. Implementation . . . . . . . . . . . . . . . . . . . . . 5 | |||
3.2.1. Alternative Implementation . . . . . . . . . . . . . 6 | 3.2.1. Alternative Implementation . . . . . . . . . . . . . 6 | |||
4. Improvements . . . . . . . . . . . . . . . . . . . . . . . . 7 | 4. Improvements . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 7 | 5. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 | 6. Informative References . . . . . . . . . . . . . . . . . . . 8 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . 7 | Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
8. Informative References . . . . . . . . . . . . . . . . . . . 8 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 8 | ||||
1. Introduction | 1. Introduction | |||
Operators of popular Internet services face complex challenges | Operators of popular Internet services face complex challenges | |||
associated with scaling their infrastructure. One scaling approach | associated with scaling their infrastructure. One scaling approach | |||
is to utilize equal-cost multi-path (ECMP) routing to perform | is to utilize equal-cost multipath (ECMP) routing to perform | |||
stateless distribution of incoming TCP or UDP sessions to multiple | stateless distribution of incoming TCP or UDP sessions to multiple | |||
servers or to middle boxes such as load balancers. Distribution of | servers or to middle boxes such as load balancers. Distribution of | |||
traffic in this manner presents a problem when dealing with ICMP | traffic in this manner presents a problem when dealing with ICMP | |||
signaling. Specifically, an ICMP error is not guaranteed to hash via | signaling. Specifically, an ICMP error is not guaranteed to hash via | |||
ECMP to the same destination as its corresponding TCP or UDP session. | ECMP to the same destination as its corresponding TCP or UDP session. | |||
A case where this is particularly problematic operationally is path | A case where this is particularly problematic operationally is path | |||
MTU discovery [RFC1981]. | MTU discovery (PMTUD) [RFC1981]. | |||
2. Problem | 2. Problem | |||
A common application for stateless load balancing of TCP or UDP flows | A common application for stateless load balancing of TCP or UDP flows | |||
is to perform an initial subdivision of flows in front of a stateful | is to perform an initial subdivision of flows in front of a stateful | |||
load balancer tier or multiple servers so that the workload becomes | load-balancer tier or multiple servers so that the workload becomes | |||
divided into manageable fractions of the total number of flows. The | divided into manageable fractions of the total number of flows. The | |||
flow division is performed using ECMP forwarding and a stateless but | flow division is performed using ECMP forwarding and a stateless but | |||
sticky algorithm for hashing across the available paths (see | sticky algorithm for hashing across the available paths (see | |||
[RFC2991] for background on ECMP routing). This nexthop selection | [RFC2991] for background on ECMP routing). For the purposes of flow | |||
for the purposes of flow distribution is a constrained form of | distribution, this next-hop selection is a constrained form of | |||
anycast topology, where all anycast destinations are equidistant from | anycast topology, where all anycast destinations are equidistant from | |||
the upstream router responsible for making the last next-hop | the upstream router responsible for making the last next-hop | |||
forwarding decision before the flow arrives on the destination | forwarding decision before the flow arrives on the destination | |||
device. In this approach, the hash is performed across some set of | device. In this approach, the hash is performed across some set of | |||
available protocol headers. Typically, these headers may include all | available protocol headers. Typically, these headers may include all | |||
or a subset of (IPv6) Flow-Label, IP-source, IP-destination, | or a subset of (IPv6) Flow-Label, IP-source, IP-destination, | |||
protocol, source-port, destination-port and potentially others such | protocol, source-port, destination-port, and potentially others such | |||
as ingress interface. | as ingress interface. | |||
A problem common to this approach of distribution through hashing is | A problem common to this approach of distribution through hashing is | |||
impact on path MTU discovery. An ICMPv6 type 2 PTB message generated | impact on path MTU discovery. An ICMPv6 type 2 PTB message generated | |||
on an intermediate device for a packet sent from a server that is | on an intermediate device for a packet sent from a server that is | |||
part of an ECMP load balanced service to a client will have the load | part of an ECMP load-balanced service to a client will have the load- | |||
balanced anycast address as the destination and hence will be | balanced anycast address as the destination and hence will be | |||
statelessly load balanced to one of the servers. While the ICMPv6 | statelessly load balanced to one of the servers. While the ICMPv6 | |||
PTB message contains as much of the packet that could not be | PTB message contains as much of the packet that could not be | |||
forwarded as possible, the payload headers are not considered in the | forwarded as possible, the payload headers are not considered in the | |||
forwarding decision and are ignored. Because the PTB message is not | forwarding decision and are ignored. Because the PTB message is not | |||
identifiable as part of the original flow by the IP or upper layer | identifiable as part of the original flow by the IP or upper-layer | |||
packet headers, the results of the ICMPv6 ECMP hash calculation are | packet headers, the results of the ICMPv6 ECMP hash calculation are | |||
unlikely to be hashed to the same nexthop as packets matching the TCP | unlikely to be hashed to the same next hop as packets matching the | |||
or UDP ECMP hash of the flow. | TCP or UDP ECMP hash of the flow. | |||
An example packet flow and topology follow. The packet for which the | An example packet flow and topology follow. The packet for which the | |||
PTB message was generated was intended for the client. | PTB message was generated was intended for the client. | |||
ptb -> router ecmp -> nexthop L4/L7 load balancer -> destination | ptb -> router ecmp -> next hop L4/L7 load balancer -> destination | |||
router --> load balancer 1 ---> | router --> load balancer 1 ---> | |||
\\--> load balancer 2 ---> load-balanced service | \\--> load balancer 2 ---> load-balanced service | |||
\--> load balancer N ---> | \--> load balancer N ---> | |||
Figure 1 | Figure 1 | |||
The router ECMP decision is used because it is part of the forwarding | The router ECMP decision is used because it is part of the forwarding | |||
architecture, can be performed at line rate, and does not depend on | architecture, can be performed at line rate, and does not depend on | |||
shared state or coordination across a distributed forwarding system | shared state or coordination across a distributed forwarding system | |||
which may include multiple linecards or routers. The ECMP routing | that may include multiple linecards or routers. The ECMP routing | |||
decision is deterministic with respect to packets having the same | decision is deterministic with respect to packets having the same | |||
computed hash. | computed hash. | |||
A typical case where ICMPv6 PTB messages are received at the load | A typical case in which ICMPv6 PTB messages are received at the load | |||
balancer is a case where the path MTU from the client to the load | balancer is where the path MTU from the client to the load balancer | |||
balancer is limited by a tunnel in which the client itself is not | is limited by a tunnel of which the client itself is not aware. | |||
aware of. | ||||
Direct experience says that the frequency of PTB messages is small | Direct experience says that the frequency of PTB messages is small | |||
compared to total flows. One possible conclusion being that tunneled | compared to total flows. One possible conclusion is that tunneled | |||
IPv6 deployments that cannot carry 1500 MTU packets are relatively | IPv6 deployments that cannot carry 1500 MTU packets are relatively | |||
rare. Techniques employed by clients such as happy-eyeballs may | rare. Techniques employed by clients (e.g., Happy Eyeballs | |||
actually contribute some amelioration to the IPv6 client experience | [RFC6555]) may actually contribute some amelioration to the IPv6 | |||
by preferring IPv4 in cases that might be identified as failures. | client experience by preferring IPv4 in cases that might be | |||
identified as failures. Still, the expectation of operators is that | ||||
Still, the expectation of operators is that PMTUD should work and | PMTUD should work and that unnecessary breakage of client traffic | |||
that unnecessary breakage of client traffic should be avoided. | should be avoided. | |||
A final observation regarding server tuning is that it is not always | A final observation regarding server tuning is that it is not always | |||
possible even if it is potentially desirable to be able to | possible, even if it is potentially desirable to be able to | |||
independently set the TCP MSS for different address families on some | independently set the TCP MSS (Maximum Segment Size) for different | |||
end-systems. On Linux platforms, advmss may be set on a per route | address families on some end systems. On Linux platforms, advmss | |||
basis for selected destinations in cases where discrimination by | (advertised mss) may be set on a per-route basis for selected | |||
route is possible. | destinations in cases where discrimination by route is possible. | |||
The problem as described does also impact IPv4; however | The problem as described does also impact IPv4; however, | |||
implementation of RFC 4821 [RFC4821] TCP MTU probing, the ability to | implementation of RFC 4821 [RFC4821] TCP MTU probing, the ability to | |||
fragment on wire at tunnel ingress points and the relative rarity of | fragment on the wire at tunnel ingress points, and the relative | |||
sub-1500 byte MTUs that are not coupled to changes in client behavior | rarity of sub-1500-byte MTUs that are not coupled to changes in | |||
(for example, endpoint VPN clients set the tunnel interface MTU | client behavior (for example, endpoint VPN clients set the tunnel | |||
accordingly to avoid fragmentation for performance reasons) makes the | interface MTU accordingly to avoid fragmentation for performance | |||
problem sufficiently rare that some existing deployments have choosen | reasons) makes the problem sufficiently rare that some existing | |||
to ignore it. | deployments have chosen to ignore it. | |||
3. Mitigation | 3. Mitigation | |||
Mitigation of the potential for PTB messages to be mis-delivered | Mitigation of the potential for PTB messages to be misdelivered | |||
involves ensuring that an ICMPv6 error message is distributed to the | involves ensuring that an ICMPv6 error message is distributed to the | |||
same anycast server responsible for the flow for which the error is | same anycast server responsible for the flow for which the error is | |||
generated. With apppropiate hardware support, mitigation could be | generated. With appropriate hardware support, flows could be | |||
done by the mechanism hosts use to identify the flow; by looking into | identified using the same technique as hosts by inspecting the | |||
the payload of the ICMPv6 message (to determine which TCP flow it was | payload of the ICMPv6 message. The ECMP hash calculation can then be | |||
associated with) before making a forwarding decision. Because the | performed using values identified from the inner TCP flow parameters | |||
encapsulated IP header occurs at a fixed offset in the ICMP message | of the ICMPv6 message. Because the encapsulated IP header occurs at | |||
it is not outside the realm of possibility that routers with | a fixed offset in the ICMP message, it is not outside the realm of | |||
sufficient header processing capability could parse that far into the | possibility that routers with sufficient header processing capability | |||
payload. Employing a mediation device that handles the parsing and | could parse that far into the payload. Employing a mediation device | |||
distribution of PTB messages after policy routing or on each load- | that handles the parsing and distribution of PTB messages after | |||
balancer/server is a possibility. | policy routing or on each load balancer / server is a possibility. | |||
Another mitigation approach is predicated upon distributing the PTB | Another mitigation approach is predicated upon distributing the PTB | |||
message to all anycast servers under the assumption that the one for | message to all anycast servers under the assumption that the one for | |||
which the message was intended will be able to match it to the flow | which the message was intended will be able to match it to the flow | |||
and update the route cache with the new MTU and that devices not able | and update the route cache with the new MTU and that devices not able | |||
to match the flow will discard these packets. Such distribution has | to match the flow will discard these packets. Such distribution has | |||
potentially significant implications for resource consumption and for | potentially significant implications for resource consumption and for | |||
self-inflicted denial-of-service if not carefully employed. | self-inflicted denial of service (DOS) if not carefully employed. | |||
Fortunately, in real-world deployments we have observed that the | Fortunately, we have observed that the number of flows for which this | |||
number of flows for which this problem occurs is relatively small | problem occurs is relatively small in real-world deployments (for | |||
(example, 10 or fewer pps on 1Gb/s or more worth of https traffic in | example, 10 or fewer pps on 1 Gbit/s or more worth of HTTPS); | |||
a real world deployment); sensible ingress rate limiters which will | sensible ingress rate limiters that will discard excessive message | |||
discard excessive message volume can be applied to protect even very | volume can be applied to protect even very large anycast server tiers | |||
large anycast server tiers with the potential for fallout limited to | with the potential for fallout limited to circumstances of deliberate | |||
circumstances of deliberate duress. | duress. | |||
3.1. Alternative Mitigations | 3.1. Alternative Mitigations | |||
As an alternative, it may be appropriate to lower the TCP MSS to 1220 | As an alternative, it may be appropriate to lower the TCP MSS to 1220 | |||
in order to accommodate 1280 byte MTU. We consider this undesirable | in order to accommodate 1280-byte MTU. We consider this undesirable, | |||
as hosts may not be able to independently set TCP MSS by address- | as hosts may not be able to independently set TCP MSS by address | |||
family thereby impacting IPv4, or alternatively that middle-boxes | family thereby impacting IPv4, or alternatively that middle-boxes | |||
need to be employed to clamp the MSS independently from the end- | need to be employed to clamp the MSS independently from the end | |||
systems. Potentially, extension headers might further alter the | systems. Potentially, extension headers might further alter the | |||
lower bound that the MSS would have to be set to, making clamping | lower bound that the MSS would have to be set to, making clamping | |||
still more undesirable. | even more undesirable. | |||
3.2. Implementation | 3.2. Implementation | |||
1. Filter-based-forwarding matches next-header ICMPv6 type-2 and | 1. Filter-based forwarding matches next-header ICMPv6 type 2 and | |||
matches a next-hop on a particular subnet directly attached to 1 | matches a next hop on a particular subnet directly attached to | |||
or more routers. The filter is policed to reasonable limits (we | one or more routers. The filter is policed to reasonable limits | |||
chose 1000pps, more conservative rates might be required in other | (we chose 1000 pps; more conservative rates might be required in | |||
implementations). | other implementations). | |||
2. Filter is applied on input side of all external (internet or | 2. The filter is applied on the input side of all external | |||
customer facing) interfaces. | (Internet- or customer-facing) interfaces. | |||
3. A proxy located at the next-hop forwards ICMPv6 type-2 packets | 3. A proxy located at the next hop forwards ICMPv6 type 2 packets it | |||
received at the next-hop to an Ethernet broadcast address | receives to an Ethernet broadcast address (example | |||
(example ff:ff:ff:ff:ff:ff) on all specified subnets. This was | ff:ff:ff:ff:ff:ff) on all specified subnets. This was | |||
necessitated by router inability (in IPv6) to forward the same | necessitated by router inability (in IPv6) to forward the same | |||
packet to multiple unicast next-hops. | packet to multiple unicast next hops. | |||
4. Anycasted servers receive the PTB error and process packet as | 4. Anycasted servers receive the PTB error and process the packet as | |||
needed. | needed. | |||
A simple Python scapy script that can perform the ICMPv6 proxy | A simple Python scapy [SCAPY] script that can perform the ICMPv6 | |||
reflection is included. | proxy reflection is included. | |||
#!/usr/bin/python | #!/usr/bin/python | |||
from scapy.all import * | from scapy.all import * | |||
IFACE_OUT = ["p2p1", "p2p2"] | IFACE_OUT = ["p2p1", "p2p2"] | |||
def icmp6_callback(pkt): | def icmp6_callback(pkt): | |||
if pkt.haslayer(IPv6) and (ICMPv6PacketTooBig in pkt) \ | if pkt.haslayer(IPv6) and (ICMPv6PacketTooBig in pkt) \ | |||
and pkt[Ether].dst != 'ff:ff:ff:ff:ff:ff': | and pkt[Ether].dst != 'ff:ff:ff:ff:ff:ff': | |||
skipping to change at page 6, line 28 | skipping to change at page 6, line 34 | |||
sendp(pkt, iface=iface) | sendp(pkt, iface=iface) | |||
def main(): | def main(): | |||
sniff(prn=icmp6_callback, filter="icmp6 \ | sniff(prn=icmp6_callback, filter="icmp6 \ | |||
and (ip6[40+0] == 2)", store=0) | and (ip6[40+0] == 2)", store=0) | |||
if __name__ == '__main__': | if __name__ == '__main__': | |||
main() | main() | |||
This example script listens on all interfaces for IPv6 PTB errors | This example script listens on all interfaces for IPv6 PTB errors | |||
being forwarded using filter-based-forwarding. It removes the | being forwarded using filter-based forwarding. It removes the | |||
existing Ethernet source and rewrites a new Ethernet destination of | existing Ethernet source and rewrites a new Ethernet destination of | |||
the Ethernet broadcast address. It then sends the resulting frame | the Ethernet broadcast address. It then sends the resulting frame | |||
out the p2p1 and p2p2 interfaces which attached to vlans where our | out the p2p1 and p2p2 interfaces that are attached to VLANs where our | |||
anycast servers reside. | anycast servers reside. | |||
3.2.1. Alternative Implementation | 3.2.1. Alternative Implementation | |||
Alternatively, network designs in which a common layer 2 network | Alternatively, network designs in which a common layer 2 network | |||
exists on the ECMP hop could distribute the proxy onto the end | exists on the ECMP hop could distribute the proxy onto the end | |||
systems, eliminating the need for policy routing. They could then | systems, eliminating the need for policy routing. They could then | |||
rewrite the destination -- for example, using iptables before | rewrite the destination -- for example, using iptables before | |||
forwarding the packet back to the network containing all of the | forwarding the packet back to the network containing all of the | |||
server or load balancer interfaces. This implmentation can be done | server or load-balancer interfaces. This implementation can be done | |||
entirely within the Linux iptables firewall. Because of the | entirely within the Linux iptables firewall. Because of the | |||
distributed nature of the filter, more conservative rate limits are | distributed nature of the filter, more conservative rate limits are | |||
required than when a global rate limit can be employed. | required than when a global rate limit can be employed. | |||
An example ip6tables / nftables rule to match icmp6 traffic, not | An example ip6tables/nftables rule to match icmp6 traffic, not match | |||
match broadcast traffic, impose a rate limit of 10 pps, and pass to a | broadcast traffic, impose a rate limit of 10 pps, and pass to a | |||
target destination would resemble: | target destination would resemble: | |||
ip6tables -I INPUT -i lo -p icmpv6 -m icmpv6 --icmpv6-type 2/0 \ | ip6tables -I INPUT -i lo -p icmpv6 -m icmpv6 --icmpv6-type 2/0 \ | |||
-m pkttype ! --pkt-type broadcast -m limit --limit 10/second \ | -m pkttype ! --pkt-type broadcast -m limit --limit 10/second \ | |||
-j TEE 2001:DB8::1 | -j TEE 2001:DB8::1 | |||
As with the scapy example, once the destination has been rewritten | As with the scapy example, once the destination has been rewritten | |||
from a hardcoded ND entry to an Ethernet broadcast address -- in this | from a hardcoded ND entry to an Ethernet broadcast address -- in this | |||
case to an IPv6 documentation address -- the traffic will be | case to an IPv6 documentation address -- the traffic will be | |||
reflected to all the hosts on the subnet. | reflected to all the hosts on the subnet. | |||
4. Improvements | 4. Improvements | |||
There are several ways that improvements could be made to the problem | There are several ways that improvements could be made to improve | |||
how to ECMP load balance of ICMPv6 PTB messages. little in the way of | handling ECMP load balancing of ICMPv6 PTB messages. Little in the | |||
Internet protocol specification change is required, rather we forsee | way of change to the Internet protocol specification is required; | |||
practical implemention change which insofar as we are aware does not | rather, we foresee practical implementation change, which, insofar as | |||
exist in current router switch or layer3/4 load balancers. | we are aware, does not exist in current router, switch, or layer 3/4 | |||
alternatively improved behavior on the part of client/server | load balancers. Alternatively, improved behavior on the part of | |||
detection of path mtu in band could render the behavior of devices in | client/server detection of path MTU in band could render the behavior | |||
the path irrelevant. | of devices in the path irrelevant. | |||
1. Routers with sufficient capacity within the lookup process could | 1. Routers with sufficient capacity within the lookup process could | |||
parse all the way through the L3 or L4 header in the ICMPv6 | parse all the way through the L3 or L4 header in the ICMPv6 | |||
payload beginning at bit offset 32 of the ICMP header. By | payload beginning at bit offset 32 of the ICMP header. By | |||
reordering the elements of the hash to match the inward direction | reordering the elements of the hash to match the inward direction | |||
of the flow, the PTB error could be directed to the same next-hop | of the flow, the PTB error could be directed to the same next hop | |||
as the incoming packets in the flow. | as the incoming packets in the flow. | |||
2. The FIB (Forwarding Information Base) on the router could be | 2. The FIB (Forwarding Information Base) on the router could be | |||
programmed with a multicast distribution tree that included all | programmed with a multicast distribution tree that includes all | |||
of the necessary next-hops, and unicast ICMPv6 packets could be | of the necessary next hops, and unicast ICMPv6 packets could be | |||
policy routed to these destinations. | policy routed to these destinations. | |||
3. Ubiquitous implementation of RFC 4821 [RFC4821] Packetization | 3. Ubiquitous implementation of RFC 4821 [RFC4821] Packetization | |||
Layer Path MTU Discovery would probably go a long way towards | Layer Path MTU Discovery would probably go a long way towards | |||
reducing dependence on ICMPv6 PTB by end systems. | reducing dependence on ICMPv6 PTB by end systems. | |||
5. Acknowledgements | 5. Security Considerations | |||
The authors would like to thank Marak Majkowsiki for contributing | ||||
text, examples, and a very close review. The authors would like to | ||||
thank Mark Andrews, Brian Carpenter, Nick Hilliard and Ray Hunter, | ||||
for review. | ||||
6. IANA Considerations | ||||
This memo includes no request to IANA. | ||||
7. Security Considerations | ||||
The employed mitigation has the potential to greatly amplify the | The employed mitigation has the potential to greatly amplify the | |||
impact of a deliberately malicious sending of ICMPv6 PTB messages. | impact of a deliberately malicious sending of ICMPv6 PTB messages. | |||
Sensible ingress rate limiting can reduce the potential for impact; | Sensible ingress rate limiting can reduce the potential for impact; | |||
however, legitimate PMTUD messages may be lost once the rate limit is | legitimate PMTUD messages may be lost once the rate limit is reached. | |||
reached; the scenario is analogous to other cases where DOS traffic | The scenario where drops of legitimate traffic occur is analogous to | |||
can crowd out legitimate traffic, however with a limited subset of | other cases where DOS traffic can crowd out legitimate traffic, | |||
overall traffic. | however only a limited subset of overall traffic is impacted. | |||
The proxy replication results in devices on the subnet not associated | The proxy replication results in all devices on the subnet receiving | |||
with the flow that generated the PTB, being recipients of the ICMPv6 | ICMPv6 PTB errors, even those not associated with the flow. This | |||
PTB message; which contains a large fragment of the packet that | could arguably result in information disclosure due to the wide | |||
exceeded the allowable MTU. This replication of the packet fragment | replication of the ICMPv6 PTB error on the subnet and the large | |||
could arguably result in information disclosure. Recipient machines | fragment of the offending IP packet embedded in the ICMPv6 error. | |||
should be in a common administrative domain. | Because of this, recipient machines should be in a common | |||
administrative domain. | ||||
8. Informative References | 6. Informative References | |||
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery | [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery | |||
for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August | for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August | |||
1996, <http://www.rfc-editor.org/info/rfc1981>. | 1996, <http://www.rfc-editor.org/info/rfc1981>. | |||
[RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and | [RFC2991] Thaler, D. and C. Hopps, "Multipath Issues in Unicast and | |||
Multicast Next-Hop Selection", RFC 2991, DOI 10.17487/ | Multicast Next-Hop Selection", RFC 2991, | |||
RFC2991, November 2000, | DOI 10.17487/RFC2991, November 2000, | |||
<http://www.rfc-editor.org/info/rfc2991>. | <http://www.rfc-editor.org/info/rfc2991>. | |||
[RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU | |||
Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | Discovery", RFC 4821, DOI 10.17487/RFC4821, March 2007, | |||
<http://www.rfc-editor.org/info/rfc4821>. | <http://www.rfc-editor.org/info/rfc4821>. | |||
[RFC6555] Wing, D. and A. Yourtchenko, "Happy Eyeballs: Success with | ||||
Dual-Stack Hosts", RFC 6555, DOI 10.17487/RFC6555, April | ||||
2012, <http://www.rfc-editor.org/info/rfc6555>. | ||||
[SCAPY] Scapy, <http://www.secdev.org/projects/scapy/>. | ||||
Acknowledgements | ||||
The authors thank Marak Majkowsiki for contributing text, examples, | ||||
and a very thorough review. The authors would like to thank Mark | ||||
Andrews, Brian Carpenter, Nick Hilliard, and Ray Hunter, for review. | ||||
Authors' Addresses | Authors' Addresses | |||
Matt Byerly | Matt Byerly | |||
Fastly | Fastly | |||
Kapolei, HI | Kapolei, HI | |||
US | United States | |||
Email: suckawha@gmail.com | Email: suckawha@gmail.com | |||
Matt Hite | Matt Hite | |||
Evernote | Evernote | |||
Redwood City, CA | Redwood City, CA | |||
US | United States | |||
Email: mhite@hotmail.com | Email: mhite@hotmail.com | |||
Joel Jaeggli | Joel Jaeggli | |||
Fastly | Fastly | |||
Mountain View, CA | Mountain View, CA | |||
US | United States | |||
Email: joelja@gmail.com | Email: joelja@gmail.com | |||
End of changes. 55 change blocks. | ||||
143 lines changed or deleted | 143 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |