--- 1/draft-ietf-tcpm-early-rexmt-02.txt 2009-11-18 21:12:14.000000000 +0100 +++ 2/draft-ietf-tcpm-early-rexmt-03.txt 2009-11-18 21:12:14.000000000 +0100 @@ -1,23 +1,23 @@ Internet Engineering Task Force Mark Allman INTERNET DRAFT ICSI -File: draft-ietf-tcpm-early-rexmt-02.txt Konstantin Avrachenkov +File: draft-ietf-tcpm-early-rexmt-03.txt Konstantin Avrachenkov Intended Status: Experimental INRIA Urtzi Ayesta LAAS-CNRS Josh Blanton Ohio University Per Hurtig Karlstad University - October 2009 - Expires: April 2010 + November 2009 + Expires: May 2010 Early Retransmit for TCP and SCTP Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that @@ -28,21 +28,21 @@ months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. - This Internet-Draft will expire on April 27, 2010. + This Internet-Draft will expire on May 18, 2010. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -74,21 +74,21 @@ 1 Introduction Many researchers have studied problems with TCP [RFC793,RFC5681] when the congestion window is small and have outlined possible mechanisms to mitigate these problems [Mor97,BPS+98,Bal98,LK98,RFC3150,AA02]. SCTP's [RFC4960] loss recovery and congestion control mechanisms are based on TCP and therefore the same problems impact the performance of SCTP connections. When the transport detects a missing segment, the connection enters a loss recovery phase. There are several variants - of the loss recovery phase depending on the TCP implemention. TCP + of the loss recovery phase depending on the TCP implementation. TCP can use slow start based recovery or Fast Recovery [RFC5681], NewReno [RFC3782], and loss recovery based on selective acknowledgments (SACKs) [RFC2018,FF96,RFC3517]. SCTP's loss recovery is not as varied due to the built-in selective acknowledgments. All the above variants have two methods for invoking loss recovery. First, if an acknowledgment (ACK) for a given segment is not received in a certain amount of time a retransmission timer fires and the segment is resent [RFC2988,RFC4960]. Second, the "Fast @@ -195,23 +195,23 @@ state to be kept by the TCP sender. In both cases we describe SACK-based and non-SACK-based versions of the scheme (of course, the non-SACK version will not apply to SCTP). This document explicitly does not prefer one variant over the other, but leaves the choice to the implementer. 2.1 Byte-based Early Retransmit A TCP or SCTP sender MAY use byte-based Early Retransmit. - A sender employing byte-based Early Retransmit MUST use the - following two conditions to determine when an Early Retransmit is - sent: + Upon the arrival of an ACK, a sender employing byte-based Early + Retransmit MUST use the following two conditions to determine when + an Early Retransmit is sent: (2.a) The amount of outstanding data (ownd)---data sent but not yet acknowledged---is less than 4*SMSS bytes. Note that in the byte-based variant of Early Retransmit 'ownd' is equivalent to 'FlightSize' defined in [RFC5681]. We use different notation because 'ownd' is not consistent with FlightSize through this document. Also note that in SCTP messages will have to be converted to @@ -221,21 +221,21 @@ sender or the advertised receive window does not permit new segments to be transmitted. When the above two conditions hold and a TCP connection does not support SACK the duplicate ACK threshold used to trigger a retransmission MUST be reduced to: ER_thresh = ceiling (ownd/SMSS) - 1 (1) duplicate ACKs, where ownd is in terms of bytes. We call this - reduced ACK threshold enabling "Early Retransimission". + reduced ACK threshold enabling "Early Retransmission". When conditions (2.a) and (2.b) hold and a TCP connection does support SACK or SCTP is in use, Early Retransmit MUST be used only when "ownd - SMSS" bytes have been SACKed. When conditions (2.a) and (2.b) do not hold, the transport MUST NOT use Early Retransmit, but rather prefer the standard mechanisms, including Fast Retransmit and Limited Transmit. As noted above, the drawback of this byte-based variant is precision @@ -255,41 +255,41 @@ In this case ER_thresh will be two, per equation (1). Thus, even though there are enough segments outstanding to trigger Fast Retransmit with the standard duplicate ACK threshold Early Retransmit will be triggered. This could cause or exacerbate performance problems caused by segment reordering in the network. 2.2 Segment-based Early Retransmit A TCP or SCTP sender MAY use segment-based Early Retransmit. - A sender employing segment-based Early Retransmit MUST use the - following two conditions to determine when an Early Retransmit is - sent: + Upon the arrival of an ACK, a sender employing segment-based Early + Retransmit MUST use the following two conditions to determine when + an Early Retransmit is sent: (3.a) The number of outstanding segments (oseg)---segments sent but not yet acknowledged---is less than four. (3.b) There is either no unsent data ready for transmission at the sender or the advertised receive window does not permit new segments to be transmitted. When the above two conditions hold and a TCP connection does not support SACK the duplicate ACK threshold used to trigger a retransmission MUST be reduced to: ER_thresh = oseg - 1 (2) duplicate ACKs, where oseg represents the number of outstanding segments. (We discuss tracking the number of outstanding segments below.) We call this reduced ACK threshold enabling "Early - Retransimission". + Retransmission". When conditions (3.a) and (3.b) hold and a TCP connection does support SACK or SCTP is in use, Early Retransmit MUST be used only when "oseg - 1" segments have been SACKed. A segment is considered to be SACKed when all its data bytes (TCP) or data chunks (SCTP) have been indicated as arrived by the receiver. When conditions (3.a) and (3.b) do not hold, the transport MUST NOT use Early Retransmit, but rather prefer the standard mechanisms, including Fast Retransmit and Limited Transmit. @@ -300,21 +300,23 @@ form an understanding as to how many actual segments have been transmitted, but not acknowledged. This can be done by the sender tracking the boundaries of the three segments on the right side of the current window (which involves tracking four sequence numbers in TCP). This could be done by keeping a circular list of the segment boundaries, for instance. Cumulative ACKs that do not fall within this region indicate that at least four segments are outstanding and therefore Early Retransmit MUST NOT be used. When the outstanding window becomes small enough that Early Retransmit can be invoked, a full understanding of the number of outstanding segments will be - available from the four sequence numbers retained. + available from the four sequence numbers retained. (Note: the + implicit sequence number consumed by the TCP FIN can also included + in the tracking of segment boundaries.) 3 Discussion In this section we discuss a number of issues surrounding the Early Retransmit algorithm. 3.1 SACK vs. non-SACK The SACK variant of the Early Retransmit algorithm is preferred to the non-SACK variant in TCP due to its robustness in the face of ACK @@ -541,20 +544,24 @@ Conservative Selective Acknowledgment (SACK)-based Loss Recovery Algorithm for TCP. RFC 3517, April 2003. [RFC3522] Reiner Ludwig, Michael Meyer. The Eifel Detection Algorithm for TCP. RFC 3522, April 2003. [RFC3782] Sally Floyd, Tom Henderson, Andrei Gurtov. The NewReno Modification to TCP's Fast Recovery Algorithm. RFC 3782, April 2004. + [RFC4653] Sumitha Bhandarkar, A. L. Narasimha Reddy, Mark Allman, + Ethan Blanton. Improving the Robustness of TCP to + Non-Congestion Events, August 2006. RFC 4653. + Author's Addresses: Mark Allman International Computer Science Institute 1947 Center Street, Suite 600 Berkeley, CA 94704-1198 Phone: 440-235-1792 mallman@icir.org http://www.icir.org/mallman/