--- 1/draft-ietf-tcpm-tcp-security-02.txt 2012-03-13 01:14:16.538671033 +0100 +++ 2/draft-ietf-tcpm-tcp-security-03.txt 2012-03-13 01:14:16.682671219 +0100 @@ -1,182 +1,183 @@ -TCP Maintenance and Minor F. Gont -Extensions (tcpm) UK CPNI -Internet-Draft January 21, 2011 -Intended status: BCP -Expires: July 25, 2011 +TCP Maintenance and Minor Extensions F. Gont +(tcpm) UK CPNI +Internet-Draft March 13, 2012 +Intended status: Informational +Expires: September 14, 2012 - Security Assessment of the Transmission Control Protocol (TCP) - draft-ietf-tcpm-tcp-security-02.txt + Survey of Security Hardening Methods for Transmission Control Protocol + (TCP) Implementations + draft-ietf-tcpm-tcp-security-03.txt Abstract - This document contains a security assessment of the specifications of - the Transmission Control Protocol (TCP), and of a number of - mechanisms and policies in use by popular TCP implementations. - Additionally, it contains best current practices for hardening a TCP - implementation. + This document surveys methods to harden Transmission Control Protocol + (TCP) implementations. It provides an overview of known attacks and + refers to the corresponding solutions in the TCP standards. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on July 25, 2011. + This Internet-Draft will expire on September 14, 2012. Copyright Notice - Copyright (c) 2011 IETF Trust and the persons identified as the + Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 - 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . 5 - 1.2. Scope of this document . . . . . . . . . . . . . . . . . 6 - 1.3. Organization of this document . . . . . . . . . . . . . . 8 - 2. The Transmission Control Protocol . . . . . . . . . . . . . . 8 - 3. TCP header fields . . . . . . . . . . . . . . . . . . . . . . 9 - 3.1. Source Port and Destination Port . . . . . . . . . . . . 10 - 3.2. Sequence number . . . . . . . . . . . . . . . . . . . . . 12 - 3.3. Acknowledgement Number . . . . . . . . . . . . . . . . . 14 - 3.4. Data Offset . . . . . . . . . . . . . . . . . . . . . . . 15 - 3.5. Control bits . . . . . . . . . . . . . . . . . . . . . . 15 - 3.5.1. Reserved (four bits) . . . . . . . . . . . . . . . . 15 - 3.5.2. CWR (Congestion Window Reduced) . . . . . . . . . . . 16 - 3.5.3. ECE (ECN-Echo) . . . . . . . . . . . . . . . . . . . 16 - 3.5.4. URG . . . . . . . . . . . . . . . . . . . . . . . . . 17 - 3.5.5. ACK . . . . . . . . . . . . . . . . . . . . . . . . . 17 - 3.5.6. PSH . . . . . . . . . . . . . . . . . . . . . . . . . 17 - 3.5.7. RST . . . . . . . . . . . . . . . . . . . . . . . . . 19 - 3.5.8. SYN . . . . . . . . . . . . . . . . . . . . . . . . . 19 - 3.5.9. FIN . . . . . . . . . . . . . . . . . . . . . . . . . 20 - 3.6. Window . . . . . . . . . . . . . . . . . . . . . . . . . 20 - 3.7. Checksum . . . . . . . . . . . . . . . . . . . . . . . . 22 - 3.8. Urgent pointer . . . . . . . . . . . . . . . . . . . . . 23 - 3.9. Options . . . . . . . . . . . . . . . . . . . . . . . . . 24 - 3.10. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 3.11. Data . . . . . . . . . . . . . . . . . . . . . . . . . . 28 - 4. Common TCP Options . . . . . . . . . . . . . . . . . . . . . 29 - 4.1. End of Option List (Kind = 0) . . . . . . . . . . . . . . 29 - 4.2. No Operation (Kind = 1) . . . . . . . . . . . . . . . . . 29 - 4.3. Maximum Segment Size (Kind = 2) . . . . . . . . . . . . . 29 - 4.4. Selective Acknowledgement Option . . . . . . . . . . . . 32 - 4.4.1. SACK-permitted Option (Kind = 4) . . . . . . . . . . 32 - 4.4.2. SACK Option (Kind = 5) . . . . . . . . . . . . . . . 33 - 4.5. MD5 Option (Kind=19) . . . . . . . . . . . . . . . . . . 35 - 4.6. Window scale option (Kind = 3) . . . . . . . . . . . . . 36 - 4.7. Timestamps option (Kind = 8) . . . . . . . . . . . . . . 37 - 4.7.1. Generation of timestamps . . . . . . . . . . . . . . 37 - 4.7.2. Vulnerabilities . . . . . . . . . . . . . . . . . . . 38 - 5. Connection-establishment mechanism . . . . . . . . . . . . . 39 - 5.1. SYN flood . . . . . . . . . . . . . . . . . . . . . . . . 40 - 5.2. Connection forgery . . . . . . . . . . . . . . . . . . . 44 - 5.3. Connection-flooding attack . . . . . . . . . . . . . . . 45 - 5.3.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 45 - 5.3.2. Countermeasures . . . . . . . . . . . . . . . . . . . 46 - 5.4. Firewall-bypassing techniques . . . . . . . . . . . . . . 48 - 6. Connection-termination mechanism . . . . . . . . . . . . . . 49 - 6.1. FIN-WAIT-2 flooding attack . . . . . . . . . . . . . . . 49 - 6.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 49 - 6.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 50 - 7. Buffer management . . . . . . . . . . . . . . . . . . . . . . 52 - 7.1. TCP retransmission buffer . . . . . . . . . . . . . . . . 52 - 7.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 52 - 7.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 53 - 7.2. TCP segment reassembly buffer . . . . . . . . . . . . . . 56 - 7.3. Automatic buffer tuning mechanisms . . . . . . . . . . . 59 - 7.3.1. Automatic send-buffer tuning mechanisms . . . . . . . 59 - 7.3.2. Automatic receive-buffer tuning mechanism . . . . . . 61 - 8. TCP segment reassembly algorithm . . . . . . . . . . . . . . 63 + 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 5 + 1.2. Scope of this document . . . . . . . . . . . . . . . . . . 6 + 1.3. Organization of this document . . . . . . . . . . . . . . 7 + 2. The Transmission Control Protocol . . . . . . . . . . . . . . 7 + 3. TCP header fields . . . . . . . . . . . . . . . . . . . . . . 8 + 3.1. Source Port and Destination Port . . . . . . . . . . . . . 8 + 3.2. Sequence number . . . . . . . . . . . . . . . . . . . . . 9 + 3.3. Acknowledgement Number . . . . . . . . . . . . . . . . . . 10 + 3.4. Data Offset . . . . . . . . . . . . . . . . . . . . . . . 10 + 3.5. Control bits . . . . . . . . . . . . . . . . . . . . . . . 10 + 3.5.1. Reserved (four bits) . . . . . . . . . . . . . . . . . 10 + 3.5.2. CWR (Congestion Window Reduced) . . . . . . . . . . . 11 + 3.5.3. ECE (ECN-Echo) . . . . . . . . . . . . . . . . . . . . 11 + 3.5.4. URG . . . . . . . . . . . . . . . . . . . . . . . . . 11 + 3.5.5. ACK . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.5.6. PSH . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.5.7. RST . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.5.8. SYN . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.5.9. FIN . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.6. Window . . . . . . . . . . . . . . . . . . . . . . . . . . 13 + 3.6.1. Security implications arising from closed windows . . 14 + 3.7. Checksum . . . . . . . . . . . . . . . . . . . . . . . . . 14 + 3.8. Urgent pointer . . . . . . . . . . . . . . . . . . . . . . 16 + 3.9. Options . . . . . . . . . . . . . . . . . . . . . . . . . 16 + 3.10. Padding . . . . . . . . . . . . . . . . . . . . . . . . . 19 + 3.11. Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 + 4. Common TCP Options . . . . . . . . . . . . . . . . . . . . . . 19 + 4.1. End of Option List (Kind = 0) . . . . . . . . . . . . . . 19 + 4.2. No Operation (Kind = 1) . . . . . . . . . . . . . . . . . 19 + 4.3. Maximum Segment Size (Kind = 2) . . . . . . . . . . . . . 19 + 4.4. Selective Acknowledgement Option . . . . . . . . . . . . . 20 + 4.4.1. SACK-permitted Option (Kind = 4) . . . . . . . . . . . 20 + 4.4.2. SACK Option (Kind = 5) . . . . . . . . . . . . . . . . 20 + 4.5. MD5 Option (Kind=19) . . . . . . . . . . . . . . . . . . . 21 + 4.6. Window scale option (Kind = 3) . . . . . . . . . . . . . . 21 + 4.7. Timestamps option (Kind = 8) . . . . . . . . . . . . . . . 22 + 4.7.1. Generation of timestamps . . . . . . . . . . . . . . . 22 + 4.7.2. Vulnerabilities . . . . . . . . . . . . . . . . . . . 22 + 5. Connection-establishment mechanism . . . . . . . . . . . . . . 24 + 5.1. SYN flood . . . . . . . . . . . . . . . . . . . . . . . . 24 + 5.2. Connection forgery . . . . . . . . . . . . . . . . . . . . 28 + 5.3. Connection-flooding attack . . . . . . . . . . . . . . . . 29 + 5.3.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 29 + 5.3.2. Countermeasures . . . . . . . . . . . . . . . . . . . 30 + 5.4. Firewall-bypassing techniques . . . . . . . . . . . . . . 32 + + 6. Connection-termination mechanism . . . . . . . . . . . . . . . 32 + 6.1. FIN-WAIT-2 flooding attack . . . . . . . . . . . . . . . . 32 + 6.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 32 + 6.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 33 + 7. Buffer management . . . . . . . . . . . . . . . . . . . . . . 35 + 7.1. TCP retransmission buffer . . . . . . . . . . . . . . . . 36 + 7.1.1. Vulnerability . . . . . . . . . . . . . . . . . . . . 36 + 7.1.2. Countermeasures . . . . . . . . . . . . . . . . . . . 37 + 7.2. TCP segment reassembly buffer . . . . . . . . . . . . . . 40 + 7.3. Automatic buffer tuning mechanisms . . . . . . . . . . . . 42 + 7.3.1. Automatic send-buffer tuning mechanisms . . . . . . . 43 + 7.3.2. Automatic receive-buffer tuning mechanism . . . . . . 45 + 8. TCP segment reassembly algorithm . . . . . . . . . . . . . . . 47 8.1. Problems that arise from ambiguity in the reassembly - process . . . . . . . . . . . . . . . . . . . . . . . . . 63 - 9. TCP Congestion Control . . . . . . . . . . . . . . . . . . . 64 - 9.1. Congestion control with misbehaving receivers . . . . . . 66 - 9.1.1. ACK division . . . . . . . . . . . . . . . . . . . . 66 - 9.1.2. DupACK forgery . . . . . . . . . . . . . . . . . . . 66 - 9.1.3. Optimistic ACKing . . . . . . . . . . . . . . . . . . 67 - 9.2. Blind DupACK triggering attacks against TCP . . . . . . . 68 - 9.2.1. Blind throughput-reduction attack . . . . . . . . . . 70 - 9.2.2. Blind flooding attack . . . . . . . . . . . . . . . . 70 - 9.2.3. Difficulty in performing the attacks . . . . . . . . 71 - 9.2.4. Modifications to TCP's loss recovery algorithms . . . 72 - 9.2.5. Countermeasures . . . . . . . . . . . . . . . . . . . 74 - 9.3. TCP Explicit Congestion Notification (ECN) . . . . . . . 79 - 9.3.1. Possible attacks by a compromised router . . . . . . 79 - 9.3.2. Possible attacks by a malicious TCP endpoint . . . . 80 - 10. TCP API . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 - 10.1. Passive opens and binding sockets . . . . . . . . . . . . 81 - 10.2. Active opens and binding sockets . . . . . . . . . . . . 82 - 11. Blind in-window attacks . . . . . . . . . . . . . . . . . . . 84 - 11.1. Blind TCP-based connection-reset attacks . . . . . . . . 84 - 11.1.1. RST flag . . . . . . . . . . . . . . . . . . . . . . 85 - 11.1.2. SYN flag . . . . . . . . . . . . . . . . . . . . . . 86 - 11.1.3. Security/Compartment . . . . . . . . . . . . . . . . 88 - 11.1.4. Precedence . . . . . . . . . . . . . . . . . . . . . 89 - 11.1.5. Illegal options . . . . . . . . . . . . . . . . . . . 90 - 11.2. Blind data-injection attacks . . . . . . . . . . . . . . 90 - 12. Information leaking . . . . . . . . . . . . . . . . . . . . . 91 + process . . . . . . . . . . . . . . . . . . . . . . . . . 47 + 9. TCP Congestion Control . . . . . . . . . . . . . . . . . . . . 48 + 9.1. Congestion control with misbehaving receivers . . . . . . 48 + 9.1.1. ACK division . . . . . . . . . . . . . . . . . . . . . 48 + 9.1.2. DupACK forgery . . . . . . . . . . . . . . . . . . . . 49 + 9.1.3. Optimistic ACKing . . . . . . . . . . . . . . . . . . 49 + 9.2. Blind DupACK triggering attacks against TCP . . . . . . . 50 + 9.2.1. Blind throughput-reduction attack . . . . . . . . . . 52 + 9.2.2. Blind flooding attack . . . . . . . . . . . . . . . . 53 + 9.2.3. Difficulty in performing the attacks . . . . . . . . . 53 + 9.2.4. Modifications to TCP's loss recovery algorithms . . . 54 + 9.2.5. Countermeasures . . . . . . . . . . . . . . . . . . . 55 + 9.3. TCP Explicit Congestion Notification (ECN) . . . . . . . . 55 + 10. TCP API . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 + 10.1. Passive opens and binding sockets . . . . . . . . . . . . 56 + 10.2. Active opens and binding sockets . . . . . . . . . . . . . 57 + 11. Blind in-window attacks . . . . . . . . . . . . . . . . . . . 59 + 11.1. Blind TCP-based connection-reset attacks . . . . . . . . . 59 + 11.1.1. RST flag . . . . . . . . . . . . . . . . . . . . . . . 60 + 11.1.2. SYN flag . . . . . . . . . . . . . . . . . . . . . . . 60 + 11.1.3. Security/Compartment . . . . . . . . . . . . . . . . . 60 + 11.1.4. Precedence . . . . . . . . . . . . . . . . . . . . . . 61 + 11.1.5. Illegal options . . . . . . . . . . . . . . . . . . . 61 + 11.2. Blind data-injection attacks . . . . . . . . . . . . . . . 61 + 12. Information leaking . . . . . . . . . . . . . . . . . . . . . 62 12.1. Remote Operating System detection via TCP/IP stack - fingerprinting . . . . . . . . . . . . . . . . . . . . . 91 - 12.1.1. FIN probe . . . . . . . . . . . . . . . . . . . . . . 91 - 12.1.2. Bogus flag test . . . . . . . . . . . . . . . . . . . 92 - 12.1.3. TCP ISN sampling . . . . . . . . . . . . . . . . . . 92 - 12.1.4. TCP initial window . . . . . . . . . . . . . . . . . 92 - 12.1.5. RST sampling . . . . . . . . . . . . . . . . . . . . 93 - 12.1.6. TCP options . . . . . . . . . . . . . . . . . . . . . 94 - 12.1.7. Retransmission Timeout (RTO) sampling . . . . . . . . 94 - 12.2. System uptime detection . . . . . . . . . . . . . . . . . 94 - 13. Covert channels . . . . . . . . . . . . . . . . . . . . . . . 95 - 14. TCP Port scanning . . . . . . . . . . . . . . . . . . . . . . 95 - 14.1. Traditional connect() scan . . . . . . . . . . . . . . . 96 - 14.2. SYN scan . . . . . . . . . . . . . . . . . . . . . . . . 96 - 14.3. FIN, NULL, and XMAS scans . . . . . . . . . . . . . . . . 96 - 14.4. Maimon scan . . . . . . . . . . . . . . . . . . . . . . . 98 - 14.5. Window scan . . . . . . . . . . . . . . . . . . . . . . . 98 - 14.6. ACK scan . . . . . . . . . . . . . . . . . . . . . . . . 99 - 15. Processing of ICMP error messages by TCP . . . . . . . . . . 99 - 16. TCP interaction with the Internet Protocol (IP) . . . . . . . 99 - 16.1. TCP-based traceroute . . . . . . . . . . . . . . . . . . 99 - 16.2. Blind TCP data injection through fragmented IP traffic . 100 - 16.3. Broadcast and multicast IP addresses . . . . . . . . . . 102 - 17. Security Considerations . . . . . . . . . . . . . . . . . . . 102 - 18. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 102 - 19. References . . . . . . . . . . . . . . . . . . . . . . . . . 103 - 20. References . . . . . . . . . . . . . . . . . . . . . . . . . 113 - 20.1. Normative References . . . . . . . . . . . . . . . . . . 113 - 20.2. Informative References . . . . . . . . . . . . . . . . . 113 - Appendix A. TODO list . . . . . . . . . . . . . . . . . . . . . 113 + fingerprinting . . . . . . . . . . . . . . . . . . . . . . 62 + 12.1.1. FIN probe . . . . . . . . . . . . . . . . . . . . . . 63 + 12.1.2. Bogus flag test . . . . . . . . . . . . . . . . . . . 63 + 12.1.3. TCP ISN sampling . . . . . . . . . . . . . . . . . . . 63 + 12.1.4. TCP initial window . . . . . . . . . . . . . . . . . . 63 + 12.1.5. RST sampling . . . . . . . . . . . . . . . . . . . . . 64 + 12.1.6. TCP options . . . . . . . . . . . . . . . . . . . . . 65 + 12.1.7. Retransmission Timeout (RTO) sampling . . . . . . . . 65 + + 12.2. System uptime detection . . . . . . . . . . . . . . . . . 66 + 13. Covert channels . . . . . . . . . . . . . . . . . . . . . . . 66 + 14. TCP Port scanning . . . . . . . . . . . . . . . . . . . . . . 66 + 14.1. Traditional connect() scan . . . . . . . . . . . . . . . . 67 + 14.2. SYN scan . . . . . . . . . . . . . . . . . . . . . . . . . 67 + 14.3. FIN, NULL, and XMAS scans . . . . . . . . . . . . . . . . 68 + 14.4. Maimon scan . . . . . . . . . . . . . . . . . . . . . . . 69 + 14.5. Window scan . . . . . . . . . . . . . . . . . . . . . . . 69 + 14.6. ACK scan . . . . . . . . . . . . . . . . . . . . . . . . . 70 + 15. Processing of ICMP error messages by TCP . . . . . . . . . . . 70 + 16. TCP interaction with the Internet Protocol (IP) . . . . . . . 70 + 16.1. TCP-based traceroute . . . . . . . . . . . . . . . . . . . 71 + 16.2. Blind TCP data injection through fragmented IP traffic . . 71 + 16.3. Broadcast and multicast IP addresses . . . . . . . . . . . 73 + 17. Security Considerations . . . . . . . . . . . . . . . . . . . 73 + 18. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 73 + 19. References (to be translated to xml) . . . . . . . . . . . . . 74 + 20. References . . . . . . . . . . . . . . . . . . . . . . . . . . 84 + 20.1. Normative References . . . . . . . . . . . . . . . . . . . 84 + 20.2. Informative References . . . . . . . . . . . . . . . . . . 84 + Appendix A. TODO list . . . . . . . . . . . . . . . . . . . . . . 85 Appendix B. Change log (to be removed by the RFC Editor - before publication of this document as an RFC) . . . 113 - B.1. Changes from draft-ietf-tcpm-tcp-security-01 . . . . . . 113 - Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 114 + before publication of this document as an RFC) . . . 85 + B.1. Changes from draft-ietf-tcpm-tcp-security-02 . . . . . . . 85 + B.2. Changes from draft-ietf-tcpm-tcp-security-01 . . . . . . . 86 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 86 1. Preface 1.1. Introduction The TCP/IP protocol suite was conceived in an environment that was quite different from the hostile environment they currently operate in. However, the effectiveness of the protocols led to their early adoption in production environments, to the point that, to some extent, the current world's economy depends on them. @@ -219,58 +220,42 @@ interoperability [Silbersack, 2005]. Producing a secure TCP/IP implementation nowadays is a very difficult task, in part because of the lack of a single document that serves as a security roadmap for the protocols. Implementers are faced with the hard task of identifying relevant documentation and differentiating between that which provides correct advice, and that which provides misleading advice based on inaccurate or wrong assumptions. - There is a clear need for a companion document to the IETF - specifications that discusses the security aspects and implications - of the protocols, identifies the existing vulnerabilities, discusses - the possible countermeasures, and analyzes their respective - effectiveness. - This document is the result of a security assessment of the IETF specifications of the Transmission Control Protocol (TCP), from a security point of view. Possible threats are identified and, where - possible, countermeasures are proposed. Additionally, many + possible, countermeasures are described. Additionally, many implementation flaws that have led to security vulnerabilities have been referenced in the hope that future implementations will not incur the same problems. - This document does not aim to be the final word on the security - aspects of TCP. On the contrary, it aims to raise awareness about a - number of TCP vulnerabilities that have been faced in the past, those - that are currently being faced, and some of those that we may still - have to deal with in the future. - - Feedback from the community is more than encouraged to help this - document be as accurate as possible and to keep it updated as new - vulnerabilities are discovered. - - This document is heavily based on the "Security Assessment of the + This document is based on the "Security Assessment of the Transmission Control Protocol (TCP)" released by the UK Centre for the Protection of National Infrastructure (CPNI), available at: http: //www.cpni.gov.uk/Products/technicalnotes/ Feb-09-security-assessment-TCP.aspx . 1.2. Scope of this document While there are a number of protocols that may affect the way TCP operates, this document focuses only on the specifications of the Transmission Control Protocol (TCP) itself. - The following IETF RFCs were selected for assessment as part of this - work: + The machanisms described in the following documents were selected for + assessment as part of this work: o RFC 793, "Transmission Control Protocol. DARPA Internet Program. Protocol Specification" (91 pages) o RFC 1122, "Requirements for Internet Hosts -- Communication Layers" (116 pages) o RFC 1191, "Path MTU Discovery" (19 pages) o RFC 1323, "TCP Extensions for High Performance" (37 pages) @@ -318,97 +303,41 @@ their security implications, and discusses the possible countermeasures. The second part contains an analysis of the security implications of the mechanisms and policies implemented by TCP, and of a number of implementation strategies in use by a number of popular TCP implementations. 2. The Transmission Control Protocol The Transmission Control Protocol (TCP) is a connection-oriented transport protocol that provides a reliable byte-stream data transfer - service. - - Very few assumptions are made about the reliability of underlying - data transfer services below the TCP layer. Basically, TCP assumes - it can obtain a simple, potentially unreliable datagram service from - the lower level protocols. Figure 1 illustrates where TCP fits in - the DARPA reference model. - - +---------------+ - | Application | - +---------------+ - | TCP | - +---------------+ - | IP | - +---------------+ - | Network | - +---------------+ - - Figure 1: TCP in the DARPA reference model - - TCP provides facilities in the following areas: - - o Basic Data Transfer - - o Reliability - - o Flow Control - - o Multiplexing - - o Connections - o Precedence and Security - - o Congestion Control + service. Very few assumptions are made about the reliability of + underlying data transfer services below the TCP layer. Basically, + TCP assumes it can obtain a simple, potentially unreliable datagram + service from the lower level protocols. - The core TCP specification, RFC 793 [Postel, 1981c], dates back to - 1981 and standardizes the basic mechanisms and policies of TCP. RFC - 1122 [Braden, 1989] provides clarifications and errata for the - original specification. RFC 2581 [Allman et al, 1999] specifies TCP - congestion control and avoidance mechanisms, not present in the - original specification. Other documents specify extensions and - improvements for TCP. + The core TCP specification, RFC 793 [RFC0793], dates back to 1981 and + standardizes the basic mechanisms and policies of TCP. RFC 1122 + [RFC1122] provides clarifications and errata for the original + specification. RFC 2581 [RFC5681] specifies TCP congestion control + and avoidance mechanisms, not present in the original specification. + Other documents specify extensions and improvements for TCP. The large amount of documents that specify extensions, improvements, or modifications to existing TCP mechanisms has led the IETF to publish a roadmap for TCP, RFC 4614 [Duke et al, 2006], that clarifies the relevance of each of those documents. 3. TCP header fields - RFC 793 [Postel, 1981c] defines the syntax of a TCP segment, along - with the semantics of each of the header fields. Figure 2 - illustrates the syntax of a TCP segment. - - 0 1 2 3 - 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Source Port | Destination Port | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Sequence Number | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Acknowledgment Number | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Data | |C|E|U|A|P|R|S|F| | - | Offset|Resrved|W|C|R|C|S|S|Y|I| Window | - | | |R|E|G|K|H|T|N|N| | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Checksum | Urgent Pointer | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | Options | Padding | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - | data | - +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - - Note that one tick mark represents one bit position - - Figure 2: Transmission Control Protocol header format + RFC 793 [RFC0793] defines the syntax of a TCP segment, along with the + semantics of each of the header fields. The minimum TCP header size is 20 bytes, and corresponds to a TCP segment with no options and no data. However, a TCP module might be handed an (illegitimate) "TCP segment" of less than 20 bytes. Therefore, before doing any processing of the TCP header fields, the following check should be performed by TCP on the segments handed by the internet layer: Segment.Size >= 20 @@ -420,685 +349,371 @@ 3.1. Source Port and Destination Port The Source Port field contains a 16-bit number that identifies the TCP end-point that originated this TCP segment. The TCP Destination Port contains a 16-bit number that identifies the destination TCP end-point of this segment. In most of the discussion we refer to client-side (or "ephemeral") port-numbers and server-side port numbers, since that distinction is what usually affects the interpretation of a port number. - TCP SHOULD randomize its ephemeral (client-side) ports, to improve - its resistance to off-path attacks. For the purpose of ephemeral - port selection, the largest posible port range SHOULD be used - (ideally 1024-65535) I-D.ietf-tsvwg-port-randomization. - - DISCUSSION: - - [I-D.ietf-tsvwg-port-randomization] provides advice on port - randomization. - - TCP MUST NOT allocate port number 0, as its use could lead to - interoperability problems. If a segment is received with port 0 as - the Source Port or the Destination Port, a RST segment SHOULD be sent - in response (provided that the incomming segment does not have the - RST flag set). - - DISCUSSION: - - While port 0 is a legitimate port number, it has a special meaning - in the UNIX Sockets API. For example, when a TCP port number of 0 - is passed as an argument to the bind() function, rather than - binding port 0, an ephemeral port is selected for the - corresponding TCP end-point. As a result, the TCP port number 0 - is never actually used in TCP segments. - - Different implementations have been found to respond differently - to TCP segments that have a port number of 0 as the Source Port - and/or the Destination Port. As a result, TCP segments with a - port number of 0 are usually employed for remote OS detection via - TCP/IP stack fingerprinting [Jones, 2003]. - - Since in practice TCP port 0 is not used by any legitimate - application and is only used for fingerprinting purposes, a number - of host implementations already reject TCP segments that use 0 as - the Source Port and/or the Destination Port. Also, a number - firewalls filter (by default) any TCP segments that contain a port - number of zero for the Source Port and/or the Destination Port. - - We therefore recommend that TCP implementations respond to - incoming TCP segments that have a Source Port or a Destination - Port of 0 with an RST (provided these incoming segments do not - have the RST bit set). - - Responding with an RST segment to incoming segments that have the - RST bit would open the door to RST-war attacks. - - TCP MUST be able to grecefully handle the case where the source end- - point (IP Source Address, TCP Source Port) is the same as the - destination end-point (IP Destination Address, TCP Destination Port). - - DISCUSSION: - - Some systems have been found to be unable to process TCP segments - in which the source endpoint {Source Address, Source Port} is the - same than the destination end-point {Destination Address, - Destination Port}. Such TCP segments have been reported to cause - malfunction of a number of implementations [CERT, 1996], and have - been exploited in the past to perform Denial of Service (DoS) - attacks [Meltman, 1997]. While these packets are very very - unlikely to exist in real and legitimate scenarios, TCP should - nevertheless be able to process them without the need of any - "extra" code. - - A SYN segment in which the source end-point {Source Address, - Source Port} is the same as the destination end-point {Destination - Address, Destination Port} will result in a "simultaneous open" - scenario, such as the one described in page 32 of RFC 793 [Postel, - 1981c]. Therefore, those TCP implementations that correctly - handle simultaneous opens should already be prepared to handle - these unusual TCP segments. - - TCP SHOULD NOT allocate of port numbers that are in use by a TCP that - is in the LISTEN or CLOSED states for use as ephemeral ports, as this - could allow attackers on the local system to "steal" incomming TCP - connections. - - DISCUSSION: + Most active attacks against ongoing TCP connections require the + attacker to guess or know the four-tuple that identifies the + connection. As a result, randomization of the TCP ephemeral ports + provides a (partial) mitigation against off-path attacks. [RFC6056] + provides guidance in this area. - While the only requirement for a selected ephemeral port is that - the resulting four-tuple (connection-id) is unique (i.e., not - currently in use by any other TCP connection), in practice it may - be necessary to not allow the allocation of port numbers that are - in use by a TCP that is in the LISTEN or CLOSED states for use as - ephemeral ports, as this might allow an attacker to "steal" - incoming connections from a local server application. Therefore, - TCP SHOULD NOT allocate port numbers that are in use by a TCP in - the LISTEN or CLOSED states for use as ephemeral ports. Section - 10.2 of this document provides a detailed discussion of this - issue. + Some implementations have been known to crash when a TCP segment in + which the source end-point (IP Source Address, TCP Source Port) is + the same as the destination end-point (IP Destination Address, TCP + Destination Port). [draft-gont-tcpm-tcp-mirrored-endpoints-00.txt] + describes this issue in detail and provides advice in this area. While some systems restrict use of the port numbers in the range - 0-1024 to privileged users, applications SHOULD NOT grant any trust + 0-1024 to privileged users, applications should not grant any trust based on the port numbers used for a TCP connection. - DISCUSSION: - Not all systems require superuser privileges to bind port numbers in that range. Besides, with desktop computers such "distinction" has generally become irrelevant. - Middle-boxes such as packet filters MUST NOT assume that clients use + Middle-boxes such as packet filters must not assume that clients use port numbers from only the Dynamic or Registered port ranges. - DISCUSSION: - It should also be noted that some clients, such as DNS resolvers, are known to use port numbers from the "Well Known Ports" range. Therefore, middle-boxes such as packet filters MUST NOT assume that clients use port number from only the Dynamic or Registered port ranges. 3.2. Sequence number - TCP SHOULD select its Initial Sequence Numbers (ISNs) with the - following expression: - - ISN = M + F(localhost, localport, remotehost, remoteport, secret_key) - - where M is a monotonically increasing counter maintained within TCP, - and F() is a Pseudo-Random Function (PRF). As it is vital that F() - not be computable from the outside, F() could be a PRF of the - connection-id and some secret data. HMAC-SHA-256 would be a good - choice for F() - - DISCUSSION: - - The choice of the Initial Sequence Number of a connection is not - arbitrary, but aims to minimize the chances of a stale segment - from being accepted by a new incarnation of a previous connection. - RFC 793 [Postel, 1981c] suggests the use of a global 32-bit ISN - generator, whose lower bit is incremented roughly every 4 - microseconds. - - However, use of such an ISN generator makes it trivial to predict - the ISN that a TCP will use for new connections, thus allowing a - variety of attacks against TCP, such as those described in Section - 5.2 and Section 11 of this document. This vulnerability was first - described in [Morris, 1985], and its exploitation was widely - publicized about 10 years later [Shimomura, 1995]. - - As a matter of fact, protection against old stale segments from a - previous incarnation of the connection comes from allowing the - creation of a new incarnation of a previous connection only after - 2*MSL have passed since a segment corresponding to the old - incarnation was last seen. This is accomplished by the TIME-WAIT - state, and TCP's "quiet time" concept. However, as discussed in - Section 3.1 and Section 11.1.2 of this document, the ISN can be - used to perform some heuristics meant to avoid an interoperability - problem that may arise when two systems establish connections at a - high rate. In order for such heuristics to work, the ISNs - generated by a TCP should be monotonically increasing. - - The ISN generation scheme recommended in this section was - originally proposed in RFC 1948 [Bellovin, 1996], such that the - chances of an attacker from guessing the ISN of a TCP are reduced, - while still producing a monotonically-increasing sequence that - allows implementation of the optimization described in Section 3.1 - and Section 11.1.2 of this document. + Predictable sequence numbers allow a variety of attacks against TCP, + such as those described in Section 5.2 and Section 11 of this + document. This vulnerability was first described in [Morris1985], + and its exploitation was widely publicized about 10 years later + [Shimomura1995]. - [CERT, 2001] and [US-CERT, 2001] are advisories about the security - implications of weak ISN generators. [Zalewski, 2001a] and - [Zalewski, 2002] contain a detailed analysis of ISN generators, - and a survey of the algorithms in use by popular TCP - implementations. + In order to mitigate this vulnerabilities, some implementations set + the TCP ISN to a PRNG. However, this has been known to cause + interoperability problems. [RFC6528] provides advice in this area. - Another security consideration that should be made about TCP - sequence numbers is that they might allow an attacker to count the - number of systems behind a Network Address Translator (NAT) - [Srisuresh and Egevang, 2001]. Depending on the ISN generators - implemented by each of the systems behind the NAT, an attacker - might be able to count the number of systems behind the NAT by - establishing a number of TCP connections (using the public address - of the NAT) and indentifying the number of different sequence - number "spaces". This information leakage could be eliminated by - rewriting the contents of all those header fields and options that - make use of sequence numbers (such as the Sequence Number and the - Acknowledgement Number fields, and the SACK Option) at the NAT. - [Gont and Srisuresh, 2008] provides a detailed discussion of the - security implications of NATs and of the possible mitigations for - this and other issues. + Another security consideration that should be made about TCP sequence + numbers is that they might allow an attacker to count the number of + systems behind a Network Address Translator (NAT) [Srisuresh and + Egevang, 2001]. Depending on the ISN generators implemented by each + of the systems behind the NAT, an attacker might be able to count the + number of systems behind the NAT by establishing a number of TCP + connections (using the public address of the NAT) and indentifying + the number of different sequence number "spaces". [Gont and + Srisuresh, 2008] provides a detailed discussion of the security + implications of NATs and of the possible mitigations for this and + other issues. 3.3. Acknowledgement Number - TCP SHOULD set the Acknowledgement Number to zero when sending a TCP - segment that does not have the ACK bit set (i.e., a SYN segment). - - TCP MUST check that, on segments that have the ACK bit set, the - Acknowledgment Number satisfies the expression: - - SND.UNA - SND.MAX.WND <= SEG.ACK <= SND.NXT - - If a TCP segment does not pass this check, the segment MUST be - dropped, and an ACK segment SHOULD be sent in response. - - DISCUSSION: - - If the ACK bit is on, the Acknowledgement Number contains the - value of the next sequence number the sender of this segment is - expecting to receive. According to RFC 793, the Acknowledgement - Number is considered valid as long as it does not acknowledge the - receipt of data that has not yet been sent. + If the ACK bit is on, the Acknowledgement Number contains the value + of the next sequence number the sender of this segment is expecting + to receive. According to RFC 793, the Acknowledgement Number is + considered valid as long as it does not acknowledge the receipt of + data that has not yet been sent. However, as a result of recent concerns on forgery attacks against - TCP (see Section 11 of this document), ongoing work at the IETF - [Ramaiah et al, 2008] has proposed to enforce a more strict check - on the Acknowledgement Number of segments that have the ACK bit - set: - - SND.UNA - SND.MAX.WND <= SEG.ACK <= SND.NXT + TCP (see Section 11 of this document) [RFC5961] has proposed to + enforce a more strict check on the Acknowledgement Number of segments + that have the ACK bit set. See for more details. - If the ACK bit is off, the Acknowledgement Number field is not - valid. We recommend TCP implementations to set the - Acknowledgement Number to zero when sending a TCP segment that - does not have the ACK bit set (i.e., a SYN segment). Some TCP - implementations have been known to fail to set the Acknowledgement - Number to zero, thus leaking information. + If the ACK bit is off, the Acknowledgement Number field is not valid. + We recommend TCP implementations to set the Acknowledgement Number to + zero when sending a TCP segment that does not have the ACK bit set + (i.e., a SYN segment). Some TCP implementations have been known to + fail to set the Acknowledgement Number to zero, thus leaking + information. TCP Acknowledgements are also used to perform heuristics for loss recovery and congestion control. Section 9 of this document describes a number of ways in which these mechanisms can be exploited. 3.4. Data Offset - TCP MUST enforce the following checks on the Data Offset field: - - Data Offset >= 5 - - Data Offset * 4 <= TCP segment length - - If a TCP segment does not pass these checks, it should be silently - dropped. - - The TCP segment length should be obtained from the IP layer, as - TCP does not include a TCP segment length field. - - DISCUSSION: - - The Data Offset field indicates the length of the TCP header in - 32-bit words. As the minimum TCP header size is 20 bytes, the - minimum legal value for this field is 5. - - For obvious reasons, the TCP header cannot be larger than the - whole TCP segment it is part of. + [draft-gont-tcpm-tcp-sanity-checks-00.txt] specifies a number of + sanity checks that should be performed on the Data Offset field. 3.5. Control bits The following subsections provide a discussion of the different control bits in the TCP header. TCP segments with unusual combinations of flags set have been known in the past to cause malfunction of some implementations, sometimes to the extent of - causing them to crash [Postel, 1987] [Braden, 1992]. These packets - are still usually employed for the purpose of TCP/IP stack - fingerprinting. Section 12.1 contains a discussion of TCP/IP stack - fingerprinting. + causing them to crash [RFC1025] [RFC1379]. These packets are still + usually employed for the purpose of TCP/IP stack fingerprinting. + Section 12.1 contains a discussion of TCP/IP stack fingerprinting. 3.5.1. Reserved (four bits) - TCP MUST ignore the Reserved field of incoming TCP segments. - - DISCUSSION: - These four bits are reserved for future use, and must be zero. As with virtually every field, the Reserved field could be used as a covert channel. While there exist intermediate devices such as protocol scrubbers that clear these bits, and firewalls that drop/ reject segments with any of these bits set, these devices should - consider the impact of these policies on TCP interoperability. - For example, as TCP continues to evolve, all or part of the bits - in the Reserved field could be used to implement some new - functionality. If some middle-box or end-system implementation - were to drop a TCP segment merely because some of these bits are - not set to zero, interoperability problems would arise. + consider the impact of these policies on TCP interoperability. For + example, as TCP continues to evolve, all or part of the bits in the + Reserved field could be used to implement some new functionality. If + some middle-box or end-system implementation were to drop a TCP + segment merely because some of these bits are not set to zero, + interoperability problems would arise. 3.5.2. CWR (Congestion Window Reduced) - DISCUSSION: - - The CWR flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is - used as part of the Explicit Congestion Notification (ECN) - mechanism. For connections in any of the synchronized states, - this flag indicates, when set, that the TCP sending this segment - has reduced its congestion window. + The CWR flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is used + as part of the Explicit Congestion Notification (ECN) mechanism. For + connections in any of the synchronized states, this flag indicates, + when set, that the TCP sending this segment has reduced its + congestion window. An analysis of the security implications of ECN can be found in Section 9.3 of this document. 3.5.3. ECE (ECN-Echo) - DISCUSSION: - - The ECE flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is - used as part of the Explicit Congestion Notification (ECN) - mechanism. - - Once a TCP connection has been established, an ACK segment with - the ECE bit set indicates that congestion was encountered in the - network on the path from the sender to the receiver. This - indication of congestion should be treated just as a congestion - loss in non-ECN-capable TCP [Ramakrishnan et al, 2001]. - Additionally, TCP should not increase the congestion window (cwnd) - in response to such an ACK segment that indicates congestion, and - should also not react to congestion indications more than once - every window of data (or once per round-trip time). + The ECE flag, defined in RFC 3168 [Ramakrishnan et al, 2001], is used + as part of the Explicit Congestion Notification (ECN) mechanism. An analysis of the security implications of ECN can be found in Section 9.3 of this document. 3.5.4. URG - DISCUSSION: - When the URG flag is set, the Urgent Pointer field contains the current value of the urgent pointer. Receipt of an "urgent" indication generates, in a number of implementations (such as those in UNIX-like systems), a software interrupt (signal) that is delivered to the corresponding process. + In UNIX-like systems, receipt of an urgent indication causes a SIGURG + signal to be delivered to the corresponding process. - In UNIX-like systems, receipt of an urgent indication causes a - SIGURG signal to be delivered to the corresponding process. - - A number of applications handle TCP urgent indications by - installing a signal handler for the corresponding signal (e.g., - SIGURG). As discussed in [Zalewski, 2001b], some signal handlers - can be maliciously exploited by an attacker, for example to gain - remote access to a system. While secure programming of signal - handlers is out of the scope of this document, we nevertheless - raise awareness that TCP urgent indications might be exploited to - abuse poorly-written signal handlers. + A number of applications handle TCP urgent indications by installing + a signal handler for the corresponding signal (e.g., SIGURG). As + discussed in [Zalewski, 2001b], some signal handlers can be + maliciously exploited by an attacker, for example to gain remote + access to a system. While secure programming of signal handlers is + out of the scope of this document, we nevertheless raise awareness + that TCP urgent indications might be exploited to abuse poorly- + written signal handlers. Section 3.9 discusses the security implications of the TCP urgent mechanism. 3.5.5. ACK - DISCUSSION: - - When the ACK bit is one, the Acknowledgment Number field contains - the next sequence number expected, cumulatively acknowledging the - receipt of all data up to the sequence number in the - Acknowledgement Number, minus one. Section 3.4 of this document - describes sanity checks that should be performed on the - Acknowledgement Number field. + When the ACK bit is one, the Acknowledgment Number field contains the + next sequence number expected, cumulatively acknowledging the receipt + of all data up to the sequence number in the Acknowledgement Number, + minus one. Section 3.4 of this document describes sanity checks that + should be performed on the Acknowledgement Number field. TCP Acknowledgements are also used to perform heuristics for loss recovery and congestion control. Section 9 of this document describes a number of ways in which these mechanisms can be exploited. 3.5.6. PSH - As a result of a SEND call, TCP SHOULD send all queued data (provided - that TCP's flow control and congestion control algorithms allow it). - - Received data SHOULD be immediately delivered to an application - calling the RECEIVE function, even if the data already available are - less than those requested by the application. - - DISCUSSION: - - RFC 793 [Postel, 1981c] contains (in pages 54-64) a functional - description of a TCP Application Programming Interface (API). One - of the parameters of the SEND function is the PUSH flag which, - when set, signals the local TCP that it must send all unsent data. - The TCP PSH (PUSH) flag will be set in the last outgoing segment, - to signal the push function to the receiving TCP. Upon receipt of - a segment with the PSH flag set, the receiving user's buffer is - returned to the user, without waiting for additional data to - arrive. - - There are two security considerations arising from the PUSH - function. On the sending side, an attacker could cause a large - amount of data to be queued for transmission without setting the - PUSH flag in the SEND call. This would prevent the local TCP from - sending the queued data, causing system memory to be tied to those - data for an unnecessarily long period of time. - - An analogous consideration should be made for the receiving TCP. - TCP is allowed to buffer incoming data until the receiving user's - buffer fills or a segment with the PSH bit set is received. If - the receiving TCP implements this policy, an attacker could send a - large amount of data, slightly less than the receiving user's - buffer size, to cause system memory to be tied to these data for - an unnecessarily long period of time. Both of these issues are - discussed in Section 4.2.2.2 of RFC 1122 [Braden, 1989]. - - In order to mitigate these potential vulnerabilities, we suggest - assuming an implicit "PUSH" in every SEND call. On the sending - side, this means that as a result of a SEND call TCP should try to - send all queued data (provided that TCP's flow control and - congestion control algorithms allow it). On the receiving side, - this means that the received data will be immediately delivered to - an application calling the RECEIVE function, even if the data - already available are less than those requested by the - application. - - It is interesting to note that popular TCP APIs (such as - "sockets") do not provide a PUSH flag in any of the interfaces - they define, but rather perform some kind of "heuristics" to set - the PSH bit in outgoing segments. As a result, the value of the - PSH bit in the received TCP segments is usually a policy of the - sending TCP, rather than a policy of the sending application. All - robust applications that make use of those APIs (such as the - sockets API) properly handle the case of a RECEIVE call returning - less data (e.g., zero) than requested, usually by performing - subsequent RECEIVE calls. - - Another potential malicious use of the PSH bit would be for an - attacker to send small TCP segments (probably with zero bytes of - data payload) to cause the receiving application to be - unnecessarily woken up (increasing the CPU load), or to cause - malfunction of poorly-written applications that may not handle - well the case of RECEIVE calls returning less data than requested. + [draft-gont-tcpm-tcp-push-semantics-00.txt] describes a number of + security issues that may arise as a result of the PUSH semantics, and + proposes a number of ways to mitigate these issues. 3.5.7. RST - TCP MUST process RST segments (i.e., segments with the RST bit set) - as follows: - - o If the Sequence Number of the RST segment is not valid (i.e., - falls outside of the receive window), silently drop the segment. - - o If the Sequence Number of the RST segment matches the next - expected sequence number (RCV.NXT), abort the corresponding - connection. - - o If the Sequence Number is valid (i.e., falls within the receive - window) but is not exactly RCV.NXT, send an ACK segment (a - "challenge ACK") of the form: . - TCP SHOULD rate-limit these challenge ACK segments. - - DISCUSSION: - - The RST bit is used to request the abortion (abnormal close) of a - TCP connection. RFC 793 [Postel, 1981c] suggests that an RST - segment should be considered valid if its Sequence Number is valid - (i.e., falls within the receive window). However, in response to - the security concerns raised by [Watson, 2004] and [NISCC, 2004], - [Ramaiah et al, 2008] proposec the aforementioned stricter - validity checks. + The RST bit is used to request the abortion (abnormal close) of a TCP + connection. RFC 793 [RFC0793] suggests that an RST segment should be + considered valid if its Sequence Number is valid (i.e., falls within + the receive window). However, in response to the security concerns + raised by [Watson, 2004] and [NISCC, 2004], [RFC6429] proposed + stricter validity checks. Please see [RFC6429] for additional + details. Section 11.1 of this document describes TCP-based connection-reset attacks, along with a number of countermeasures to mitigate their impact. 3.5.8. SYN - DISCUSSION: - The SYN bit is used during the connection-establishment phase, to request the synchronization of sequence numbers. - There are basically four different vulnerabilities that make use - of the SYN bit: SYN-flooding attacks, connection forgery attacks, - connection flooding attacks, and connection-reset attacks. They - are described in Section 5.1, Section 5.2, Section 5.3, and - Section 11.1.2, respectively, along with the possible - countermeasures. + There are basically four different vulnerabilities that make use of + the SYN bit: SYN-flooding attacks, connection forgery attacks, + connection flooding attacks, and connection-reset attacks. They are + described in Section 5.1, Section 5.2, Section 5.3, and Section + 11.1.2, respectively, along with the possible countermeasures. 3.5.9. FIN - DISCUSSION: - The FIN flag is used to signal the remote end-point the end of the data transfer in this direction. Receipt of a valid FIN segment - (i.e., a TCP segment with the FIN flag set) causes the transition - in the connection state, as part of what is usually referred to as - the "connection termination phase". + (i.e., a TCP segment with the FIN flag set) causes the transition in + the connection state, as part of what is usually referred to as the + "connection termination phase". - The connection-termination phase can be exploited to perform a - number of resource-exhaustion attacks. Section 6 of this document - describes a number of attacks that exploit the connection- - termination phase along with the possible countermeasures. + The connection-termination phase can be exploited to perform a number + of resource-exhaustion attacks. Section 6 of this document describes + a number of attacks that exploit the connection-termination phase + along with the possible countermeasures. 3.6. Window - DISCUSSION: - The TCP Window field advertises how many bytes of data the remote peer is allowed to send before a new advertisement is made. - Theoretically, the maximum transfer rate that can be achieved by - TCP is limited to: + Theoretically, the maximum transfer rate that can be achieved by TCP + is limited to: Maximum Transfer Rate = Window / RTT This means that, under ideal network conditions (e.g., no packet loss), the TCP Window in use should be at least: Window = 2 * Bandwidth * Delay - Using a larger Window than that resulting from the previous - equation will not provide any improvements in terms of - performance. + Using a larger Window than that resulting from the previous equation + will not provide any improvements in terms of performance. In practice, selection of the most convenient Window size may also depend on a number of other parameters, such as: packet loss rate, loss recovery mechanisms in use, etc. - Security implications of the maximum TCP window size - An aspect of the TCP Window that is usually overlooked is the security implications of its size. Increasing the TCP window - increases the sequence number space that will be considered - "valid" for incoming segments. Thus, use of unnecessarily large - TCP Window sizes increases TCP's vulnerability to forgery attacks - unnecessarily. + increases the sequence number space that will be considered "valid" + for incoming segments. Thus, use of unnecessarily large TCP Window + sizes increases TCP's vulnerability to forgery attacks unnecessarily. - In those scenarios in which the network conditions are known - and/or can be easily predicted, it is recommended that the TCP - Window is never set to a value larger than that resulting from the - equations above. Additionally, the nature of the application - running on top of TCP should be considered when tuning the TCP - window. As an example, an H.245 signaling application certainly - does not have high requirements on throughput, and thus a window - size of around 4 KBytes will usually fulfill its needs, while - keeping TCP's resistance to off-path forgery attacks at a decent - level. Some rough measurements seem to indicate that a TCP window - of 4Kbytes is common practice for TCP connections servicing - applications such as BGP. + In those scenarios in which the network conditions are known and/or + can be easily predicted, it is recommended that the TCP Window is + never set to a value larger than that resulting from the equations + above. Additionally, the nature of the application running on top of + TCP should be considered when tuning the TCP window. As an example, + an H.245 signaling application certainly does not have high + requirements on throughput, and thus a window size of around 4 KBytes + will usually fulfill its needs, while keeping TCP's resistance to + off-path forgery attacks at a decent level. Some rough measurements + seem to indicate that a TCP window of 4Kbytes is common practice for + TCP connections servicing applications such as BGP. - In principle, a possible approach to avoid requiring - administrators to manually set the TCP window would be to - implement an automatic buffer tuning mechanism, such as that - described in [Heffner, 2002]. However, as discussed in Section - 7.3.2 of this document these mechanisms can be exploited to - perform other types of attacks. + In principle, a possible approach to avoid requiring administrators + to manually set the TCP window would be to implement an automatic + buffer tuning mechanism, such as that described in [Heffner, 2002]. + However, as discussed in Section 7.3.2 of this document these + mechanisms can be exploited to perform other types of attacks. - Security implications arising from closed windows +3.6.1. Security implications arising from closed windows - The TCP window is a flow-control mechanism that prevents a fast - data sender application from overwhelming a "slow" receiver. When - a TCP end-point is not willing to receive any more data (before + When a TCP end-point is not willing to receive any more data (before some of the data that have already been received are consumed), it will advertise a TCP window of zero bytes. This will effectively stop the sender from sending any new data to the TCP receiver. - Transmission of new data will resume when the TCP receiver - advertises a nonzero TCP window, usually with a TCP segment that - contains no data ("an ACK"). + Transmission of new data will resume when the TCP receiver advertises + a nonzero TCP window, usually with a TCP segment that contains no + data ("an ACK"). This segment is usually referred to as a "window update", as the only purpose of this segment is to update the server regarding the new window. - To accommodate those scenarios in which the ACK segment that - "opens" the window is lost, TCP implements a "persist timer" that - causes the TCP sender to query the TCP receiver periodically if - the last segment received advertised a window of zero bytes. This - probe simply consists of sending one byte of new data that will - force the TCP receiver to send an ACK segment back to the TCP - sender, containing the current TCP window. Similarly to the - retransmission timeout timer, an exponential back-off is used when - calculating the retransmission timer, so that the spacing between - probes increases exponentially. + To accommodate those scenarios in which the ACK segment that "opens" + the window is lost, TCP implements a "persist timer" that causes the + TCP sender to query the TCP receiver periodically if the last segment + received advertised a window of zero bytes. This probe simply + consists of sending one byte of new data that will force the TCP + receiver to send an ACK segment back to the TCP sender, containing + the current TCP window. Similarly to the retransmission timeout + timer, an exponential back-off is used when calculating the + retransmission timer, so that the spacing between probes increases + exponentially. A fundamental difference between the "persist timer" and the - retransmission timer is that there is no limit on the amount of - time during which a TCP can advertise a zero window. This means - that a TCP end-point could potentially advertise a zero window - forever, thus keeping kernel memory at the TCP sender tied to the - TCP retransmission buffer. This could clearly be exploited as a - vector for performing a Denial of Service (DoS) attack against - TCP, such as that described in Section 7.1 of this document. + retransmission timer is that there is no limit on the amount of time + during which a TCP can advertise a zero window. This means that a + TCP end-point could potentially advertise a zero window forever, thus + keeping kernel memory at the TCP sender tied to the TCP + retransmission buffer. This could clearly be exploited as a vector + for performing a Denial of Service (DoS) attack against TCP, such as + that described in Section 7.1 of this document. Section 7.1 of this document describes a Denial of Service attack that aims at exhausting the kernel memory used for the TCP retransmission buffer, along with possible countermeasures. 3.7. Checksum - Middleboxes that process TCP segments MUST validate the Checksum - field, and silently discard the TCP segment if such validation fails. - - DISCUSSION: - - The Checksum field is an error detection mechanism meant for the - contents of the TCP segment and a number of important fields of - the IP header. It is computed over the full TCP header pre-pended - with a pseudo header that includes the IP Source Address, the IP - Destination Address, the Protocol number, and the TCP segment - length. While in principle there should not be security - implications arising from this field, due to non-RFC-compliant - implementations, the Checksum can be exploited to detect - firewalls, evade network intrusion detection systems (NIDS), - and/or perform Denial of Service attacks. + While in principle there should not be security implications arising + from the Checksum field, due to non-RFC-compliant implementations, + the Checksum can be exploited to detect firewalls, evade network + intrusion detection systems (NIDS), and/or perform Denial of Service + attacks. If a stateful firewall does not check the TCP Checksum in the segments it processes, an attacker can exploit this situation to - perform a variety of attacks. For example, he could send a flood - of TCP segments with invalid checksums, which would nevertheless - create state information at the firewall. When each of these - segments is received at its intended destination, the TCP checksum - will be found to be incorrect, and the corresponding will be - silently discarded. As these segments will not elicit a response - (e.g., an RST segment) from the intended recipients, the - corresponding connection state entries at the firewall will not be - removed. Therefore, an attacker may end up tying all the state - resources of the firewall to TCP connections that will never - complete or be terminated, probably leading to a Denial of Service - to legitimate users, or forcing the firewall to randomly drop - connection state entries. + perform a variety of attacks. For example, he could send a flood of + TCP segments with invalid checksums, which would nevertheless create + state information at the firewall. When each of these segments is + received at its intended destination, the TCP checksum will be found + to be incorrect, and the corresponding will be silently discarded. + As these segments will not elicit a response (e.g., an RST segment) + from the intended recipients, the corresponding connection state + entries at the firewall will not be removed. Therefore, an attacker + may end up tying all the state resources of the firewall to TCP + connections that will never complete or be terminated, probably + leading to a Denial of Service to legitimate users, or forcing the + firewall to randomly drop connection state entries. If a NIDS does not check the Checksum of TCP segments, an attacker - may send TCP segments with an invalid checksum to cause the NIDS - to obtain a TCP data stream different from that obtained by the - system being monitored. In order to "confuse" the NIDS, the - attacker would send TCP segments with an invalid Checksum and a - Sequence Number that would overlap the sequence number space being - used for his malicious activity. FTester [Barisani, 2006] is a - tool that can be used to assess NIDS on this issue. + may send TCP segments with an invalid checksum to cause the NIDS to + obtain a TCP data stream different from that obtained by the system + being monitored. In order to "confuse" the NIDS, the attacker would + send TCP segments with an invalid Checksum and a Sequence Number that + would overlap the sequence number space being used for his malicious + activity. FTester [Barisani, 2006] is a tool that can be used to + assess NIDS on this issue. Finally, an attacker performing port-scanning could potentially exploit intermediate systems that do not check the TCP Checksum to - detect whether a given TCP port is being filtered by an - intermediate firewall, or the port is actually closed by the host - being port-scanned. If a given TCP port appeared to be closed, - the attacker would then send a SYN segment with an invalid - Checksum. If this segment elicited a response (either an ICMP - error message or a TCP RST segment) to this packet, then that - response should come from a system that does not check the TCP - checksum. Since normal host implementations of the TCP protocol - do check the TCP checksum, such a response would most likely come - from a firewall or some other middle-box. + detect whether a given TCP port is being filtered by an intermediate + firewall, or the port is actually closed by the host being port- + scanned. If a given TCP port appeared to be closed, the attacker + would then send a SYN segment with an invalid Checksum. If this + segment elicited a response (either an ICMP error message or a TCP + RST segment) to this packet, then that response should come from a + system that does not check the TCP checksum. Since normal host + implementations of the TCP protocol do check the TCP checksum, such a + response would most likely come from a firewall or some other middle- + box. [Ed3f, 2002] describes the exploitation of the TCP checksum for performing the above activities. [US-CERT, 2005d] provides an example of a TCP implementation that failed to check the TCP checksum. 3.8. Urgent pointer - Segment.Size - Data Offset * 4 > 0 - - If a TCP segment with the URG bit set does not pass this check, it - MUST be silently dropped. - - For TCP segments that have the URG bit set to zero, sending TCP TCP - SHOULD set the Urgent Pointer to zero. - - A receiving TCP MUST ignore the Urgent Pointer field of TCP segments - for which the URG bit is zero. - - DISCUSSION: - - Section 3.7 of RFC 793 [Postel, 1981c] states (in page 42) that to - send an urgent indication the user must also send at least one - byte of data. - - If the URG bit is zero, the Urgent Pointer is not valid, and thus - should not be processed by the receiving TCP. Nevertheless, we - recommend TCP implementations to set the Urgent Pointer to zero - when sending a TCP segment that does not have the URG bit set, and - to ignore the Urgent Pointer (as required by RFC 793) when the URG - bit is zero. - - Some stacks have been known to fail to set the Urgent Pointer to - zero when the URG bit is zero, thus leaking out the corresponding - system memory contents. [Zalewski, 2008] provides further details - about this issue. - Some implementations have been found to be unable to process TCP - urgent indications correctly. [Myst, 1997] originally described - how TCP urgent indications could be exploited to perform a Denial - of Service (DoS) attack against some TCP/IP implementations, - usually leading to a system crash. + urgent indications correctly. [Myst, 1997] originally described how + TCP urgent indications could be exploited to perform a Denial of + Service (DoS) attack against some TCP/IP implementations, usually + leading to a system crash. + + [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes a number of + sanity checks to be enforced on TCP segments regarding urgent + indications. [RFC6093] deprecates the use of urgent indications in + new applications. 3.9. Options [IANA, 2007] contains the official list of the assigned option numbers. TCP Options have been specified in the past both within the IETF and by other groups. [Hnes, 2007] contains an un-official updated version of the IANA list of assigned option numbers. The following table contains a summary of the assigned TCP option numbers, which is based on [Hnes, 2007]. @@ -1197,548 +812,188 @@ o Case 2: An option-kind byte, followed by an option-length byte, and the actual option-data bytes. In options of the Case 2 above, the option-length byte counts the option-kind byte and the option-length byte, as well as the actual option-data bytes. All options except "End of Option List" (Kind = 0) and "No Operation" (Kind = 1), are of "Case 2". - For options that belong to the "Case 2" described above, the - following checks MUST be performed: - - option-length >= 2 - - option-offset + option-length <= Data Offset * 4 - - Where option-offset is the offset of the first byte of the option - within the TCP header, with the first byte of the TCP header being - assigned an offset of 0. - - If a TCP segment fails to pass any of these checks, it SHOULD be - silently dropped. - - TCP MUST ignore unknown TCP options, provided they pass the - validation checks specified above. In the same way, middle-boxes - such as packet filters SHOULD NOT reject TCP segments containing - "unknown" TCP options that pass the validation checks described - earlier in this Section. - - DISCUSSION: - - The value "2" in the first equation accounts for the option-kind - byte and the option-length byte, and assumes zero bytes of option- - data. This check prevents, among other things, loops in option - processing that may arise from incorrect option lengths. - - The second equation takes into account the limit on the legitimate - option length imposed by the syntax of the TCP header, and is - meant to detect forged option-length values that might make an - option overlap with the TCP payload, or even go past the actual - end of the TCP segment carrying the option. - - Middle-boxes such as packet filters should not reject TCP segments - containing unknown options solely because these options have not been - present in the SYN/SYN-ACK handshake. - - DISCUSSION: - - There is renewed interest in defining new TCP options for purposes - like improved connection management and maintenance, advanced - congestion control schemes, and security features. The evolution - of the TCP/IP protocol suite would be severely impacted by - obstacles to deploying such new protocol mechanisms. - - Middle-boxes such as packet filters SHOULD NOT reject TCP segments - containing unknown options solely because these options have not been - present in the SYN/SYN-ACK handshake. - - DISCUSSION: - - In the past, TCP enhancements based on TCP options regularly have - specified the exchange of a specific "enabling" option during the - initial SYN/SYN-ACK handshake. Due to the severely limited TCP - option space which has already become a concern, it should be - expected that future specifications might introduce new options - not negotiated or enabled in this way. Therefore, middle-boxes - such as packet filters should not reject TCP segments containing - unknown options solely because these options have not been present - in the SYN/SYN-ACK handshake. - - TCP MUST NOT "echo" in any way unknown TCP options received in - inbound TCP segments. - - DISCUSSION: - - Some TCP implementations have been known to "echo" unknown TCP - options received in incoming segments. Here we stress that TCP - must not "echo" in any way unknown TCP options received in inbound - TCP segments. This is at the foundation for the introduction of - new TCP options, ensuring unambiguous behavior of systems not - supporting a new specification. + [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes a number of + sanity checks that should be performed on TCP options. Section 4 discusses the security implications of common TCP options. 3.10. Padding The TCP header padding is used to ensure that the TCP header ends and data begins on a 32-bit boundary. The padding is composed of zeros. 3.11. Data The data field contains the upper-layer packet being transmitted by means of TCP. This payload is processed by the application process making use of the transport services of TCP. Therefore, the security implications of this field are out of the scope of this document. 4. Common TCP Options 4.1. End of Option List (Kind = 0) - TCP implementations MUST be able to gracefully handle those TCP - segments in which the End of Option List should have been present, - but is missing. - - DISCUSSION: - - This option is used to indicate the "end of options" in those - cases in which the end of options would not coincide with the end - of the TCP header. - - TCP implementations are required to ignore those options they do - not implement, and to be able to handle options with illegal - lengths. Therefore, TCP implementations should be able to - gracefully handle those TCP segments in which the End of Option - List should have been present, but is missing. - - It is interesting to note that some TCP implementations do not use - the "End of Option List" option for indicating the "end of - options", but simply pad the TCP header with several "No - Operation" (Kind = 1) options to meet the header length specified - by the Data Offset header field. + This option indicates the "End of Options". As noted in + [draft-gont-tcpm-tcp-sanity-checks-00.txt], some implementations pad + the end of options with "No Operation" options rather than including + an "End of Options List" option. 4.2. No Operation (Kind = 1) The no-operation option is basically used to allow the sending system to align subsequent options in, for example, 32-bit boundaries. This option does not have any known security implications. 4.3. Maximum Segment Size (Kind = 2) The Maximum Segment Size (MSS) option is used to indicate to the remote TCP endpoint the maximum segment size this TCP is willing to receive. - The following check MUST be performed on a TCP segment that carries a - MSS option: - - SYN == 1 - - If the segment does not pass this check, it MUST be silently dropped. - - DISCUSSION: - - As stated in Section 3.1 of RFC 793 [Postel, 1981c], this option - can only be sent in the initial connection request (i.e., in - segments with the SYN control bit set). - - TCP MUST check that the option length is 4. If the option does not - pass this check, it MUST be dropped. - - The received MSS SHOULD be sanitized as follows: - - Sanitized_MSS = max(MSS, 536) - - This "sanitized" MSS value SHOULD be used to compute the "effective - send MSS" by the expression included in Section 4.2.2.6 of RFC 1122 - [Braden, 1989], as follows: - - Eff.snd.MSS = min(Sanitized_MSS+20, MMS_S) - TCPhdrsize - IPoptionsize - - where: - - Sanitized_MSS: - sanitized MSS value (the value received in the MSS option, with an - enforced minimum value) - - MMS_S: - maximum size for a transport-layer message that TCP may send - - TCPhdrsize: - size of the TCP header, which typically was 20, but may be larger - if TCP options are to be sent. - - IPoptionsize - size of any IP options that TCP will pass to the IP layer with the - current message. - - DISCUSSION: - - The advertised maximum segment size may be the result of the - consideration of a number of factors. Firstly, if fragmentation - is employed, the size of the IP reassembly buffer may impose a - limit on the maximum TCP segment size that can be received. - Considering that the minimum IP reassembly buffer size is 576 - bytes, if an MSS option is not present included in the connection- - establishment phase, an MSS of 536 bytes should be assumed. - Secondly, if Path-MTU Discovery (specified in RFC 1191 [Mogul and - Deering, 1990] and RFC 1981 [McCann et al, 1996]) is expected to - be used for the connection, an artificial maximum segment size may - be enforced by a TCP to prevent the remote peer from sending TCP - segments which would be too large to be transmitted without - fragmentation. Finally, a system connected by a low-speed link - may choose to introduce an artificial maximum segment size to - enforce an upper limit on the network latency that would otherwise - negatively affect its interactive applications [Stevens, 1994]. - - The TCP specifications do not impose any requirements on the - maximum segment size value that is included in the MSS option. - However, there are a number of values that may cause undesirable - results. Firstly, an MSS of 0 could possible "freeze" the TCP - connection, as it would not allow data to be included in the - payload of the TCP segments. Secondly, low values other than 0 - would degrade the performance of the TCP connection (wasting more - bandwidth in protocol headers than in actual data), and could - potentially exhaust processing cycles at the sending TCP and/or - the receiving TCP by producing an increase in the interrupt rate - caused by the transmitted (or received) packets. - - The problems that might arise from low MSS values were first - described by [Reed, 2001]. However, the community did not reach - consensus on how to deal with these issues at that point. - - RFC 791 [Postel, 1981a] requires IP implementations to be able to - receive IP datagrams of at least 576 bytes. Assuming an IPv4 - header of 20 bytes, and a TCP header of 20 bytes, there should be - room in each IP packet for 536 application data bytes. - - There are two cases to analyze when considering the possible - interoperability impact of sanitizing the received MSS value: TCP - connections relying on IP fragmentation and TCP connections - implementing Path-MTU Discovery. In case the corresponding TCP - connection relies on IP fragmentation, given that the minimum - reassembly buffer size is required to be 576 bytes by RFC 791 - [Postel, 1981a], the adoption of 536 bytes as a lower limit is - safe. - - In case the TCP connection relies on Path-MTU Discovery, imposing - a lower limit on the adopted MSS may ignore the advice of the - remote TCP on the maximum segment size that can possibly be - transmitted without fragmentation. As a result, this could lead - to the first TCP data segment to be larger than the Path-MTU. - However, in such a scenario, the TCP segment should elicit an ICMP - Unreachable "fragmentation needed and DF bit set" error message - that would cause the "effective send MSS" (E_MSS) to be decreased - appropriately. Thus, imposing a lower limit on the accepted MSS - will not cause any interoperability problems. - - A possible scenario exists in which the proposed enforcement of a - lower limit in the received MSS might lead to an interoperability - problem. If a system was attached to the network by means of a - link with an MTU of less than 576 bytes, and there was some - intermediate system which either silently dropped (i.e., without - sending an ICMP error message) those packets equal to or larger - than that 576 bytes, or some intermediate system simply filtered - ICMP "fragmentation needed and DF bit set" error messages, the - proposed behavior would not lead to an interoperability problem, - when communication could have otherwise succeeded. However, the - interoperability problem would really be introduced by the network - setup (e.g., the middle-box silently dropping packets), rather - than by the mechanism proposed in this section. In any case, TCP - should nevertheless implement a mechanism such as that specified - by RFC 4821 [Mathis and Heffner, 2007] to deal with this type of - "network black-holes". + The MSS option has been employed for performing DoS attacks, by + advertising very small MSS values thus greatly increasing the packet- + rate used by the victim system. + [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes this issue, and + proposes sanity checks to mitigate it. 4.4. Selective Acknowledgement Option The Selective Acknowledgement option provides an extension to allow the acknowledgement of individual segments, to enhance TCP's loss recovery. Two options are involved in the SACK mechanism. The "Sack-permitted option" is sent during the connections-establishment phase, to advertise that SACK is supported. If both TCP peers agree to use selective acknowledgements, the actual selective acknowledgements are sent, if needed, by means of "SACK options". 4.4.1. SACK-permitted Option (Kind = 4) - The SACK-permitted option is meant to advertise that the TCP sending - this segment supports Selective Acknowledgements. - - The following check MUST be performed on a TCP segment that carries a - MSS option: - - SYN == 1 - - If a segment does not pass this check, it MUST be silently dropped. - - DISCUSSION: - - The SACK-permitted option can be sent only in SYN segments. - - TCP MUST check that the option length is 2. If the option does not - pass this check it MUST be silently dropped. + [draft-gont-tcpm-tcp-sanity-checks-00.txt] to be performed on this + option. 4.4.2. SACK Option (Kind = 5) - The SACK option is used to convey extended acknowledgment information - from the receiver to the sender over an established TCP connection. - The option consists of an option-kind byte (which must be 5), an - option-length byte, and a variable number of SACK blocks. - - TCP MUST silently discard those TCP segments carrying a SACK option - that does not pass the following check: - - option-offset + option-length <= Data Offset * 4 - - TCP MUST silently discard those TCP segments carrying a SACK option - that does not pass the following check: - - option-length >= 10 - - DISCUSSION: - - A SACK Option with zero SACK blocks is nonsensical. The value - "10" accounts for the option-kind byte, the option-length byte, a - 4-byte left-edge field, and a 4-byte right-edge field. - - TCP MUST silently discard those TCP segments carrying a SACK option - that does not pass the following check: - - (option-length - 2) % 8 == 0 - - DISCUSSION: - - As stated in Section 3 of RFC 2018 [Mathis et al, 1996], a SACK - option that specifies n blocks will have a length of 8*n+2. - - TCP MUST silently discard those TCP segments carrying a SACK option - that contains a SACK block that does not pass the following check: - - Left Edge of Block < Right Edge of Block - - As in all the other occurrences in this document, all comparisons - between sequence numbers should be performed using sequence number - arithmetic. - - DISCUSSION: - - Each block included in a SACK option represents a number of - received data bytes that are contiguous and isolated; that is, the - bytes just below the block, (Left Edge of Block - 1), and just - above the block, (Right Edge of Block), have not yet been - received. - - TCP MUST enforce a limit on the number of SACK blocks that a TCP will - store in memory for each connection at any time. - - DISCUSSION: - The TCP receiving a SACK option is expected to keep track of the - selectively-acknowledged blocks. Even when space in the TCP - header is limited (and thus each TCP segment can selectively- - acknowledge at most four blocks of data), an attacker could try to - perform a buffer overflow or a resource-exhaustion attack by - sending a large number of SACK options. + selectively-acknowledged blocks. Even when space in the TCP header + is limited (and thus each TCP segment can selectively-acknowledge at + most four blocks of data), an attacker could try to perform a buffer + overflow or a resource-exhaustion attack by sending a large number of + SACK options. - For example, an attacker could send a large number of SACK - options, each of them acknowledging one byte of data. - Additionally, for the purpose of wasting resources on the attacked - system, each of these blocks would be separated from each other by - one byte, to prevent the attacked system from coalescing two (or - more) contiguous SACK blocks into a single SACK block. If the - attacked system kept track of each SACKed block by storing both - the Left Edge and the Right Edge of the block, then for each - window of data, the attacker could waste up to 4 * Window bytes of - memory at the attacked TCP. + For example, an attacker could send a large number of SACK options, + each of them acknowledging one byte of data. Additionally, for the + purpose of wasting resources on the attacked system, each of these + blocks would be separated from each other by one byte, to prevent the + attacked system from coalescing two (or more) contiguous SACK blocks + into a single SACK block. If the attacked system kept track of each + SACKed block by storing both the Left Edge and the Right Edge of the + block, then for each window of data, the attacker could waste up to 4 + * Window bytes of memory at the attacked TCP. The value "4 * Window" results from the expression "(Window / 2) * 8", in which the value "2" accounts for the 1-byte block selectively-acknowledged by each SACK block and 1 byte that would be used to separate each SACK blocks from each other, and the value "8" accounts for the 8 bytes needed to store the Left Edge and the Right Edge of each SACKed block. - Therefore, it is clear that a limit should be imposed on the - number of SACK blocks that a TCP will store in memory for each - connection at any time. Measurements in [Dharmapurikar and - Paxson, 2005] indicate that in the vast majority of cases - connections have a single hole in the data stream at any given - time. Thus, a limit of 16 SACK blocks for each connection would - handle even most of the more unusual cases in which there is more - than one simultaneous hole at a time. + [draft-gont-tcpm-tcp-sanity-checks-00.txt] describes sanity checks to + be performed on this option such that this and other possible issues + are mitigated. 4.5. MD5 Option (Kind=19) The TCP MD5 option provides a mechanism for authenticating TCP segments with a 18-byte digest produced by the MD5 algorithm. The option consists of an option-kind byte (which must be 19), an option- length byte (which must be 18), and a 16-byte MD5 digest. - TCP MUST silently drop a TCP segment that carries a TCP MD5 option - that does not pass the following checks: - - option-offset + option-length <= Data Offset * 4 - - option-length == 18 - - DISCUSSION: - - The TCP MD5 option is of "Case 2", and has a fixed length. - - DISCUSSION: - A basic weakness on the TCP MD5 option is that the MD5 algorithm - itself has been known (for a long time) to be vulnerable to - collision search attacks. + itself has been known (for a long time) to be vulnerable to collision + search attacks. - [Bellovin, 2006] argues that it has two other weaknesses, namely - that it does not provide a key identifier, and that it has no - provision for automated key management. However, it is generally - accepted that while a Key-ID field can be a good approach for - providing smooth key rollover, it is not actually a requirement. - For instance, most systems implementing the TCP MD5 option include - a "keychain" mechanism that fully supports smooth key rollover. - Additionally, with some further work, ISAKMP/IKE could be used to - configure the MD5 keys. + [Bellovin, 2006] argues that it has two other weaknesses, namely that + it does not provide a key identifier, and that it has no provision + for automated key management. However, it is generally accepted that + while a Key-ID field can be a good approach for providing smooth key + rollover, it is not actually a requirement. For instance, most + systems implementing the TCP MD5 option include a "keychain" + mechanism that fully supports smooth key rollover. Additionally, + with some further work, ISAKMP/IKE could be used to configure the MD5 + keys. - It is interesting to note that while the TCP MD5 option, as - specified by RFC 2385 [Heffernan, 1998], addresses the TCP-based - forgery attacks against TCP discussed in Section 11, it does not - address the ICMP-based connection-reset attacks discussed in - Section 15. As a result, while a TCP connection may be protected - from TCP-based forgery attacks by means of the MD5 option, an - attacker might still be able to successfully perform the ICMP- - based counter-part. + It is interesting to note that while the TCP MD5 option, as specified + by RFC 2385 [Heffernan, 1998], addresses the TCP-based forgery + attacks against TCP discussed in Section 11, it does not address the + ICMP-based connection-reset attacks discussed in Section 15. As a + result, while a TCP connection may be protected from TCP-based + forgery attacks by means of the MD5 option, an attacker might still + be able to successfully perform the ICMP-based counter-part. The TCP MD5 option has been obsoleted by the TCP-AO. 4.6. Window scale option (Kind = 3) The window scale option provides a mechanism to expand the definition of the TCP window to 32 bits, such that the performance of TCP can be improved in some network scenarios. The Window scale option consists of an option-kind byte (which must be 3), followed by an option- length byte (which must be 3), and a shift count (shift.cnt) byte (the actual option-data). - The option may be sent only in the initial SYN segment, but may also - be sent in a SYN/ACK segment if the option was received in the - initial SYN segment. If the option is received in any other segment, - it MUST be silently dropped. - - TCP MUST silently discard TCP segments that contain a Window scale - option whose option-length is not 3. - - DISCUSSION: - - This option has a fixed length. - - TCP MUST silently discard TCP segments that contain a Window scale - option that does not pass the following check: - - shift.cnt <= 14 - - DISCUSSION: - - As discussed in Section 2.3 of RFC 1323 [Jacobson et al, 1992], in - order to prevent new data from being mistakenly considered as old - and vice versa, the resulting window should be equal to or smaller - than 2^32. - - DISCUSSION: - - [Welzl, 2008] describes major problems with the use of the Window - scale option in the Internet due to faulty equipment. - While there are not known security implications arising from the window scale mechanism itself, the size of the TCP window has a number of security implications. In general, larger window sizes increase the chances of an attacker from successfully performing - forgery attacks against TCP, such as those described in Section 11 - of this document. Additionally, large windows can exacerbate the - impact of resource exhaustion attacks such as those described in - Section 7 of this document. + forgery attacks against TCP, such as those described in Section 11 of + this document. Additionally, large windows can exacerbate the impact + of resource exhaustion attacks such as those described in Section 7 + of this document. Section 3.7 provides a general discussion of the security implications of the TCP window size. Section 7.3.2 discusses the - security implications of Automatic receive-buffer tuning - mechanisms. + security implications of Automatic receive-buffer tuning mechanisms. 4.7. Timestamps option (Kind = 8) The Timestamps option, specified in RFC 1323 [Jacobson et al, 1992], is used to perform two functions: Round-Trip Time Measurement (RTTM), and Protection Against Wrapped Sequence Numbers (PAWS). - TCP MUST silently discard TCP segments that contain a Timestamps - option that does not pass the following check: - - option-length == 10 - - DISCUSSION: - - As specified by RFC 1323, the option-length must be 10. - 4.7.1. Generation of timestamps - TCP SHOULD generate timestamps with the following expression: - - timestamp = T() + F(localhost, localport, remotehost, remoteport, secret_key) - - where the result of T() is a global system clock that complies with - the requirements of Section 4.2.2 of RFC 1323 [Jacobson et al, 1992], - and F() is a function that should not be computable from the outside. - Therefore, we suggest F() to be a cryptographic hash function of the - connection-id and some secret data. - - DISCUSSION: - For the purpose of PAWS, the timestamps sent on a connection are required to be monotonically increasing. While there is no - requirement that timestamps are monotonically increasing across - TCP connections, the generation of timestamps such that they are + requirement that timestamps are monotonically increasing across TCP + connections, the generation of timestamps such that they are monotonically increasing across connections between the same two - endpoints allows the use of timestamps for improving the handling - of SYN segments that are received while the corresponding four- - tuple is in the TIME-WAIT state. This is discussed in Section - 11.1.2 of this document. - - F() provides an offset that will be the same for all incarnations - of a connection between the same two endpoints, while T() provides - the monotonically increasing values that are needed for PAWS. - - Further discussion about this algorithm is available in - [I-D.gont-timestamps-generation]. - - TCP SHOULD NOT initialize a global timestamp counter to a fixed value - when the system is bootstrapped. - - DISCUSSION: - - Some implementations are known to initialize their global - timestamp clock to zero when the system is bootstrapped. This is - undesirable, as the timestamp clock would disclose the system - uptime. - - TCP SHOULD set the Timestamp Echo Reply (TSecr) field to zero when - sending a TCP segment that does not have the ACK bit set (i.e., a SYN - segment). - - DISCUSSION: + endpoints allows the use of timestamps for improving the handling of + SYN segments that are received while the corresponding four-tuple is + in the TIME-WAIT state. This is discussed in Section 11.1.2 of this + document. - Some TCP implementations have been found to fail to set the - Timestamp Echo Reply field (TSecr) to zero in TCP segments that do - not have the ACK bit set, thus potentially leaking information. + Some implementations are known to initialize their global timestamp + clock to zero when the system is bootstrapped. This is undesirable, + as the timestamp clock would disclose the system uptime. + [I-D.gont-timestamps-generation] discusses the generation of TCP + timestamps in detail. 4.7.2. Vulnerabilities Blind In-Window Attacks Segments that contain a timestamp option smaller than the last timestamp option recorded by TCP are silently dropped. This allows for a subtle attack against TCP that would allow an attacker to cause one direction of data transfer of the attacked connection to freeze [US-CERT, 2005c]. An attacker could forge a TCP segment that @@ -1800,235 +1055,222 @@ proposes mitigations for this and other issues. 5. Connection-establishment mechanism The following subsections describe a number of attacks that can be performed against TCP by exploiting its connection-establishment mechanism. 5.1. SYN flood - TCP SHOULD implement (and enable by default) a syn-cache [Lemon, - 2002]. - - TCP SHOULD implement syn-cookies, and SHOULD enable them only after a - specified number of TCBs has been allocated for connections in the - SYN-RECEIVED state. - - DISCUSSION: - TCP uses a mechanism known as the "three-way handshake" for the establishment of a connection between two TCP peers. RFC 793 - [Postel, 1981c] states that when a TCP that is in the LISTEN state - receives a SYN segment (i.e., a TCP segment with the SYN flag - set), it must transition to the SYN-RECEIVED state, record the - control information (e.g., the ISN) contained in the SYN segment - in a Transmission Control Block (TCB), and respond with a SYN/ACK - segment. + [RFC0793] states that when a TCP that is in the LISTEN state receives + a SYN segment (i.e., a TCP segment with the SYN flag set), it must + transition to the SYN-RECEIVED state, record the control information + (e.g., the ISN) contained in the SYN segment in a Transmission + Control Block (TCB), and respond with a SYN/ACK segment. A Transmission Control Block is the data structure used to store (usually within the kernel) all the information relevant to a TCP connection. The concept of "TCB" is introduced in the core TCP - specification RFC 793 [Postel, 1981c]. + specification RFC 793 [RFC0793]. - In practice, virtually all existing implementations do not modify - the state of the TCP that was in the LISTEN state, but rather - create a new TCP (i.e., a new "protocol machine"), and perform all - the state transitions on this newly-created TCP. This allows the - application running on top of TCP to service to more than one - client at the same time. As a result, each connection request - results in the allocation of system memory to store the TCB - associated with the newly created TCB. + In practice, virtually all existing implementations do not modify the + state of the TCP that was in the LISTEN state, but rather create a + new TCP (i.e., a new "protocol machine"), and perform all the state + transitions on this newly-created TCP. This allows the application + running on top of TCP to service to more than one client at the same + time. As a result, each connection request results in the allocation + of system memory to store the TCB associated with the newly created + TCB. If TCP was implemented strictly as described in RFC 793, the - application running on top of TCP would have to finish servicing - the current client before being able to service the next one in - line, or should instead be able to perform some kind of connection - hand-off. + application running on top of TCP would have to finish servicing the + current client before being able to service the next one in line, or + should instead be able to perform some kind of connection hand-off. - An attacker could exploit TCP's connection-establishment mechanism - to perform a Denial of Service (DoS) attack, by sending a large - number of connection requests to the target system, with the - intent of exhausting the system memory destined for storing TCBs - (or related kernel data structures), thus preventing the attacked - system from establishing new connections with legitimate users. - This attack is widely known as "SYN flood", and has received a lot - of attention during the late 90's [CERT, 1996]. + An attacker could exploit TCP's connection-establishment mechanism to + perform a Denial of Service (DoS) attack, by sending a large number + of connection requests to the target system, with the intent of + exhausting the system memory destined for storing TCBs (or related + kernel data structures), thus preventing the attacked system from + establishing new connections with legitimate users. This attack is + widely known as "SYN flood", and has received a lot of attention + during the late 90's [CERT, 1996]. Given that the attacker does not need to complete the three-way handshake for the attacked system to tie system resources to the - newly created TCBs, he will typically forge the source IP address - of the malicious SYN segments he sends, thus concealing his own IP + newly created TCBs, he will typically forge the source IP address of + the malicious SYN segments he sends, thus concealing his own IP address. - If the forged IP addresses corresponded to some reachable system, - the impersonated system would receive the SYN/ACK segment sent by - the attacked host (in response to the forged SYN segment), which - would elicit an RST segment. This RST segment would be delivered - to the attacked system, causing the corresponding connection to be - aborted, and the corresponding TCB to be removed. + If the forged IP addresses corresponded to some reachable system, the + impersonated system would receive the SYN/ACK segment sent by the + attacked host (in response to the forged SYN segment), which would + elicit an RST segment. This RST segment would be delivered to the + attacked system, causing the corresponding connection to be aborted, + and the corresponding TCB to be removed. - As the impersonated host would not have any state information for - the TCP connection being referred to by the SYN/ACK segment, it - would respond with a RST segment, as specified by the TCP segment - processing rules of RFC 793 [Postel, 1981c]. + As the impersonated host would not have any state information for the + TCP connection being referred to by the SYN/ACK segment, it would + respond with a RST segment, as specified by the TCP segment + processing rules of RFC 793 [RFC0793]. However, if the forged IP source addresses were unreachable, the attacked TCP would continue retransmitting the SYN/ACK segment corresponding to each connection request, until timing out and aborting the connection. For this reason, a number of widely available attack tools first check whether each of the (forged) IP - addresses are reachable by sending an ICMP echo request to them. - The receipt of an ICMP echo response is considered an indication - of the IP address being reachable (and thus results in the - corresponding IP address not being used for performing the - attack), while the receipt of an ICMP unreachable error message is - considered an indication of the IP address being unreachable (and - thus results in the corresponding IP address being used for - performing the attack). + addresses are reachable by sending an ICMP echo request to them. The + receipt of an ICMP echo response is considered an indication of the + IP address being reachable (and thus results in the corresponding IP + address not being used for performing the attack), while the receipt + of an ICMP unreachable error message is considered an indication of + the IP address being unreachable (and thus results in the + corresponding IP address being used for performing the attack). - [Gont, 2008b] describes how the so-called ICMP soft errors could - be used by TCP to abort connections in any of the non-synchronized + [Gont, 2008b] describes how the so-called ICMP soft errors could be + used by TCP to abort connections in any of the non-synchronized states. While implementation of the mechanism described in that document would certainly not eliminate the vulnerability of TCP to SYN flood attacks (as the attacker could use addresses that are simply "black-holed"), it provides an example of how signaling - information such as that provided by means of ICMP error messages - can provide valuable information that a transport protocol could - use to perform heuristics. + information such as that provided by means of ICMP error messages can + provide valuable information that a transport protocol could use to + perform heuristics. In order to mitigate the impact of this attack, the amount of - information stored for non-established connections should be - reduced (ideally, non-synchronized connections should not require - any state information to be maintained at the TCP performing the - passive OPEN). There are basically two mitigation techniques for - this vulnerability: a syn-cache and syn-cookies. + information stored for non-established connections should be reduced + (ideally, non-synchronized connections should not require any state + information to be maintained at the TCP performing the passive OPEN). + There are basically two mitigation techniques for this vulnerability: + a syn-cache and syn-cookies. - [Borman, 1997] and RFC 4987 [Eddy, 2007] contain a general - discussion of SYN-flooding attacks and common mitigation - approaches. + [Borman, 1997] and RFC 4987 [Eddy, 2007] contain a general discussion + of SYN-flooding attacks and common mitigation approaches. - The syn-cache [Lemon, 2002] approach aims at reducing the amount - of state information that is maintained for connections in the - SYN-RECEIVED state, and allocates a full TCB only after the - connection has transited to the ESTABLISHED state. + The syn-cache [Lemon, 2002] approach aims at reducing the amount of + state information that is maintained for connections in the SYN- + RECEIVED state, and allocates a full TCB only after the connection + has transited to the ESTABLISHED state. The syn-cookie [Bernstein, 1996] approach aims at completely eliminating the need to maintain state information at the TCP performing the passive OPEN, by encoding the most elementary information required to complete the three-way handshake in the Sequence Number of the SYN/ACK segment that is sent in response to - the received SYN segment. Thus, TCP is relieved from keeping - state for connections in the SYN-RECEIVED state. + the received SYN segment. Thus, TCP is relieved from keeping state + for connections in the SYN-RECEIVED state. The syn-cookie approach has a number of drawbacks: - * Firstly, given the limited space in the Sequence Number field, - it is not possible to encode all the information included in - the initial segment, such as, for example, support of Selective + o Firstly, given the limited space in the Sequence Number field, it + is not possible to encode all the information included in the + initial segment, such as, for example, support of Selective Acknowledgements (SACK). - * Secondly, in the event that the Acknowledgement segment sent in - response to the SYN/ACK sent by the TCP that performed the - passive OPEN (i.e., the TCP server) were lost, the connection - would end up in the ESTABLISHED state on the client-side, but - in the CLOSED state on the server side. This scenario is - normally handled in TCP by having the TCP server retransmit its - SYN/ACK. However, if syn-cookies are enabled, there would be - no connection state information on the server side, and thus - the SYN/ACK would never be retransmitted. This could lead to a - scenario in which the connection could remain in the - ESTABLISHED state on the client side, but in the CLOSED state - at the server side, indefinitely. If the application protocol - was such that it required the client to wait for some data from - the server (e.g., a greeting message) before sending any data - to the server, a deadlock would take place, with the client - application waiting for such server data, and the server - waiting for the TCP three-way handshake to complete. + o Secondly, in the event that the Acknowledgement segment sent in + response to the SYN/ACK sent by the TCP that performed the passive + OPEN (i.e., the TCP server) were lost, the connection would end up + in the ESTABLISHED state on the client-side, but in the CLOSED + state on the server side. This scenario is normally handled in + TCP by having the TCP server retransmit its SYN/ACK. However, if + syn-cookies are enabled, there would be no connection state + information on the server side, and thus the SYN/ACK would never + be retransmitted. This could lead to a scenario in which the + connection could remain in the ESTABLISHED state on the client + side, but in the CLOSED state at the server side, indefinitely. + If the application protocol was such that it required the client + to wait for some data from the server (e.g., a greeting message) + before sending any data to the server, a deadlock would take + place, with the client application waiting for such server data, + and the server waiting for the TCP three-way handshake to + complete. - * Thirdly, unless the function used to encode information in the + o Thirdly, unless the function used to encode information in the SYN/ACK packet is cryptographically strong, an attacker could forge TCP connections in the ESTABLISHED state by forging ACK - segments that would be considered as "legitimate" by the - receiving TCP. + segments that would be considered as "legitimate" by the receiving + TCP. - * Fourthly, in those scenarios in which establishment of new + o Fourthly, in those scenarios in which establishment of new connections is blocked by simply dropping segments with the SYN - bit set, use of SYN cookies could allow an attacker to bypass - the firewall rules, as a connection could be established by - forging an ACK segment with the correct values, without the - need of setting the SYN bit. + bit set, use of SYN cookies could allow an attacker to bypass the + firewall rules, as a connection could be established by forging an + ACK segment with the correct values, without the need of setting + the SYN bit. - As a result, syn-cookies are usually not employed as a first line - of defense against SYN-flood attacks, but are only as the last - resort to cope with them. For example, some TCP implementations - enable syn-cookies only after a certain number of TCBs has been - allocated for connections in the SYN-RECEIVED state. We recommend - this implementation technique, with a syn-cache enabled by - default, and use of syn-cookies triggered, for example, when the - limit of TCBs for non-synchronized connections with a given port - number has been reached. + As a result, syn-cookies are usually not employed as a first line of + defense against SYN-flood attacks, but are only as the last resort to + cope with them. For example, some TCP implementations enable syn- + cookies only after a certain number of TCBs has been allocated for + connections in the SYN-RECEIVED state. We recommend this + implementation technique, with a syn-cache enabled by default, and + use of syn-cookies triggered, for example, when the limit of TCBs for + non-synchronized connections with a given port number has been + reached. - It is interesting to note that a SYN-flood attack should only - affect the establishment of new connections. A number of books - and online documents seem to assume that TCP will not be able to - respond to any TCP segment that is meant for a TCP port that is - being SYN-flooded (e.g., respond with an RST segment upon receipt - of a TCP segment that refers to a non-existent TCP connection). - While SYN-flooding attacks have been successfully exploited in the - past for achieving such a goal [Shimomura, 1995], as clarified by - RFC 1948 [Bellovin, 1996] the effectiveness of SYN flood attacks - to silence a TCP implementation arose as a result of a bug in the - 4.4BSD TCP implementation [Wright and Stevens, 1994], rather than - from a theoretical property of SYN-flood attacks themselves. - Therefore, those TCP implementations that do not suffer from such - a bug should not be silenced as a result of a SYN-flood attack. + It is interesting to note that a SYN-flood attack should only affect + the establishment of new connections. A number of books and online + documents seem to assume that TCP will not be able to respond to any + TCP segment that is meant for a TCP port that is being SYN-flooded + (e.g., respond with an RST segment upon receipt of a TCP segment that + refers to a non-existent TCP connection). While SYN-flooding attacks + have been successfully exploited in the past for achieving such a + goal [Shimomura, 1995], as clarified by RFC 1948 [Bellovin, 1996] the + effectiveness of SYN flood attacks to silence a TCP implementation + arose as a result of a bug in the 4.4BSD TCP implementation [Wright + and Stevens, 1994], rather than from a theoretical property of SYN- + flood attacks themselves. Therefore, those TCP implementations that + do not suffer from such a bug should not be silenced as a result of a + SYN-flood attack. - [Zquete, 2002] describes a mechanism that could theoretically - improve the functionality of SYN cookies. It exploits the TCP - "simultaneous open" mechanism, as illustrated in Figure 5. + [Zquete, 2002] describes a mechanism that could theoretically improve + the functionality of SYN cookies. It exploits the TCP "simultaneous + open" mechanism, as illustrated in Figure 5. See Figure 5, in page 46 of the UK CPNI document. Use of TCP simultaneous open for handling SYN floods In line 1, TCP A initiates the connection-establishment phase by - sending a SYN segment to TCP B. In line 2, TCP B creates a SYN - cookie as described by [Bernstein, 1996], but does not set the ACK - bit of the segment it sends (thus really sending a SYN segment, - rather than a SYN/ACK). This "fools" TCP A into thinking that - both SYN segments "have crossed each other in the network" as if a - "simultaneous open" scenario had taken place. As a result, in - line 3 TCP A sends a SYN/ACK segment containing the same options - that were contained in the original SYN segment. In line 4, upon - receipt of this segment, TCP processes the cookie encoded in the - ACK field as if it had been the result of a traditional SYN cookie - scenario, and moves the connection into the ESTABLISHED state. In - line 5, TCP B sends a SYN/ACK segment, which causes the connection - at TCP A to move into the ESTABLISHED state. In line 6, TCP A - sends a data segment on the connection. + sending a SYN segment to TCP B. In line 2, TCP B creates a SYN cookie + as described by [Bernstein, 1996], but does not set the ACK bit of + the segment it sends (thus really sending a SYN segment, rather than + a SYN/ACK). This "fools" TCP A into thinking that both SYN segments + "have crossed each other in the network" as if a "simultaneous open" + scenario had taken place. As a result, in line 3 TCP A sends a SYN/ + ACK segment containing the same options that were contained in the + original SYN segment. In line 4, upon receipt of this segment, TCP + processes the cookie encoded in the ACK field as if it had been the + result of a traditional SYN cookie scenario, and moves the connection + into the ESTABLISHED state. In line 5, TCP B sends a SYN/ACK + segment, which causes the connection at TCP A to move into the + ESTABLISHED state. In line 6, TCP A sends a data segment on the + connection. - While this mechanism would work in theory, unfortunately there are - a number of factors that prevent it from being usable in real - network environments: + While this mechanism would work in theory, unfortunately there are a + number of factors that prevent it from being usable in real network + environments: - * Some systems are not able to perform the "simultaneous open" + o Some systems are not able to perform the "simultaneous open" operation specified in RFC 793, and thus the connection establishment will fail. - * Some firewalls might prevent the establishment of TCP - connections that rely on the "simultaneous open" mechanism - (e.g., a given firewall might be allowing incoming SYN/ACK - segments, but not outgoing SYN/ACK segments). + o Some firewalls might prevent the establishment of TCP connections + that rely on the "simultaneous open" mechanism (e.g., a given + firewall might be allowing incoming SYN/ACK segments, but not + outgoing SYN/ACK segments). - Therefore, we do not recommend implementation of this mechanism - for mitigating SYN-flood attacks. + Therefore, we do not recommend implementation of this mechanism for + mitigating SYN-flood attacks. 5.2. Connection forgery The process of causing a TCP connection to be illegitimately established between two arbitrary remote peers is usually referred to as "connection spoofing" or "connection forgery". This can have a great negative impact when systems establish some sort of trust relationships based on the IP addresses used to establish a TCP connection [daemon9 et al, 1996]. @@ -2060,20 +1302,23 @@ recommended that systems disable IP Source Routing by default, or at the very least, they disable source routing for IP packets that encapsulate TCP segments. The IPv6 Routing Header Type 0, which provides a similar functionality to that provided by IPv4 source routing, has been officially deprecated by RFC 5095 [Abley et al, 2007]. 5.3. Connection-flooding attack + NOTE: THIS SECTION IS BEING EDITED. RFC2119-LANGUAGE IS BEING + REMOVED. + 5.3.1. Vulnerability The creation and maintenance of a TCP connection requires system memory to maintain shared state between the local and the remote TCP. As system memory is a finite resource, there is a limit on the number of TCP connections that a system can maintain at any time. When the TCP API is employed to create a TCP connection with a remote peer, it allocates system memory for maintaining shared state with the remote TCP peer, and thus the resulting connection would tie a similar amount of resources at the remote host as at the local host. @@ -2216,40 +1461,23 @@ Some firewalls can be configured to limit the number of simultaneous connections that any system can maintain with a specific system and/or service at any given time. Limiting the number of simultaneous connections that each system can establish with a specific system and service would effectively limit the possibility of an attacker that controls a single IP address to exhaust system resources at the attacker system/service. 5.4. Firewall-bypassing techniques - TCP MUST silently drop those TCP segments that have both the SYN and - the RST flags set. - - DISCUSSION: - - Some firewalls block incoming TCP connections by blocking only - incoming SYN segments. However, there are inconsistencies in how - different TCP implementations handle SYN segments that have - additional flags set, which may allow an attacker to bypass - firewall rules [US-CERT, 2003b]. - - For example, some firewalls have been known to mistakenly allow - incoming SYN segments if they also have the RST bit set. As some - TCP implementations will create a new connection in response to a - TCP segment with both the SYN and RST bits set, an attacker could - bypass the firewall rules and establish a connection with a - "protected" system by setting the RST bit in his SYN segments. - - Here we advise TCP implementations to silently drop those TCP - segments that have both the SYN and the RST flags set. + [draft-gont-tcpm-tcp-sanity-checks-00.txt] discusses how packets with + both the SYN and RST bits set have been employed in the wild to + bypass firewall rules, and provides advices in this area. 6. Connection-termination mechanism 6.1. FIN-WAIT-2 flooding attack 6.1.1. Vulnerability TCP implements a connection-termination mechanism that is employed for the graceful termination of a TCP connection. This mechanism usually consists of the exchange of four-segments. Figure 6 @@ -2268,23 +1496,23 @@ As a result, an attacker could establish a large number of connections with the target system, and cause it close each of them. For each connection, once the target system has sent its FIN segment, the attacker would acknowledge the receipt of this segment, but would send no further segments on that connection. As a result, an attacker could cause the corresponding system resources (e.g., the system memory used for storing the TCB) without the need to send any further packets. - While the CLOSE command described in RFC 793 [Postel, 1981c] simply - signals the remote TCP end-point that this TCP has finished sending - data (i.e., it closes only one direction of the data transfer), the + While the CLOSE command described in RFC 793 [RFC0793] simply signals + the remote TCP end-point that this TCP has finished sending data + (i.e., it closes only one direction of the data transfer), the close() system-call available in most operating systems has different semantics: it marks the corresponding file descriptor as closed (and thus it is no longer usable), and assigns the operating system the responsibility to deliver any queued data to the remote TCP peer and to terminate the TCP connection. This makes the FIN-WAIT-2 state particularly attractive for performing memory exhaustion attacks, as even if the application running on top of TCP were imposing limits on the maximum number of ongoing connections, and/or time limits on the function calls performed on TCP connections, that application would be unable to enforce these limits on the FIN-WAIT-2 state. @@ -2598,148 +1827,143 @@ window to cause the target system to tie system memory to the TCP retransmission buffer, it is hard to perform any useful statistics from the advertised window. While it is tempting to enforce a limit on the length of the persist state (see Section 3.7.2 of this document), an attacker could simply open the window (i.e., advertise a TCP window larger than zero) from time to time to prevent this enforced limit from causing his malicious connections to be aborted. 7.2. TCP segment reassembly buffer - TCP MAY discard out-of-order data when system-memory exhaustion is - imminent. - - DISCUSSION: - TCP buffers out-of-order segments to more efficiently handle the - occurrence of packet reordering and segment loss. When out-of- - order data are received, a "hole" momentarily exists in the data - stream which must be filled before the received data can be - delivered to the application making use of TCP's services. This - situation can be exploited by an attacker, which could - intentionally create a hole in the data stream by sending a number - of segments with a sequence number larger than the next sequence - number expected (RCV.NXT) by the attacked TCP. Thus, the attacked - TCP would tie system memory to buffer the out-of-order segments, - without being able to hand the received data to the corresponding - application. + occurrence of packet reordering and segment loss. When out-of-order + data are received, a "hole" momentarily exists in the data stream + which must be filled before the received data can be delivered to the + application making use of TCP's services. This situation can be + exploited by an attacker, which could intentionally create a hole in + the data stream by sending a number of segments with a sequence + number larger than the next sequence number expected (RCV.NXT) by the + attacked TCP. Thus, the attacked TCP would tie system memory to + buffer the out-of-order segments, without being able to hand the + received data to the corresponding application. If a large number of such connections were created, system memory could be exhausted, precluding the attacked TCP from servicing new connections and/or continue servicing TCP connections previously established. - Fortunately, these attacks can be easily mitigated, at the expense - of degrading the performance of possibly legitimate connections. - When out-of-order data is received, an Acknowledgement segment is - sent with the next sequence number expected (RCV.NXT). This means - that receipt of the out-of-order data will not be actually - acknowledged by the TCP's cumulative Acknowledgement Number. As a - result, a TCP is free to discard any data that have been received - out-of-order, without affecting the reliability of the data - transfer. Given the performance implications of discarding out- - of-order segments for legitimate connections, this pruning policy - should be applied only if memory exhaustion is imminent. + Fortunately, these attacks can be easily mitigated, at the expense of + degrading the performance of possibly legitimate connections. When + out-of-order data is received, an Acknowledgement segment is sent + with the next sequence number expected (RCV.NXT). This means that + receipt of the out-of-order data will not be actually acknowledged by + the TCP's cumulative Acknowledgement Number. As a result, a TCP is + free to discard any data that have been received out-of-order, + without affecting the reliability of the data transfer. Given the + performance implications of discarding out-of-order segments for + legitimate connections, this pruning policy should be applied only if + memory exhaustion is imminent. - As a result of discarding the out-of-order data, these data will - need to be unnecessarily retransmitted. Additionally, a loss - event will be detected by the sending TCP, and thus the slow start - phase of TCP's congestion control will be entered, thus reducing - the data transfer rate of the connection. + As a result of discarding the out-of-order data, these data will need + to be unnecessarily retransmitted. Additionally, a loss event will + be detected by the sending TCP, and thus the slow start phase of + TCP's congestion control will be entered, thus reducing the data + transfer rate of the connection. - It is interesting to note that this pruning policy could be - applied even if Selective Acknowledgements (SACK) (specified in - RFC 2018 [Mathis et al, 1996]) are in use, as SACK provides only - advisory information, and does not preclude the receiving TCP from - discarding data that have been previously selectively-acknowledged - by means of TCP's SACK option, but not acknowledged by TCP's - cumulative Acknowledgement Number. + It is interesting to note that this pruning policy could be applied + even if Selective Acknowledgements (SACK) (specified in RFC 2018 + [Mathis et al, 1996]) are in use, as SACK provides only advisory + information, and does not preclude the receiving TCP from discarding + data that have been previously selectively-acknowledged by means of + TCP's SACK option, but not acknowledged by TCP's cumulative + Acknowledgement Number. There are a number of ways in which the pruning policy could be - triggered. For example, when out of order data are received, a - timer could be set, and the sequence number of the out-of-order - data could be recorded. If the hole were filled before the timer - expires, the timer would be turned off. However, if the timer - expired before the hole were filled, all the out-of-order segments - of the corresponding connection would be discarded. This would be - a proactive counter-measure for attacks that aim at exhausting the - receive buffers. + triggered. For example, when out of order data are received, a timer + could be set, and the sequence number of the out-of-order data could + be recorded. If the hole were filled before the timer expires, the + timer would be turned off. However, if the timer expired before the + hole were filled, all the out-of-order segments of the corresponding + connection would be discarded. This would be a proactive counter- + measure for attacks that aim at exhausting the receive buffers. - In addition, an implementation could incorporate reactive - mechanisms for more carefully controlling buffer allocation when - some predefined buffer allocation threshold was reached. At such - point, pruning policies would be applied. + In addition, an implementation could incorporate reactive mechanisms + for more carefully controlling buffer allocation when some predefined + buffer allocation threshold was reached. At such point, pruning + policies would be applied. A number of mechanisms can aid in the process of freeing system - resources. For example, a table of network prefixes corresponding - to the IP addresses of TCP peers that have ongoing TCP connections - could record the aggregate amount of out-of-order data currently - buffered for those connections. When the pruning policy was - triggered, TCP connections with hosts that have network prefixes - with large aggregate out-of-order buffered data could be selected - first for pruning the out-of-order segments. + resources. For example, a table of network prefixes corresponding to + the IP addresses of TCP peers that have ongoing TCP connections could + record the aggregate amount of out-of-order data currently buffered + for those connections. When the pruning policy was triggered, TCP + connections with hosts that have network prefixes with large + aggregate out-of-order buffered data could be selected first for + pruning the out-of-order segments. - Alternatively, if TCP segments were de-multiplexed by means of a - hash table (as it is currently the case in many TCP - implementations), a counter could be held at each entry of the - hash table that would record the aggregate out-of-order data - currently buffered for those connections belonging to that hash - table entry. When the pruning policy is triggered, the out-of- - order data corresponding to those connections linked by the hash - table entry with largest amount of aggregate out-of-order data - could be pruned first. It is important that this hash is not - computable by an attacker, as this would allow him to maliciously - cause the performance of specific connections to be degraded. - That is, given a four-tuple that identifies a connection, an - attacker should not be able to compute the corresponding hash - value used by the target system to de-multiplex incoming TCP - segments to that connection. + Alternatively, if TCP segments were de-multiplexed by means of a hash + table (as it is currently the case in many TCP implementations), a + counter could be held at each entry of the hash table that would + record the aggregate out-of-order data currently buffered for those + connections belonging to that hash table entry. When the pruning + policy is triggered, the out-of-order data corresponding to those + connections linked by the hash table entry with largest amount of + aggregate out-of-order data could be pruned first. It is important + that this hash is not computable by an attacker, as this would allow + him to maliciously cause the performance of specific connections to + be degraded. That is, given a four-tuple that identifies a + connection, an attacker should not be able to compute the + corresponding hash value used by the target system to de-multiplex + incoming TCP segments to that connection. - Another variant of a resource exhaustion attack against TCP's - segment reassembly mechanism would target the data structures used - to link the different holes in a data stream. For example, an - attacker could send a burst of 1 byte segments, leaving a one-byte - hole between each of the data bytes sent. Depending on the data - structures used for holding and linking together each of the data - segments, such an attack might waste a large amount of system - memory by exploiting the overhead needed store and link together - each of these one-byte segments. + Another variant of a resource exhaustion attack against TCP's segment + reassembly mechanism would target the data structures used to link + the different holes in a data stream. For example, an attacker could + send a burst of 1 byte segments, leaving a one-byte hole between each + of the data bytes sent. Depending on the data structures used for + holding and linking together each of the data segments, such an + attack might waste a large amount of system memory by exploiting the + overhead needed store and link together each of these one-byte + segments. - For example, if a linked-list is used for holding and linking each - of the data segments, each of the involved data structures could - involve one byte of kernel memory for storing the received data - byte (the TCP payload), plus 4 bytes (32 bits) for storing a - pointer to the next node in the linked-list. Additionally, while - such a data structure would require only a few bytes of kernel - memory, it could result in the allocation of a whole memory page, - thus consuming much more memory than expected. + For example, if a linked-list is used for holding and linking each of + the data segments, each of the involved data structures could involve + one byte of kernel memory for storing the received data byte (the TCP + payload), plus 4 bytes (32 bits) for storing a pointer to the next + node in the linked-list. Additionally, while such a data structure + would require only a few bytes of kernel memory, it could result in + the allocation of a whole memory page, thus consuming much more + memory than expected. Therefore, implementations should enforce a limit on the number of - holes that are allowed in the received data stream at any given - time. When such a limit is reached, incoming TCP segments which - would create new holes would be silently dropped. Measurements in - [Dharmapurikar and Paxson, 2005] indicate that in the vast - majority of TCP connections have at most a single hole at any - given time. A limit of 16 holes for each connection would - accommodate even most of the very unusual cases in which there can - be more than hole in the data stream at a given time. + holes that are allowed in the received data stream at any given time. + When such a limit is reached, incoming TCP segments which would + create new holes would be silently dropped. Measurements in + [Dharmapurikar and Paxson, 2005] indicate that in the vast majority + of TCP connections have at most a single hole at any given time. A + limit of 16 holes for each connection would accommodate even most of + the very unusual cases in which there can be more than hole in the + data stream at a given time. [US-CERT, 2004a] is a security advisory about a Denial of Service vulnerability resulting from a TCP implementation that did not - enforce limits on the number of segments stored in the TCP - reassembly buffer. + enforce limits on the number of segments stored in the TCP reassembly + buffer. - Section 8 of this document describes the security implications of - the TCP segment reassembly algorithm. + Section 8 of this document describes the security implications of the + TCP segment reassembly algorithm. 7.3. Automatic buffer tuning mechanisms + NOTE: THIS SECTION IS BEING EDITED. PLEASE DISREGARD THE RFC2119- + LANGUAGE RECOMMENDATIONS. + 7.3.1. Automatic send-buffer tuning mechanisms A TCP implementing an automatic send-buffer tuning mechanism SHOULD enforce the following limit on the size of the send buffer of each TCP connection: send_buffer_size <= send_buffer_pool / (min_buffer_size * max_connections) where @@ -2934,204 +2159,122 @@ It is worth noting that TCP Selective Acknowledgements (SACK) are advisory, in the sense that a TCP that has SACKed (but not ACKed) a block of data is free to discard that block, and expect the TCP sender to retransmit them when the retransmission timer of the peer TCP expires. 8. TCP segment reassembly algorithm 8.1. Problems that arise from ambiguity in the reassembly process - If a TCP segment is received containing some data bytes that had - already been received, the first copy of those data SHOULD be used - for reassembling the application data stream. - - DISCUSSION: - A security consideration that should be made for the TCP segment - reassembly algorithm is that of data stream consistency between - the host performing the TCP segment reassembly, and a Network - Intrusion Detection System (NIDS) being employed to monitor the - host in question. + reassembly algorithm is that of data stream consistency between the + host performing the TCP segment reassembly, and a Network Intrusion + Detection System (NIDS) being employed to monitor the host in + question. - In the event a TCP segment was unnecessarily retransmitted, or - there was packet duplication in any of the intervening networks, a - TCP might get more than one copy of the same data. Also, as TCP - segments can be re-packetized when they are retransmitted, a given - TCP segment might partially overlap data already received in - earlier segments. In all these cases, the question arises about - which of the copies of the received data should be used when - reassembling the data stream. In legitimate and normal - circumstances, all copies would be identical, and the same data - stream would be obtained regardless of which copy of the data was - used. However, an attacker could maliciously send overlapping - segments containing different data, with the intent of evading a - Network Intrusion Detection Systems (NIDS), which might reassemble - the received TCP segments differently than the monitored system. - [Ptacek and Newsham, 1998] provides a detailed discussion of these - issues. + In the event a TCP segment was unnecessarily retransmitted, or there + was packet duplication in any of the intervening networks, a TCP + might get more than one copy of the same data. Also, as TCP segments + can be re-packetized when they are retransmitted, a given TCP segment + might partially overlap data already received in earlier segments. + In all these cases, the question arises about which of the copies of + the received data should be used when reassembling the data stream. + In legitimate and normal circumstances, all copies would be + identical, and the same data stream would be obtained regardless of + which copy of the data was used. However, an attacker could + maliciously send overlapping segments containing different data, with + the intent of evading a Network Intrusion Detection Systems (NIDS), + which might reassemble the received TCP segments differently than the + monitored system. [Ptacek and Newsham, 1998] provides a detailed + discussion of these issues. - As suggested in Section 3.9 of RFC 793 [Postel, 1981c], if a TCP - segment arrives containing some data bytes that have already been - received, the first copy of those data should be used for - reassembling the application data stream. It should be noted that - while convergence to this policy might prevent some cases of - ambiguity in the reassembly process, there are a number of other - techniques that an attacker could still exploit to evade a NIDS - [CPNI, 2008]. These techniques can generally be defeated if the - NIDS is placed in-line with the monitored system, thus allowing - the NIDS to normalize the network traffic or apply some other - policy that could ensure consistency between the result of the - segment reassembly process obtained by the monitored host and that - obtained by the NIDS. + As suggested in Section 3.9 of RFC 793 [RFC0793], if a TCP segment + arrives containing some data bytes that have already been received, + the first copy of those data should be used for reassembling the + application data stream. It should be noted that while convergence + to this policy might prevent some cases of ambiguity in the + reassembly process, there are a number of other techniques that an + attacker could still exploit to evade a NIDS [CPNI, 2008]. These + techniques can generally be defeated if the NIDS is placed in-line + with the monitored system, thus allowing the NIDS to normalize the + network traffic or apply some other policy that could ensure + consistency between the result of the segment reassembly process + obtained by the monitored host and that obtained by the NIDS. [CERT, 2003] and [CORE, 2003] are advisories about a heap buffer overflow in a popular Network Intrusion Detection System resulting from incorrect sequence number calculations in its TCP stream- reassembly module. 9. TCP Congestion Control + NOTE: THIS SECTION IS BEING EDITED. + TCP implements two algorithms, "slow start" and "congestion avoidance", for controlling the rate at which data is transmitted on - a TCP connection [Allman et al, 1999]. These algorithms require the - addition of two variables as part of TCP per-connection state: cwnd - and ssthresh. - - The congestion window (cwnd) is a sender-side limit on the amount of - outstanding data that the sender can have at any time, while the - receiver's advertised window (rwnd) is a receiver-side limit on the - amount of outstanding data. The minimum of cwnd and rwnd governs - data transmission. - - Another state variable, the slow-start threshold (ssthresh), is used - to determine whether it is the slow start or the congestion avoidance - algorithm that should control data transmission. When cwnd < - ssthresh, "slow start" governs data transmission, and the congestion - window (cwnd) is exponentially increased. When cwnd > ssthresh, - "congestion avoidance" governs data transmission, and the congestion - window (cwnd) is only linearly increased. - - As specified in RFC 2581 [Allman et al, 1999], when cwnd and ssthresh - are equal the sender may use either slow start or congestion - avoidance. - - During slow start, TCP increments cwnd by at most SMSS bytes for each - ACK received that acknowledges new data. During congestion - avoidance, cwnd is incremented by 1 full-sized segment per round-trip - time (RTT), until congestion is detected. - - Additionally, TCP uses two algorithms, Fast Retransmit and Fast - Recovery, to mitigate the effects of packet loss. The "Fast - Retransmit" algorithm infers packet loss when three Duplicate - Acknowledgements (DupACKs) are received. - - The value "three" is meant to allow for fast-retransmission of - "missing" data, while avoiding network packet reordering from - triggering loss recovery. - - Once packet loss is detected by the receipt of three duplicate-ACKs, - the "Fast Recovery" algorithm governs the transfer of new data until - a non-duplicate ACK is received that acknowledges the receipt of new - data. The Fast Retransmit and Fast Recovery algorithms are usually - implemented together, as follows (from RFC 2581): - - o When the third duplicate ACK is received, set ssthresh to no more - than the value given in the equation: ssthresh = max (FlightSize / - 2, 2*SMSS) - - o Retransmit the lost segment and set cwnd to ssthresh plus 3*SMSS. - This artificially "inflates" the congestion window by the number - of segments (three) that have left the network and which the - receiver has buffered. - - o For each additional duplicate ACK received, increment cwnd by - SMSS. This artificially inflates the congestion window in order - to reflect the additional segment that has left the network. - - o Transmit a segment, if allowed by the new value of cwnd and the - receiver's advertised window. - - o When the next ACK arrives that acknowledges new data, set cwnd to - ssthresh (the value set in step 1). This is termed "deflating" - the window. + a TCP connection [RFC5681]. 9.1. Congestion control with misbehaving receivers [Savage et al, 1999] describes a number of ways in which TCP's congestion control mechanisms can be exploited by a misbehaving TCP receiver to obtain more than its fair share of bandwidth. The following subsections provide a brief discussion of these vulnerabilities, along with the possible countermeasures. 9.1.1. ACK division - TCP SHOULD increase cwnd by one SMSS only when a valid ACK covers the - entire data segment sent - - (note: or should we recommend the other counter-measure (i.e., - implementation of ABC?) - - DISCUSSION: - - Given that TCP updates cwnd based on the number of duplicate ACKs - it receives, rather than on the amount of data that each ACK is - actually acknowledging, a malicious TCP receiver could cause the - TCP sender to illegitimately increase its congestion window by - acknowledging a data segment with a number of separate - Acknowledgements, each covering a distinct piece of the received - data segment. + Given that TCP updates cwnd based on the number of duplicate ACKs it + receives, rather than on the amount of data that each ACK is actually + acknowledging, a malicious TCP receiver could cause the TCP sender to + illegitimately increase its congestion window by acknowledging a data + segment with a number of separate Acknowledgements, each covering a + distinct piece of the received data segment. See Figure 7, in page 64 of the UK CPNI document. ACK division attack - [Savage et al, 1999] describes two possible countermeasures for - this vulnerability. One of them is to increment cwnd not by a - full SMSS, but proportionally to the amount of data being - acknowledged by the received ACK, similarly to the policy - described in RFC 3465 [Allman, 2003]. Another alternative is to - increase cwnd by one SMSS only when a valid ACK covers the entire - data segment sent. + [Savage et al, 1999] describes two possible countermeasures for this + vulnerability. One of them is to increment cwnd not by a full SMSS, + but proportionally to the amount of data being acknowledged by the + received ACK, similarly to the policy described in RFC 3465 [Allman, + 2003]. Another alternative is to increase cwnd by one SMSS only when + a valid ACK covers the entire data segment sent. 9.1.2. DupACK forgery - TCP SHOULD keep track of the number of outstanding segments (o_seg), - and accept only up to (o_seg -1) duplicate Acknowledgements. - - DISCUSSION: - - The second vulnerability discussed in [Savage et al, 1999] allows - an attacker to cause the TCP sender to illegitimately increase its - congestion window by forging a number of duplicate - Acknowledgements (DupACKs). Figure 8 shows a sample scenario. - The first three DupACKs trigger the Fast Recovery mechanism, while - the rest of them cause the congestion window at the TCP sender to - be illegitimately inflated. Thus, the attacker is able to - illegitimately cause the TCP sender to increase its data - transmission rate. + The second vulnerability discussed in [Savage et al, 1999] allows an + attacker to cause the TCP sender to illegitimately increase its + congestion window by forging a number of duplicate Acknowledgements + (DupACKs). Figure 8 shows a sample scenario. The first three + DupACKs trigger the Fast Recovery mechanism, while the rest of them + cause the congestion window at the TCP sender to be illegitimately + inflated. Thus, the attacker is able to illegitimately cause the TCP + sender to increase its data transmission rate. See Figure 8, in page 65 of the UK CPNI document. DupACK forgery attack - Fortunately, a number of sender-side heuristics can be implemented - to mitigate this vulnerability. First, the TCP sender could keep - track of the number of outstanding segment (o_seg), and accept - only up to (o_seg -1) DupACKs. Secondly, a TCP sender might, for - example, refuse to enter Fast Recovery multiple times in some - period of time (e.g., one RTT). + Fortunately, a number of sender-side heuristics can be implemented to + mitigate this vulnerability. First, the TCP sender could keep track + of the number of outstanding segment (o_seg), and accept only up to + (o_seg -1) DupACKs. Secondly, a TCP sender might, for example, + refuse to enter Fast Recovery multiple times in some period of time + (e.g., one RTT). [Savage et al, 1999] also describes a modification to TCP to - implement a nonce protocol that would eliminate this - vulnerability. However, this would require modification of all - implementations, which makes this counter-measure hard to deploy. + implement a nonce protocol that would eliminate this vulnerability. + However, this would require modification of all implementations, + which makes this counter-measure hard to deploy. 9.1.3. Optimistic ACKing Another alternative for an attacker to exploit TCP's congestion control mechanisms is to acknowledge data that has not yet been received, thus causing the congestion window at the TCP sender to be incremented faster than it should. See Figure 9, in page 66 of the UK CPNI document. @@ -3168,43 +2311,43 @@ TCP", the third duplicate-ACK will cause the "lost" segment to be retransmitted, and each subsequent duplicate-ACK will cause cwnd to be artificially inflated. Thus, the "sending TCP" might end up injecting more packets into the network than it really should, with the potential of causing network congestion. This is a potential consequence of the "Duplicate-ACK spoofing attack" described in [Savage et al, 1999]. Secondly, if bursts of three duplicate ACKs are sent to the TCP sender, the attacked system would infer packet loss, and ssthresh and - cwnd would be reduced. As noted in RFC 2581 [Allman et al, 1999], - causing two congestion control events back-to-back will often cut - ssthresh and cwnd to their minimum value of 2*SMSS, with the - connection immediately entering the slower-performing congestion - avoidance phase. While it would not be attractive for an attacker to - perform this attack against one of his TCP connections, the attack - might be attractive when the TCP connection to be attacked is - established between two other parties. + cwnd would be reduced. As noted in RFC 5681 [RFC5681], causing two + congestion control events back-to-back will often cut ssthresh and + cwnd to their minimum value of 2*SMSS, with the connection + immediately entering the slower-performing congestion avoidance + phase. While it would not be attractive for an attacker to perform + this attack against one of his TCP connections, the attack might be + attractive when the TCP connection to be attacked is established + between two other parties. It is usually assumed that in order for an off-path attacker to perform attacks against a third-party TCP connection, he should be able to guess a number of values, including a valid TCP Sequence Number and a valid TCP Acknowledgement Number. While this is true if the attacker tries to "inject" valid packets into the connection by himself, a feature of TCP can be exploited to fool one of the TCP endpoints to transmit valid duplicate Acknowledgements on behalf of the attacker, hence relieving the attacker of the hard task of forging valid values for the Sequence Number and Acknowledgement Number TCP header fields. - Section 3.9 of RFC 793 [Postel, 1981c] describes the processing of - incoming TCP segments as a function of the connection state and the - contents of the various header fields of the received segment. For + Section 3.9 of RFC 793 [RFC0793] describes the processing of incoming + TCP segments as a function of the connection state and the contents + of the various header fields of the received segment. For connections in the ESTABLISHED state, the first check that is performed on incoming segments is that they contain "in window" data. That is, RCV.NXT <= SEG.SEQ <= RCV.NXT+RCV.WND, or RCV.NXT <= SEG.SEQ+SEG.LEN-1 < RCV.NXT+RCV.WND If a segment does not pass this check, it is dropped, and an Acknowledgement is sent in response: @@ -3302,40 +2445,20 @@ segments (in red) sent by the attacker causes the TCP sender to enter the loss recovery phase and illegitimately inflate the congestion window, leading to an increase in the data transmission rate. Once a segment that acknowledges new data is received by the TCP sender, the loss recovery phase ends, and the data transmission rate is reduced. See Figure 12, in page 70 of the UK CPNI document. Blind flooding attack (time-line graph) - Figure 13 is a time-sequence graph produced from packet logs obtained - from tests of the described attack in a real network. A burst of - segments is sent upon receipt of the burst of Duplicate - Acknowledgements illegitimately elicited by the attacker. Figure 14 - is an averaged-throughput graphic for the same time frame, which - clearly shows the effect of the attack in terms of throughput. - - See Figure 13, in page 71 of the UK CPNI document. - - Blind flooding attack (time sequence graph) - - See Figure 14, in page 71 of the UK CPNI document. - - Blind flooding attack (averaged throughput graph) - - These graphics were produced with Shawn Ostermann's tcptrace tool - [Ostermann, 2008]. An explanation of the format of the graphics can - be found in tcptrace's manual (available at the project's web site: - http://www.tcptrace.org). - 9.2.3. Difficulty in performing the attacks In order to exploit the technique described in Section 9.2 of this document, an attacker would need to know the four-tuple {IP Source Address, TCP Source Port, IP Destination Address, TCP Destination Port} that identifies the connection to be attacked. As discussed by [Watson, 2004] and RFC 4953 [Touch, 2007], there are a number of scenarios in which these values may be known or easily guessed. It is interesting to note that the attacks described in Section 9.2 @@ -3385,296 +2508,57 @@ interesting in the case of the blind-flooding attack, as the attack would elicit even more packets from the TCP sender. Whether a full-window or just half a window of data is retransmitted depends on the Acknowledgement policy at the TCP receiver. If the TCP receiver sends an Acknowledgement (ACK) for every segment, a full-window of data will be retransmitted. If the TCP receiver sends an Acknowledgement (ACK) for every other segment, then only half a window of data will be retransmitted. - Figure 15 is a time-sequence graph produced from packet logs obtained - from tests performed in a real network. Once loss recovery is - illegitimately triggered by the duplicate-ACKs elicited by the - attacker, an entire flight of data is unnecessarily retransmitted. - Figure 16 is an averaged-throughput graphic for the same time-frame, - which shows an increase in the throughput of the connection resulting - from the retransmission of segments governed by NewReno's loss - recovery. - - See Figure 15, in page 73 of the UK CPNI document. - - NewReno loss recovery (time-sequence graph) - - See Figure 16, in page 74 of the UK CPNI document. - - NewReno loss recovery (averaged throughput graph) - Limited Transmit RFC 3042 [Allman et al, 2001] proposes an enhancement to TCP to more effectively recover lost segments when a connection's congestion window is small, or when a large number of segments are lost in a single transmission window. The "Limited Transmit" algorithm calls for sending a new data segment in response to each of the first two Duplicate Acknowledgements that arrive at the TCP sender. This would provide two additional transmitted packets that may be useful for the attacker in the case of the blind flooding attack described in Section 9.2.2 is performed. SACK-based loss recovery - RFC 3517 [Blanton et al, 2003] specifies a conservative loss-recovery + [I-D.ietf-tcpm-3517bis] specifies a conservative loss-recovery algorithm that is based on the use of the selective acknowledgement (SACK) TCP option. The algorithm uses DupACKs as an indication of - congestion, as specified in RFC 2581 [Allman et al, 1999]. However, - a difference between this algorithm and the basic algorithm described + congestion, as specified in RFC 2581 [RFC5681]. However, a + difference between this algorithm and the basic algorithm described in RFC 2581 is that it clocks out segments only with the SACK information included in the DupACKs. That is, during the loss recovery phase, segments will be injected in the network only if the SACK information included in the received DupACKs indicates that one or more segments have left the network. As a result, those systems that implement SACK-based loss recovery will not be vulnerable to the - blind flooding attack described in Section 9.2.2. However, as RFC - 3517 does not actually require DupACKs to include new SACK + blind flooding attack described in Section 9.2.2. Additionally, as + [I-D.ietf-tcpm-3517bis] requires DupACKs to include new SACK information (corresponding to data that has not yet been acknowledged by TCP's cumulative Acknowledgement), systems that implement SACK- - based loss-recovery may still remain vulnerable to the blind - throughput-reduction attack described in Section 9.2.1. SACK-based - loss recovery implementations should be updated to implement the - countermeasure ("Use of SACK information to validate DupACKs") - described in Section 9.2.5. + based loss-recovery will not be vulnerable to the blind throughput- + reduction attack described in Section 9.2.1. 9.2.5. Countermeasures - TCP SHOULD validate the Sequence Number of an incomming TCP segment - as follows: - - RCV.NXT - MAX.RCV.WND <= SEG.SEQ <= RCV.NXT + RCV.WND - - where MAX.RCV.WND is the largest TCP window that has so far been - advertised to the remote endpoint. - - If a segment passes this check, the processing rules specified in RFC - 793 [Postel, 1981c] MUST applied. Otherwise, TCP SHOULD send an ACK - (as specified by the processing rules in RFC 793 [Postel, 1981c]), - applying rate-limiting to the Acknowledgement segments sent in - response to out-of-window segments. - - DISCUSSION: - - As discussed in Section 9.2, TCP responds with an ACK when an out- - of-window segment is received, to accommodate those scenarios in - which the Acknowledgement segments that correspond to some - received data are lost in the network, and to help discover half- - open TCP connections. - - However, it is possible to restrict the sequence numbers that are - considered acceptable, and have TCP respond with ACKs only when it - is strictly necessary. - - A feature of TCP is that, in some scenarios, it can detect half- - open connections. If an implementation chose to silently drop - those TCP segments that do not pass the check enforced by the - equation above, it could prevent TCP from detecting half-open - connections. Figure 17 shows a scenario in which, provided that - "TCP B" behaves as specified in RFC 793, a half-open connection - would be discovered and aborted. - - An established connection is said to be "half open" if one of the - TCPs has closed or aborted the connection at its end without the - knowledge of the other, or if the two ends of the connection have - become desynchronized owing to a crash that resulted in loss of - memory. - - See Figure 17, in page 76 of the UK CPNI document. - - Half-Open Connection Discovery - - In the scenario illustrated by Figure 17, TCP A crashes losing the - connection-state information of the TCP connection with TCP B. In - line 3, TCP A tries to establish a new connection with TCP B, - using the same four-tuple {IP Source Address, TCP source port, IP - Destination Address, TCP destination port}. In line 4, as the SYN - segment is out of window, TCP B responds with an ACK. This ACK - elicits an RST segment from TCP A, which causes the half-open - connection at TCP B to be aborted. - - If the SYN segment had been "in window", TCP B would have sent an - RST segment instead, which would have closed the half-open - connection. Ongoing work at the TCPM WG of the IETF proposes to - change this behavior, and make TCP respond to a SYN segment - received for any of the synchronized states with an ACK segment, - to avoid in-window SYN segments from being used to perform - connection-reset attacks [Ramaiah et al, 2008]. - - However, in case the out-of-window segment was silently dropped, - the scenario in Figure 17 would change into that in Figure 18. - - See Figure 18, in page 76 of the UK CPNI document. - - Half-Open Connection Discovery with the proposed counter-measure - - In line 3, the SYN segment sent by TCP A is silently dropped by - TCP B because it does not pass the check enforced by the equation - above (i.e., it contains an out-of-window sequence number). As a - result, some time later (an RTO) TCP A retransmits its SYN - segment. Even after TCP A times out, the half-open connection at - TCP B will remain in the same state. - - Thus, a conservative reaction to those segments that do not pass - the check enforced by the equation above would be to respond with - an Acknowledgement segment (as specified by RFC 793), applying - rate-limiting to those Acknowledgement segments sent in response - to segments that do not pass the check enforced by that equation. - An implementation might choose to enforce a rate-limit of, e.g., - one ACK per five seconds, as a single ACK segment is needed for - the Half-Open Connection Discovery mechanism to work. - - As the only reason to respond with an ACK to those segments that - do not pass the check enforced by the equation above is to allow - TCP to discover half-open connections, an aggressive rate-limit - can be enforced. As long as the rate-limit prevents out-of-window - segments from eliciting three Acknowledgment segments in a Round- - trip Time (RTT), an attacker would not be able to trigger TCP's - loss-recovery, and thus would not be able to perform the attacks - described in the previous sections. - - It is interesting to note that RFC 793 [Postel, 1981c] itself - states that half-open connections are expected to be unusual. - Additionally, given that in many scenarios it may be unlikely for - a TCP connection request to be issued with the same four-tuple as - that of the half-open connection, a complete solution for the - discovery of half-open connections cannot rely on the mechanism - illustrated by Figure 17, either. Therefore, some implementations - might choose to sacrifice TCP's ability to detect half-open - connections, and have a more aggressive reaction to those segments - that do not pass the check enforced by the equation above by - silently dropping them. - - This validation check can also help to avoid ACK wars in some - scenarios that may arise from the use of transparent proxies. In - those scenarios, when the transparent proxy fails to wire (i.e., - is disabled), the sequence numbers of the two end-points of the - TCP connection become desynchronized, and both TCPs begin to send - duplicate Acknowledgements to each other, with the intention of - re-synchronizing them. As the sequence numbers never get re- - synchronized, the ACK war can only be stopped by an external - agent. - - TCP SHOULD limit the number of duplicate acknowledgements it will - honour to: - - Max_DupACKs = (FlightSize / SMSS) - 1 - - Where FlightSize and SMSS are the values defined in RFC 2581 [Allman - et al, 1999]. When more than Max_DupACKs duplicate acknowledgements - are received, the exceeding DupACKs should be silently dropped. - - DISCUSSION: - - Note that duplicate acknowledgements should be elicited by out-of- - order segments. - - In the case of TCP connections that have agreed to employ SACK, TCP - SHOULD validate duplicate ACKs with the following criteria: Valid - Duplicate ACKs MUST contain new SACK information. The SACK - information MUST refer to data that has already been sent, but that - has not yet been acknowledged by TCP's cumulative Acknowledgement. A - TCP segment that does not pass this check SHOULD NOT be considered as - "duplicate Acknowledgement". - - DISCUSSION: - - SACK, specified in 2018 [Mathis et al, 1996], provides a mechanism - for TCP to be able to acknowledge the receipt of out-of-order TCP - segments. For connections that have agreed to use SACK, each - legitimate DupACK will contain new SACK information that reflects - the data bytes contained in the out-of-order data segment that - elicited the DupACK. - - RFC 3517 [Blanton et al, 2003] specifies a SACK-based loss - recovery algorithm for TCP. However, it does recommend TCP - implementations to validate DupACKs by requiring that they contain - new SACK information. Results obtained from auditing a number of - TCP implementations seem to indicate that most TCP implementations - do not enforce this validation check on incoming DupACKs, either. - - In the case of TCP connections that have agreed to use SACK, a - validation check should be performed on incoming ACK segments to - completely eliminate the attacks described in Section 9.2.1 and - Section 9.2.2 of this document: "Duplicate ACKs should contain new - SACK information. The SACK information should refer to data that - has already been sent, but that has not yet been acknowledged by - TCP's cumulative Acknowledgement". - - Those ACK segments that do not comply with this validation check - should not be considered "duplicate ACKs", and thus should not - trigger the loss-recovery phase. - - In case at least one segment in a window of data has been lost, - the successive segments will elicit the generation of Duplicate - ACKs containing new SACK information. This SACK information will - indicate the receipt of these successive segments by the TCP - receiver. - - In the case of pure ACKs illegitimately elicited by out-of-window - segments, however, the ACKs will not contain any SACK information. - - If DSACK (specified in 2883 [Floyd et al, 2000]) were implemented - by the TCP receiver, then the illegitimately elicited DupACKs - might contain out-of-window SACK information if the sequence - number of the forged TCP segment (SEG.SEQ) is lower than the next - expected sequence number (RECV.NXT) at the TCP receiver. Such - segments should be considered to indicate the receipt of duplicate - data, rather than an indication of lost data, and therefore should - not trigger loss recovery. - - Other possible general mitigations are discussed in the following - paragraphs: - - TCP port number randomization - - As in order to perform the blind attacks described in Section 9.2.1 - and Section 9.2.2 the attacker needs to know the TCP port numbers in - use by the connection to be attacked, obfuscating the TCP source port - used for outgoing TCP connections will increase the number of packets - required to successfully perform these attacks. Section 3.1 of this - document discusses the use of port randomization. - - It must be noted that given that these blind DupACK triggering - attacks do not require the attacker to forge valid TCP Sequence - numbers and TCP Acknowledgement numbers, port randomization should - not be relied upon as a first line of defense. - - Ingress and Egress filtering - - Ingress and Egress filtering reduces the number of systems in the - global Internet that can perform attacks that rely on forged source - IP addresses. While protection from the blind attacks discussed in - Section 9.2 should not rely only on Ingress and Egress filtering, its - deployment is recommended to help prevent all attacks that rely on - forged IP addresses. RFC 3704 [Baker and Savola, 2004], RFC 2827 - [Ferguson and Senie, 2000], and [NISCC, 2006] provide advice on - Ingress and Egress filtering. - - Generalized TTL Security Mechanism (GTSM) - - RFC 5082 [Gill et al, 2007] proposes a check on the TTL field of the - IP packets that correspond to a given TCP connection to reduce the - number of systems that could successfully attack the protected TCP - connection. It provides for the attacks discussed in this document - the same level of protection than for the attacks described in - [Watson, 2004] and RFC 4953 [Touch, 2007]. While implementation of - this mechanism may be useful in some scenarios, it should be clear - that countermeasures discussed in the previous sections provide a - more effective and simpler solution than that provided by the GTSM. + [draft-gont-tcpm-limiting-aow-segments-00.txt] proposes to rate-limit + the reaction to out-of-window segments. This would mitigate the + attacks described earlier in this section. 9.3. TCP Explicit Congestion Notification (ECN) ECN (Explicit Congestion Notification) provides a mechanism for intermediate systems to signal congestion to the communicating endpoints that in some scenarios can be used as an alternative to dropping packets. RFC 3168 [Ramakrishnan et al, 2001] contains a detailed discussion of the possible ways and scenarios in which ECN could be exploited by an @@ -3684,125 +2568,53 @@ on nonces, that protects against accidental or malicious concealment of marked packets from the TCP sender. The specified mechanism defines a "NS" ("Nonce Sum") field in the TCP header that makes use of one bit from the Reserved field, and requires a modification in both of the endpoints of a TCP connection to process this new field. This mechanism is still in "Experimental" status, and since it might suffer from the behavior of some middle-boxes such as firewalls or packet-scrubbers, we defer a recommendation of this mechanism until more experience is gained. - There also is ongoing work in the research community and the IETF to - define alternate semantics for the ECN field of the IP header (e.g., - see [PCNWG, 2009]). - - The following subsections try to summarize the security implications - of ECN. - -9.3.1. Possible attacks by a compromised router - - Firstly, a router controlled by a malicious user could erase the CE - codepoint (either by replacing it with the ECT(0), ECT(1), or non-ECT - codepoints), effectively eliminating the congestion indication. As a - result, the corresponding TCP sender would not reduce its data - transmission rate, possibly leading to network congestion. This - could also lead to unfairness, as this flow could experience better - performance than other flows for which the congestion indication is - not erased (and thus their transmission rate is reduced). - - Secondly, a router controlled by a malicious user could - illegitimately set the CE codepoint, falsely indicating congestion, - to cause the TCP sender to reduce its data transmission rate. - However, this particular attack is no worse than the malicious router - simply dropping the packets rather setting their CE codepoint. - - Thirdly, a malicious router could turn off the ECT codepoint of a - packet, thus disabling ECN support. As a result, if the packet later - arrives at a router that is experiencing congestion, it may be - dropped rather than marked. As with the previous scenario, though, - this is no worse than the malicious router simply dropping the - corresponding packet. - - It should be noted that a compromised on-path IP router could engage - in a much broader range of attacks, with broader impacts, and at much - lower attacker cost than the ones described here. Such a compromised - router is extremely unlikely to engage in the attack vectors - discussed in this section, given the existence of more effective - attack vectors that have lower attacker cost. - -9.3.2. Possible attacks by a malicious TCP endpoint - - If a packet with the ECT codepoint set arrives at an ECN-capable - router that is experiencing moderate congestion, the router may - decide to set its CE codepoint instead of dropping it. If either of - the TCP endpoints do not honour the congestion indication provided by - an ECN-capable router, this would result in unfairness, as other - (legitimate) ECN-capable flows would still reduce their sending rate - in response to the ECN marking of packets. Furthermore, under - moderate congestion, non-ECN-capable flows would be subject to packet - drops by the same router. As a result, the flow with a malicious TCP - end-point would obtain better service than the legitimate flows. - - As noted in RFC 3168 [Ramakrishnan et al, 2001], a TCP endpoint - falsely indicating ECN capability could lead to unfairness, allowing - the mis-beheaving flow to get more than its fair share of the - bandwidth. This could be the result of the mis-behavior of either of - the TCP endpoints. For example, the sending TCP could indicate ECN - capability, but then send a CWR in response to an ECE without - actually reducing its congestion window. Alternatively (or in - addition), the receiving TCP could simply ignore those packets with - the CE codepoint set, thus avoiding the sending TCP from receiving - the congestion indication. - - In the case of the sending TCP ignoring the ECN congestion - indication, this would be no worse than the sending TCP ignoring the - congestion indication provided by a lost segment. However, the case - of a TCP receiver ignoring the CE codepoint allows the TCP receiver - to get more than its fair share of bandwidth in a way that was - previously unavailable. If congestion was kept "moderate", then the - malicious TCP receiver could maintain the unfairness, as the router - experiencing congestion would mark the offending packets of the - misbehaving flow rather than dropping them. At the same time, - legitimate ECN-capable flows would respond to the congestion - indication provided by the CE codepoint, while legitimate non-ECN- - capable flows would be subject of packet dropping. However, if - congestion turned to sufficiently heavy, the router experiencing - congestion would switch from marking packets to dropping packets, and - at that point the attack vector provided by ECN could no longer be - exploited (until congestion returns to moderate state). + There also is ongoing work in the research community and the IETF + to define alternate semantics for the ECN field of the IP header + (e.g., see [PCNWG, 2009]). - RFC 3168 [Ramakrishnan et al, 2001] describes the use of "penalty - boxes" which would act on flows that do not respond appropriately to - congestion indications. Section 10 of RFC 3168 suggests that a first - action taken at a penalty box for an ECN-capable flow would be to - switch to dropping packets (instead of marking them), and, if the - flow does not respond appropriately to the congestion indication, the - penalty box could reset the misbehaving connection. Here we - discourage implementation of such a policy, as it would create a - vector for connection-reset attacks. For example, an attacker could - forge TCP segments with the same four-tuple as the targeted - connection and cause them to transit the penalty box. The penalty - box would first switch from marking to dropping packets. However, - the attacker would continue sending forged segments, at a steady - rate. As a result, if the penalty box implemented such a severe - policy of resetting connections for flows that still do not respond - to end-to-end congestion control after switching from marking to - dropping, the attacked connection would be reset. + RFC 3168 [RFC3168] provides a very throrough security assessment of + ECN. Among the possible mitigations, it describes the use of + "penalty boxes" which would act on flows that do not respond + appropriately to congestion indications. Section 10 of RFC 3168 + suggests that a first action taken at a penalty box for an ECN- + capable flow would be to switch to dropping packets (instead of + marking them), and, if the flow does not respond appropriately to the + congestion indication, the penalty box could reset the misbehaving + connection. Here we discourage implementation of such a policy, as + it would create a vector for connection-reset attacks. For example, + an attacker could forge TCP segments with the same four-tuple as the + targeted connection and cause them to transit the penalty box. The + penalty box would first switch from marking to dropping packets. + However, the attacker would continue sending forged segments, at a + steady rate. As a result, if the penalty box implemented such a + severe policy of resetting connections for flows that still do not + respond to end-to-end congestion control after switching from marking + to dropping, the attacked connection would be reset. 10. TCP API - Section 3.8 of RFC 793 [Postel, 1981c] describes the minimum set of - TCP User Commands required of all TCP Implementations. Most - operating systems provide an Application Programming Interface (API) - that allows applications to make use of the services provided by TCP. - One of the most popular APIs is the Sockets API, originally - introduced in the BSD networking package [McKusick et al, 1996]. + NOTE: THIS SECTION IS BEING EDITED. + + Section 3.8 of RFC 793 [RFC0793] describes the minimum set of TCP + User Commands required of all TCP Implementations. Most operating + systems provide an Application Programming Interface (API) that + allows applications to make use of the services provided by TCP. One + of the most popular APIs is the Sockets API, originally introduced in + the BSD networking package [McKusick et al, 1996]. 10.1. Passive opens and binding sockets When there is already a pending passive OPEN for some local port number, TCP SHOULD NOT allow processes that do not belong to the same user to "reuse" the local port for another passive OPEN. Additionally, reuse of a local port SHOULD default to "off", and be enabled only by an explicit command (e.g., the setsockopt() function of the Sockets API). @@ -3814,27 +2626,27 @@ OPEN (local port, foreign socket, active/passive [, timeout] [, precedence] [, security/compartment] [, options]) -> local connection name When this command is used to perform a passive open (i.e., the active/passive flag is set to passive), the foreign socket parameter may be either fully-specified (to wait for a particular connection) or unspecified (to wait for any call). - As discussed in Section 2.7 of RFC 793 [Postel, 1981c], if there - are several passive OPENs with the same local socket (recorded in - the corresponding TCB), an incoming connection will be matched to - the TCB with the more specific foreign socket. This means that - when the foreign socket of a passive OPEN matches that of the - incoming connection request, that passive OPEN takes precedence - over those passive OPENs with an unspecified foreign socket. + As discussed in Section 2.7 of RFC 793 [RFC0793], if there are + several passive OPENs with the same local socket (recorded in the + corresponding TCB), an incoming connection will be matched to the + TCB with the more specific foreign socket. This means that when + the foreign socket of a passive OPEN matches that of the incoming + connection request, that passive OPEN takes precedence over those + passive OPENs with an unspecified foreign socket. Popular implementations such as the Sockets API let the user specify the local socket as fully-specified {local IP address, local TCP port} pair, or as just the local TCP port (leaving the local IP address unspecified). In the former case, only those connection requests sent to {local port, local IP address} will be accepted. In the latter case, connection requests sent to any of the system's IP addresses will be accepted. In a similar fashion to the generic API described in Section 2.7 of RFC 793, if there is a pending passive OPEN with a fully-specified local socket that @@ -3853,38 +2665,37 @@ port" argument of the "OPEN" command. An implementation MAY relax the aforementioned restriction when the process or system user requesting allocation of such a port number is the same that the process or system user controlling the TCP in the CLOSED or LISTEN states with the same port number. DISCUSSION: As discussed in Section 10.1, the "OPEN" command specified in - Section 3.8 of RFC 793 [Postel, 1981c] can be used to perform - active opens. In case of active opens, the parameter "local port" - will contain a so-called "ephemeral port". While the only - requirement for such an ephemeral port is that the resulting - connection-id is unique, port numbers that are currently in use by - a TCP in the LISTEN state should not be allowed for use as - ephemeral ports. If this rule is not complied, an attacker could - potentially "steal" an incoming connection to a local server - application by issuing a connection request to the victim client - at roughly the same time the client tries to connect to the victim - server application. If the SYN segment corresponding to the - attacker's connection request and the SYN segment corresponding to - the victim client "cross each other in the network", and provided - the attacker is able to know or guess the ephemeral port used by - the client, a TCP simultaneous open scenario would take place, and - the incoming connection request sent by the client would be - matched with the attacker's socket rather than with the victim - server application's socket. + Section 3.8 of RFC 793 [RFC0793] can be used to perform active + opens. In case of active opens, the parameter "local port" will + contain a so-called "ephemeral port". While the only requirement + for such an ephemeral port is that the resulting connection-id is + unique, port numbers that are currently in use by a TCP in the + LISTEN state should not be allowed for use as ephemeral ports. If + this rule is not complied, an attacker could potentially "steal" + an incoming connection to a local server application by issuing a + connection request to the victim client at roughly the same time + the client tries to connect to the victim server application. If + the SYN segment corresponding to the attacker's connection request + and the SYN segment corresponding to the victim client "cross each + other in the network", and provided the attacker is able to know + or guess the ephemeral port used by the client, a TCP simultaneous + open scenario would take place, and the incoming connection + request sent by the client would be matched with the attacker's + socket rather than with the victim server application's socket. As already noted, in order for this attack to succeed, the attacker should be able to guess or know (in advance) the ephemeral port selected by the victim client, and be able to know the right moment to issue a connection request to the victim client. While in many scenarios this may prove to be a difficult task, some factors such as an inadequate ephemeral port selection policy at the victim client could make this attack feasible. It should be noted that most applications based on popular @@ -3908,20 +2719,22 @@ ports. An implementation might choose to relax the aforementioned restriction when the process or system user requesting allocation of such a port number is the same that the process or system user controlling the TCP in the CLOSED or LISTEN states with the same port number. 11. Blind in-window attacks + NOTE: THIS SECTION IS BEING EDITED. + In the last few years awareness has been raised about a number of "blind" attacks that can be performed against TCP by forging TCP segments that fall within the receive window [NISCC, 2004] [Watson, 2004]. The term "blind" refers to the fact that the attacker does not have access to the packets that belong to the attacked connection. The effects of these attacks range from connection resets to data injection. While these attacks were known in the research community, @@ -3947,306 +2760,121 @@ reset attacks against TCP. [Watson, 2004] and [NISCC, 2004] raised awareness about connection-reset attacks that exploit the RST flag of TCP segments. [Ramaiah et al, 2008] noted that carefully crafted SYN segments could also be used to perform connection-reset attacks. This document describes yet two previously undocumented vectors for performing connection-reset attacks: the Precedence field of IP packets that encapsulate TCP segments, and illegal TCP options. 11.1.1. RST flag - TCP SHOULD implement the mitigation for RST-based attacks specified - in [Ramaiah et al, 2008]. - - DISCUSSION: - The RST flag signals a TCP peer that the connection should be aborted. In contrast with the FIN handshake (which gracefully - terminates a TCP connection), an RST segment causes the connection - to be abnormally closed. - - As stated in Section 3.4 of RFC 793 [Postel, 1981c], all reset - segments are validated by checking their Sequence Numbers, with - the Sequence Number considered valid if it is within the receive - window. In the SYN-SENT state, however, an RST is valid if the - Acknowledgement Number acknowledges the SYN segment that - supposedly elicited the reset. - - [Ramaiah et al, 2008] proposes a modification to TCP's transition - diagram to address this attack vector. The counter-measure is a - combination of enforcing a more strict validation check on the - sequence number of reset segments, and the addition of a - "challenge" mechanism. With the implementation of the proposed - mechanism, TCP would behave as follows: - - If the Sequence Number of an RST segment is outside the receive - window, the segment is silently dropped (as stated by RFC 793). - That is, a reset segment is discarded unless it passes the - following check: - - RCV.NXT <= Sequence Number < RCV.NXT+RCV.WND - - If the sequence number falls exactly on the left-edge of the - receive window, the reset is honoured. That is, the connection is - reset if the following condition is true: - - Sequence Number == RCV.NXT - - If an RST segment passes the first check (i.e., it is within the - receive window) but does not pass the second check (i.e., it does - not fall exactly on the left edge of the receive window), an - Acknowledgement segment ("challenge ACK") is set in response: + terminates a TCP connection), an RST segment causes the connection to + be abnormally closed. - + As stated in Section 3.4 of RFC 793 [RFC0793], all reset segments are + validated by checking their Sequence Numbers, with the Sequence + Number considered valid if it is within the receive window. In the + SYN-SENT state, however, an RST is valid if the Acknowledgement + Number acknowledges the SYN segment that supposedly elicited the + reset. - This Acknowledgement segment is referred to as a "challenge ACK" - as, in the event the RST segment that elicited it had been - legitimate (but silently dropped as a result of enforcing the - above checks), the challenge ACK would elicit a new reset segment - that would fall exactly on the left edge of the window and would - thus pass all the above checks, finally resetting the connection. + [RFC5961] proposes a modification to TCP's transition diagram to + address this attack vector. The counter-measure is a combination of + enforcing a more strict validation check on the sequence number of + reset segments, and the addition of a "challenge" mechanism. - We recommend the implementation of this countermeasure. However, - we are aware of patent claims on this counter-measure, and suggest - vendors to research the consequences of the possible patents that - may apply. + We note that we are aware of patent claims on this counter- + measure, and suggest vendors to research the consequences of the + possible patents that may apply. - [US-CERT, 2003a] is an advisory of a firewall system that was - found particularly vulnerable to resets attack because of not - validating the TCP Sequence Number of RST segments. Clearly, all - TCPs (including those in middle-boxes) should validate RST - segments as discussed in this section. + [US-CERT, 2003a] is an advisory of a firewall system that was found + particularly vulnerable to resets attack because of not validating + the TCP Sequence Number of RST segments. Clearly, all TCPs + (including those in middle-boxes) should validate RST segments as + discussed in this section. 11.1.2. SYN flag - Processing of SYN segments received for connections in the - synchronized states SHOULD occur as follows: - - o If a SYN segment is received for a connection in any synchronized - state other than TIME-WAIT, respond with an ACK, applying rate- - throttling. [Ramaiah et al, 2008] - - o If the corresponding connection is in the TIME-WAIT state, then - process the incomming SYN as specified in - [I-D.ietf-tcpm-tcp-timestamps]. - - DISCUSSION: - - Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if a - SYN segment is received with a valid (i.e., "in window") Sequence - Number, an RST segment should be sent in response, and the - connection should be aborted. - - The IETF has published an RFC, "Improving TCP's Resistance to - Blind In-Window Attacks" [Ramaiah et al, 2008] which addresses, - among others, this variant of TCP-based connection-reset attack. - This section describes the counter-measure proposed by the IETF, a - problem that may arise from the implementation of that solution, - and a workaround to it. - - In order to mitigate this attack vector, [Ramaiah et al, 2008] - proposes to change TCP's reaction to SYN segments as follows. - When a SYN segment is received for a connection in any of the - synchronized states, an Acknowledgement (ACK) segment is sent in - response. - - As discussed in [Ramaiah et al, 2008], there is a corner-case that - would not be properly handled by this mechanism. If a host (TCP - A) establishes a TCP connection with a remote peer (TCP B), and - then crashes, reboots and tries to initiate a new incarnation of - the same connection (i.e., a connection with the same four-tuple - as the previous connection) using an Initial Sequence Number equal - to the RCV.NXT value at the remote peer (TCP B), the ACK segment - sent by TCP B in response to the SYN segment would contain an - Acknowledgement number that would be considered valid by TCP A, - and thus an RST segment would not be sent in response to the - Acknowledgement (ACK) segment. As this ACK would not have the SYN - bit set, TCP A (being in the SYN-SENT state) would silently drop - it (as stated on page 68 of RFC 793). After a Retransmission - Timeout (RTO), TCP A would retransmit its SYN segment, which would - lead to the same sequence of events as before. Eventually, TCP A - would timeout, and the connection would be aborted. This is a - corner case in which the introduced change would lead to a non- - desirable behavior. However, we consider this scenario to be - extremely unlikely and, in the event it ever took place, the - connection would nevertheless be aborted after retrying for a - period of USER TIMEOUT seconds. - - However, when this change is implemented exactly as described in - [Ramaiah et al, 2008], the potential of interoperability problems - is introduced, as a heuristic widely incorporated in many TCP - implementations is disabled. - - In a number of scenarios a socket pair may need to be reused while - the corresponding four-tuple is still in the TIME-WAIT state in a - remote TCP peer. For example, a client accessing some service on - a host may try to create a new incarnation of a previous - connection, while the corresponding four-tuple is still in the - TIME-WAIT state at the remote TCP peer (the server). This may - happen if the ephemeral port numbers are being reused too quickly, - either because of a bad policy of selection of ephemeral ports, or - simply because of a high connection rate to the corresponding - service. In such scenarios, the establishment of new connections - that reuse a four-tuple that is in the TIME-WAIT state would fail. - In order to avoid this problem, RFC 1122 [Braden, 1989] states (in - Section 4.2.2.13) that when a connection request is received with - a four-tuple that is in the TIME-WAIT state, the connection - request could be accepted if the sequence number of the incoming - SYN segment is greater than the last sequence number seen on the - previous incarnation of the connection (for that direction of the - data transfer). - - This requirement aims at avoiding the sequence number space of the - new and old incarnations of the connection to overlap, thus - avoiding old segments from the previous incarnation of the - connection to be accepted as valid by the new connection. - - The requirement in [Ramaiah et al, 2008] to disregard SYN segments - received for connections in any of the synchronized states forbids - the implementation of the heuristic described above. As a result, - we argue that the processing of SYN segments proposed in [Ramaiah - et al, 2008] should apply only for connections in any of the - synchronized states other than the TIME-WAIT state. + Section 3.9 (page 71) of RFC 793 [RFC0793] states that if a SYN + segment is received with a valid (i.e., "in window") Sequence Number, + an RST segment should be sent in response, and the connection should + be aborted. This could be leveraged to perform a blind connection- + reset attack. [RFC5961] proposes a change in TCP's state diagram to + mitigate this attack vector. 11.1.3. Security/Compartment - If the security/compartment field of an incoming TCP segment does not - match the value recorded in the corresponding TCB, TCP SHOULD NOT - abort the connection, but simply discard the corresponding packet. - Additionally, this whole event SHOULD be logged as a security - violation. - - DISCUSSION: - - Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if - the IP security/compartment of an incoming segment does not - exactly match the security/compartment in the TCB, a RST segment - should be sent, and the connection should be aborted. - - A discussion of the IP security options relevant to this section - can be found in Section 3.13.2.12, Section 3.13.2.13, and Section - 3.13.2.14 of [CPNI, 2008]. - - This certainly provides another attack vector for performing - connection-reset attacks, as an attacker could forge TCP segments - with a security/compartment that is different from that recorded - in the corresponding TCB and, as a result, the attacked connection - would be reset. - - It is interesting to note that for connections in the ESTABLISHED - state, this check is performed after validating the TCP Sequence - Number and checking the RST bit, but before validating the - Acknowledgement field. Therefore, even if the stricter validation - of the Acknowledgement field (described in Section 3.4) was - implemented, it would not help to mitigate this attack vector. + Section 3.9 (page 71) of RFC 793 [RFC0793] states that if the IP + security/compartment of an incoming segment does not exactly match + the security/compartment in the TCB, a RST segment should be sent, + and the connection should be aborted. This certainly provides + another attack vector for performing connection-reset attacks, as an + attacker could forge TCP segments with a security/compartment that is + different from that recorded in the corresponding TCB and, as a + result, the attacked connection would be reset. - This attack vector can be easily mitigated by relaxing the - reaction to TCP segments with "incorrect" security/compartment - values as specified in this section. + [draft-gont-tcpm-tcp-seccomp-prec-00.txt] aims to update RFC 793 such + that this issue is eliminated. 11.1.4. Precedence - If the Precedence field of an incomming TCP segment does not match - the value recorded in the corresponding TCB, TCP MUST NOT abort the - connection, and MUST instead continue processing the segment as - specified by RFC 793. - - DISCUSSION: - - Section 3.9 (page 71) of RFC 793 [Postel, 1981c] states that if - the IP Precedence of an incoming segment does not exactly match - the Precedence recorded in the TCB, a RST segment should be sent, - and the connection should be aborted. - - This certainly provides another attack vector for performing - connection-reset attacks, as an attacker could forge TCP segments - with a IP Precedence that is different from that recorded in the - corresponding TCB and, as a result, the attacked connection would - be reset. - - It is interesting to note that for connections in the ESTABLISHED - state, this check is performed after validating the TCP Sequence - Number and checking the RST bit, but before validating the - Acknowledgement field. Therefore, even if the stricter validation - of the Acknowledgement field (described in Section 3.4) were - implemented, it would not help to mitigate this attack vector. - - This attack vector can be easily mitigated by relaxing the - reaction to TCP segments with "incorrect" IP Precedence values. - That is, even if the Precedence field does not match the value - recorded in the corresponding TCB, TCP should not abort the - connection, and should instead continue processing the segment as - specified by RFC 793. - - It is interesting to note that resetting a connection due to a - change in the Precedence value might have a negative impact on - interoperability. For example, the packets that correspond to the - connection could temporarily take a different internet path, in - which some middle-box could re-mark the Precedence field (due to - administration policies at the network to be transited). In such - a scenario, an implementation following the advice in RFC 793 - would abort the connection, when the connection would have - probably survived. + Section 3.9 (page 71) of RFC 793 [RFC0793] states that if the IP + precedence of an incoming segment does not exactly match the + precedence in the TCB, a RST segment should be sent, and the + connection should be aborted. This certainly provides another attack + vector for performing connection-reset attacks, as an attacker could + forge TCP segments with a precedence that is different from that + recorded in the corresponding TCB and, as a result, the attacked + connection would be reset. - While the IPv4 Type of Service field (and hence the Precedence - field) has been redefined by the Differentiated Services (DS) - field specified in RFC 2474 [Nichols et al, 1998], RFC 793 - [Postel, 1981c] was never formally updated in this respect. We - note that both legacy systems that have not been upgraded to - implement the differentiated services architecture described in - RFC 2475 [Blake et al, 1998] and current implementations that have - extrapolated the discussion of the Precedence field to the - Differentiated Services field may still be vulnerable to the - connection reset vector discussed in this section. + [draft-gont-tcpm-tcp-seccomp-prec-00.txt] aims to update RFC 793 such + that this issue is eliminated. 11.1.5. Illegal options - TCP MUST silently drop those TCP segments that contain TCP options - with illegal option lengths. - - DISCUSSION: + Section 4.2.2.5 of RFC 1122 [RFC1122] discusses the processing of TCP + options. It states that TCP should be prepared to handle an illegal + option length (e.g., zero) without crashing, and suggests handling + such illegal options by resetting the corresponding connection and + logging the reason. However, this suggested behavior could be + exploited to perform connection-reset attacks. - Section 4.2.2.5 of RFC 1122 [Braden, 1989] discusses the - processing of TCP options. It states that TCP must be able to - receive a TCP option in any segment, and must ignore without error - any option it does not implement. Additionally, it states that - TCP should be prepared to handle an illegal option length (e.g., - zero) without crashing, and suggests handling such illegal options - by resetting the corresponding connection and logging the reason. - However, this suggested behavior could be exploited to perform - connection-reset attacks. Therefore, as discussed in Section 3.10 - of this document, we advise TCP implementations to silently drop - those TCP segments that contain illegal option lengths. + [draft-gont-tcpm-tcp-illegal-option-lengths-00] aims at formally + updating RFC 1122, such that this issue is eliminated. 11.2. Blind data-injection attacks An attacker could try to inject data in the stream of data being transferred on the connection. As with the other attacks described in Section 11 of this document, in order to perform a blind data injection attack the attacker would need to know or guess the four- tuple that identifies the TCP connection to be attacked. Additionally, he should be able to guess a valid ("in window") TCP Sequence Number, and a valid Acknowledgement Number. As discussed in Section 3.4 of this document, [Ramaiah et al, 2008] proposes to enforce a more strict check on the Acknowledgement Number - of incoming segments than that specified in RFC 793 [Postel, 1981c]. + of incoming segments than that specified in RFC 793 [RFC0793]. Implementation of the proposed check requires more packets on the side of the attacker to successfully perform a blind data-injection attack. However, it should be noted that applications concerned with any of the attacks discussed in Section 11 of this document should make use of proper authentication techniques, such as those specified for IPsec in RFC 4301 [Kent and Seo, 2005]. 12. Information leaking + NOTE: THIS SECTION IS BEING EDITED. + 12.1. Remote Operating System detection via TCP/IP stack fingerprinting Clearly, remote Operating System (OS) detection is a useful tool for attackers. Tools such as nmap [Fyodor, 2006b] can usually detect the operating system type and version of a remote system with an amazingly accurate precision. This information can in turn be used by attackers to tailor their exploits to the identified operating system type and version. Evasion of OS fingerprinting can prove to be a very difficult task. @@ -4281,24 +2909,24 @@ 12.1.1. FIN probe TCP MUST silently drop TCP any segments received for a connection in the LISTEN state that do not have the SYN, RST, or ACK flags set. In the rest of the cases, the processing rules in RFC 793 MUST be applied. DISCUSSION: The attacker sends a FIN (or any packet without the SYN or the ACK - flags set) to an open port. RFC 793 [Postel, 1981c] leaves the - reaction to such segments unspecified. As a result, some - implementations silently drop the received segment, while others - respond with a RST. + flags set) to an open port. RFC 793 [RFC0793] leaves the reaction + to such segments unspecified. As a result, some implementations + silently drop the received segment, while others respond with a + RST. 12.1.2. Bogus flag test TCP MUST ignore any flags not supported, and MUST NOT reflect them if a TCP segment is sent in response to the one just received. DISCUSSION: The attacker sends a TCP segment setting at least one bit of the Reserved field. Some implementations ignore this field, while @@ -4364,27 +2992,27 @@ DISCUSSION: [Fyodor, 1998] reports that many implementations differ in the Acknowledgement Number they use in response to segments received for connections in the CLOSED state. In particular, these implementations differ in the way they construct the RST segment that is sent in response to those TCP segments received for connections in the CLOSED state. - RFC 793 [Postel, 1981c] describes (in pages 36-37) how RST - segments are to be generated. According to this RFC, the ACK bit - (and the Acknowledgment Number) is set in a RST only if the - incoming segment that elicited the RST did not have the ACK bit - set (and thus the Sequence Number of the outgoing RST segment must - be set to zero). However, we recommend TCP implementations to set - the ACK bit (and the Acknowledgement Number) in all outgoing RST + RFC 793 [RFC0793] describes (in pages 36-37) how RST segments are + to be generated. According to this RFC, the ACK bit (and the + Acknowledgment Number) is set in a RST only if the incoming + segment that elicited the RST did not have the ACK bit set (and + thus the Sequence Number of the outgoing RST segment must be set + to zero). However, we recommend TCP implementations to set the + ACK bit (and the Acknowledgement Number) in all outgoing RST segments, as it allows for additional validation checks to be enforced at the system receiving the segment. 12.1.6. TCP options Different implementations differ in the TCP options they enable by default. Additionally, they differ in the actual contents of the options, and in the order in which the options are included in a TCP segment. There is currently no recommendation on the order in which to include TCP options in TCP segments. @@ -4454,20 +3082,22 @@ [Rowland, 1996] contains a discussion of covert channels in the TCP/IP protocol suite, with some TCP-based examples. [Giffin et al, 2002] describes the use of TCP timestamps for the establishment of covert channels. [Zander, 2008] contains an extensive bibliography of papers on covert channels, and a list of freely-available tools that implement covert channels with the TCP/IP protocol suite. 14. TCP Port scanning + NOTE: THIS SECTION IS BEING EDITED. + TCP port scanning aims at identifying TCP port numbers on which there is a process listening for incoming connections. That is, it aims at identifying TCPs at the target system that are in the LISTEN state. The following subsections describe different TCP port scanning techniques that have been implemented in freely-available tools. These subsections focus only on those port scanning techniques that exploit features of TCP itself, and not of other communication protocols. For example, the following subsections do not discuss the @@ -4517,25 +3147,25 @@ scanning tool. 14.3. FIN, NULL, and XMAS scans TCP SHOULD respond with an RST when a TCP segment is received for a connection in the LISTEN state, and the incoming segment has neither the SYN bit nor the RST bit set. DISCUSSION: - RFC 793 [Postel, 1981c] states, in page 65, that an incoming - segment that does not have the RST bit set and that is received - for a connection in the fictional state CLOSED causes an RST to be - sent in response. Pages 65-66 of RFC 793 describes the processing - of incoming segments for connections in the state LISTEN, and + RFC 793 [RFC0793] states, in page 65, that an incoming segment + that does not have the RST bit set and that is received for a + connection in the fictional state CLOSED causes an RST to be sent + in response. Pages 65-66 of RFC 793 describes the processing of + incoming segments for connections in the state LISTEN, and implicitly states that an incoming segment that does not have the ACK bit set (and is not a SYN or an RST) should be silently dropped. As a result, an attacker can exploit this situation to perform a port scan by sending TCP segments that do not have the ACK bit set to the target system. When a port is "open" (i.e., there is a TCP in the LISTEN state on the corresponding port), the target system will respond with an RST segment. On the other hand, if the port is "closed" (i.e., there is a TCP in the fictional state CLOSED) @@ -4557,43 +3187,43 @@ It should be clear that while the aforementioned control-bits combinations are the most popular ones, other combinations could be used to exploit this port-scanning vector. For example, the CWR, ECE, and/or any of the Reserved bits could be set in the probe segments. The advantage of this port-scanning technique is that in can bypass some stateless firewalls. However, the downside is that a number of implementations do not comply strictly with RFC 793 - [Postel, 1981c], and thus always respond to the probe segments - with an RST, regardless of whether the port is open or closed. + [RFC0793], and thus always respond to the probe segments with an + RST, regardless of whether the port is open or closed. This port-scanning vector can be easily defeated as rby responding with an RST when a TCP segment is received for a connection in the LISTEN state, and the incoming segment has neither the SYN bit nor the RST bit set. 14.4. Maimon scan If a TCP that is in the CLOSED or LISTEN states receives a TCP segment with both the FIN and ACK bits set, it MUST respond with a RST. DISCUSSION: This port scanning technique was introduced in [Maimon, 1996] with the name "StealthScan" (method #1), and was later incorporated into the nmap tool [Fyodor, 2006b] as the "Maimon scan". This port scanning technique employs TCP segments that have both the FIN and ACK bits sets as the probe segments. While according - to RFC 793 [Postel, 1981c] these segments should elicit an RST + to RFC 793 [RFC0793] these segments should elicit an RST regardless of whether the corresponding port is open or closed, a programming flaw found in a number of TCP implementations has caused some systems to silently drop the probe segment if the corresponding port was open (i.e., there was a TCP in the LISTEN state), and respond with an RST only if the port was closed. Therefore, an RST would indicate that the scanned port is closed, while the absence of a response from the target system would indicate that the scanned port is open. @@ -4627,47 +3257,36 @@ implement this policy. 14.6. ACK scan The so-called "ACK scan" is not really a port-scanning technique (i.e., it does not aim at determining whether a specific port is open or closed), but rather aims at determining whether some intermediate system is filtering TCP segments sent to that specific port number. The probe packet is a TCP segment with the ACK bit set which, - according to RFC 793 [Postel, 1981c] should elicit an RST from the - target system regardless of whether the corresponding TCP port is - open or closed. If no response is received from the target system, - it is assumed that some intermediate system is filtering the probe - packets sent to the target system. + according to RFC 793 [RFC0793] should elicit an RST from the target + system regardless of whether the corresponding TCP port is open or + closed. If no response is received from the target system, it is + assumed that some intermediate system is filtering the probe packets + sent to the target system. It should be noted that this "port scanning" techniques exploits basic TCP processing rules, and therefore cannot be defeated at an end-system. 15. Processing of ICMP error messages by TCP - TCP SHOULD silently ignore received ICMP Source Quench messages. - - TCP SHOULD process ICMP "hard errors" as "soft errors" when they are - received for connections that are in any of he synchronized states. - - TCP SHOULD process ICMP "fragmentation needed and DF bit set" and - ICMPv6 "Packet Too Big" error messages as described in [RFC5927]. - - DISCUSSION: - - [RFC5927] analyzes a number of vulnerabilities based on crafted - ICMP messages, along with possible counter-measures. + [RFC5927] analyzes a number of vulnerabilities based on crafted ICMP + messages, along with possible counter-measures. 16. TCP interaction with the Internet Protocol (IP) - 16.1. TCP-based traceroute The traceroute tool is used to identify the intermediate systems the local system and the destination system. It is usually implemented by sending "probe" packets with increasing IP Time to Live values (starting from 0), without maintaining any state with the final destination. Some traceroute implementations use ICMP "echo request" messages as the probe packets, while others use UDP packets or TCP SYN segments. @@ -4781,22 +3400,22 @@ This document provides a thorough security assessment of the Transmission Control Protocol (TCP), identifies a number of vulnerabilities, and specifies possible counter-measures. Additionally, it provides implementation guidance such that the resilience of TCP implementations is improved. 18. Acknowledgements The author would like to thank (in alphabetical order) David Borman, - Wesley Eddy, and Alfred Hoenes, for providing valuable feedback on - earlier versions of thi document. + Wesley Eddy, Alfred Hoenes, and Michael Scharf, for providing + valuable feedback on earlier versions of thi document. This document is heavily based on the document "Security Assessment of the Transmission Control Protocol (TCP)" [CPNI, 2009] written by Fernando Gont on behalf of CPNI (Centre for the Protection of National Infrastructure). The author would like to thank (in alphabetical order) Randall Atkinson, Guillermo Gont, Alfred Hoenes, Jamshid Mahdavi, Stanislav Shalunov, Michael Welzl, Dan Wing, Andrew Yourtchenko, Michal Zalewski, and Christos Zoulas, for providing valuable feedback on @@ -4805,21 +3424,21 @@ Additionally, the author would like to thank (in alphabetical order) Mark Allman, David Black, Ethan Blanton, David Borman, James Chacon, John Heffner, Jerrold Leichter, Jamshid Mahdavi, Keith Scott, Bill Squier, and David White, who generously answered a number of questions that araised while the aforementioned document was being written. Finally, the author would like to thank CPNI (formely NISCC) for their continued support. -19. References +19. References (to be translated to xml) Abley, J., Savola, P., Neville-Neil, G. 2007. Deprecation of Type 0 Routing Headers in IPv6. RFC 5095. Allman, M. 2003. TCP Congestion Control with Appropriate Byte Counting (ABC). RFC 3465. Allman, M. 2008. Comments On Selecting Ephemeral Ports. Available at: http://www.icir.org/mallman/share/ports-dec08.pdf @@ -5050,23 +3671,20 @@ Protocol. RFC 4301. Klensin, J. 2008. Simple Mail Transfer Protocol. RFC 5321. Ko, Y., Ko, S., and Ko, M. 2001. NIDS Evasion Method named SeolMa. Phrack Magazine, Volume 0x0b, Issue 0x39, phile #0x03 of 0x12. Available at: http://www.phrack.org/issues.html?issue=57&id=3#article Lahey, K. 2000. TCP Problems with Path MTU Discovery. RFC 2923. - Larsen, M., Gont, F. 2008. Port Randomization. IETF Internet-Draft - (draft-ietf-tsvwg-port-randomization-02), work in progress. - Lemon, 2002. Resisting SYN flood DoS attacks with a SYN cache. Proceedings of the BSDCon 2002 Conference, pp 89-98. Maimon, U. 1996. Port Scanning without the SYN flag. Phrack Magazine, Volume Seven, Issue Fourty-Nine, phile #0x0f of 0x10. Available at: http://www.phrack.org/issues.html?issue=49&id=15#article Mathis, M., Mahdavi, J., Floyd, S. Romanow, A. 1996. TCP Selective Acknowledgment Options. RFC 2018. @@ -5287,59 +3903,113 @@ IFIP Communications and Multimedia Security Conference (CMS 2002). Available at: http://www.ieeta.pt/~avz/pubs/CMS02.html Zweig, J., Partridge, C. 1990. TCP Alternate Checksum Options. RFC 1146. 20. References 20.1. Normative References - [I-D.ietf-tcpm-tcp-timestamps] - Gont, F., "Reducing the TIME-WAIT state using TCP - timestamps", draft-ietf-tcpm-tcp-timestamps-03 (work in - progress), December 2010. + [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, + RFC 793, September 1981. - [I-D.ietf-tsvwg-port-randomization] - Larsen, M. and F. Gont, "Transport Protocol Port - Randomization Recommendations", - draft-ietf-tsvwg-port-randomization-09 (work in progress), + [RFC1122] Braden, R., "Requirements for Internet Hosts - + Communication Layers", STD 3, RFC 1122, October 1989. + + [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition + of Explicit Congestion Notification (ECN) to IP", + RFC 3168, September 2001. + + [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion + Control", RFC 5681, September 2009. + + [RFC5961] Ramaiah, A., Stewart, R., and M. Dalal, "Improving TCP's + Robustness to Blind In-Window Attacks", RFC 5961, August 2010. + [RFC6056] Larsen, M. and F. Gont, "Recommendations for Transport- + Protocol Port Randomization", BCP 156, RFC 6056, + January 2011. + [RFC6093] Gont, F. and A. Yourtchenko, "On the Implementation of the TCP Urgent Mechanism", RFC 6093, January 2011. + [RFC6191] Gont, F., "Reducing the TIME-WAIT State Using TCP + Timestamps", BCP 159, RFC 6191, April 2011. + + [RFC6528] Gont, F. and S. Bellovin, "Defending against Sequence + Number Attacks", RFC 6528, February 2012. + 20.2. Informative References [I-D.gont-timestamps-generation] Gont, F. and A. Oppermann, "On the generation of TCP timestamps", draft-gont-timestamps-generation-00 (work in progress), June 2010. + [I-D.ietf-tcpm-3517bis] + Blanton, E., Jarvinen, I., Wang, L., Allman, M., Kojo, M., + and Y. Nishida, "A Conservative Selective Acknowledgment + (SACK)-based Loss Recovery Algorithm for TCP", + draft-ietf-tcpm-3517bis-01 (work in progress), + January 2012. + + [Morris1985] + Morris, R., "A Weakness in the 4.2BSD UNIX TCP/IP + Software", CSTR 117, AT&T Bell Laboratories, Murray Hill, + NJ, 1985. + + [RFC1025] Postel, J., "TCP and IP bake off", RFC 1025, + September 1987. + + [RFC1379] Braden, B., "Extending TCP for Transactions -- Concepts", + RFC 1379, November 1992. + [RFC5927] Gont, F., "ICMP Attacks against TCP", RFC 5927, July 2010. + [RFC6429] Bashyam, M., Jethanandani, M., and A. Ramaiah, "TCP Sender + Clarification for Persist Condition", RFC 6429, + December 2011. + + [Shimomura1995] + Shimomura, T., "Technical details of the attack described + by Markoff in NYT", + http://www.gont.com.ar/docs/post-shimomura-usenet.txt, + Message posted in USENET's comp.security.misc newsgroup, + Message-ID: <3g5gkl$5j1@ariel.sdsc.edu>, 1995. + Appendix A. TODO list A Number of formatting issues still have to be fixed in this document. Among others are: o The ASCII-art corresponding to some figures are still missing. We still have to convert the nice JPGs of the UK CPNI document into ugly ASCII-art. o The references have not yet been converted to xml, but are hardcoded, instead. That's why they may not look as expected Appendix B. Change log (to be removed by the RFC Editor before publication of this document as an RFC) -B.1. Changes from draft-ietf-tcpm-tcp-security-01 +B.1. Changes from draft-ietf-tcpm-tcp-security-02 + o Lots of text has been removed out of the document. + + o The documento track has been changed from BCP to Informational + (RFC2119-language recommendations ahve been removed). + + o Where necessary, stand-alone std tracks documents have been + produced. + +B.2. Changes from draft-ietf-tcpm-tcp-security-01 A Number of formatting issues still have to be fixed in this document. Among others are: o The whole document was reformatted with RFC 1122 style. Author's Address Fernando Gont UK Centre for the Protection of National Infrastructure