--- 1/draft-ietf-dnsop-no-response-issue-12.txt 2019-02-25 15:13:32.416509052 -0800 +++ 2/draft-ietf-dnsop-no-response-issue-13.txt 2019-02-25 15:13:32.468510321 -0800 @@ -1,181 +1,183 @@ Network Working Group M. Andrews Internet-Draft R. Bellis Intended status: Best Current Practice ISC -Expires: May 8, 2019 November 4, 2018 +Expires: August 29, 2019 February 25, 2019 - A Common Operational Problem in DNS Servers - Failure To Respond. - draft-ietf-dnsop-no-response-issue-12 + A Common Operational Problem in DNS Servers - Failure To Communicate. + draft-ietf-dnsop-no-response-issue-13 Abstract The DNS is a query / response protocol. Failing to respond to queries, or responding incorrectly, causes both immediate operational problems and long term problems with protocol development. This document identifies a number of common kinds of queries to which some servers either fail to respond or else respond incorrectly. This document also suggests procedures for TLD and other zone - operators to apply to help reduce / eliminate the problem. + operators to apply to mitigate the problem. The document does not look at the DNS data itself, just the structure of the responses. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on May 8, 2019. + This Internet-Draft will expire on August 29, 2019. Copyright Notice - Copyright (c) 2018 IETF Trust and the persons identified as the + Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Consequences . . . . . . . . . . . . . . . . . . . . . . . . 4 - 3. Common queries kinds that result in non responses. . . . . . 5 + 3. Common queries kinds that result in no or bad responses. . . 5 3.1. Basic DNS Queries . . . . . . . . . . . . . . . . . . . . 5 3.1.1. Zone Existence . . . . . . . . . . . . . . . . . . . 5 3.1.2. Unknown / Unsupported Type Queries . . . . . . . . . 5 3.1.3. DNS Flags . . . . . . . . . . . . . . . . . . . . . . 6 3.1.4. Unknown DNS opcodes . . . . . . . . . . . . . . . . . 6 - 3.1.5. Recursive Queries . . . . . . . . . . . . . . . . . . 6 - 3.1.6. TCP Queries . . . . . . . . . . . . . . . . . . . . . 6 + 3.1.5. TCP Queries . . . . . . . . . . . . . . . . . . . . . 6 3.2. EDNS Queries . . . . . . . . . . . . . . . . . . . . . . 6 3.2.1. EDNS Queries - Version Independent . . . . . . . . . 7 3.2.2. EDNS Queries - Version Specific . . . . . . . . . . . 7 3.2.3. EDNS Options . . . . . . . . . . . . . . . . . . . . 7 3.2.4. EDNS Flags . . . . . . . . . . . . . . . . . . . . . 7 3.2.5. Truncated EDNS Responses . . . . . . . . . . . . . . 8 3.2.6. DO=1 Handling . . . . . . . . . . . . . . . . . . . . 8 3.2.7. EDNS over TCP . . . . . . . . . . . . . . . . . . . . 8 4. Firewalls and Load Balancers . . . . . . . . . . . . . . . . 8 5. Scrubbing Services . . . . . . . . . . . . . . . . . . . . . 9 6. Whole Answer Caches . . . . . . . . . . . . . . . . . . . . . 10 7. Response Code Selection . . . . . . . . . . . . . . . . . . . 10 - 8. Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 + 8. Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 8.1. Testing - Basic DNS . . . . . . . . . . . . . . . . . . . 11 8.1.1. Is The Server Configured For The Zone? . . . . . . . 11 - 8.1.2. Testing Unknown Types . . . . . . . . . . . . . . . . 11 + 8.1.2. Testing Unknown Types . . . . . . . . . . . . . . . . 12 8.1.3. Testing Header Bits . . . . . . . . . . . . . . . . . 12 8.1.4. Testing Unknown Opcodes . . . . . . . . . . . . . . . 14 - 8.1.5. Testing Recursive Queries . . . . . . . . . . . . . . 14 - 8.1.6. Testing TCP . . . . . . . . . . . . . . . . . . . . . 14 + 8.1.5. Testing TCP . . . . . . . . . . . . . . . . . . . . . 15 8.2. Testing - Extended DNS . . . . . . . . . . . . . . . . . 15 - 8.2.1. Testing Minimal EDNS . . . . . . . . . . . . . . . . 15 + 8.2.1. Testing Minimal EDNS . . . . . . . . . . . . . . . . 16 8.2.2. Testing EDNS Version Negotiation . . . . . . . . . . 16 - 8.2.3. Testing Unknown EDNS Options . . . . . . . . . . . . 16 - 8.2.4. Testing Unknown EDNS Flags . . . . . . . . . . . . . 17 + 8.2.3. Testing Unknown EDNS Options . . . . . . . . . . . . 17 + 8.2.4. Testing Unknown EDNS Flags . . . . . . . . . . . . . 18 8.2.5. Testing EDNS Version Negotiation With Unknown EDNS Flags . . . . . . . . . . . . . . . . . . . . . . . . 18 8.2.6. Testing EDNS Version Negotiation With Unknown EDNS Options . . . . . . . . . . . . . . . . . . . . . . . 19 - - 8.2.7. Testing Truncated Responses . . . . . . . . . . . . . 19 + 8.2.7. Testing Truncated Responses . . . . . . . . . . . . . 20 8.2.8. Testing DO=1 Handling . . . . . . . . . . . . . . . . 20 - 8.2.9. Testing EDNS Version Negotiation With DO=1 . . . . . 20 + 8.2.9. Testing EDNS Version Negotiation With DO=1 . . . . . 21 8.2.10. Testing With Multiple Defined EDNS Options . . . . . 21 - 8.3. When EDNS Is Not Supported . . . . . . . . . . . . . . . 21 + 8.3. When EDNS Is Not Supported . . . . . . . . . . . . . . . 22 9. Remediation . . . . . . . . . . . . . . . . . . . . . . . . . 22 10. Security Considerations . . . . . . . . . . . . . . . . . . . 23 - 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 - 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 23 - 12.1. Normative References . . . . . . . . . . . . . . . . . . 23 - 12.2. Informative References . . . . . . . . . . . . . . . . . 24 + 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 + 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 + 12.1. Normative References . . . . . . . . . . . . . . . . . . 24 + 12.2. Informative References . . . . . . . . . . . . . . . . . 25 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 25 1. Introduction The DNS [RFC1034], [RFC1035] is a query / response protocol. Failing to respond to queries, or responding incorrectly, causes both immediate operational problems and long term problems with protocol development. Failure to respond to a query is indistinguishable from packet loss without doing an analysis of query-response patterns. Additionally failure to respond results in unnecessary queries being made by DNS clients, and introduces delays to the resolution process. Due to the inability to distinguish between packet loss and nameservers dropping EDNS [RFC6891] queries, packet loss is sometimes misclassified as lack of EDNS support which can lead to DNSSEC validation failures. - The existance of servers which fail to respond to queries results in + The existence of servers which fail to respond to queries results in developers being hesitant to deploy new standards. Such servers need to be identified and remediated. The DNS has response codes that cover almost any conceivable query response. A nameserver should be able to respond to any conceivable query using them. There should be no need to drop queries because a nameserver does not understand them. - Unless a nameserver is under attack, it should respond to all queries - directed to it. When a nameserver is under attack it may wish to - drop packets. A common attack is to use a nameserver as a amplifier - by sending spoofed packets. This is done because response packets - are bigger than the queries and big amplification factors are - available especially if EDNS is supported. Limiting the rate of + Unless a nameserver is under attack, it should respond to all DNS + requests directed to it. When a nameserver is under attack it may + wish to drop packets. A common attack is to use a nameserver as an + amplifier by sending spoofed packets. This is done because response + packets are bigger than the queries and large amplification factors + are available especially if EDNS is supported. Limiting the rate of responses is reasonable when this is occurring and the client should retry. This however only works if legitimate clients are not being forced to guess whether EDNS queries are accepted or not. While there is still a pool of servers that don't respond to EDNS requests, clients have no way to know if the lack of response is due to packet - loss, EDNS packets not being supported, or rate limiting due to the - server being under attack. Misclassification of server behaviour is - unavoidable when rate limiting is used until the population of - servers which fail to respond to well formed queries drops to near + loss, or EDNS packets not being supported, or rate limiting due to + the server being under attack. Misclassification of server behaviour + is unavoidable when rate limiting is used until the population of + servers which fail to respond to well-formed queries drops to near zero. - A nameserver should not assume that there isn't a delegation to the - server even if it is not configured to serve the zone. Misconfigured + Nameservers should respond to queries even if the queried name is not + for any name the server is configured to answer for. Misconfigured nameservers are a common occurrence in the DNS and receiving queries for zones that the server is not configured for is not necessarily an indication that the server is under attack. Parent zone operators are advised to regularly check that the delegating NS records are consistent with those of the delegated zone and to correct them when they are not [RFC1034]. Doing this regularly should reduce the instances of broken delegations. + This document does not try to identify all possible errors nor does + it supply a exhaustive list of tests. + 2. Consequences Failure to follow the relevant DNS RFCs has multiple adverse consequences. Some are caused directly from the non-compliant behaviour and others as a result of work-arounds forced on recursive servers. Addressing known issues now will reduce future interoperability issues as the DNS protocol continues to evolve and - clients make use of newly-introduced DNS features. + clients make use of newly-introduced DNS features. In particular the + base DNS specification [RFC1034], [RFC1035] and the EDNS + specification [RFC6891], when implemented, need to be followed. Some examples of known consequences include: o The AD flag bit in a response cannot be trusted to mean anything as some servers incorrectly copy the flag bit from the request to the response [RFC1035], [RFC4035]. o Widespread non-response to EDNS queries has lead to recursive servers having to assume that EDNS is not supported and that fallback to plain DNS is required, potentially causing DNSSEC @@ -186,44 +188,42 @@ EDNS option or just EDNS that is causing the non response. In the limited amount of time required to resolve a query before the client times out this is not possible. o Incorrectly returning FORMERR to a EDNS option being present, leads to the recursive server not being able to determine if the server is just broken in the handling of the EDNS option or doesn't support EDNS at all. o Mishandling of unknown query types has contributed to the - abandoning of the transition of the SPF type. + abandonment of the transition of the SPF type. o Mishandling of unknown query types has slowed up the development of DANE and resulted in additional rules being specified to reduce the probability of interacting with a broken server when making TLSA queries. The consequences of servers not following the RFCs will only grow if measures are not put in place to remove non compliant servers from the ecosystem. Working around issues due to non-compliance with RFCs is not sustainable. Most (if not all) of these consequences could have been avoided if action had been taken to remove non-compliant servers as soon as people were aware of them, i.e. to actively seek out broken implementations and servers and inform their developers and operators that they need to fix their servers. -3. Common queries kinds that result in non responses. +3. Common queries kinds that result in no or bad responses. - There are a number common query kinds that fail to respond today. - They are: EDNS queries with and without extensions; queries for - unknown (unallocated) or unsupported types; and filtering of TCP - queries. + This section is broken down into Basic DNS requests and EDNS + requests. 3.1. Basic DNS Queries 3.1.1. Zone Existence Initially, to test existence of the zone, an SOA query should be made. If the SOA record is not returned but some other response is returned, this is an indication of a bad delegation. 3.1.2. Unknown / Unsupported Type Queries @@ -240,38 +240,38 @@ the likelihood of a false positive due to packet loss. 3.1.3. DNS Flags Some servers fail to respond to DNS queries with various DNS flags set, regardless of whether they are defined or still reserved. At the time of writing there are servers that fail to respond to queries with the AD bit set to 1 and servers that fail to respond to queries with the last reserved flag bit set. +3.1.3.1. Recursive Queries + + A non-recursive server is supposed to respond to recursive queries as + if the RD bit is not set [RFC1034]. + 3.1.4. Unknown DNS opcodes The use of previously undefined opcodes is to be expected. Since the DNS was first defined two new opcodes have been added, UPDATE and NOTIFY. NOTIMP is the expected rcode to an unknown or unimplemented opcode. Note: while new opcodes will most probably use the current layout structure for the rest of the message there is no requirement that anything other than the DNS header match. -3.1.5. Recursive Queries - - A non-recursive server is supposed to respond to recursive queries as - if the RD bit is not set [RFC1034]. - -3.1.6. TCP Queries +3.1.5. TCP Queries All DNS servers are supposed to respond to queries over TCP [RFC7766]. While firewalls should not block TCP connection attempts if they do they should cleanly terminate the connection by sending TCP RESET or sending ICMP/ICMPv6 Administratively Prohibited messages. Dropping TCP connections introduces excessive delays to the resolution process. Whether a server accepts TCP connections can be tested by first checking that it responds to UDP queries to confirm that it is up and @@ -312,37 +312,37 @@ version numbers that they do not support. Some servers respond correctly to EDNS version 0 queries but fail to set QR=1 when responding to EDNS versions they do not support. Such answers are discarded or treated as requests. 3.2.3. EDNS Options Some servers fail to respond to EDNS queries with EDNS options set. Unknown EDNS options are supposed to be ignored by the server - [RFC6891], the original EDNS specifion left this behaviour undefined - [RFC2671]. + [RFC6891], the original EDNS specification left this behaviour + undefined [RFC2671]. 3.2.4. EDNS Flags Some servers fail to respond to EDNS queries with EDNS flags set. - Server should ignore EDNS flags they do not understand and should not + Servers should ignore EDNS flags they do not understand and must not add them to the response [RFC6891]. 3.2.5. Truncated EDNS Responses Some EDNS aware servers fail to include an OPT record when a truncated response is sent. An OPT record is supposed to be included in a truncated response [RFC6891]. Some EDNS aware server fail to honour the advertised EDNS buffer size - and send over-sized responses. + and send over-sized responses [RFC6891]. 3.2.6. DO=1 Handling Some nameservers incorrectly only return an EDNS response when the DO bit [RFC3225] is 1 in the query. Additionally some nameservers fail to copy the DO bit to the response despite clearly supporting DNSSEC by returning an RRSIG records to EDNS queries with DO=1. 3.2.7. EDNS over TCP @@ -438,36 +438,39 @@ Choosing the correct response code when responding to DNS queries is important. Response codes should be chosen considering how clients will handle them. For unimplemented opcodes NOTIMP is the expected response code. For example, a new opcode could change the message format by extending the header or changing the structure of the records etc. For unimplemented type codes, and in the absence of other errors, the only valid response is NoError if the qname exists, and NameError - (NXDOMAIN) otherwise. For Meta-RRs NOTIMP may be returned - instead.
 + (NXDOMAIN) otherwise. For Meta-RRs NOTIMP may be returned instead. If a zone cannot be loaded because it contains unimplemented type codes that are not encoded as unknown record types according to - [RFC3597] then the expected response is SERVFAIL. + [RFC3597] then the expected response is SERVFAIL as the whole zone + should be rejected Section 5.2 [RFC1035]. If a zone loads then + Section 4.3.2 [RFC1034] applies. If the server supports EDNS and receives a query with an unsupported EDNS version, the correct response is BADVERS [RFC6891]. - If the server does not support EDNS at all, FORMERR and NOTIMP are - the expected error codes. That said a minimal EDNS server - implementation requires parsing the OPT records and responding with - an empty OPT record. There is no need to interpret any EDNS options - present in the request as unsupported EDNS options are expected to be - ignored [RFC6891]. + If the server does not support EDNS at all, FORMERR is the expected + error code. That said a minimal EDNS server implementation requires + parsing the OPT records and responding with an empty OPT record in + the additional section in most cases. There is no need to interpret + any EDNS options present in the request as unsupported EDNS options + are expected to be ignored [RFC6891]. Additionally EDNS flags can be + ignored. The only part of the OPT record that needs to be examined + is the version field to determine if BADVERS needs to be sent or not. 8. Testing Testing is divided into two sections. "Basic DNS", which all servers should meet, and "Extended DNS", which should be met by all servers that support EDNS (a server is deemed to support EDNS if it gives a valid EDNS response to any EDNS query). If a server does not support EDNS it should still respond to all the tests. These tests query for records at the apex of a zone that the server @@ -503,25 +506,26 @@ expect: status: NOERROR expect: the SOA record to be present in the answer section expect: flag: aa to be present expect: flag: rd to NOT be present expect: flag: ad to NOT be present expect: the OPT record to NOT be present 8.1.2. Testing Unknown Types - Ask for the TYPE1000 record at the configured zone's name. This - query is made with no DNS flag bits set and without EDNS. TYPE1000 - has been chosen for this purpose as IANA is unlikely to allocate this + Ask for the TYPE1000 RRset at the configured zone's name. This query + is made with no DNS flag bits set and without EDNS. TYPE1000 has + been chosen for this purpose as IANA is unlikely to allocate this type in the near future and it is not in a range reserved for private - use [RFC6895]. + use [RFC6895]. Any unallocated type code could be chosen for this + test. We expect no records to be returned in the answer section with the rcode set to NOERROR and the AA and QR bits to be set in the response; RA may also be set [RFC1034]. We do not expect an OPT record to be returned [RFC6891]. Check that queries for an unknown type work: dig +noedns +noad +norec type1000 $zone @$server @@ -604,60 +608,60 @@ expect: MBZ to NOT be in the response (see below) expect: flag: aa to be present expect: flag: rd to NOT be present expect: flag: ad to NOT be present expect: the OPT record to NOT be present MBZ (Must Be Zero) is a dig-specific indication that the flag bit has been incorrectly copied. See Section 4.1.1, [RFC1035] "Z Reserved for future use. Must be zero in all queries and responses." -8.1.4. Testing Unknown Opcodes - - Construct a DNS message that consists of only a DNS header with - opcode set to 15 (currently not allocated), no DNS header bits set - and empty question, answer, authority and additional sections. - - Check that new opcodes are handled: - - dig +noedns +noad +opcode=15 +norec +header-only @$server - - expect: status: NOTIMP - expect: opcode: 15 - expect: all sections to be empty - expect: flag: aa to NOT be present - expect: flag: rd to NOT be present - expect: flag: ad to NOT be present - expect: the OPT record to NOT be present - -8.1.5. Testing Recursive Queries +8.1.3.4. Testing Recursive Queries - Ask for the SOA record of the confgured zone. This query is made + Ask for the SOA record of the configured zone. This query is made with only the RD DNS flag bit set and without EDNS. We expect the SOA record for the zone to be returned in the answer section with the rcode set to NOERROR and the AA, QR and RD bits to be set in the response; RA may also be set [RFC1034]. We do not expect an OPT record to be returned [RFC6891]. Check that recursive queries work: dig +noedns +noad +rec soa $zone @$server expect: status: NOERROR expect: the SOA record to be present in the answer section expect: flag: aa to be present expect: flag: rd to be present expect: flag: ad to NOT be present expect: the OPT record to NOT be present -8.1.6. Testing TCP +8.1.4. Testing Unknown Opcodes + + Construct a DNS message that consists of only a DNS header with + opcode set to 15 (currently not allocated), no DNS header bits set + and empty question, answer, authority and additional sections. + + Check that new opcodes are handled: + + dig +noedns +noad +opcode=15 +norec +header-only @$server + + expect: status: NOTIMP + expect: opcode: 15 + expect: all sections to be empty + expect: flag: aa to NOT be present + expect: flag: rd to NOT be present + expect: flag: ad to NOT be present + expect: the OPT record to NOT be present + +8.1.5. Testing TCP Ask for the SOA record of the configured zone. This query is made with no DNS flag bits set and without EDNS. This query is to be sent using TCP. We expect the SOA record for the zone to be returned in the answer section with the rcode set to NOERROR and the AA and QR bits to be set in the response; RA may also be set [RFC1034]. We do not expect an OPT record to be returned [RFC6891]. @@ -734,21 +738,23 @@ expect: flag: ad to NOT be present +noednsneg has been set as dig supports EDNS version negotiation and we want to see only the response to the initial EDNS version 1 query. 8.2.3. Testing Unknown EDNS Options Ask for the SOA record of the configured zone. This query is made with no DNS flag bits set. EDNS version 0 is used without any EDNS flags. An EDNS option is present with a value that has not yet been - assigned by IANA. We have picked 100 for the example below. + assigned by IANA. We have picked an unassigned code of 100 for the + example below. Any unassigned EDNS option code could have be choose + for this test. We expect the SOA record for the zone to be returned in the answer section with the rcode set to NOERROR and the AA and QR bits to be set in the response; RA may also be set [RFC1034]. We expect an OPT record to be returned. There should be no EDNS flags present in the response. The EDNS version field should be 0 as EDNS versions other than 0 are yet to be specified and there should be no EDNS options present as unknown EDNS options are supposed to be ignored by the server [RFC6891] Section 6.1.2. @@ -819,21 +825,23 @@ expect: an OPT record to be present in the additional section expect: MBZ not to be present expect: EDNS Version 0 in response expect: flag: aa to NOT be present expect: flag: ad to NOT be present 8.2.6. Testing EDNS Version Negotiation With Unknown EDNS Options Ask for the SOA record of the configured zone. This query is made with no DNS flag bits set. EDNS version 1 is used. An unknown EDNS - option is present. We have picked 100 for the example below. + option is present. We have picked an unassigned code of 100 for the + example below. Any unassigned EDNS option code could be chosen for + this test. We expect the SOA record for the zone to NOT be returned in the answer section with the extended rcode set to BADVERS and the QR bit to be set in the response; RA may also be set [RFC1034]. We expect an OPT record to be returned. There should be no EDNS flags present in the response. The EDNS version field should be 0 as EDNS versions other than 0 are yet to be specified and there should be no EDNS options present [RFC6891]. Check that EDNS version 1 queries with unknown options work (EDNS @@ -1011,37 +1019,39 @@ notification or remediation depending on whether they have a direct relationship with the child operator. Many TLD registries, for example, cannot directly contact their registrants and may instead need to communicate through the relevant registrar. In such cases it may be most efficient for registrars to take on the responsibility for testing the name servers of their registrants, since they have a direct relationship. When notification is not effective at correcting problems with a misbehaving name server, parent operators can choose to remove NS - record sets (and glue records below) that refer to the faulty server. - This should only be done as a last resort and with due consideration, - as removal of a delegation can have unanticipated side effects. For - example, other parts of the DNS tree may depend on names below the - removed zone cut, and the parent operator may find themselves - responsible for causing new DNS failures to occur. + record sets (and glue records below) that refer to the faulty server + until the servers are fixed. This should only be done as a last + resort and with due consideration, as removal of a delegation can + have unanticipated side effects. For example, other parts of the DNS + tree may depend on names below the removed zone cut, and the parent + operator may find themselves responsible for causing new DNS failures + to occur. 10. Security Considerations Testing protocol compliance can potentially result in false reports of attempts to break services from Intrusion Detection Services and - firewalls. All of the tests are well formed (though not necessarily + firewalls. All of the tests are well-formed (though not necessarily common) DNS queries. None the tests listed above should cause any harm to a protocol-compliant server. Relaxing firewall settings to ensure EDNS compliance could potentially expose a critical implementation flaw in the nameserver. + Nameservers should be tested for conformance before relaxing firewall settings. When removing delegations for non-compliant servers there can be a knock on effect on other zones that require these zones to be operational for the nameservers addresses to be resolved. 11. IANA Considerations There are no actions for IANA.