--- 1/draft-ietf-dime-doic-rate-control-10.txt 2019-02-11 09:13:23.488755169 -0800 +++ 2/draft-ietf-dime-doic-rate-control-11.txt 2019-02-11 09:13:23.528756125 -0800 @@ -1,19 +1,19 @@ Diameter Maintenance and Extensions (DIME) S. Donovan, Ed. Internet-Draft Oracle Intended status: Standards Track E. Noel -Expires: April 6, 2019 AT&T Labs - October 3, 2018 +Expires: August 15, 2019 AT&T Labs + February 11, 2019 Diameter Overload Rate Control - draft-ietf-dime-doic-rate-control-10 + draft-ietf-dime-doic-rate-control-11 Abstract This specification documents an extension to the Diameter Overload Indication Conveyance (DOIC) [RFC7683] base solution. This extension adds a new overload control abatement algorithm. This abatement algorithm allows for a DOIC reporting node to specify a maximum rate at which a DOIC reacting node sends Diameter requests to the DOIC reporting node. @@ -33,25 +33,25 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on April 6, 2019. + This Internet-Draft will expire on August 15, 2019. Copyright Notice - Copyright (c) 2018 IETF Trust and the persons identified as the + Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -69,56 +69,56 @@ 5.3. Reporting Node Maintenance of Overload Control State . . 7 5.4. Reacting Node Maintenance of Overload Control State . . . 8 5.5. Reporting Node Behavior for Rate Abatement Algorithm . . 8 5.6. Reacting Node Behavior for Rate Abatement Algorithm . . . 9 6. Rate Abatement Algorithm AVPs . . . . . . . . . . . . . . . . 9 6.1. OC-Supported-Features AVP . . . . . . . . . . . . . . . . 9 6.1.1. OC-Feature-Vector AVP . . . . . . . . . . . . . . . . 9 6.2. OC-OLR AVP . . . . . . . . . . . . . . . . . . . . . . . 9 6.2.1. OC-Maximum-Rate AVP . . . . . . . . . . . . . . . . . 10 6.3. Attribute Value Pair Flag Rules . . . . . . . . . . . . . 10 - 7. Rate Based Abatement Algorithm . . . . . . . . . . . . . . . 10 + 7. Rate-Based Abatement Algorithm . . . . . . . . . . . . . . . 10 7.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 11 7.2. Reporting Node Behavior . . . . . . . . . . . . . . . . . 11 7.3. Reacting Node Behavior . . . . . . . . . . . . . . . . . 12 7.3.1. Default Algorithm for Rate-based Control . . . . . . 12 7.3.2. Priority Treatment . . . . . . . . . . . . . . . . . 15 7.3.3. Optional Enhancement: Avoidance of Resonance . . . . 17 8. IANA Consideration . . . . . . . . . . . . . . . . . . . . . 18 8.1. AVP Codes . . . . . . . . . . . . . . . . . . . . . . . . 18 - 8.2. New Registries . . . . . . . . . . . . . . . . . . . . . 18 - 8.3. New DOIC report types . . . . . . . . . . . . . . . . . . 18 + 8.2. OC-Supported-Features . . . . . . . . . . . . . . . . . . 18 + 8.3. New DOIC report types . . . . . . . . . . . . . . . . . . 19 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 11.1. Normative References . . . . . . . . . . . . . . . . . . 19 - 11.2. Informative References . . . . . . . . . . . . . . . . . 19 + 11.2. Informative References . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 1. Introduction This document defines a new Diameter overload control abatement algorithm, the "rate" algorithm. The base Diameter overload specification [RFC7683] defines the "loss" algorithm as the default Diameter overload abatement algorithm. The loss algorithm allows a reporting node (see Section 2) to instruct a reacting node (see Section 2) to reduce the amount of traffic sent to the reporting node by abating (diverting or throttling) a percentage of requests sent to the server. While this can effectively decrease the load handled by the server, it does not directly address cases where the rate of arrival of service requests changes quickly. For instance, if the service requests that result in Diameter transactions increase quickly then the loss algorithm cannot guarantee the load presented to the server remains below a specific rate level. The loss algorithm can be slow to ensure the stability - of reporting nodes when subjected to rapidly changing loads. The + of reporting nodes when subjected to rapidly-changing loads. The "loss" algorithm errs both in throttling too much when there is a dip in offered load, and throttling not enough when there is a spike in offered load. Consider the case where a reacting node is handling 100 service requests per second, where each of these service requests results in one Diameter transaction being sent to a reporting node. If the reporting node is approaching an overload state, or is already in an overload state, it will send a Diameter overload report requesting a percentage reduction in traffic sent when the loss algorithm is used @@ -141,53 +141,53 @@ report requesting that the reacting node abate 91% of requests to get back to the desired 90 transactions per second. However, once the spike has abated and the reacting node handled service requests returns to 100 per second, this will result in just 9 transactions per second being sent to the reporting node, requiring a new overload report setting the reduction percentage back to 10%. This control feedback loop has the potential to make the situation worse by causing wide fluctuations in traffic on multiple nodes in the Diameter network. - One of the benefits of a rate based algorithm over the loss algorithm + One of the benefits of a rate-based algorithm over the loss algorithm is that it better handles spikes in traffic. Instead of sending a request to reduce traffic by a percentage, the rate approach allows the reporting node to specify the maximum number of Diameter requests per second that can be sent to the reporting node. For instance, in this example, the reporting node could send a rate-based request specifying the maximum transactions per second to be 90. The reacting node will send the 90 regardless of whether it is receiving 100 or 1000 service requests per second. - It should be noted that one of the implications of the rate based + It should be noted that one of the implications of the rate-based algorithm is that the reporting node needs to determine how it wants - to distribute it's load over the set of reacting nodes from which it + to distribute its load over the set of reacting nodes from which it is receiving traffic. For instance, if the reporting node is receiving Diameter traffic from 10 reacting nodes and has a capacity of 100 transactions per second then the reporting node could choose to set the rate for each of the reacting nodes to 10 transactions per second. This, of course, is assuming that each of the reacting nodes has equal performance characteristics. The reporting node could also choose to have a high capacity reacting node send 55 transactions per second and the remaining 9 low capacity reacting nodes send 5 transactions per second. The ability of the reporting node to - specify the amount of traffic on a per reacting node basis implies + specify the amount of traffic on a per-reacting-node basis implies that the reporting node must maintain state for each of the reacting nodes. This state includes the current allocation of Diameter - traffic to that reacting node. If the number of reacting node + traffic to that reacting node. If the number of reacting nodes changes, either because new nodes are added, nodes are removed from service or nodes fail, then the reporting node will need to redistribute the maximum Diameter transactions over the new set of reacting nodes. This document extends the base Diameter Overload Indication - Conveyance (DOIC) solution [RFC7683] to add support for the rate + Conveyance (DOIC) solution [RFC7683] to add support for the rate- based overload abatement algorithm. This document draws heavily on work in the SIP Overload Control working group. The definition of the rate abatement algorithm is copied almost verbatim from the SIP Overload Control (SOC) document [RFC7415], with changes focused on making the wording consistent with the DOIC solution and the Diameter protocol. 2. Terminology @@ -236,29 +236,30 @@ 4. Capability Announcement This document defines the rate abatement algorithm (referred to as rate in this document) feature. Support for the rate feature by a DOIC node will be indicated by a new value of the OC-Feature-Vector AVP, as described in Section 6.1.1, per the rules defined in [RFC7683]. Since all nodes that support DOIC are required to support the loss algorithm, DOIC nodes supporting the rate feature will support both - the loss and rate based abatement algorithms. + the loss and rate-based abatement algorithms. DOIC reacting nodes supporting the rate feature MUST indicate support - for both the loss and rate algorithms in the OC-Feature-Vector AVP. + for both the loss and rate algorithms in the OC-Feature-Vector AVP + and MAY indicate support for other algorithms. As defined in [RFC7683], a DOIC reporting node supporting the rate - feature MUST select a single abatement algorithm in the OC-Feature- - Vector AVP and OC-Peer-Algo AVP in the answer message sent to the - DOIC reacting nodes. + feature selects a single abatement algorithm in the OC-Feature-Vector + AVP and OC-Peer-Algo AVP in the answer message sent to the DOIC + reacting nodes. A reporting node can select one abatement algorithm to apply to host and realm reports and a different algorithm to apply to peer reports. For host or realm reports the selected algorithm is reflected in the OC-Feature-Vector AVP sent as part of the OC-Supported- Features AVP included in answer messages for transaction where the request contained an OC-Supported-Features AVP. This is per the procedures defined in [RFC7683]. @@ -274,21 +275,22 @@ [RFC7683] for handling of overload reports when the rate overload abatement algorithm is used. 5.1. Reporting Node Overload Control State A reporting node that uses the rate abatement algorithm SHOULD maintain reporting node Overload Control State (OCS) for each reacting node to which it sends a rate Overload Report (OLR). This is different from the behavior defined in [RFC7683] where a - single loss percentage sent to all reacting nodes. + reporting node sends a single loss percentage to all reacting + nodes. A reporting node SHOULD maintain OCS entries when using the rate abatement algorithm per supported Diameter application, per targeted reacting node and per report type. A rate OCS entry is identified by the tuple of Application-Id, report type and DiameterIdentity of the target of the rate OLR. The rate OCS entry SHOULD include the rate allocated to the reacting note. @@ -303,35 +305,32 @@ 5.2. Reacting Node Overload Control State A reacting node that supports the rate abatement algorithm MUST indicate rate as the selected abatement algorithm in the reacting node OCS based on the OC-Feature-Vector AVP or the OC-Peer-Algo AVP in the received OC-Supported-Features AVP. A reacting node that supports the rate abatement algorithm MUST include the rate specified in the OC-Maximum-Rate AVP included in the - OC-OLR AVP as an element of the abatement algorithm specific portion + OC-OLR AVP as an element of the abatement-algorithm-specific portion of reacting node OCS entries. All other elements for the OCS defined in [RFC7683] and [I-D.ietf-dime-agent-overload] also apply to the reporting nodes OCS when using the rate abatement algorithm. 5.3. Reporting Node Maintenance of Overload Control State A reporting node that has selected the rate overload abatement algorithm and enters an overload condition MUST indicate rate as the - abatement algorithm in the resulting reporting node OCS entries. - - A reporting node that has selected the rate abatement algorithm and - enters an overload condition MUST indicate the selected rate in the + abatement algorithm and MUST indicate the selected rate in the resulting reporting node OCS entries. When selecting the rate algorithm in the response to a request that contained an OC-Supporting-Features AVP with an OC-Feature-Vector AVP indicating support for the rate feature, a reporting node MUST ensure that a reporting node OCS entry exists for the target of the overload report. The target is defined as follows: o For Host reports, the target is the DiameterIdentity contained in the Origin-Host AVP received in the request. @@ -363,39 +362,39 @@ 5.5. Reporting Node Behavior for Rate Abatement Algorithm When in an overload condition with rate selected as the overload abatement algorithm and when handling a request that contained an OC- Supported-Features AVP that indicated support for the rate abatement algorithm, a reporting node SHOULD include an OC-OLR AVP for the rate algorithm using the parameters stored in the reporting node OCS for the target of the overload report. Note: It is also possible for the reporting node to send overload - reports with the rate algorithm indicated when the reporting node - is not in an overloaded state. This could be a strategy to + reports with the rate algorithm indicated even when the reporting + node is not in an overloaded state. This could be a strategy to proactively avoid entering into an overloaded state. Whether to do so is up to local policy. When sending an overload report for the rate algorithm, the OC- Maximum-Rate AVP MUST be included in the OC-OLR AVP and the OC- Reduction-Percentage AVP MUST NOT be included. 5.6. Reacting Node Behavior for Rate Abatement Algorithm When determining if abatement treatment should be applied to a request being sent to a reporting node that has selected the rate - overload abatement algorithm, the reacting node MAY use the algorithm - detailed in Section 7. + overload abatement algorithm, the reacting node can choose to use the + algorithm detailed in Section 7. - Other algorithms for controlling the rate MAY be implemented by - the reacting node. Any algorithm implemented MUST result in the - correct rate of traffic being sent to the reporting node. + Other algorithms for controlling the rate MAY be implemented by the + reacting node. Any algorithm implemented MUST correctly limit the + maximum rate of traffic being sent to the reporting node. Once a determination is made by the reacting node that an individual Diameter request is to be subjected to abatement treatment then the procedures for throttling and diversion defined in [RFC7683] and [I-D.ietf-dime-agent-overload] apply. 6. Rate Abatement Algorithm AVPs 6.1. OC-Supported-Features AVP @@ -452,21 +451,21 @@ +---------+ |AVP flag | |rules | +----+----+ AVP Section | |MUST| Attribute Name Code Defined Value Type |MUST| NOT| +---------------------------------------------------------+----+----+ |OC-Maximum-Rate TBD1 6.2 Unsigned32 | | V | +---------------------------------------------------------+----+----+ -7. Rate Based Abatement Algorithm +7. Rate-Based Abatement Algorithm This section is pulled from [RFC7415], with minor changes needed to make it apply to the Diameter protocol. 7.1. Overview The reporting node is the one protected by the overload control algorithm defined here. The reacting node is the one that abates traffic towards the server. @@ -477,21 +476,21 @@ (e.g. CPU utilization or queuing delay) to evaluate its overload state and estimate a target maximum Diameter request rate in number of requests per second (as opposed to target percent reduction in the case of loss-based abatement). When in an overloaded state, the reporting node uses the OC-OLR AVP to inform reacting nodes of its overload state and of the target Diameter request rate. Upon receiving the overload report with a target maximum Diameter - request rate, each reacting node applies abatement treatment for new + request rate, each reacting node applies overload abatement for new Diameter requests towards the reporting node. 7.2. Reporting Node Behavior The actual algorithm used by the reporting node to determine its overload state and estimate a target maximum Diameter request rate is beyond the scope of this document. However, the reporting node MUST periodically evaluate its overload state and estimate a target Diameter request rate beyond which it @@ -509,22 +508,22 @@ When setting the maximum rate for a particular reacting node, the reporting node may need take into account the workload (e.g. CPU load per request) of the distribution of message types from that reacting node. Furthermore, because the reacting node may prioritize the specific types of messages it sends while under overload restriction, this distribution of message types may be different from the message distribution for that reacting node under non-overload conditions (e.g., either higher or lower CPU load). Note that the value of OC-Maximum-Rate AVP (in request messages per - second) for the rate algorithm provides an upper bound on the traffic - sent by the reacting node to the reporting node. + second) for the rate algorithm provides a loose upper bound on the + traffic sent by the reacting node to the reporting node. In other words, when multiple reacting nodes are being controlled by an overloaded reporting node, at any given time, some reporting nodes may receive requests at a rate below its target maximum Diameter request rate while others above that target rate. But the resulting request rate presented to the overloaded reporting node will converge towards the target Diameter request rate or a lower rate. Upon detection of overload, and the determination to invoke overload controls, the reporting node follows the specifications in [RFC7683] @@ -535,20 +534,22 @@ The reporting node uses the OC-Maximum-Rate AVP defined in this specification to communicate a target maximum Diameter request rate to each of its clients. 7.3. Reacting Node Behavior 7.3.1. Default Algorithm for Rate-based Control A reference algorithm is shown below. + Note that use of // below inidcates a comment. + No priority case: // T: inter-transmission interval, set to 1 / OC-Maximum-Rate // TAU: tolerance parameter // ta: arrival time of the most recent arrival // LCT: arrival time of last Diameter request that // was sent to the server // (initialized to the first arrival time) // X: current value of the leaky bucket counter (initialized to // TAU0) @@ -813,38 +814,44 @@ 'phasing' of the buckets remains. 8. IANA Consideration 8.1. AVP Codes New AVPs defined by this specification are listed in Section 6. All AVP codes are allocated from the 'Authentication, Authorization, and Accounting (AAA) Parameters' AVP Codes registry. -8.2. New Registries +8.2. OC-Supported-Features - There are no new IANA registries introduced by this document. + As indicated in Section 6.1.1, a new allocation is required in the + OC-Feature-Vector AVP. 8.3. New DOIC report types All DOIC report types defined in the future MUST indicate whether or not the rate algorithm can be used with that report type. 9. Security Considerations The rate overload abatement mechanism is an extension to the base Diameter overload mechanism. As such, all of the security considerations outlined in [RFC7683] apply to the rate overload abatement mechanism. + In addition, the rate algorithm could be used to handle DoS attacks + more effectively than the loss algorithm. + 10. Acknowledgements + Lionel Morand for his contributions to the document. + 11. References 11.1. Normative References [I-D.ietf-dime-agent-overload] Donovan, S., "Diameter Agent Overload", draft-ietf-dime- agent-overload-00 (work in progress), December 2014. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119,