--- 1/draft-ietf-rtcweb-rtp-usage-11.txt 2014-02-14 12:14:35.841417874 -0800 +++ 2/draft-ietf-rtcweb-rtp-usage-12.txt 2014-02-14 12:14:35.929420026 -0800 @@ -1,21 +1,21 @@ -RTCWEB Working Group C. Perkins +RTCWEB Working Group C. S. Perkins Internet-Draft University of Glasgow Intended status: Standards Track M. Westerlund -Expires: June 19, 2014 Ericsson +Expires: August 18, 2014 Ericsson J. Ott Aalto University - December 16, 2013 + February 14, 2014 Web Real-Time Communication (WebRTC): Media Transport and Use of RTP - draft-ietf-rtcweb-rtp-usage-11 + draft-ietf-rtcweb-rtp-usage-12 Abstract The Web Real-Time Communication (WebRTC) framework provides support for direct interactive rich communication using audio, video, text, collaboration, games, etc. between two peers' web-browsers. This memo describes the media transport aspects of the WebRTC framework. It specifies how the Real-time Transport Protocol (RTP) is used in the WebRTC context, and gives requirements for which RTP features, profiles, and extensions need to be supported. @@ -28,25 +28,25 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on June 19, 2014. + This Internet-Draft will expire on August 18, 2014. Copyright Notice - Copyright (c) 2013 IETF Trust and the persons identified as the + Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -65,56 +65,54 @@ 4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 9 4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . 10 4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 10 4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 12 5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 12 5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . 13 5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 13 5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 - 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 13 + 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 14 5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 14 5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR) . . . . . . . . . . . . . . . . . . . . . . . 14 5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 15 5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 15 5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 - 5.2.4. Associating RTP Media Streams and Signalling Contexts 15 6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 16 6.1. Negative Acknowledgements and RTP Retransmission . . . . 16 6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . 17 7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . 17 7.1. Boundary Conditions and Circuit Breakers . . . . . . . . 18 7.2. RTCP Limitations for Congestion Control . . . . . . . . . 19 - 7.3. Congestion Control Interoperability and Legacy Systems . 19 + 7.3. Congestion Control Interoperability and Legacy Systems . 20 8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 20 9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . 21 10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 23 12. RTP Implementation Considerations . . . . . . . . . . . . . . 25 12.1. Configuration and Use of RTP Sessions . . . . . . . . . 25 12.1.1. Use of Multiple Media Flows Within an RTP Session . 25 - 12.1.2. Use of Multiple RTP Sessions . . . . . . . . . . . . 27 + 12.1.2. Use of Multiple RTP Sessions . . . . . . . . . . . . 26 12.1.3. Differentiated Treatment of Flows . . . . . . . . . 31 12.2. Source, Flow, and Participant Identification . . . . . . 32 12.2.1. Media Streams . . . . . . . . . . . . . . . . . . . 33 12.2.2. Media Streams: SSRC Collision Detection . . . . . . 33 12.2.3. Media Synchronisation Context . . . . . . . . . . . 34 13. Security Considerations . . . . . . . . . . . . . . . . . . . 35 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 - 15. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 36 - 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 - 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 - 17.1. Normative References . . . . . . . . . . . . . . . . . . 36 - 17.2. Informative References . . . . . . . . . . . . . . . . . 39 + 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 + 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 + 16.1. Normative References . . . . . . . . . . . . . . . . . . 36 + 16.2. Informative References . . . . . . . . . . . . . . . . . 39 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 1. Introduction The Real-time Transport Protocol (RTP) [RFC3550] provides a framework for delivery of audio and video teleconferencing data and other real- time media applications. Previous work has defined the RTP protocol, along with numerous profiles, payload formats, and other extensions. When combined with appropriate signalling, these form the basis for many teleconferencing systems. @@ -199,32 +197,32 @@ same RTP Session are those that share a single SSRC space. That is, those endpoints can see an SSRC identifier transmitted by any one of the other endpoints. An endpoint can see an SSRC either directly in RTP and RTCP packets, or as a contributing source (CSRC) in RTP packets from a mixer. The RTP Session scope is hence decided by the endpoints' network interconnection topology, in combination with RTP and RTCP forwarding strategies deployed by endpoints and any interconnecting middle nodes. WebRTC MediaStream: The MediaStream concept defined by the W3C in - the API. + the API [W3C.WD-mediacapture-streams-20130903]. Other terms are used according to their definitions from the RTP Specification [RFC3550]. 4. WebRTC Use of RTP: Core Protocols The following sections describe the core features of RTP and RTCP - that need to be implemented, along with the mandated RTP profiles and - payload formats. Also described are the core extensions providing - essential features that all WebRTC implementations need to implement - to function effectively on today's networks. + that need to be implemented, along with the mandated RTP profiles. + Also described are the core extensions providing essential features + that all WebRTC implementations need to implement to function + effectively on today's networks. 4.1. RTP and RTCP The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be implemented as the media transport protocol for WebRTC. RTP itself comprises two parts: the RTP data transfer protocol, and the RTP control protocol (RTCP). RTCP is a fundamental and integral part of RTP, and MUST be implemented in all WebRTC applications. The following RTP and RTCP features are sometimes omitted in limited @@ -252,21 +250,22 @@ o Support for multiple synchronisation contexts. Participants that send multiple simultaneous RTP media streams MAY do so as part of a single synchronisation context, using a single RTCP CNAME for all streams and allowing receivers to play the streams out in a synchronised manner, or they MAY use different synchronisation contexts, and hence different RTCP CNAMEs, for some or all of the streams. Receivers MUST support reception of multiple RTCP CNAMEs from each participant in an RTP session. See also Section 4.9. o Support for sending and receiving RTCP SR, RR, SDES, and BYE - packet types, with OPTIONAL support for other RTCP packet types; + packet types, with OPTIONAL support for other RTCP packet types + unless mandated by other parts of this specification; implementations MUST ignore unknown RTCP packet types. Note that additional RTCP Packet types are needed by the RTP/SAVPF Profile (Section 4.2) and the other RTCP extensions (Section 5). o Support for multiple end-points in a single RTP session, and for scaling the RTCP transmission interval according to the number of participants in the session; support for randomised RTCP transmission intervals to avoid synchronisation of RTCP reports; support for RTCP timer reconsideration. @@ -289,57 +288,57 @@ Secure RTP Profile for RTCP-Based Feedback (RTP/SAVPF) [RFC5124], as extended by [RFC7007], MUST be implemented. This builds on the basic RTP/AVP profile [RFC3551], the RTP profile for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure RTP profile (RTP/SAVP) [RFC3711]. The RTCP-based feedback extensions [RFC4585] are needed for the improved RTCP timer model, that allows more flexible transmission of RTCP packets in response to events, rather than strictly according to bandwidth. This is vital for being able to report congestion events. - These extensions also save RTCP bandwidth, and will commonly only use - the full RTCP bandwidth allocation if there are many events that - require feedback. They are also needed to make use of the RTP - conferencing extensions discussed in Section 5.1. + These extensions also allow saving RTCP bandwidth, and an endpoint + will commonly only use the full RTCP bandwidth allocation if there + are many events that require feedback. The timer rules are also + needed to make use of the RTP conferencing extensions discussed in + Section 5.1. Note: The enhanced RTCP timer model defined in the RTP/AVPF profile is backwards compatible with legacy systems that implement - only the base RTP/AVP profile, given some constraints on parameter - configuration such as the RTCP bandwidth value and "trr-int" (the - most important factor for interworking with RTP/AVP end-points via - a gateway is to set the trr-int parameter to a value representing - 4 seconds). + only the RTP/AVP or RTP/SAVP profile, given some constraints on + parameter configuration such as the RTCP bandwidth value and "trr- + int" (the most important factor for interworking with RTP/(S)AVP + end-points via a gateway is to set the trr-int parameter to a + value representing 4 seconds). - The secure RTP profile [RFC3711] is needed to provide media + The secure RTP (SRTP) profile [RFC3711] is needed to provide media encryption, integrity protection, replay protection and a limited form of source authentication. WebRTC implementations MUST NOT send packets using the basic RTP/AVP profile or the RTP/AVPF profile; they MUST employ the full RTP/SAVPF profile to protect all RTP and RTCP - packets that are generated. The default and mandatory to implement - transforms listed in Section 5 of [RFC3711] SHALL apply. - - The keying mechanism(s) to be used with the RTP/SAVPF profile are - defined in Section 5.5 of [I-D.ietf-rtcweb-security-arch] or its - replacement. + packets that are generated (i.e., implementations MUST use SRTP and + SRTCP). The RTP/SAVPF profile MUST be configured using the cipher + suites, DTLS-SRTP protection profiles, keying mechanisms, and other + parameters described in [I-D.ietf-rtcweb-security-arch]. 4.3. Choice of RTP Payload Formats The set of mandatory to implement codecs and RTP payload formats for - WebRTC is not specified in this memo. Implementations can support - any codec for which an RTP payload format and associated signalling - is defined. Implementation cannot assume that the other participants - in an RTP session understand any RTP payload format, no matter how - common; the mapping between RTP payload type numbers and specific - configurations of particular RTP payload formats MUST be agreed - before those payload types/formats can be used. In an SDP context, - this can be done using the "a=rtpmap:" and "a=fmtp:" attributes - associated with an "m=" line. + WebRTC is not specified in this memo, instead they are defined in + separate specifications, such as [I-D.ietf-rtcweb-audio]. + Implementations can support any codec for which an RTP payload format + and associated signalling is defined. Implementation cannot assume + that the other participants in an RTP session understand any RTP + payload format, no matter how common; the mapping between RTP payload + type numbers and specific configurations of particular RTP payload + formats MUST be agreed before those payload types/formats can be + used. In an SDP context, this can be done using the "a=rtpmap:" and + "a=fmtp:" attributes associated with an "m=" line. Endpoints can signal support for multiple RTP payload formats, or multiple configurations of a single RTP payload format, as long as each unique RTP payload format configuration uses a different RTP payload type number. As outlined in Section 4.8, the RTP payload type number is sometimes used to associate an RTP media stream with a signalling context. This association is possible provided unique RTP payload type numbers are used in each context. For example, an RTP media stream can be associated with an SDP "m=" line by comparing the RTP payload type numbers used by the media stream with payload types @@ -395,29 +394,20 @@ session, which will comprise a single transport-layer flow (this will prevent the use of some quality-of-service mechanisms, as discussed in Section 12.1.3). Implementations are REQUIRED to support transport of all RTP media streams, independent of media type, in a single RTP session according to [I-D.ietf-avtcore-multi-media-rtp-session]. If multiple types of media are to be used in a single RTP session, all participants in that session MUST agree to this usage. In an SDP context, [I-D.ietf-mmusic-sdp-bundle-negotiation] can be used to signal this. - It is also possible to use a shim-based approach to run multiple RTP - sessions on a single transport-layer flow. This gives advantages in - some gateway scenarios, and makes it easy to distinguish groups of - RTP media streams that might need distinct processing. One way of - doing this is described in - [I-D.westerlund-avtcore-transport-multiplexing]. At the time of this - writing, there is no consensus to use a shim-based approach in WebRTC - implementations. - Further discussion about when different RTP session structures and multiplexing methods are suitable can be found in [I-D.ietf-avtcore-multiplex-guidelines]. 4.5. RTP and RTCP Multiplexing Historically, RTP and RTCP have been run on separate transport layer addresses (e.g., two UDP ports for each RTP session, one port for RTP and one port for RTCP). With the increased use of Network Address/ Port Translation (NAPT) this has become problematic, since @@ -480,31 +470,33 @@ Use of the "a=ssrc:" attribute to signal SSRC identifiers in an RTP session is OPTIONAL. Implementations MUST be prepared to accept RTP and RTCP packets using SSRCs that have not been explicitly signalled ahead of time. Implementations MUST support random SSRC assignment, and MUST support SSRC collision detection and resolution, according to [RFC3550]. When using signalled SSRC values, collision detection MUST be performed as described in Section 5 of [RFC5576]. It is often desirable to associate an RTP media stream with a non-RTP - context (e.g., to associate an RTP media stream with an "m=" line in - a session description formatted using SDP). If SSRCs are signalled - this is straightforward (in SDP the "a=ssrc:" line will be at the - media level, allowing a direct association with an "m=" line). If - SSRCs are not signalled, the RTP payload type numbers used in an RTP - media stream are often sufficient to associate that media stream with - a signalling context (e.g., if RTP payload type numbers are assigned - as described in Section 4.3 of this memo, the RTP payload types used - by an RTP media stream can be compared with values in SDP "a=rtpmap:" - lines, which are at the media level in SDP, and so map to an "m=" - line). + context. For users of the WebRTC API a mapping between SSRCs and + MediaStreamTracks are provided per Section 11. For gateways or other + usages it is possible to associate an RTP media stream with an "m=" + line in a session description formatted using SDP. If SSRCs are + signalled this is straightforward (in SDP the "a=ssrc:" line will be + at the media level, allowing a direct association with an "m=" line). + If SSRCs are not signalled, the RTP payload type numbers used in an + RTP media stream are often sufficient to associate that media stream + with a signalling context (e.g., if RTP payload type numbers are + assigned as described in Section 4.3 of this memo, the RTP payload + types used by an RTP media stream can be compared with values in SDP + "a=rtpmap:" lines, which are at the media level in SDP, and so map to + an "m=" line). 4.9. Generation of the RTCP Canonical Name (CNAME) The RTCP Canonical Name (CNAME) provides a persistent transport-level identifier for an RTP endpoint. While the Synchronisation Source (SSRC) identifier for an RTP endpoint can change if a collision is detected, or when the RTP application is restarted, its RTCP CNAME is meant to stay unchanged, so that RTP endpoints can be uniquely identified and associated with their RTP media streams within a set of related RTP sessions. For proper functionality, each RTP endpoint @@ -570,22 +562,22 @@ mandated in this memo are implemented). The RTP extensions described in Section 5.1.1 to Section 5.1.6 are designed to be used with centralised conferencing, where an RTP middlebox (e.g., a conference bridge) receives a participant's RTP media streams and distributes them to the other participants. These extensions are not necessary for interoperability; an RTP endpoint that does not implement these extensions will work correctly, but might offer poor performance. Support for the listed extensions will greatly improve the quality of experience and, to provide a - reasonable baseline quality, some these extensions are mandatory to - be supported by WebRTC end-points. + reasonable baseline quality, some of these extensions are mandatory + to be supported by WebRTC end-points. The RTCP conferencing extensions are defined in Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/ AVPF) [RFC4585] and the "Codec Control Messages in the RTP Audio- Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] and are fully usable by the Secure variant of this profile (RTP/SAVPF) [RFC5124]. 5.1.1. Full Intra Request (FIR) The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the @@ -606,32 +598,41 @@ repaired somehow. This is semantically different from the Full Intra Request above as there could be multiple ways to fulfil the request. WebRTC senders MUST understand and react to this feedback message as a loss tolerance mechanism; receivers MAY send PLI messages. 5.1.3. Slice Loss Indication (SLI) The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the encoder that it has detected the loss or corruption of one or more consecutive - macro blocks, and would like to have these repaired somehow. Support - for this feedback message is OPTIONAL as a loss tolerance mechanism. + macro blocks, and would like to have these repaired somehow. It is + RECOMMENDED that receivers generate SLI feedback messages if slices + are lost when using a codec that supports the concept of macro + blocks. A sender that receives an SLI feedback message SHOULD + attempt to repair the lost slice(s). 5.1.4. Reference Picture Selection Indication (RPSI) - Reference Picture Selection Indication (RPSI) is defined in - Section 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video coding + + Reference Picture Selection Indication (RPSI) messages are defined in + Section 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video encoding standards allow the use of older reference pictures than the most - recent one for predictive coding. If such a codec is in used, and if - the encoder has learned about a loss of encoder-decoder - synchronisation, a known-as-correct reference picture can be used for - future coding. The RPSI message allows this to be signalled. - Support for RPSI messages is OPTIONAL. + recent one for predictive coding. If such a codec is in use, and if + the encoder has learnt that encoder-decoder synchronisation has been + lost, then a known as correct reference picture can be used as a base + for future coding. The RPSI message allows this to be signalled. + Receivers that detect that encoder-decoder synchronisation has been + lost SHOULD generate an RPSI feedback message if codec being used + supports reference picture selection. A RTP media stream sender that + receives such an RPSI message SHOULD act on that messages to change + the reference picture, if it is possible to do so within the + available bandwidth constraints. 5.1.5. Temporal-Spatial Trade-off Request (TSTR) The temporal-spatial trade-off request and notification are defined in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used to ask the video encoder to change the trade-off it makes between temporal and spatial resolution, for example to prefer high spatial image quality but low frame rate. Support for TSTR requests and notifications is OPTIONAL. @@ -703,29 +704,20 @@ The Mixer to Client Audio Level header extension [RFC6465] provides the client with the audio level of the different sources mixed into a common mix by a RTP mixer. This enables a user interface to indicate the relative activity level of each session participant, rather than just being included or not based on the CSRC field. This is a pure optimisations of non critical functions, and is hence OPTIONAL to implement. If it is implemented, it is REQUIRED that the header extensions are encrypted according to [RFC6904] since the information contained in these header extensions can be considered sensitive. -5.2.4. Associating RTP Media Streams and Signalling Contexts - (tbd: it seems likely that we need a mechanism to associate RTP media - streams with signalling contexts. The mechanism by which this is - done will likely be some combination of an RTP header extension, - periodic transmission of a new RTCP SDES item, and some signalling - extension. The semantics of those items are not yet settled; see - draft-westerlund-avtext-rtcp-sdes-srcname, draft-ietf-mmusic-msid, - and draft-even-mmusic-application-token for discussion). - 6. WebRTC Use of RTP: Improving Transport Robustness There are tools that can make RTP media streams robust against packet loss and reduce the impact of loss on media quality. However, they all add extra bits compared to a non-robust stream. The overhead of these extra bits needs to be considered, and the aggregate bit-rate MUST be rate controlled to avoid causing network congestion (see Section 7). As a result, improving robustness might require a lower base encoding quality, but has the potential to deliver that quality with fewer errors. The mechanisms described in the following sub- @@ -734,29 +726,29 @@ 6.1. Negative Acknowledgements and RTP Retransmission As a consequence of supporting the RTP/SAVPF profile, implementations can support negative acknowledgements (NACKs) for RTP data packets [RFC4585]. This feedback can be used to inform a sender of the loss of particular RTP packets, subject to the capacity limitations of the RTCP feedback channel. A sender can use this information to optimise the user experience by adapting the media encoding to compensate for known lost packets, for example. - Senders are REQUIRED to understand the Generic NACK message defined - in Section 6.2.1 of [RFC4585], but MAY choose to ignore this feedback - (following Section 4.2 of [RFC4585]). Receivers MAY send NACKs for - missing RTP packets; [RFC4585] provides some guidelines on when to - send NACKs. It is not expected that a receiver will send a NACK for - every lost RTP packet, rather it needs to consider the cost of - sending NACK feedback, and the importance of the lost packet, to make - an informed decision on whether it is worth telling the sender about - a packet loss event. + RTP Media Stream Senders are REQUIRED to understand the Generic NACK + message defined in Section 6.2.1 of [RFC4585], but MAY choose to + ignore this feedback (following Section 4.2 of [RFC4585]). Receivers + MAY send NACKs for missing RTP packets; [RFC4585] provides some + guidelines on when to send NACKs. It is not expected that a receiver + will send a NACK for every lost RTP packet, rather it needs to + consider the cost of sending NACK feedback, and the importance of the + lost packet, to make an informed decision on whether it is worth + telling the sender about a packet loss event. The RTP Retransmission Payload Format [RFC4588] offers the ability to retransmit lost packets based on NACK feedback. Retransmission needs to be used with care in interactive real-time applications to ensure that the retransmitted packet arrives in time to be useful, but can be effective in environments with relatively low network RTT (an RTP sender can estimate the RTT to the receivers using the information in RTCP SR and RR packets, as described at the end of Section 6.4.1 of [RFC3550]). The use of retransmissions can also increase the forward RTP bandwidth, and can potentially worsen the problem if the packet @@ -825,21 +818,21 @@ limiting factor on the capacity of the network path might be the link bandwidth, or it might be competition with other traffic on the link (this can be non-WebRTC traffic, traffic due to other WebRTC flows, or even competition with other WebRTC flows in the same session). An effective media congestion control algorithm is therefore an essential part of the WebRTC framework. However, at the time of this writing, there is no standard congestion control algorithm that can be used for interactive media applications such as WebRTC flows. Some requirements for congestion control algorithms for WebRTC - sessions are discussed in [I-D.jesup-rtp-congestion-reqs], and it is + sessions are discussed in [I-D.ietf-rmcat-cc-requirements], and it is expected that a future version of this memo will mandate the use of a congestion control algorithm that satisfies these requirements. 7.1. Boundary Conditions and Circuit Breakers In the absence of a concrete congestion control algorithm, all WebRTC implementations MUST implement the RTP circuit breaker algorithm that is in described [I-D.ietf-avtcore-rtp-circuit-breakers]. The RTP circuit breaker is designed to enable applications to recognise and react to situations of extreme network congestion. However, since @@ -866,25 +859,25 @@ Experience with the congestion control algorithms of TCP [RFC5681], TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown that feedback on packet arrivals needs to be sent roughly once per round trip time. We note that the real-time media traffic might not have to adapt to changing path conditions as rapidly as needed for the elastic applications TCP was designed for, but frequent feedback is still needed to allow the congestion control algorithm to track the path dynamics. - The total RTCP bandwidth is limited in its transmission rate to a - fraction of the RTP traffic (by default 5%). RTCP packets are larger - than, e.g., TCP ACKs (even when non-compound RTCP packets are used). - The RTP media stream bit rate thus limits the maximum feedback rate - as a function of the mean RTCP packet size. + The total RTCP bandwidth is normally limited in its transmission rate + to a fraction of the nominal RTP traffic (by default 5%). RTCP + packets are larger than, e.g., TCP ACKs (even when non-compound RTCP + packets are used). The RTP media stream bit rate thus limits the + maximum feedback rate as a function of the mean RTCP packet size. Interactive communication might not be able to afford waiting for packet losses to occur to indicate congestion, because an increase in play out delay due to queuing (most prominent in wireless networks) can easily lead to packets being dropped due to late arrival at the receiver. Therefore, more sophisticated cues might need to be reported -- to be defined in a suitable congestion control framework as noted above -- which, in turn, increase the report size again. For example, different RTCP XR report blocks (jointly) provide the necessary details to implement a variety of congestion control @@ -963,21 +956,22 @@ packets, whether or not they were signalled. There is no requirement that the data contained in such reports be used, or exposed to the Javascript application, however. 9. WebRTC Use of RTP: Future Extensions It is possible that the core set of RTP protocols and RTP extensions specified in this memo will prove insufficient for the future needs of WebRTC applications. In this case, future updates to this memo MUST be made following the Guidelines for Writers of RTP Payload - Format Specifications [RFC2736] and Guidelines for Extending the RTP + Format Specifications [RFC2736], How to Write an RTP Payload Format + [I-D.ietf-payload-rtp-howto] and Guidelines for Extending the RTP Control Protocol [RFC5968], and SHOULD take into account any future guidelines for extending RTP and related protocols that have been developed. Authors of future extensions are urged to consider the wide range of environments in which RTP is used when recommending extensions, since extensions that are applicable in some scenarios can be problematic in others. Where possible, the WebRTC framework will adopt RTP extensions that are of general utility, to enable easy implementation of a gateway to other applications using RTP, rather than adopt @@ -1003,23 +997,21 @@ indicating to the WebRTC end-point that the RTP/SAVPF is used, and limiting the usage of the "a=rtcp:" attribute to indicate a trr- int value of 4 seconds. Transport Information: Source and destination IP address(s) and ports for RTP and RTCP MUST be signalled for each RTP session. In WebRTC these transport addresses will be provided by ICE that signals candidates and arrives at nominated candidate address pairs. If RTP and RTCP multiplexing [RFC5761] is to be used, such that a single port is used for RTP and RTCP flows, this MUST be - signalled (see Section 4.5). If several RTP sessions are to be - multiplexed onto a single transport layer flow, this MUST also be - signalled (see Section 4.4). + signalled (see Section 4.5). RTP Payload Types, media formats, and format parameters: The mapping between media type names (and hence the RTP payload formats to be used), and the RTP payload type numbers MUST be signalled. Each media type MAY also have a number of media type parameters that MUST also be signalled to configure the codec and RTP payload format (the "a=fmtp:" line from SDP). Section 4.3 of this memo discusses requirements for uniqueness of payload types. RTP Extensions: The RTP extensions to be used SHOULD be agreed upon, @@ -1048,58 +1040,49 @@ 11. WebRTC API Considerations The WebRTC API [W3C.WD-webrtc-20130910] and the Media Capture and Streams API [W3C.WD-mediacapture-streams-20130903] defines and uses the concept of a MediaStream that consists of zero or more MediaStreamTracks. A MediaStreamTrack is an individual stream of media from any type of media source like a microphone or a camera, but also conceptual sources, like a audio mix or a video composition, are possible. The MediaStreamTracks within a MediaStream need to be - possible to play out synchronised. + possible to play out synchronised. The below text uses the + terminology from [I-D.ietf-avtext-rtp-grouping-taxonomy]. A MediaStreamTrack's realisation in RTP in the context of an RTCPeerConnection consists of a source packet stream identified with an SSRC within an RTP session part of the RTCPeerConnection. The MediaStreamTrack can also result in additional packet streams, and thus SSRCs, in the same RTP session. These can be dependent packet streams from scalable encoding of the source stream associated with the MediaStreamTrack, if such a media encoder is used. They can also be redundancy packet streams, these are created when applying Forward Error Correction (Section 6.2) or RTP retransmission (Section 6.1) to the source packet stream. - Note: It is quite likely that a simulcast specification will - result in multiple source packet streams, and thus SSRCs, based on - the same source stream associated with the MediaStreamTrack being - simulcasted. Each such source packet stream can have dependent - and redundant packet streams associated with them. However, the - final conclusion on this awaits the specification of simulcast. - Simulcast will also require signalling to correctly separate and - associate the source packet streams with their sets of dependent - and/or redundant streams. - It is important to note that the same media source can be feeding multiple MediaStreamTracks. As different sets of constraints or other parameters can be applied to the MediaStreamTrack, each MediaStreamTrack instance added to a RTCPeerConnection SHALL result in an independent source packet stream, with its own set of associated packet streams, and thus different SSRC(s). It will depend on applied constraints and parameters if the source stream and the encoding configuration will be identical between different MediaStreamTracks sharing the same media source. Thus it is possible for multiple source packet streams to share encoded streams (but not packet streams), but this is an implementation choice to try to utilise such optimisations. Note that such optimizations would need to take into account that the constraints for one of the MediaStreamTracks can at any moment change, meaning that the encoding - configurations should no longer be identical. + configurations might no longer be identical. The same MediaStreamTrack can also be included in multiple MediaStreams, thus multiple sets of MediaStreams can implicitly need to use the same synchronisation base. To ensure that this works in all cases, and don't forces a endpoint to change synchronisation base and CNAME in the middle of a ongoing delivery of any packet streams, which would cause media disruption; all MediaStreamTracks and their associated SSRCs originating from the same endpoint MUST be sent using the same CNAME within one RTCPeerConnection as well as across all RTCPeerConnections part of the same communication session @@ -1151,20 +1134,26 @@ the SSRC is done as specified in "Cross Session Stream Identification in the Session Description Protocol" [I-D.ietf-mmusic-msid]. This document [I-D.ietf-mmusic-msid] also defines, in section 4.1, how to map unknown source packet stream SSRCs to MediaStreamTracks and MediaStreams. Commonly the RTP Payload Type of any incoming packets will reveal if the packet stream is a source stream or a redundancy or dependent packet stream. The association to the correct source packet stream depends on the payload format in use for the packet stream. + Finally this specification puts a requirement on the WebRTC API to + realize a method for determining the CSRC list (Section 4.1) as well + as the Mixer-to-Client audio levels (Section 5.2.3) (when supported) + and the basic requirements for this is further discussed in + Section 12.2.1. + 12. RTP Implementation Considerations The following discussion provides some guidance on the implementation of the RTP features described in this memo. The focus is on a WebRTC end-point implementation perspective, and while some mention is made of the behaviour of middleboxes, that is not the focus of this memo. 12.1. Configuration and Use of RTP Sessions A WebRTC end-point will be a simultaneous participant in one or more @@ -1253,32 +1242,32 @@ To separate media with different purposes: An end-point might want to send media streams that have different purposes on different RTP sessions, to make it easy for the peer device to distinguish them. For example, some centralised multiparty conferencing systems display the active speaker in high resolution, but show low resolution "thumbnails" of other participants. Such systems might configure the end-points to send simulcast high- and low- resolution versions of their video using separate RTP sessions, to simplify the operation of the central mixer. In the WebRTC - context this appears to be most easily accomplished by - establishing multiple RTCPeerConnection all being feed the same - set of WebRTC MediaStreams. Each RTCPeerConnection is then + context this is currently possible to accomplished by establishing + multiple WebRTC MediaStreamTracks that have the same media source + in one (or more) RTCPeerConnection. Each MediaStreamTrack is then configured to deliver a particular media quality and thus media bit-rate, and will produce an independently encoded version with the codec parameters agreed specifically in the context of that - RTCPeerConnection. The central mixer can always distinguish - packets corresponding to the low- and high-resolution streams by + RTCPeerConnection. The central mixer can distinguish packets + corresponding to the low- and high-resolution streams by inspecting their SSRC, RTP payload type, or some other information - contained in RTP header extensions or RTCP packets, but it can be - easier to distinguish the flows if they arrive on separate RTP - sessions on separate UDP ports. + contained in RTP payload, RTP header extension or RTCP packets, + but it can be easier to distinguish the flows if they arrive on + separate RTP sessions on separate UDP ports. To directly connect with multiple peers: A multi-party conference does not need to use a central mixer. Rather, a multi-unicast mesh can be created, comprising several distinct RTP sessions, with each participant sending RTP traffic over a separate RTP session (that is, using an independent RTCPeerConnection object) to every other participant, as shown in Figure 1. This topology has the benefit of not requiring a central mixer node that is trusted to access and manipulate the media data. The downside is that it increases the used bandwidth at each sender by requiring @@ -1431,32 +1420,32 @@ 12.1.3. Differentiated Treatment of Flows There are use cases for differentiated treatment of RTP media streams. Such differentiation can happen at several places in the system. First of all is the prioritization within the end-point sending the media, which controls, both which RTP media streams that will be sent, and their allocation of bit-rate out of the current available aggregate as determined by the congestion control. - It is expected that the WebRTC API will allow the application to - indicate relative priorities for different MediaStreamTracks. These - priorities can then be used to influence the local RTP processing, - especially when it comes to congestion control response in how to - divide the available bandwidth between the RTP flows. Any changes in - relative priority will also need to be considered for RTP flows that - are associated with the main RTP flows, such as RTP retransmission - streams and FEC. The importance of such associated RTP traffic flows - is dependent on the media type and codec used, in regards to how - robust that codec is to packet loss. However, a default policy might - to be to use the same priority for associated RTP flows as for the - primary RTP flow. + It is expected that the WebRTC API [W3C.WD-webrtc-20130910] will + allow the application to indicate relative priorities for different + MediaStreamTracks. These priorities can then be used to influence + the local RTP processing, especially when it comes to congestion + control response in how to divide the available bandwidth between the + RTP flows. Any changes in relative priority will also need to be + considered for RTP flows that are associated with the main RTP flows, + such as RTP retransmission streams and FEC. The importance of such + associated RTP traffic flows is dependent on the media type and codec + used, in regards to how robust that codec is to packet loss. + However, a default policy might to be to use the same priority for + associated RTP flows as for the primary RTP flow. Secondly, the network can prioritize packet flows, including RTP media streams. Typically, differential treatment includes two steps, the first being identifying whether an IP packet belongs to a class that has to be treated differently, the second the actual mechanism to prioritize packets. This is done according to three methods: DiffServ: The end-point marks a packet with a DiffServ code point to indicate to the network that the packet belongs to a particular class. @@ -1487,24 +1476,28 @@ use them on some set of RTP media streams. 2) The information needs to be propagated to the operating system when transmitting the packet. Details of this process are outside the scope of this memo and are further discussed in "DSCP and other packet markings for RTCWeb QoS" [I-D.dhesikan-tsvwg-rtcweb-qos]. For packet based marking schemes it might be possible to mark individual RTP packets differently based on the relative priority of the RTP payload. For example video codecs that have I, P, and B pictures could prioritise any payloads carrying only B frames less, - as these are less damaging to loose. As default policy all RTP - packets related to a media stream ought to be provided with the same - prioritization; per-packet prioritization is outside the scope of - this memo, but might be specified elsewhere in future. + as these are less damaging to loose. However, depending on the QoS + mechanism and what markings that are applied, this can result in not + only different packet drop probabilities but also packet reordering, + see [I-D.dhesikan-tsvwg-rtcweb-qos] for further discussion. As + default policy all RTP packets related to a media stream ought to be + provided with the same prioritization; per-packet prioritization is + outside the scope of this memo, but might be specified elsewhere in + future. It is also important to consider how RTCP packets associated with a particular RTP media flow need to be marked. RTCP compound packets with Sender Reports (SR), ought to be marked with the same priority as the RTP media flow itself, so the RTCP-based round-trip time (RTT) measurements are done using the same flow priority as the media flow experiences. RTCP compound packets containing RR packet ought to be sent with the priority used by the majority of the RTP media flows reported on. RTCP packets containing time-critical feedback packets can use higher priority to improve the timeliness and likelihood of @@ -1649,84 +1642,68 @@ extensions (Section 5.2.2) or the mixer-to-client audio level header extensions (Section 5.2.3). 14. IANA Considerations This memo makes no request of IANA. Note to RFC Editor: this section is to be removed on publication as an RFC. -15. Open Issues - - This section contains a summary of the open issues or to be done - things noted in the document: - - 1. tbd: The discussion at IETF 88 confirmed that there is broad - agreement to support simulcast, however the method for achieving - simulcast of a media source has to be decided. - -16. Acknowledgements +15. Acknowledgements The authors would like to thank Bernard Aboba, Harald Alvestrand, Cary Bran, Charles Eckel, Cullen Jennings, Dan Romascanu, and the other members of the IETF RTCWEB working group for their valuable feedback. -17. References +16. References -17.1. Normative References +16.1. Normative References [I-D.ietf-avtcore-multi-media-rtp-session] Westerlund, M., Perkins, C., and J. Lennox, "Sending Multiple Types of Media in a Single RTP Session", draft- - ietf-avtcore-multi-media-rtp-session-03 (work in - progress), July 2013. + ietf-avtcore-multi-media-rtp-session-04 (work in + progress), January 2014. [I-D.ietf-avtcore-rtp-circuit-breakers] Perkins, C. and V. Singh, "Multimedia Congestion Control: Circuit Breakers for Unicast RTP Sessions", draft-ietf- - avtcore-rtp-circuit-breakers-03 (work in progress), July - 2013. + avtcore-rtp-circuit-breakers-04 (work in progress), + January 2014. [I-D.ietf-avtcore-rtp-multi-stream-optimisation] - Lennox, J., Westerlund, M., Wu, W., and C. Perkins, + Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, "Sending Multiple Media Streams in a Single RTP Session: Grouping RTCP Reception Statistics and Other Feedback", draft-ietf-avtcore-rtp-multi-stream-optimisation-00 (work in progress), July 2013. [I-D.ietf-avtcore-rtp-multi-stream] Lennox, J., Westerlund, M., Wu, W., and C. Perkins, "Sending Multiple Media Streams in a Single RTP Session", - draft-ietf-avtcore-rtp-multi-stream-01 (work in progress), - July 2013. + draft-ietf-avtcore-rtp-multi-stream-02 (work in progress), + January 2014. [I-D.ietf-avtext-multiple-clock-rates] Petit-Huguenin, M. and G. Zorn, "Support for Multiple Clock Rates in an RTP Session", draft-ietf-avtext- - multiple-clock-rates-11 (work in progress), November - 2013. - - [I-D.ietf-mmusic-sdp-bundle-negotiation] - Holmberg, C., Alvestrand, H., and C. Jennings, - "Multiplexing Negotiation Using Session Description - Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- - bundle-negotiation-05 (work in progress), October 2013. + multiple-clock-rates-11 (work in progress), November 2013. [I-D.ietf-rtcweb-security-arch] Rescorla, E., "WebRTC Security Architecture", draft-ietf- - rtcweb-security-arch-07 (work in progress), July 2013. + rtcweb-security-arch-08 (work in progress), January 2014. [I-D.ietf-rtcweb-security] Rescorla, E., "Security Considerations for WebRTC", draft- - ietf-rtcweb-security-05 (work in progress), July 2013. + ietf-rtcweb-security-06 (work in progress), January 2014. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP Payload Format Specifications", BCP 36, RFC 2736, December 1999. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time @@ -1803,74 +1780,91 @@ August 2013. [RFC7022] Begen, A., Perkins, C., Wing, D., and E. Rescorla, "Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)", RFC 7022, September 2013. [W3C.WD-mediacapture-streams-20130903] Burnett, D., Bergkvist, A., Jennings, C., and A. Narayanan, "Media Capture and Streams", World Wide Web Consortium WD WD-mediacapture-streams-20130903, September - 2013, . + 2013, . [W3C.WD-webrtc-20130910] Bergkvist, A., Burnett, D., Jennings, C., and A. Narayanan, "WebRTC 1.0: Real-time Communication Between Browsers", World Wide Web Consortium WD WD- webrtc-20130910, September 2013, . -17.2. Informative References +16.2. Informative References [I-D.dhesikan-tsvwg-rtcweb-qos] Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and other packet markings for RTCWeb QoS", draft-dhesikan- - tsvwg-rtcweb-qos-03 (work in progress), December 2013. + tsvwg-rtcweb-qos-04 (work in progress), January 2014. [I-D.ietf-avtcore-multiplex-guidelines] Westerlund, M., Perkins, C., and H. Alvestrand, "Guidelines for using the Multiplexing Features of RTP to Support Multiple Media Streams", draft-ietf-avtcore- - multiplex-guidelines-01 (work in progress), July 2013. + multiplex-guidelines-02 (work in progress), January 2014. [I-D.ietf-avtcore-rtp-topologies-update] Westerlund, M. and S. Wenger, "RTP Topologies", draft- ietf-avtcore-rtp-topologies-update-01 (work in progress), October 2013. + [I-D.ietf-avtext-rtp-grouping-taxonomy] + Lennox, J., Gross, K., Nandakumar, S., and G. Salgueiro, + "A Taxonomy of Grouping Semantics and Mechanisms for Real- + Time Transport Protocol (RTP) Sources", draft-ietf-avtext- + rtp-grouping-taxonomy-00 (work in progress), November + 2013. + [I-D.ietf-mmusic-msid] - Alvestrand, H., "Cross Session Stream Identification in - the Session Description Protocol", draft-ietf-mmusic- - msid-02 (work in progress), November 2013. + Alvestrand, H., "WebRTC MediaStream Identification in the + Session Description Protocol", draft-ietf-mmusic-msid-04 + (work in progress), February 2014. + + [I-D.ietf-mmusic-sdp-bundle-negotiation] + Holmberg, C., Alvestrand, H., and C. Jennings, + "Multiplexing Negotiation Using Session Description + Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- + bundle-negotiation-05 (work in progress), October 2013. + + [I-D.ietf-payload-rtp-howto] + Westerlund, M., "How to Write an RTP Payload Format", + draft-ietf-payload-rtp-howto-13 (work in progress), + January 2014. + + [I-D.ietf-rmcat-cc-requirements] + Jesup, R., "Congestion Control Requirements For RMCAT", + draft-ietf-rmcat-cc-requirements-02 (work in progress), + February 2014. + + [I-D.ietf-rtcweb-audio] + Valin, J. and C. Bran, "WebRTC Audio Codec and Processing + Requirements", draft-ietf-rtcweb-audio-05 (work in + progress), February 2014. [I-D.ietf-rtcweb-overview] Alvestrand, H., "Overview: Real Time Protocols for Brower- based Applications", draft-ietf-rtcweb-overview-08 (work in progress), September 2013. [I-D.ietf-rtcweb-use-cases-and-requirements] Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- Time Communication Use-cases and Requirements", draft- - ietf-rtcweb-use-cases-and-requirements-12 (work in - progress), October 2013. - - [I-D.jesup-rtp-congestion-reqs] - Jesup, R. and H. Alvestrand, "Congestion Control - Requirements For Real Time Media", draft-jesup-rtp- - congestion-reqs-00 (work in progress), March 2012. - - [I-D.westerlund-avtcore-transport-multiplexing] - Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP - Sessions onto a Single Lower-Layer Transport", draft- - westerlund-avtcore-transport-multiplexing-07 (work in - progress), October 2013. + ietf-rtcweb-use-cases-and-requirements-14 (work in + progress), February 2014. [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control Protocol Extended Reports (RTCP XR)", RFC 3611, November 2003. [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion Control Protocol (DCCP) Congestion Control ID 2: TCP-like Congestion Control", RFC 4341, March 2006. [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for