--- 1/draft-ietf-rtcweb-jsep-20.txt 2017-07-03 17:13:27.067353331 -0700 +++ 2/draft-ietf-rtcweb-jsep-21.txt 2017-07-03 17:13:27.279358413 -0700 @@ -1,186 +1,186 @@ Network Working Group J. Uberti Internet-Draft Google Intended status: Standards Track C. Jennings -Expires: September 30, 2017 Cisco +Expires: January 4, 2018 Cisco E. Rescorla, Ed. Mozilla - March 29, 2017 + July 3, 2017 - Javascript Session Establishment Protocol - draft-ietf-rtcweb-jsep-20 + JavaScript Session Establishment Protocol + draft-ietf-rtcweb-jsep-21 Abstract - This document describes the mechanisms for allowing a Javascript + This document describes the mechanisms for allowing a JavaScript application to control the signaling plane of a multimedia session via the interface specified in the W3C RTCPeerConnection API, and discusses how this relates to existing signaling protocols. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on September 30, 2017. + This Internet-Draft will expire on January 4, 2018. Copyright Notice Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. General Design of JSEP . . . . . . . . . . . . . . . . . 4 - 1.2. Other Approaches Considered . . . . . . . . . . . . . . . 5 + 1.2. Other Approaches Considered . . . . . . . . . . . . . . . 6 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 - 3. Semantics and Syntax . . . . . . . . . . . . . . . . . . . . 6 - 3.1. Signaling Model . . . . . . . . . . . . . . . . . . . . . 6 + 3. Semantics and Syntax . . . . . . . . . . . . . . . . . . . . 7 + 3.1. Signaling Model . . . . . . . . . . . . . . . . . . . . . 7 3.2. Session Descriptions and State Machine . . . . . . . . . 7 - 3.3. Session Description Format . . . . . . . . . . . . . . . 10 - 3.4. Session Description Control . . . . . . . . . . . . . . . 10 - 3.4.1. RtpTransceivers . . . . . . . . . . . . . . . . . . . 10 - 3.4.2. RtpSenders . . . . . . . . . . . . . . . . . . . . . 11 - 3.4.3. RtpReceivers . . . . . . . . . . . . . . . . . . . . 11 - 3.5. ICE . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 - 3.5.1. ICE Gathering Overview . . . . . . . . . . . . . . . 11 - 3.5.2. ICE Candidate Trickling . . . . . . . . . . . . . . . 12 + 3.3. Session Description Format . . . . . . . . . . . . . . . 11 + 3.4. Session Description Control . . . . . . . . . . . . . . . 11 + 3.4.1. RtpTransceivers . . . . . . . . . . . . . . . . . . . 11 + 3.4.2. RtpSenders . . . . . . . . . . . . . . . . . . . . . 12 + 3.4.3. RtpReceivers . . . . . . . . . . . . . . . . . . . . 12 + 3.5. ICE . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 3.5.1. ICE Gathering Overview . . . . . . . . . . . . . . . 12 + 3.5.2. ICE Candidate Trickling . . . . . . . . . . . . . . . 13 3.5.2.1. ICE Candidate Format . . . . . . . . . . . . . . 13 - 3.5.3. ICE Candidate Policy . . . . . . . . . . . . . . . . 13 - 3.5.4. ICE Candidate Pool . . . . . . . . . . . . . . . . . 14 - 3.6. Video Size Negotiation . . . . . . . . . . . . . . . . . 15 - 3.6.1. Creating an imageattr Attribute . . . . . . . . . . . 15 - 3.6.2. Interpreting an imageattr Attribute . . . . . . . . . 16 - 3.7. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 17 - 3.8. Interactions With Forking . . . . . . . . . . . . . . . . 18 - 3.8.1. Sequential Forking . . . . . . . . . . . . . . . . . 19 - 3.8.2. Parallel Forking . . . . . . . . . . . . . . . . . . 19 - 4. Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 20 - 4.1. PeerConnection . . . . . . . . . . . . . . . . . . . . . 20 - 4.1.1. Constructor . . . . . . . . . . . . . . . . . . . . . 20 - 4.1.2. addTrack . . . . . . . . . . . . . . . . . . . . . . 22 - 4.1.3. removeTrack . . . . . . . . . . . . . . . . . . . . . 23 - 4.1.4. addTransceiver . . . . . . . . . . . . . . . . . . . 23 - 4.1.5. createDataChannel . . . . . . . . . . . . . . . . . . 23 - 4.1.6. createOffer . . . . . . . . . . . . . . . . . . . . . 23 - 4.1.7. createAnswer . . . . . . . . . . . . . . . . . . . . 24 - 4.1.8. SessionDescriptionType . . . . . . . . . . . . . . . 25 - 4.1.8.1. Use of Provisional Answers . . . . . . . . . . . 26 - 4.1.8.2. Rollback . . . . . . . . . . . . . . . . . . . . 27 - 4.1.9. setLocalDescription . . . . . . . . . . . . . . . . . 28 - 4.1.10. setRemoteDescription . . . . . . . . . . . . . . . . 28 - 4.1.11. currentLocalDescription . . . . . . . . . . . . . . . 29 - 4.1.12. pendingLocalDescription . . . . . . . . . . . . . . . 29 - 4.1.13. currentRemoteDescription . . . . . . . . . . . . . . 29 - 4.1.14. pendingRemoteDescription . . . . . . . . . . . . . . 29 - 4.1.15. canTrickleIceCandidates . . . . . . . . . . . . . . . 30 - 4.1.16. setConfiguration . . . . . . . . . . . . . . . . . . 30 - 4.1.17. addIceCandidate . . . . . . . . . . . . . . . . . . . 31 - 4.2. RtpTransceiver . . . . . . . . . . . . . . . . . . . . . 32 - 4.2.1. stop . . . . . . . . . . . . . . . . . . . . . . . . 32 - 4.2.2. stopped . . . . . . . . . . . . . . . . . . . . . . . 32 - 4.2.3. setDirection . . . . . . . . . . . . . . . . . . . . 32 - 4.2.4. direction . . . . . . . . . . . . . . . . . . . . . . 32 - 4.2.5. currentDirection . . . . . . . . . . . . . . . . . . 33 - 4.2.6. setCodecPreferences . . . . . . . . . . . . . . . . . 33 - 5. SDP Interaction Procedures . . . . . . . . . . . . . . . . . 33 - 5.1. Requirements Overview . . . . . . . . . . . . . . . . . . 34 - 5.1.1. Usage Requirements . . . . . . . . . . . . . . . . . 34 - 5.1.2. Profile Names and Interoperability . . . . . . . . . 34 - 5.2. Constructing an Offer . . . . . . . . . . . . . . . . . . 35 - 5.2.1. Initial Offers . . . . . . . . . . . . . . . . . . . 35 - 5.2.2. Subsequent Offers . . . . . . . . . . . . . . . . . . 42 - 5.2.3. Options Handling . . . . . . . . . . . . . . . . . . 46 - 5.2.3.1. IceRestart . . . . . . . . . . . . . . . . . . . 46 - 5.2.3.2. VoiceActivityDetection . . . . . . . . . . . . . 46 - 5.3. Generating an Answer . . . . . . . . . . . . . . . . . . 47 - 5.3.1. Initial Answers . . . . . . . . . . . . . . . . . . . 47 - 5.3.2. Subsequent Answers . . . . . . . . . . . . . . . . . 53 - 5.3.3. Options Handling . . . . . . . . . . . . . . . . . . 54 - 5.3.3.1. VoiceActivityDetection . . . . . . . . . . . . . 55 - 5.4. Modifying an Offer or Answer . . . . . . . . . . . . . . 55 - 5.5. Processing a Local Description . . . . . . . . . . . . . 56 - 5.6. Processing a Remote Description . . . . . . . . . . . . . 56 - 5.7. Parsing a Session Description . . . . . . . . . . . . . . 57 - 5.7.1. Session-Level Parsing . . . . . . . . . . . . . . . . 57 - 5.7.2. Media Section Parsing . . . . . . . . . . . . . . . . 59 - 5.7.3. Semantics Verification . . . . . . . . . . . . . . . 61 - 5.8. Applying a Local Description . . . . . . . . . . . . . . 63 - 5.9. Applying a Remote Description . . . . . . . . . . . . . . 64 - 5.10. Applying an Answer . . . . . . . . . . . . . . . . . . . 68 - 6. Processing RTP/RTCP . . . . . . . . . . . . . . . . . . . . . 70 - 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 70 - 7.1. Simple Example . . . . . . . . . . . . . . . . . . . . . 71 - 7.2. Detailed Example . . . . . . . . . . . . . . . . . . . . 76 - 7.3. Early Transport Warmup Example . . . . . . . . . . . . . 85 - 8. Security Considerations . . . . . . . . . . . . . . . . . . . 93 - 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 94 - 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 94 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 94 - 11.1. Normative References . . . . . . . . . . . . . . . . . . 94 - 11.2. Informative References . . . . . . . . . . . . . . . . . 98 - Appendix A. Appendix A . . . . . . . . . . . . . . . . . . . . . 100 - Appendix B. Change log . . . . . . . . . . . . . . . . . . . . . 101 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 110 + 3.5.3. ICE Candidate Policy . . . . . . . . . . . . . . . . 14 + 3.5.4. ICE Candidate Pool . . . . . . . . . . . . . . . . . 15 + 3.6. Video Size Negotiation . . . . . . . . . . . . . . . . . 16 + 3.6.1. Creating an imageattr Attribute . . . . . . . . . . . 16 + 3.6.2. Interpreting an imageattr Attribute . . . . . . . . . 17 + 3.7. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 18 + 3.8. Interactions With Forking . . . . . . . . . . . . . . . . 19 + 3.8.1. Sequential Forking . . . . . . . . . . . . . . . . . 20 + 3.8.2. Parallel Forking . . . . . . . . . . . . . . . . . . 20 + 4. Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 21 + 4.1. PeerConnection . . . . . . . . . . . . . . . . . . . . . 21 + 4.1.1. Constructor . . . . . . . . . . . . . . . . . . . . . 21 + 4.1.2. addTrack . . . . . . . . . . . . . . . . . . . . . . 24 + 4.1.3. removeTrack . . . . . . . . . . . . . . . . . . . . . 24 + 4.1.4. addTransceiver . . . . . . . . . . . . . . . . . . . 24 + 4.1.5. createDataChannel . . . . . . . . . . . . . . . . . . 24 + 4.1.6. createOffer . . . . . . . . . . . . . . . . . . . . . 25 + 4.1.7. createAnswer . . . . . . . . . . . . . . . . . . . . 26 + 4.1.8. SessionDescriptionType . . . . . . . . . . . . . . . 26 + 4.1.8.1. Use of Provisional Answers . . . . . . . . . . . 27 + 4.1.8.2. Rollback . . . . . . . . . . . . . . . . . . . . 28 + 4.1.9. setLocalDescription . . . . . . . . . . . . . . . . . 29 + 4.1.10. setRemoteDescription . . . . . . . . . . . . . . . . 30 + 4.1.11. currentLocalDescription . . . . . . . . . . . . . . . 30 + 4.1.12. pendingLocalDescription . . . . . . . . . . . . . . . 30 + 4.1.13. currentRemoteDescription . . . . . . . . . . . . . . 30 + 4.1.14. pendingRemoteDescription . . . . . . . . . . . . . . 31 + 4.1.15. canTrickleIceCandidates . . . . . . . . . . . . . . . 31 + 4.1.16. setConfiguration . . . . . . . . . . . . . . . . . . 31 + 4.1.17. addIceCandidate . . . . . . . . . . . . . . . . . . . 32 + 4.2. RtpTransceiver . . . . . . . . . . . . . . . . . . . . . 33 + 4.2.1. stop . . . . . . . . . . . . . . . . . . . . . . . . 33 + 4.2.2. stopped . . . . . . . . . . . . . . . . . . . . . . . 33 + 4.2.3. setDirection . . . . . . . . . . . . . . . . . . . . 33 + 4.2.4. direction . . . . . . . . . . . . . . . . . . . . . . 34 + 4.2.5. currentDirection . . . . . . . . . . . . . . . . . . 34 + 4.2.6. setCodecPreferences . . . . . . . . . . . . . . . . . 34 + 5. SDP Interaction Procedures . . . . . . . . . . . . . . . . . 35 + 5.1. Requirements Overview . . . . . . . . . . . . . . . . . . 35 + 5.1.1. Usage Requirements . . . . . . . . . . . . . . . . . 35 + 5.1.2. Profile Names and Interoperability . . . . . . . . . 35 + 5.2. Constructing an Offer . . . . . . . . . . . . . . . . . . 36 + 5.2.1. Initial Offers . . . . . . . . . . . . . . . . . . . 37 + 5.2.2. Subsequent Offers . . . . . . . . . . . . . . . . . . 43 + 5.2.3. Options Handling . . . . . . . . . . . . . . . . . . 47 + 5.2.3.1. IceRestart . . . . . . . . . . . . . . . . . . . 47 + 5.2.3.2. VoiceActivityDetection . . . . . . . . . . . . . 47 + 5.3. Generating an Answer . . . . . . . . . . . . . . . . . . 48 + 5.3.1. Initial Answers . . . . . . . . . . . . . . . . . . . 48 + 5.3.2. Subsequent Answers . . . . . . . . . . . . . . . . . 55 + 5.3.3. Options Handling . . . . . . . . . . . . . . . . . . 56 + 5.3.3.1. VoiceActivityDetection . . . . . . . . . . . . . 56 + 5.4. Modifying an Offer or Answer . . . . . . . . . . . . . . 56 + 5.5. Processing a Local Description . . . . . . . . . . . . . 57 + 5.6. Processing a Remote Description . . . . . . . . . . . . . 58 + 5.7. Parsing a Session Description . . . . . . . . . . . . . . 58 + 5.7.1. Session-Level Parsing . . . . . . . . . . . . . . . . 59 + 5.7.2. Media Section Parsing . . . . . . . . . . . . . . . . 60 + 5.7.3. Semantics Verification . . . . . . . . . . . . . . . 63 + 5.8. Applying a Local Description . . . . . . . . . . . . . . 64 + 5.9. Applying a Remote Description . . . . . . . . . . . . . . 66 + 5.10. Applying an Answer . . . . . . . . . . . . . . . . . . . 69 + 6. Processing RTP/RTCP . . . . . . . . . . . . . . . . . . . . . 72 + 7. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 72 + 7.1. Simple Example . . . . . . . . . . . . . . . . . . . . . 73 + 7.2. Detailed Example . . . . . . . . . . . . . . . . . . . . 77 + 7.3. Early Transport Warmup Example . . . . . . . . . . . . . 87 + 8. Security Considerations . . . . . . . . . . . . . . . . . . . 94 + 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 95 + 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 95 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 95 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 95 + 11.2. Informative References . . . . . . . . . . . . . . . . . 100 + Appendix A. Appendix A . . . . . . . . . . . . . . . . . . . . . 102 + Appendix B. Change log . . . . . . . . . . . . . . . . . . . . . 103 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 112 1. Introduction This document describes how the W3C WEBRTC RTCPeerConnection - interface [W3C.WD-webrtc-20140617] is used to control the setup, - management and teardown of a multimedia session. + interface [W3C.webrtc] is used to control the setup, management and + teardown of a multimedia session. 1.1. General Design of JSEP The thinking behind WebRTC call setup has been to fully specify and control the media plane, but to leave the signaling plane up to the application as much as possible. The rationale is that different applications may prefer to use different protocols, such as the - existing SIP or Jingle call signaling protocols, or something custom - to the particular application, perhaps for a novel use case. In this + existing SIP call signaling protocol, or something custom to the + particular application, perhaps for a novel use case. In this approach, the key information that needs to be exchanged is the multimedia session description, which specifies the necessary transport and media configuration information necessary to establish the media plane. With these considerations in mind, this document describes the - Javascript Session Establishment Protocol (JSEP) that allows for full - control of the signaling state machine from Javascript. As described - above, JSEP assumes a model in which a Javascript application + JavaScript Session Establishment Protocol (JSEP) that allows for full + control of the signaling state machine from JavaScript. As described + above, JSEP assumes a model in which a JavaScript application executes inside a runtime containing WebRTC APIs (the "JSEP implementation"). The JSEP implementation is almost entirely divorced from the core signaling flow, which is instead handled by - the Javascript making use of two interfaces: (1) passing in local and + the JavaScript making use of two interfaces: (1) passing in local and remote session descriptions and (2) interacting with the ICE state machine. The combination of the JSEP implementation and the - Javascript application is referred to throughout this document as a + JavaScript application is referred to throughout this document as a "JSEP endpoint". In this document, the use of JSEP is described as if it always occurs between two JSEP endpoints. Note though in many cases it will actually be between a JSEP endpoint and some kind of server, such as a gateway or MCU. This distinction is invisible to the JSEP endpoint; it just follows the instructions it is given via the API. JSEP's handling of session descriptions is simple and straightforward. Whenever an offer/answer exchange is needed, the @@ -195,58 +195,62 @@ createAnswer() API to generate an appropriate answer, applies it using the setLocalDescription() API, and sends the answer back to the initiator over the signaling channel. When the initiator gets that answer, it installs it using the setRemoteDescription() API, and initial setup is complete. This process can be repeated for additional offer/answer exchanges. Regarding ICE [RFC5245], JSEP decouples the ICE state machine from the overall signaling state machine, as the ICE state machine must remain in the JSEP implementation, because only the implementation - has the necessary knowledge of candidates and other transport info. - Performing this separation also provides additional flexibility; in - protocols that decouple session descriptions from transport, such as - Jingle, the session description can be sent immediately and the - transport information can be sent when available. In protocols that - don't, such as SIP, the information can be used in the aggregated - form. Sending transport information separately can allow for faster - ICE and DTLS startup, since ICE checks can start as soon as any - transport information is available rather than waiting for all of it. + has the necessary knowledge of candidates and other transport + information. Performing this separation provides additional + flexibility in protocols that decouple session descriptions from + transport. For instance, in traditional SIP, each offer or answer is + self-contained, including both the session descriptions and the + transport information. However, [I-D.ietf-mmusic-trickle-ice-sip] + allows SIP to be used with trickle ICE [I-D.ietf-ice-trickle], in + which the session description can be sent immediately and the + transport information can be sent when available. Sending transport + information separately can allow for faster ICE and DTLS startup, + since ICE checks can start as soon as any transport information is + available rather than waiting for all of it. JSEP's decoupling of + the ICE and signaling state machines allows it to accommodate either + model. Through its abstraction of signaling, the JSEP approach does require the application to be aware of the signaling process. While the application does not need to understand the contents of session descriptions to set up a call, the application must call the right APIs at the right times, convert the session descriptions and ICE information into the defined messages of its chosen signaling protocol, and perform the reverse conversion on the messages it receives from the other side. - One way to mitigate this is to provide a Javascript library that + One way to mitigate this is to provide a JavaScript library that hides this complexity from the developer; said library would implement a given signaling protocol along with its state machine and serialization code, presenting a higher level call-oriented interface to the application developer. For example, libraries exist to adapt the JSEP API into an API suitable for a SIP or XMPP. Thus, JSEP provides greater control for the experienced developer without forcing any additional complexity on the novice developer. 1.2. Other Approaches Considered One approach that was considered instead of JSEP was to include a lightweight signaling protocol. Instead of providing session descriptions to the API, the API would produce and consume messages from this protocol. While providing a more high-level API, this put more control of signaling within the JSEP implementation, forcing it - to have to understand and handle concepts like signaling glare. In - addition, it prevented the application from driving the state machine - to a desired state, as is needed in the page reload case. + to have to understand and handle concepts like signaling glare (see + [RFC3264], Section 4). A second approach that was considered but not chosen was to decouple the management of the media control objects from session descriptions, instead offering APIs that would control each component directly. This was rejected based on a feeling that requiring exposure of this level of complexity to the application programmer would not be beneficial; it would result in an API where even a simple example would require a significant amount of code to orchestrate all the needed interactions, as well as creating a large API surface that needed to be agreed upon and documented. In @@ -301,53 +305,54 @@ V V +-----------+ +-----------+ | JSEP |<----------- Media ------------>| JSEP | | Impl. | | Impl. | +-----------+ +-----------+ Figure 1: JSEP Signaling Model 3.2. Session Descriptions and State Machine - In order to establish the media plane, the user agent needs specific - parameters to indicate what to transmit to the remote side, as well - as how to handle the media that is received. These parameters are - determined by the exchange of session descriptions in offers and - answers, and there are certain details to this process that must be - handled in the JSEP APIs. + In order to establish the media plane, the JSEP implementation needs + specific parameters to indicate what to transmit to the remote side, + as well as how to handle the media that is received. These + parameters are determined by the exchange of session descriptions in + offers and answers, and there are certain details to this process + that must be handled in the JSEP APIs. Whether a session description applies to the local side or the remote side affects the meaning of that description. For example, the list of codecs sent to a remote party indicates what the local side is willing to receive, which, when intersected with the set of codecs the remote side supports, specifies what the remote side should send. However, not all parameters follow this rule; for example, the - fingerprints [I-D.ietf-mmusic-4572-update] sent to a remote party are - calculated based on the local certificate(s) offered; the remote - party MUST either accept these parameters or reject them altogether, - with no option to choose different values. + fingerprints [RFC8122] sent to a remote party are calculated based on + the local certificate(s) offered; the remote party MUST either accept + these parameters or reject them altogether, with no option to choose + different values. In addition, various RFCs put different conditions on the format of offers versus answers. For example, an offer may propose an arbitrary number of m= sections (i.e., media descriptions as described in [RFC4566], Section 5.14), but an answer must contain the exact same number as the offer. Lastly, while the exact media parameters are only known only after an - offer and an answer have been exchanged, it is possible for the - offerer to receive media after they have sent an offer and before - they have received an answer. To properly process incoming media in - this case, the offerer's media handler must be aware of the details - of the offer before the answer arrives. + offer and an answer have been exchanged, the offerer may receive ICE + checks, and possibly media (e.g., in the case of a re-offer after a + connection has been established) before it receives an answer. To + properly process incoming media in this case, the offerer's media + handler must be aware of the details of the offer before the answer + arrives. - Therefore, in order to handle session descriptions properly, the user - agent needs: + Therefore, in order to handle session descriptions properly, the JSEP + implementation needs: 1. To know if a session description pertains to the local or remote side. 2. To know if a session description is an offer or an answer. 3. To allow the offer to be specified independently of the answer. JSEP addresses this by adding both setLocalDescription and setRemoteDescription methods and having session description objects @@ -373,125 +378,123 @@ during call setup. Note that the final answer itself may be different than any received provisional answers. In [RFC3264], the constraint at the signaling level is that only one offer can be outstanding for a given session, but at the media stack level, a new offer can be generated at any point. For example, when using SIP for signaling, if one offer is sent, then cancelled using a SIP CANCEL, another offer can be generated even though no answer was received for the first offer. To support this, the JSEP media layer can provide an offer via the createOffer() method whenever the - Javascript application needs one for the signaling. The answerer can + JavaScript application needs one for the signaling. The answerer can send back zero or more provisional answers, and finally end the offer-answer exchange by sending a final answer. The state machine for this is as follows: setRemote(OFFER) setLocal(PRANSWER) /-----\ /-----\ | | | | v | v | +---------------+ | +---------------+ | | |----/ | |----/ - | | setLocal(PRANSWER) | | - | Remote-Offer |------------------- >| Local-Pranswer| + | have- | setLocal(PRANSWER) | have- | + | remote-offer |------------------- >| local-pranswer| | | | | | | | | +---------------+ +---------------+ ^ | | | | setLocal(ANSWER) | setRemote(OFFER) | | | V setLocal(ANSWER) | +---------------+ | | | | | |<---------------------------+ - | Stable | + | stable | | |<---------------------------+ | | | +---------------+ setRemote(ANSWER) | ^ | | | | setLocal(OFFER) | setRemote(ANSWER) | | | V | +---------------+ +---------------+ | | | | - | | setRemote(PRANSWER) | | - | Local-Offer |------------------- >|Remote-Pranswer| + | have- | setRemote(PRANSWER) |have- | + | local-offer |------------------- >|remote-pranswer| | | | | | |----\ | |----\ +---------------+ | +---------------+ | ^ | ^ | | | | | \-----/ \-----/ setLocal(OFFER) setRemote(PRANSWER) Figure 2: JSEP State Machine Aside from these state transitions there is no other difference between the handling of provisional ("pranswer") and final ("answer") answers. 3.3. Session Description Format JSEP's session descriptions use SDP syntax for their internal representation. While this format is not optimal for manipulation - from Javascript, it is widely accepted, and frequently updated with + from JavaScript, it is widely accepted, and frequently updated with new features; any alternate encoding of session descriptions would have to keep pace with the changes to SDP, at least until the time that this new encoding eclipsed SDP in popularity. - However, to simplify Javascript processing, and provide for future - flexibility, the SDP syntax is encapsulated within a - SessionDescription object, which can be constructed from SDP, and be - serialized out to SDP. If future specifications agree on a JSON - format for session descriptions, we could easily enable this object - to generate and consume that JSON. - - Other methods may be added to SessionDescription in the future to - simplify handling of SessionDescriptions from Javascript. In the - meantime, Javascript libraries can be used to perform these - manipulations. + However, to provide for future flexibility, the SDP syntax is + encapsulated within a SessionDescription object, which can be + constructed from SDP, and be serialized out to SDP. If future + specifications agree on a JSON format for session descriptions, we + could easily enable this object to generate and consume that JSON. - Note that most applications should be able to treat the + As detailed below, most applications should be able to treat the SessionDescriptions produced and consumed by these various API calls as opaque blobs; that is, the application will not need to read or change them. 3.4. Session Description Control In order to give the application control over various common session parameters, JSEP provides control surfaces which tell the JSEP implementation how to generate session descriptions. This avoids the - need for Javascript to modify session descriptions in most cases. + need for JavaScript to modify session descriptions in most cases. Changes to these objects result in changes to the session descriptions generated by subsequent createOffer/Answer calls. 3.4.1. RtpTransceivers RtpTransceivers allow the application to control the RTP media associated with one m= section. Each RtpTransceiver has an RtpSender and an RtpReceiver, which an application can use to control the sending and receiving of RTP media. The application may also modify the RtpTransceiver directly, for instance, by stopping it. RtpTransceivers generally have a 1:1 mapping with m= sections, although there may be more RtpTransceivers than m= sections when RtpTransceivers are created but not yet associated with a m= section, or if RtpTransceivers have been stopped and disassociated from m= sections. An RtpTransceiver is said to be associated with an m= section if its mid property is non-null; otherwise it is said to be disassociated. The associated m= section is determined using a mapping between transceivers and m= section indices, formed when - creating an offer or applying a remote offer. An RtpTransceiver is - never associated with more than one m= section, and once a session - description is applied, a m= section is always associated with - exactly one RtpTransceiver. + creating an offer or applying a remote offer. + + An RtpTransceiver is never associated with more than one m= section, + and once a session description is applied, a m= section is always + associated with exactly one RtpTransceiver. However, in certain + cases where a m= section has been rejected, as discussed in + Section 5.2.2 below, that m= section will be "recycled" and + associated with a new RtpTransceiver with a new mid value. RtpTransceivers can be created explicitly by the application or implicitly by calling setRemoteDescription with an offer that adds new m= sections. 3.4.2. RtpSenders RtpSenders allow the application to control how RTP media is sent. An RtpSender is conceptually responsible for the outgoing RTP stream(s) described by an m= section. This includes encoding the @@ -530,23 +533,24 @@ application that gathering is occurring through an event. Then, when each new ICE candidate becomes available, the ICE agent will supply it to the application via an additional event; these candidates will also automatically be added to the current and/or pending local session description. Finally, when all candidates have been gathered, an event will be dispatched to signal that the gathering process is complete. Note that gathering phases only gather the candidates needed by new/recycled/restarting m= sections; other m= sections continue to - use their existing candidates. Also, when bundling is active, - candidates are only gathered (and exchanged) for the m= sections - referenced in BUNDLE-tags, as described in + use their existing candidates. Also, if an m= section is bundled + (either by a successful bundle negotiation or by being marked as + bundle-only), then candidates will be gathered and exchanged for that + m= section if and only if its MID is a BUNDLE-tag, as described in [I-D.ietf-mmusic-sdp-bundle-negotiation]. 3.5.2. ICE Candidate Trickling Candidate trickling is a technique through which a caller may incrementally provide candidates to the callee after the initial offer has been dispatched; the semantics of "Trickle ICE" are defined in [I-D.ietf-ice-trickle]. This process allows the callee to begin acting upon the call and setting up the ICE (and perhaps DTLS) connections immediately, without having to wait for the caller to @@ -568,21 +572,22 @@ using the new remote candidates for connectivity checks. 3.5.2.1. ICE Candidate Format In JSEP, ICE candidates are abstracted by an IceCandidate object, and as with session descriptions, SDP syntax is used for the internal representation. The candidate details are specified in an IceCandidate field, using the same SDP syntax as the "candidate-attribute" field defined in - [RFC5245], Section 15.1. For example: + [RFC5245], Section 15.1. Note that this field does not contain an + "a=" prefix, as indicated in the following example: candidate:1 1 UDP 1694498815 192.0.2.33 10000 typ host The IceCandidate object contains a field to indicate which ICE ufrag it is associated with, as defined in [RFC5245], Section 15.4. This value is used to determine which session description (and thereby which gathering phase) this IceCandidate belongs to, which helps resolve ambiguities during ICE restarts. If this field is absent in a received IceCandidate (perhaps when communicating with a non-JSEP endpoint), the most recently received session description is assumed. @@ -637,23 +642,23 @@ them as the source of connectivity checks, or indirectly expose them via other fields, such as the raddr/rport attributes for other ICE candidates. Later, if a different policy is specified by the application, the application can apply it by kicking off a new gathering phase via an ICE restart. 3.5.4. ICE Candidate Pool JSEP applications typically inform the JSEP implementation to begin ICE gathering via the information supplied to setLocalDescription, as - this is where the app specifies the number of media streams, and - thereby ICE components, for which to gather candidates. However, to - accelerate cases where the application knows the number of ICE + the local description indicates the number of ICE components which + will be needed and for which candidates must be gathered. However, + to accelerate cases where the application knows the number of ICE components to use ahead of time, it may ask the implementation to gather a pool of potential ICE candidates to help ensure rapid media setup. When setLocalDescription is eventually called, and the JSEP implementation goes to gather the needed ICE candidates, it SHOULD start by checking if any candidates are available in the pool. If there are candidates in the pool, they SHOULD be handed to the application immediately via the ICE candidate event. If the pool becomes depleted, either because a larger-than-expected number of ICE @@ -671,43 +676,51 @@ though that by holding on to these pre-gathered candidates, which will be kept alive as long as they may be needed, the application will consume resources on the STUN/TURN servers it is using. 3.6. Video Size Negotiation Video size negotiation is the process through which a receiver can use the "a=imageattr" SDP attribute [RFC6236] to indicate what video frame sizes it is capable of receiving. A receiver may have hard limits on what its video decoder can process, or it may have some - maximum set by policy. + maximum set by policy. By specifying these limits in an + "a=imageattr" attribute, JSEP endpoints can attempt to ensure that + the remote sender transmits video at an acceptable resolution. + However, when communicating with a non-JSEP endpoint that does not + understand this attribute, any signaled limits may be exceeded, and + the JSEP implementation MUST handle this gracefully, e.g., by + discarding the video. Note that certain codecs support transmission of samples with aspect ratios other than 1.0 (i.e., non-square pixels). JSEP implementations will not transmit non-square pixels, but SHOULD receive and render such video with the correct aspect ratio. However, sample aspect ratio has no impact on the size negotiation described below; all dimensions are measured in pixels, whether square or not. 3.6.1. Creating an imageattr Attribute The receiver will first intersect any known local limits (e.g., hardware decoder capababilities, local policy) to determine the absolute minimum and maximum sizes it can receive. If there are no known local limits, the "a=imageattr" attribute SHOULD be omitted. + If these local limits preclude receiving any video, i.e., the + degenerate case of no permitted resolutions, the "a=imageattr" + attribute MUST be omitted, and the m= section MUST be marked as + sendonly/inactive, as appropriate. Otherwise, an "a=imageattr" attribute is created with "recv" direction, and the resulting resolution space formed from the aforementioned intersection is used to specify its minimum and - maximum x= and y= values. If the intersection is the null set, i.e., - the degenerate case of no permitted resolutions, this MUST be - represented by x=0 and y=0 values. + maximum x= and y= values. The rules here express a single set of preferences, and therefore, the "a=imageattr" q= value is not important. It SHOULD be set to 1.0. The "a=imageattr" field is payload type specific. When all video codecs supported have the same capabilities, use of a single attribute, with the wildcard payload type (*), is RECOMMENDED. However, when the supported video codecs have different limitations, specific "a=imageattr" attributes MUST be inserted for each payload @@ -721,74 +734,84 @@ This declaration indicates that the receiver is capable of decoding any image resolution from 48x48 up to 1280x720 pixels. 3.6.2. Interpreting an imageattr Attribute [RFC6236] defines "a=imageattr" to be an advisory field. This means that it does not absolutely constrain the video formats that the sender can use, but gives an indication of the preferred values. - This specification prescribes more specific behavior. When a sender - of a given MediaStreamTrack, which is producing video of a certain - resolution, receives an "a=imageattr recv" attribute, it MUST check - to see if the original resolution meets the size criteria specified - in the attribute, and adapt the resolution accordingly by scaling (if - appropriate). Note that when considering a MediaStreamTrack that is - producing rotated video, the unrotated resolution MUST be used. This - is required regardless of whether the receiver supports performing - receive-side rotation (e.g., through CVO [TS26.114]), as it - significantly simplifies the matching logic. - - For the purposes of resolution negotiation, only size limits are - considered. Any other values, e.g. picture or sample aspect ratio, - MUST be ignored. - - When communicating with a non-JSEP endpoint, multiple relevant - "a=imageattr recv" attributes may be present in a received m= - section. If this occurs, attributes other than the one with the - highest "q=" value MUST be ignored. If multiple attributes have the - same "q=" value, those that appear after the first such attribute in - the m= section MUST be ignored. + This specification prescribes more specific behavior. When a + MediaStreamTrack, which is producing video of a certain resolution + (the "track resolution"), is attached to a RtpSender, which is + encoding the track video at the same or lower resolution(s) (the + "encoder resolutions"), and a remote description is applied that + references the sender and contains valid "a=imageattr recv" + attributes, it MUST follow the rules below to ensure the sender does + not transmit a resolution that would exceed the size criteria + specified in the attributes. These rules MUST be followed as long as + the attributes remain present in the remote description, including + cases in which the track changes its resolution, or is replaced with + a different track. - If an "a=imageattr recv" attribute references a different video - payload type than what has been selected for sending the - MediaStreamTrack, it MUST be ignored. + Depending on how the RtpSender is configured, it may be producing a + single encoding at a certain resolution, or, if simulcast Section 3.7 + has been negotiated, multiple encodings, each at their own specific + resolution. In addition, depending on the configuration, each + encoding may have the flexibility to reduce resolution when needed, + or may be locked to a specific output resolution. - If the original resolution matches the size limits in the attribute, - the track MUST be transmitted untouched. + For each encoding being produced by the RtpSender, the following + rules are applied to determine what should be transmitted: - If the original resolution exceeds the size limits in the attribute, - the sender SHOULD apply downscaling to the output of the - MediaStreamTrack in order to satisfy the limits. Downscaling MUST - NOT change the track aspect ratio. + o First, the most suitable "a=imageattr recv" attribute is selected. + This is performed by taking the attribute with the highest "q=" + value from the set of attributes that reference the media format + that has been selected for the specified encoding. If multiple + attributes have the same "q=" value, the one that appears first in + the m= section is used. Note that while JSEP endpoints will + include at most one "a=imageattr recv" attribute per media format, + JSEP endpoints may receive session descriptions from non-JSEP + endpoints with m= sections that contain multiple such attributes. - If the original resolution is less than the size limits in the - attribute, upscaling is needed, but this may not be appropriate in - all cases. To address this concern, the application can set an - upscaling policy for each sent track. For this case, if upscaling is - permitted by policy, the sender SHOULD apply upscaling in order to - provide the desired resolution. Otherwise, the sender MUST NOT apply - upscaling. The sender SHOULD NOT upscale in other cases, even if the - policy permits it. Upscaling MUST NOT change the track aspect ratio. + o If there is an applicable "a=imageattr recv" attribute for the + encoding, the limits from the attribute are then compared to the + encoder resolution. Only the specific limits mentioned below are + considered; any other values, such as picture aspect ratio, MUST + be ignored. Note that when considering a MediaStreamTrack that is + producing rotated video, the unrotated resolution MUST be used for + the checks. This is required regardless of whether the receiver + supports performing receive-side rotation (e.g., through CVO + [TS26.114]), as it significantly simplifies the matching logic. - If there is no appropriate and permitted scaling mechanism that - allows the received size limits to be satisfied, the sender MUST NOT - transmit the track. + o If the attribute includes a "sar=" (sample aspect ratio) value set + to something other than "1.0", indicating the receiver wants to + receive non-square pixels, this cannot be satisfied and the sender + MUST NOT transmit the encoding. - If the attribute includes a "sar=" (sample aspect ratio) value set to - something other than "1.0", indicating the receiver wants to receive - non-square pixels, this cannot be satisfied and the sender MUST NOT - transmit the track. + o If the encoder resolution exceeds the maximum size permitted by + the attribute, and the encoder is allowed to adjust its + resolution, the encoder SHOULD apply downscaling in order to + satisfy the limits, although the downscaling MUST NOT change the + picture aspect ratio of the encoding. For example, if the encoder + resolution is 1280x720, and the attribute specified a maximum of + 640x480, the expected output resolution would be 640x360. If + downscaling cannot be applied, the encoding MUST NOT be + transmitted, and an error SHOULD be surfaced to the application. - In the special case of receiving a maximum resolution of [0, 0], as - described above, the sender MUST NOT transmit the track. + o If the encoder resolution is less than the minimum size permitted + by the attribute, the encoding MUST NOT be transmitted, and an + error SHOULD be surfaced to the application; the encoder MUST NOT + apply upscaling. JSEP implementations SHOULD avoid this situation + by allowing receipt of arbitrarily small resolutions, perhaps via + fallback to a software decoder. 3.7. Simulcast JSEP supports simulcast transmission of a MediaStreamTrack, where multiple encodings of the source media can be transmitted within the context of a single m= section. The current JSEP API is designed to allow applications to send simulcasted media but only to receive a single encoding. This allows for multi-user scenarios where each sending client sends multiple encodings to a server, which then, for each receiving client, chooses the appropriate encoding to forward. @@ -834,24 +857,25 @@ "a=rid" attribute for each encoding, as specified in [I-D.ietf-mmusic-rid], Section 4; the use of RID identifiers allows the individual encodings to be disambiguated even though they are all part of the same m= section. 3.8. Interactions With Forking Some call signaling systems allow various types of forking where an SDP Offer may be provided to more than one device. For example, SIP [RFC3261] defines both a "Parallel Search" and "Sequential Search". + Although these are primarily signaling level issues that are outside the scope of JSEP, they do have some impact on the configuration of the media plane that is relevant. When forking happens at the - signaling layer, the Javascript application responsible for the + signaling layer, the JavaScript application responsible for the signaling needs to make the decisions about what media should be sent or received at any point of time, as well as which remote endpoint it should communicate with; JSEP is used to make sure the media engine can make the RTP and media perform as required by the application. The basic operations that the applications can have the media engine do are: o Start exchanging media with a given remote peer, but keep all the resources reserved in the offer. @@ -881,34 +905,34 @@ At some point, the application will end the setup process, perhaps with a timer; at this point, the application could reapply the pending remote description as a final answer. 3.8.2. Parallel Forking Parallel forking involves a call being dispatched to multiple remote callees, where each callee can accept the call, and multiple simultaneous active signaling sessions can be established as a result. If multiple callees send media at the same time, the - possibilities for handling this are described in Section 3.1 of - [RFC3960]. Most SIP devices today only support exchanging media with - a single device at a time, and do not try to mix multiple early media - audio sources, as that could result in a confusing situation. For - example, consider having a European ringback tone mixed together with - the North American ringback tone - the resulting sound would not be - like either tone, and would confuse the user. If the signaling + possibilities for handling this are described in [RFC3960], + Section 3.1. Most SIP devices today only support exchanging media + with a single device at a time, and do not try to mix multiple early + media audio sources, as that could result in a confusing situation. + For example, consider having a European ringback tone mixed together + with the North American ringback tone - the resulting sound would not + be like either tone, and would confuse the user. If the signaling application wishes to only exchange media with one of the remote endpoints at a time, then from a media engine point of view, this is exactly like the sequential forking case. - In the parallel forking case where the Javascript application wishes + In the parallel forking case where the JavaScript application wishes to simultaneously exchange media with multiple peers, the flow is - slightly more complex, but the Javascript application can follow the + slightly more complex, but the JavaScript application can follow the strategy that [RFC3960] describes using UPDATE. The UPDATE approach allows the signaling to set up a separate media flow for each peer that it wishes to exchange media with. In JSEP, this offer used in the UPDATE would be formed by simply creating a new PeerConnection and making sure that the same local media streams have been added into this new PeerConnection. Then the new PeerConnection object would produce a SDP offer that could be used by the signaling to perform the UPDATE strategy discussed in [RFC3960]. As a result of sharing the media streams, the application will end up @@ -1281,34 +1305,34 @@ "rollback" with empty contents to either setLocalDescription or setRemoteDescription, depending on which was most recently used (i.e. if the new offer was supplied to setLocalDescription, the rollback should be done using setLocalDescription as well). 4.1.9. setLocalDescription The setLocalDescription method instructs the PeerConnection to apply the supplied session description as its local configuration. The type field indicates whether the description should be processed as - an offer, provisional answer, or final answer; offers and answers are - checked differently, using the various rules that exist for each SDP - line. + an offer, provisional answer, final answer, or rollback; offers and + answers are checked differently, using the various rules that exist + for each SDP line. This API changes the local media state; among other things, it sets up local resources for receiving and decoding media. In order to successfully handle scenarios where the application wants to offer to change from one media format to a different, incompatible format, the PeerConnection must be able to simultaneously support use of both the current and pending local descriptions (e.g., support the codecs that exist in either description). This dual processing begins when the - PeerConnection enters the have-local-offer state, and continues until - setRemoteDescription is called with either a final answer, at which - point the PeerConnection can fully adopt the pending local + PeerConnection enters the "have-local-offer" state, and continues + until setRemoteDescription is called with either a final answer, at + which point the PeerConnection can fully adopt the pending local description, or a rollback, which results in a revert to the current local description. This API indirectly controls the candidate gathering process. When a local description is supplied, and the number of transports currently in use does not match the number of transports needed by the local description, the PeerConnection will create transports as needed and begin gathering candidates for each transport, using ones from the candidate pool if available. @@ -1492,23 +1516,23 @@ A stopped RtpTransceiver does not send any outgoing RTP or RTCP or process any incoming RTP or RTCP. It cannot be restarted. 4.2.3. setDirection The setDirection method sets the direction of a transceiver, which affects the direction property of the associated m= section on future calls to createOffer and createAnswer. When creating offers, the transceiver direction is directly reflected - in the output, even for reoffers. When creating answers, the + in the output, even for re-offers. When creating answers, the transceiver direction is intersected with the offered direction, as - explained in the Section 5.3 section below. + explained in Section 5.3 below. Note that while setDirection sets the direction property of the transceiver immediately (Section 4.2.4), this property does not immediately affect whether the transceiver's RtpSender will send or its RtpReceiver will receive. The direction in effect is represented by the currentDirection property, which is only updated when an answer is applied. 4.2.4. direction @@ -1539,23 +1563,21 @@ which codec the implementation decides to send. It only affects which codecs the implementation indicates that it prefers to receive, via the offer or answer. Even when a codec is excluded by setCodecPreferences, it still may be used to send until the next offer/answer exchange discards it. The codec preferences of an RtpTransceiver can cause codecs to be excluded by subsequent calls to createOffer and createAnswer, in which case the corresponding media formats in the associated m= section will be excluded. The codec preferences cannot add media - formats that would otherwise not be present. This includes codecs - that were not negotiated in a previous offer/answer exchange that - included the transceiver. + formats that would otherwise not be present. The codec preferences of an RtpTransceiver can also determine the order of codecs in subsequent calls to createOffer and createAnswer, in which case the order of the media formats in the associated m= section will follow the specified preferences. 5. SDP Interaction Procedures This section describes the specific procedures to be followed when creating and parsing SDP objects. @@ -1592,30 +1614,46 @@ profile and MUST indicate this profile for each data m= line they produce in an offer. Because ICE can select either UDP [RFC5245] or TCP [RFC6544] transport depending on network conditions, this advertisement is consistent with ICE eventually selecting either either UDP or TCP. Unfortunately, in an attempt at compatibility, some endpoints generate other profile strings even when they mean to support one of these profiles. For instance, an endpoint might generate "RTP/AVP" but supply "a=fingerprint" and "a=rtcp-fb" attributes, indicating its - willingness to support "(UDP,TCP)/TLS/RTP/SAVPF". In order to - simplify compatibility with such endpoints, JSEP implementations MUST - follow the following rules when processing the media m= sections in - an offer: + willingness to support "UDP/TLS/RTP/SAVPF" or "TCP/TLS/RTP/SAVPF". + In order to simplify compatibility with such endpoints, JSEP + implementations MUST follow the following rules when processing the + media m= sections in a received offer: - o The profile in any "m=" line in any answer MUST exactly match the - profile provided in the offer. + o Any profile in the offer matching one of the following MUST be + accepted: - o Any profile matching the following patterns MUST be accepted: - "RTP/[S]AVP[F]" and "(UDP/TCP)/TLS/RTP/SAVP[F]" + * "RTP/AVP" (Defined in [RFC4566], Section 8.2.2) + + * "RTP/AVPF" (Defined in [RFC4585], Section 9) + + * "RTP/SAVP" (Defined in [RFC3711], Section 12) + + * "RTP/SAVPF" (Defined in [RFC5124], Section 6) + + * "TCP/DTLS/RTP/SAVP" (Defined in [RFC7850], Section 3.4) + + * "TCP/DTLS/RTP/SAVPF" (Defined in [RFC7850], Section 3.5) + + * "UDP/TLS/RTP/SAVP" (Defined in [RFC5764], Section 9) + + * "UDP/TLS/RTP/SAVPF" (Defined in [RFC5764], Section 9) + + o The profile in any "m=" line in any generated answer MUST exactly + match the profile provided in the offer. o Because DTLS-SRTP is REQUIRED, the choice of SAVP or AVP has no effect; support for DTLS-SRTP is determined by the presence of one or more "a=fingerprint" attribute. Note that lack of an "a=fingerprint" attribute will lead to negotiation failure. o The use of AVPF or AVP simply controls the timing rules used for RTCP feedback. If AVPF is provided, or an "a=rtcp-fb" attribute is present, assume AVPF timing, i.e., a default value of "trr- int=0". Otherwise, assume that AVPF is being used in an AVP @@ -1657,96 +1695,94 @@ The value of the tuple SHOULD be set to a non-meaningful address, such as IN IP4 0.0.0.0, to prevent leaking the local address in this field. As mentioned in [RFC4566], the entire o= line needs to be unique, but selecting a random number for is sufficient to accomplish this. o The third SDP line MUST be a "s=" line, as specified in [RFC4566], Section 5.3; to match the "o=" line, a single dash SHOULD be used as the session name, e.g. "s=-". Note that this differs from the advice in [RFC4566] which proposes a single space, but as both - "o=" and "s=" are meaningless, having the same meaningless value - seems clearer. + "o=" and "s=" are meaningless in JSEP, having the same meaningless + value seems clearer. o Session Information ("i="), URI ("u="), Email Address ("e="), Phone Number ("p="), Repeat Times ("r="), and Time Zones ("z=") lines are not useful in this context and SHOULD NOT be included. o Encryption Keys ("k=") lines do not provide sufficient security and MUST NOT be included. o A "t=" line MUST be added, as specified in [RFC4566], Section 5.9; both and SHOULD be set to zero, e.g. "t=0 0". o An "a=ice-options" line with the "trickle" option MUST be added, as specified in [I-D.ietf-ice-trickle], Section 4. o If WebRTC identity is being used, an "a=identity" line as described in [I-D.ietf-rtcweb-security-arch], Section 5. - The next step is to generate m= sections, as specified in [RFC4566] + The next step is to generate m= sections, as specified in [RFC4566], Section 5.14. An m= section is generated for each RtpTransceiver that has been added to the PeerConnection, excluding any stopped RtpTransceivers. This is done in the order the RtpTransceivers were added to the PeerConnection. For each m= section generated for an RtpTransceiver, establish a mapping between the transceiver and the index of the generated m= section. Each m= section, provided it is not marked as bundle-only, MUST generate a unique set of ICE credentials and gather its own unique set of ICE candidates. Bundle-only m= sections MUST NOT contain any ICE credentials and MUST NOT gather any candidates. For DTLS, all m= sections MUST use all the certificate(s) that have been specified for the PeerConnection; as a result, they MUST all - have the same [I-D.ietf-mmusic-4572-update] fingerprint value(s), or - these value(s) MUST be session-level attributes. + have the same [RFC8122] fingerprint value(s), or these value(s) MUST + be session-level attributes. Each m= section should be generated as specified in [RFC4566], Section 5.14. For the m= line itself, the following rules MUST be followed: - o The port value is set to the port of the default ICE candidate for - this m= section, but given that no candidates are available yet, - the "dummy" port value of 9 (Discard) MUST be used, as indicated - in [I-D.ietf-ice-trickle], Section 5.1. + o If the m= section is marked as bundle-only, then the port value + MUST be set to 0. Otherwise, the port value is set to the port of + the default ICE candidate for this m= section, but given that no + candidates are available yet, the "dummy" port value of 9 + (Discard) MUST be used, as indicated in [I-D.ietf-ice-trickle], + Section 5.1. o To properly indicate use of DTLS, the field MUST be set to "UDP/TLS/RTP/SAVPF", as specified in [RFC5764], Section 8. o If codec preferences have been set for the associated transceiver, media formats MUST be generated in the corresponding order, and MUST exclude any codecs not present in the codec preferences. - o The media formats in the answer MAY include codecs present in the - offer that were discarded in a previous offer/answer exchange. - This is necessary for compatibility with third- party call control - and SIP use cases. - o Unless excluded by the above restrictions, the media formats MUST include the mandatory audio/video codecs as specified in - [I-D.ietf-rtcweb-audio] (see Section 3) and - [I-D.ietf-rtcweb-video] (see Section 5). + [RFC7874], Section 3, and [RFC7742], Section 5. The m= line MUST be followed immediately by a "c=" line, as specified in [RFC4566], Section 5.7. Again, as no candidates are available yet, the "c=" line must contain the "dummy" value "IN IP4 0.0.0.0", as defined in [I-D.ietf-ice-trickle], Section 5.1. [I-D.ietf-mmusic-sdp-mux-attributes] groups SDP attributes into different categories. To avoid unnecessary duplication when - bundling, Section 8.1 of [I-D.ietf-mmusic-sdp-bundle-negotiation] - specifies that attributes of category IDENTICAL or TRANSPORT should - not be repeated in bundled m= sections. + bundling, attributes of category IDENTICAL or TRANSPORT MUST NOT be + repeated in bundled m= sections, repeating the guidance from + [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 8.1. This includes + m= sections for which bundling has been negotiated and is still + desired, as well as m= sections marked as bundle-only. The following attributes, which are of a category other than IDENTICAL or TRANSPORT, MUST be included in each m= section: o An "a=mid" line, as specified in [RFC5888], Section 4. All MID values MUST be generated in a fashion that does not leak user information, e.g., randomly or using a per-PeerConnection counter, and SHOULD be 3 bytes or less, to allow them to efficiently fit into the RTP header extension defined in [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 14. Note that @@ -1784,24 +1820,24 @@ limitations on the size of images which can be decoded, an "a=imageattr" line, as specified in Section 3.6. o For each supported RTP header extension, an "a=extmap" line, as specified in [RFC5285], Section 5. The list of header extensions that SHOULD/MUST be supported is specified in [I-D.ietf-rtcweb-rtp-usage], Section 5.2. Any header extensions that require encryption MUST be specified as indicated in [RFC6904], Section 4. - o For each supported RTCP feedback mechanism, an "a=rtcp-fb" - mechanism, as specified in [RFC4585], Section 4.2. The list of - RTCP feedback mechanisms that SHOULD/MUST be supported is - specified in [I-D.ietf-rtcweb-rtp-usage], Section 5.1. + o For each supported RTCP feedback mechanism, an "a=rtcp-fb" line, + as specified in [RFC4585], Section 4.2. The list of RTCP feedback + mechanisms that SHOULD/MUST be supported is specified in + [I-D.ietf-rtcweb-rtp-usage], Section 5.1. o If the RtpTransceiver has a sendrecv or sendonly direction: * For each MediaStream that was associated with the transceiver when it was created via addTrack or addTransceiver, an "a=msid" line, as specified in [I-D.ietf-mmusic-msid], Section 2. If a MediaStreamTrack is attached to the transceiver's RtpSender, the "a=msid" lines MUST use that track's ID. If no MediaStreamTrack is attached, a valid ID MUST be generated, in the same way that the implementation generates IDs for local @@ -1837,33 +1873,33 @@ o If the bundle policy for this PeerConnection is set to "max- bundle", and this is not the first m= section, or the bundle policy is set to "balanced", and this is not the first m= section for this media type, an "a=bundle-only" line. The following attributes, which are of category IDENTICAL or TRANSPORT, MUST appear only in "m=" sections which either have a unique address or which are associated with the bundle-tag. (In initial offers, this means those "m=" sections which do not contain an "a=bundle-only" attribute.) + o "a=ice-ufrag" and "a=ice-pwd" lines, as specified in [RFC5245], Section 15.4. - o An "a=fingerprint" line for each of the endpoint's certificates, - as specified in [RFC4572], Section 5; the digest algorithm used - for the fingerprint MUST match that used in the certificate - signature. + o For each desired digest algorithm, one or more "a=fingerprint" + lines for each of the endpoint's certificates, as specified in + [RFC8122], Section 5. o An "a=setup" line, as specified in [RFC4145], Section 4, and clarified for use in DTLS-SRTP scenarios in [RFC5763], Section 5. The role value in the offer MUST be "actpass". - o An "a=dtls-id" line, as specified in [I-D.ietf-mmusic-dtls-sdp] + o An "a=tls-id" line, as specified in [I-D.ietf-mmusic-dtls-sdp], Section 5.2. o An "a=rtcp" line, as specified in [RFC3605], Section 2.1, containing the dummy value "9 IN IP4 0.0.0.0", because no candidates have yet been gathered. o An "a=rtcp-mux" line, as specified in [RFC5761], Section 5.1.3. o If the RTP/RTCP multiplexing policy is "require", an "a=rtcp-mux- only" line, as specified in [I-D.ietf-mmusic-mux-exclusive], @@ -1885,43 +1922,48 @@ Section 6.1. As discussed above, the following attributes of category IDENTICAL or TRANSPORT are included only if the data m= section either has a unique address or is associated with the bundle-tag (e.g., if it is the only m= section): o "a=ice-ufrag" o "a=ice-pwd" + o "a=fingerprint" o "a=setup" - o "a=dtls-id" + o "a=tls-id" Once all m= sections have been generated, a session-level "a=group" attribute MUST be added as specified in [RFC5888]. This attribute MUST have semantics "BUNDLE", and MUST include the mid identifiers of each m= section. The effect of this is that the JSEP implementation offers all m= sections as one bundle group. However, whether the m= sections are bundle-only or not depends on the bundle policy. The next step is to generate session-level lip sync groups as defined in [RFC5888], Section 7. For each MediaStream referenced by more than one RtpTransceiver (by passing those MediaStreams as arguments to the addTrack and addTransceiver methods), a group of type "LS" MUST be added that contains the mid values for each RtpTransceiver. Attributes which SDP permits to either be at the session level or the media level SHOULD generally be at the media level even if they are - identical. This promotes readability, especially if one of a set of - initially identical attributes is subsequently changed. + identical. This assists development and debugging by making it + easier to understand individual media sections, especially if one of + a set of initially identical attributes is subsequently changed. + However, implementations MAY choose to aggregate attributes at the + session level and JSEP implementations MUST be prepared to receive + attributes in either location. Attributes other than the ones specified above MAY be included, except for the following attributes which are specifically incompatible with the requirements of [I-D.ietf-rtcweb-rtp-usage], and MUST NOT be included: o "a=crypto" o "a=key-mgmt" @@ -1935,21 +1977,21 @@ SDP. Implementations MUST be prepared to accept compliant SDP even if it would not conform to the requirements for generating SDP in this specification. 5.2.2. Subsequent Offers When createOffer is called a second (or later) time, or is called after a local description has already been installed, the processing is somewhat different than for an initial offer. - If the initial offer was not applied using setLocalDescription, + If the previous offer was not applied using setLocalDescription, meaning the PeerConnection is still in the "stable" state, the steps for generating an initial offer should be followed, subject to the following restriction: o The fields of the "o=" line MUST stay the same except for the field, which MUST increment by one on each call to createOffer if the offer might differ from the output of the previous call to createOffer; implementations MAY opt to increment on every call. The value of the generated is independent of the of the @@ -1961,34 +2003,35 @@ Note that if the application creates an offer by reading currentLocalDescription instead of calling createOffer, the returned SDP may be different than when setLocalDescription was originally called, due to the addition of gathered ICE candidates, but the will not have changed. There are no known scenarios in which this causes problems, but if this is a concern, the solution is simply to use createOffer to ensure a unique . - If the initial offer was applied using setLocalDescription, but an - answer from the remote side has not yet been applied, meaning the - PeerConnection is still in the "local-offer" state, an offer is - generated by following the steps in the "stable" state above, along - with these exceptions: + If the previous offer was applied using setLocalDescription, but a + corresponding answer from the remote side has not yet been applied, + meaning the PeerConnection is still in the "have-local-offer" state, + an offer is generated by following the steps in the "stable" state + above, along with these exceptions: o The "s=" and "t=" lines MUST stay the same. o If any RtpTransceiver has been added, and there exists an m= section with a zero port in the current local description or the current remote description, that m= section MUST be recycled by generating an m= section for the added RtpTransceiver as if the m= - section were being added to the session description, placed at the - same index as the m= section with a zero port. + section were being added to the session description (including a + new MID value), and placing it at the same index as the m= section + with a zero port. o If an RtpTransceiver is stopped and is not associated with an m= section, an m= section MUST NOT be generated for it. This prevents adding back RtpTransceivers whose m= sections were recycled and used for a new RtpTransceiver in a previous offer/ answer exchange, as described above. o If an RtpTransceiver has been stopped and is associated with an m= section, and the m= section is not being recycled as described above, an m= section MUST be generated for it with the port set to @@ -2036,73 +2079,63 @@ Section 9.3. If the m= section is bundled into another m= section, both "a=candidate" and "a=end-of-candidates" MUST be omitted. o For RtpTransceivers that are still present, the "a=rid" lines MUST stay the same. o For RtpTransceivers that are still present, any "a=simulcast" line MUST stay the same. - o If any RtpTransceiver has been stopped, the port MUST be set to - zero and all "a=msid" lines MUST be removed. - - o If any RtpTransceiver has been added, and there exists a m= - section with a zero port in the current local description or the - current remote description, that m= section MUST be recycled by - generating a m= section for the added RtpTransceiver as if the m= - section were being added to session description, except that - instead of adding it, the generated m= section replaces the m= - section with a zero port. The new m= section MUST contain a new - MID. - - If the initial offer was applied using setLocalDescription, and an - answer from the remote side has been applied using - setRemoteDescription, meaning the PeerConnection is in the "remote- - pranswer" or "stable" states, an offer is generated based on the - negotiated session descriptions by following the steps mentioned for - the "local-offer" state above. + If the previous offer was applied using setLocalDescription, and a + corresponding answer from the remote side has been applied using + setRemoteDescription, meaning the PeerConnection is in the "have- + remote-pranswer" or "stable" states, an offer is generated based on + the negotiated session descriptions by following the steps mentioned + for the "have-local-offer" state above. - In addition, for each non-recycled, non-rejected m= section in the - new offer, the following adjustments are made based on the contents - of the corresponding m= section in the current remote description, if - any: + In addition, for each existing, non-recycled, non-rejected m= section + in the new offer, the following adjustments are made based on the + contents of the corresponding m= section in the current local or + remote description, as appropriate: o The m= line and corresponding "a=rtpmap" and "a=fmtp" lines MUST - only include codecs present in the most recent answer which have - not been excluded by the codec preferences of the associated - transceiver. Note that non-JSEP endpoints are not subject to - these restrictions, and might offer media formats that were not - present in the most recent answer, as specified in [RFC3264], - Section 8. Therefore, JSEP implementations MUST be prepared to - receive such offers. + only include media formats which have not been excluded by the + codec preferences of the associated transceiver, and MUST include + all currently available formats. Media formats that were + previously offered but are no longer available (e.g., a shared + hardware codec) MAY be excluded. o Unless codec preferences have been set for the associated transceiver, the media formats on the m= line MUST be generated in - the same order as in the current local description. + the same order as in the most recent answer. Any media formats + that were not present in the most recent answer MUST be added + after all existing formats. o The RTP header extensions MUST only include those that are present in the most recent answer. - o The RTCP feedback extensions MUST only include those that are - present in the most recent answer. + o The RTCP feedback mechanisms MUST only include those that are + present in the most recent answer, except for the case of format- + specific mechanisms that are referencing a newly-added media + format. - o The "a=rtcp" line MUST only be added if the most recent answer did - not include an "a=rtcp-mux" line. + o The "a=rtcp" line MUST NOT be added if the most recent answer + included an "a=rtcp-mux" line. - o The "a=rtcp-mux" line MUST only be added if present in the most - recent answer. + o The "a=rtcp-mux" line MUST be the same as that in the most recent + answer. o The "a=rtcp-mux-only" line MUST NOT be added. - o The "a=rtcp-rsize" line MUST only be added if present in the most - recent answer. + o The "a=rtcp-rsize" line MUST NOT be added unless present in the + most recent answer. o An "a=bundle-only" line MUST NOT be added, as indicated in [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 6. Instead, JSEP implementations MUST simply omit parameters in the IDENTICAL and TRANSPORT categories for bundled m= sections, as described in [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 8.1. o Note that if media m= sections are bundled into a data m= section, then certain TRANSPORT and IDENTICAL attributes may appear in the data m= section even if they would otherwise only be appropriate @@ -2143,46 +2176,49 @@ pwd attributes, as specified in [RFC5245], Section 9.1.1.1. If this option is specified on an initial offer, it has no effect (since a new ICE ufrag and pwd are already generated). Similarly, if the ICE configuration has changed, this option has no effect, since new ufrag and pwd attributes will be generated automatically. This option is primarily useful for reestablishing connectivity in cases where failures are detected by the application. 5.2.3.2. VoiceActivityDetection + Silence suppression, also known as discontinuous transmission + ("DTX"), can reduce the bandwidth used for audio by switching to a + special encoding when voice activity is not detected, at the cost of + some fidelity. + If the "VoiceActivityDetection" option is specified, with a value of "true", the offer MUST indicate support for silence suppression in the audio it receives by including comfort noise ("CN") codecs for each offered audio codec, as specified in [RFC3389], Section 5.1, except for codecs that have their own internal silence suppression support. For codecs that have their own internal silence suppression support, the appropriate fmtp parameters for that codec MUST be specified to indicate that silence suppression for received audio is desired. For example, when using the Opus codec [RFC6716], the "usedtx=1" parameter, specified in [RFC7587], would be used in the - offer. This option allows the endpoint to significantly reduce the - amount of audio bandwidth it receives, at the cost of some fidelity, - depending on the quality of the remote VAD algorithm. + offer. If the "VoiceActivityDetection" option is specified, with a value of "false", the JSEP implementation MUST NOT emit "CN" codecs. For codecs that have their own internal silence suppression support, the appropriate fmtp parameters for that codec MUST be specified to indicate that silence suppression for received audio is not desired. For example, when using the Opus codec, the "usedtx=0" parameter - would be specified in the offer. - - Note that setting the "VoiceActivityDetection" parameter when - generating an offer is a request to receive audio with silence - suppression. It has no impact on whether the local endpoint does - silence suppression for the audio it sends. + would be specified in the offer. In addition, the implementation + MUST NOT use silence suppression for media it generates, regardless + of whether the "CN" codecs or related fmtp parameters appear in the + peer's description. The impact of these rules is that silence + suppression in JSEP depends on mutual agreement of both sides, which + ensures consistent handling regardless of which codec is used. The "VoiceActivityDetection" option does not have any impact on the setting of the "vad" value in the signaling of the client to mixer audio level header extension described in [RFC6464], Section 4. 5.3. Generating an Answer When createAnswer is called, a new SDP description must be created that is compatible with the supplied remote description as well as the requirements specified in [I-D.ietf-rtcweb-rtp-usage]. The exact @@ -2284,26 +2320,27 @@ The next step is to go through each offered m= section. Each offered m= section will have an associated RtpTransceiver, as described in Section 5.9. If there are more RtpTransceivers than there are m= sections, the unmatched RtpTransceivers will need to be associated in a subsequent offer. For each offered m= section, if any of the following conditions are true, the corresponding m= section in the answer MUST be marked as rejected by setting the port in the m= line to zero, as indicated in - [RFC3264], Section 6., and further processing for this m= section can + [RFC3264], Section 6, and further processing for this m= section can be skipped: o The associated RtpTransceiver has been stopped. - o No supported codec is present in the offer. + o None of the offered media formats are supported and, if + applicable, allowed by codec preferences. o The bundle policy is "max-bundle", and this is not the first m= section or in the same bundle group as the first m= section. o The bundle policy is "balanced", and this is not the first m= section for this media type or in the same bundle group as the first m= section for this media type. Otherwise, each m= section in the answer should then be generated as specified in [RFC3264], Section 6.1. For the m= line itself, the @@ -2311,32 +2348,35 @@ o The port value would normally be set to the port of the default ICE candidate for this m= section, but given that no candidates are available yet, the "dummy" port value of 9 (Discard) MUST be used, as indicated in [I-D.ietf-ice-trickle], Section 5.1. o The field MUST be set to exactly match the field for the corresponding m= line in the offer. o If codec preferences have been set for the associated transceiver, - media formats MUST be generated in the corresponding order, and - MUST exclude any codecs not present in the codec preferences or - not present in the offer. Note that non-JSEP endpoints are not - subject to this restriction, and might add media formats in the - answer that are not present in the offer, as specified in - [RFC3264], Section 6.1. Therefore, JSEP implementations MUST be - prepared to receive such answers. + media formats MUST be generated in the corresponding order, + regardless of what was offered, and MUST exclude any codecs not + present in the codec preferences. - o Unless excluded by the above restrictions, the media formats MUST - include the mandatory audio/video codecs as specified in - [I-D.ietf-rtcweb-audio] (see Section 3) and - [I-D.ietf-rtcweb-video] (see Section 5). + o Otherwise, the media formats on the m= line MUST be generated in + the same order as those offered in the current remote description, + excluding any currently unsupported formats. Any currently + available media formats that are not present in the current remote + description MUST be added after all existing formats. + + o In either case, the media formats in the answer MUST include at + least one format that is present in the offer, but MAY include + formats that are locally supported but not present in the offer, + as mentioned in [RFC3264], Section 6.1. If no common format + exists, the m= section is rejected as described above. The m= line MUST be followed immediately by a "c=" line, as specified in [RFC4566], Section 5.7. Again, as no candidates are available yet, the "c=" line must contain the "dummy" value "IN IP4 0.0.0.0", as defined in [I-D.ietf-ice-trickle], Section 5.1. If the offer supports bundle, all m= sections to be bundled must use the same ICE credentials and candidates; all m= sections not being bundled must use unique ICE credentials and candidates. Each m= section MUST contain the following attributes (which are of attribute @@ -2378,21 +2418,21 @@ "a=imageattr" line, as specified in Section 3.6. o For each supported RTP header extension that is present in the offer, an "a=extmap" line, as specified in [RFC5285], Section 5. The list of header extensions that SHOULD/MUST be supported is specified in [I-D.ietf-rtcweb-rtp-usage], Section 5.2. Any header extensions that require encryption MUST be specified as indicated in [RFC6904], Section 4. o For each supported RTCP feedback mechanism that is present in the - offer, an "a=rtcp-fb" mechanism, as specified in [RFC4585], + offer, an "a=rtcp-fb" line, as specified in [RFC4585], Section 4.2. The list of RTCP feedback mechanisms that SHOULD/ MUST be supported is specified in [I-D.ietf-rtcweb-rtp-usage], Section 5.1. o If the RtpTransceiver has a sendrecv or sendonly direction: * For each MediaStream that was associated with the transceiver when it was created via addTrack or addTransceiver, an "a=msid" line, as specified in [I-D.ietf-mmusic-msid], Section 2. If a MediaStreamTrack is attached to the transceiver's RtpSender, @@ -2406,31 +2446,34 @@ MediaStream ID, as specified in [I-D.ietf-mmusic-msid], Section 3. The track ID MUST be selected as described above. Each m= section which is not bundled into another m= section, MUST contain the following attributes (which are of category IDENTICAL or TRANSPORT): o "a=ice-ufrag" and "a=ice-pwd" lines, as specified in [RFC5245], Section 15.4. - o An "a=fingerprint" line for each of the endpoint's certificates, - as specified in [RFC4572], Section 5; the digest algorithm used - for the fingerprint MUST match that used in the certificate - signature. + o For each desired digest algorithm, one or more "a=fingerprint" + lines for each of the endpoint's certificates, as specified in + [RFC8122], Section 5. o An "a=setup" line, as specified in [RFC4145], Section 4, and clarified for use in DTLS-SRTP scenarios in [RFC5763], Section 5. - The role value in the answer MUST be "active" or "passive"; the - "active" role is RECOMMENDED. + The role value in the answer MUST be "active" or "passive". When + the offer contains the "actpass" value, as will always be the case + with JSEP endpoints, the answerer SHOULD use the "active" role. + Offers from non-JSEP endpoints MAY send other values for + "a=setup", in which case the answer MUST use a value consistent + with the value in the offer. - o An "a=dtls-id" line, as specified in [I-D.ietf-mmusic-dtls-sdp] + o An "a=tls-id" line, as specified in [I-D.ietf-mmusic-dtls-sdp], Section 5.3. o If present in the offer, an "a=rtcp-mux" line, as specified in [RFC5761], Section 5.1.3. Otherwise, an "a=rtcp" line, as specified in [RFC3605], Section 2.1, containing the dummy value "9 IN IP4 0.0.0.0" (because no candidates have yet been gathered). o If present in the offer, an "a=rtcp-rsize" line, as specified in [RFC5506], Section 5. @@ -2446,25 +2489,26 @@ "a=max-message-size" line, as defined in [I-D.ietf-mmusic-sctp-sdp], Section 6.1. As discussed above, the following attributes of category IDENTICAL or TRANSPORT are included only if the data m= section is not bundled into another m= section: o "a=ice-ufrag" o "a=ice-pwd" + o "a=fingerprint" o "a=setup" - o "a=dtls-id" + o "a=tls-id" Note that if media m= sections are bundled into a data m= section, then certain TRANSPORT and IDENTICAL attributes may also appear in the data m= section even if they would otherwise only be appropriate for a media m= section (e.g., "a=rtcp-mux"). If "a=group" attributes with semantics of "BUNDLE" are offered, corresponding session-level "a=group" attributes MUST be added as specified in [RFC5888]. These attributes MUST have semantics "BUNDLE", and MUST include the all mid identifiers from the offered @@ -2477,21 +2521,21 @@ The attributes prohibited in the creation of offers are also prohibited in the creation of answers. 5.3.2. Subsequent Answers When createAnswer is called a second (or later) time, or is called after a local description has already been installed, the processing is somewhat different than for an initial answer. - If the initial answer was not applied using setLocalDescription, + If the previous answer was not applied using setLocalDescription, meaning the PeerConnection is still in the "have-remote-offer" state, the steps for generating an initial answer should be followed, subject to the following restriction: o The fields of the "o=" line MUST stay the same except for the field, which MUST increment if the session description changes in any way from the previously generated answer. If any session description was previously supplied to @@ -2500,34 +2544,33 @@ o The "s=" and "t=" lines MUST stay the same. o Each "m=" and c=" line MUST be filled in with the port and address of the default candidate for the m= section, as described in [RFC5245], Section 4.3. Note, however, that the m= line protocol need not match the default candidate, because this protocol value must instead match what was supplied in the offer, as described above. - o Unless codec preferences have been set for the associated - transceiver, the media formats on the m= line MUST be generated in - the same order as in the current local description. - o Each "a=ice-ufrag" and "a=ice-pwd" line MUST stay the same, unless the m= section is restarting, in which case new ICE credentials must be created as specified in [RFC5245], Section 9.2.1.1. If the m= section is bundled into another m= section, it still MUST NOT contain any ICE credentials. o Each "a=setup" line MUST use an "active" or "passive" role value consistent with the existing DTLS association, if the association is being continued by the offerer. + o RTCP multiplexing must be used, and an "a=rtcp-mux" line inserted + if and only if the m= section previously used RTCP multiplexing. + o If the m= section is not bundled into another m= section and RTCP multiplexing is not active, an "a=rtcp" attribute line MUST be filled in with the port and address of the default RTCP candidate. If no RTCP candidates have yet been gathered, dummy values MUST be used, as described in the initial answer section above. o If the m= section is not bundled into another m= section, for each candidate that has been gathered during the most recent gathering phase (see Section 3.5.1), an "a=candidate" line MUST be added, as defined in [RFC5245], Section 4.3., paragraph 3. If candidate @@ -2556,21 +2599,24 @@ 5.3.3.1. VoiceActivityDetection Silence suppression in the answer is handled as described in Section 5.2.3.2, with one exception: if support for silence suppression was not indicated in the offer, the VoiceActivityDetection parameter has no effect, and the answer should be generated as if VoiceActivityDetection was set to false. This is done on a per-codec basis (e.g., if the offerer somehow offered support for CN but set "usedtx=0" for Opus, setting VoiceActivityDetection to true would result in an answer with CN - codecs and "usedtx=0"). + codecs and "usedtx=0"). The impact of this rule is that an answerer + will not try to use silence suppression with any endpoint that does + not offer it, making silence suppression support bilateral even with + non-JSEP endpoints. 5.4. Modifying an Offer or Answer The SDP returned from createOffer or createAnswer MUST NOT be changed before passing it to setLocalDescription. If precise control over the SDP is needed, the aforementioned createOffer/createAnswer options or RtpTransceiver APIs MUST be used. Note that the application MAY modify the SDP to reduce the capabilities in the offer it sends to the far side (post- @@ -2591,76 +2637,78 @@ assume that all SDP is well-formed; however, one should be able to assume that any implementation of this specification will be able to process, as a remote offer or answer, unmodified SDP coming from any other implementation of this specification. 5.5. Processing a Local Description When a SessionDescription is supplied to setLocalDescription, the following steps MUST be performed: - o First, the type of the SessionDescription is checked against the - current state of the PeerConnection: + o If the description is of type "rollback", follow the processing + defined in Section 4.1.8.2 and skip the processing described in + the rest of this section. + + o Otherwise, the type of the SessionDescription is checked against + the current state of the PeerConnection: * If the type is "offer", the PeerConnection state MUST be either "stable" or "have-local-offer". * If the type is "pranswer" or "answer", the PeerConnection state MUST be either "have-remote-offer" or "have-local-pranswer". o If the type is not correct for the current state, processing MUST stop and an error MUST be returned. o The SessionDescription is then checked to ensure that its contents are identical to those generated in the last call to createOffer/ createAnswer, and thus have not been altered, as discussed in Section 5.4; otherwise, processing MUST stop and an error MUST be returned. o Next, the SessionDescription is parsed into a data structure, as - described in the Section 5.7 section below. If parsing fails for - any reason, processing MUST stop and an error MUST be returned. + described in Section 5.7 below. o Finally, the parsed SessionDescription is applied as described in - the Section 5.8 section below. + Section 5.8 below. 5.6. Processing a Remote Description When a SessionDescription is supplied to setRemoteDescription, the following steps MUST be performed: - o First, the type of the SessionDescription is checked against the - current state of the PeerConnection: + o If the description is of type "rollback", follow the processing + defined in Section 4.1.8.2 and skip the processing described in + the rest of this section. + + o Otherwise, the type of the SessionDescription is checked against + the current state of the PeerConnection: * If the type is "offer", the PeerConnection state MUST be either "stable" or "have-remote-offer". * If the type is "pranswer" or "answer", the PeerConnection state MUST be either "have-local-offer" or "have-remote-pranswer". o If the type is not correct for the current state, processing MUST stop and an error MUST be returned. o Next, the SessionDescription is parsed into a data structure, as - described in the Section 5.7 section below. If parsing fails for - any reason, processing MUST stop and an error MUST be returned. + described in Section 5.7 below. If parsing fails for any reason, + processing MUST stop and an error MUST be returned. o Finally, the parsed SessionDescription is applied as described in - the Section 5.9 section below. + Section 5.9 below. 5.7. Parsing a Session Description - When a SessionDescription of any type is supplied to setLocal/ - RemoteDescription, the implementation must parse it and reject it if - it is invalid. The exact details of this process are explained - below. - The SDP contained in the session description object consists of a sequence of text lines, each containing a key-value expression, as described in [RFC4566], Section 5. The SDP is read, line-by-line, and converted to a data structure that contains the deserialized information. However, SDP allows many types of lines, not all of which are relevant to JSEP applications. For each line, the implementation will first ensure it is syntactically correct according to its defining ABNF, check that it conforms to [RFC4566] and [RFC3264] semantics, and then either parse and store or discard the provided value, as described below. @@ -2669,31 +2717,29 @@ parser MUST stop with an error and reject the session description, even if the value is to be discarded. This ensures that implementations do not accidentally misinterpret ambiguous SDP. 5.7.1. Session-Level Parsing First, the session-level lines are checked and parsed. These lines MUST occur in a specific order, and with a specific syntax, as defined in [RFC4566], Section 5. Note that while the specific line types (e.g. "v=", "c=") MUST occur in the defined order, lines of the - same type (typically "a=") can occur in any order, and their ordering - is not meaningful. + same type (typically "a=") can occur in any order. The following non-attribute lines are not meaningful in the JSEP context and MAY be discarded once they have been checked. - The "c=" line MUST be checked for syntax but its value is not - used. This supersedes the guidance in [RFC5245], Section 6.1, to - use "ice-mismatch" to indicate mismatches between "c=" and the - candidate lines; because JSEP always uses ICE, "ice-mismatch" is - not useful in this context. + The "c=" line MUST be checked for syntax but its value is only + used for ICE mismatch detection, as defined in [RFC5245], + Section 6.1. Note that JSEP implementations should never + encounter this condition because ICE is required for WebRTC. The "i=", "u=", "e=", "p=", "t=", "r=", "z=", and "k=" lines are not used by this specification; they MUST be checked for syntax but their values are not used. The remaining non-attribute lines are processed as follows: The "v=" line MUST have a version of 0, as specified in [RFC4566], Section 5.1. @@ -2717,40 +2763,42 @@ o If present, a single "a=ice-ufrag" line is parsed as specified in [RFC5245], Section 15.4, and the ufrag value is stored. o If present, a single "a=ice-pwd" line is parsed as specified in [RFC5245], Section 15.4, and the password value is stored. o If present, a single "a=ice-options" line is parsed as specified in [RFC5245], Section 15.5, and the set of specified options is stored. - o Any "a=fingerprint" lines are parsed as specified in [RFC4572], + o Any "a=fingerprint" lines are parsed as specified in [RFC8122], Section 5, and the set of fingerprint and algorithm values is stored. o If present, a single "a=setup" line is parsed as specified in [RFC4145], Section 4, and the setup value is stored. - o If present, a single "a=dtls-id" line is parsed as specified in - [I-D.ietf-mmusic-dtls-sdp] Section 5, and the dtls-id value is + o If present, a single "a=tls-id" line is parsed as specified in + [I-D.ietf-mmusic-dtls-sdp] Section 5, and the tls-id value is stored. o Any "a=identity" lines are parsed and the identity values stored for subsequent verification, as specified [I-D.ietf-rtcweb-security-arch], Section 5. o Any "a=extmap" lines are parsed as specified in [RFC5285], Section 5, and their values are stored. - As required by [RFC4566], Section 5.13, unknown attribute lines MUST - be ignored. + Other attributes that are not relevant to JSEP may also be present, + and implementations SHOULD process any that they recognize. As + required by [RFC4566], Section 5.13, unknown attribute lines MUST be + ignored. Once all the session-level lines have been parsed, processing continues with the lines in m= sections. 5.7.2. Media Section Parsing Like the session-level lines, the media section lines MUST occur in the specific order and with the specific syntax defined in [RFC4566], Section 5. @@ -2784,46 +2832,45 @@ o Any "a=candidate" attributes MUST be parsed as specified in [RFC5245], Section 15.1, and their values stored. o Any "a=remote-candidates" attributes MUST be parsed as specified in [RFC5245], Section 15.2, but their values are ignored. o If present, a single "a=end-of-candidates" attribute MUST be parsed as specified in [I-D.ietf-ice-trickle], Section 8.2, and its presence or absence flagged and stored. - o Any "a=fingerprint" lines are parsed as specified in [RFC4572], + o Any "a=fingerprint" lines are parsed as specified in [RFC8122], Section 5, and the set of fingerprint and algorithm values is stored. - If the "m=" proto value indicates use of RTP, as described in the - Section 5.1.2 section above, the following attribute lines MUST be - processed: + If the "m=" proto value indicates use of RTP, as described in + Section 5.1.2 above, the following attribute lines MUST be processed: o The "m=" fmt value MUST be parsed as specified in [RFC4566], Section 5.14, and the individual values stored. o Any "a=rtpmap" or "a=fmtp" lines MUST be parsed as specified in [RFC4566], Section 6, and their values stored. o If present, a single "a=ptime" line MUST be parsed as described in [RFC4566], Section 6, and its value stored. o If present, a single "a=maxptime" line MUST be parsed as described in [RFC4566], Section 6, and its value stored. o If present, a single direction attribute line (e.g. "a=sendrecv") MUST be parsed as described in [RFC4566], Section 6, and its value stored. - o Any "a=ssrc" or "a=ssrc-group" attributes MUST be parsed as - specified in [RFC5576], Sections 4.1-4.2, and their values stored. + o Any "a=ssrc" attributes MUST be parsed as specified in [RFC5576], + Section 4.1, and their values stored. o Any "a=extmap" attributes MUST be parsed as specified in [RFC5285], Section 5, and their values stored. o Any "a=rtcp-fb" attributes MUST be parsed as specified in [RFC4585], Section 4.2., and their values stored. o If present, a single "a=rtcp-mux" attribute MUST be parsed as specified in [RFC5761], Section 5.1.3, and its presence or absence flagged and stored. @@ -2861,65 +2908,70 @@ protocol value stored. o An "a=sctp-port" attribute MUST be present, and it MUST be parsed as specified in [I-D.ietf-mmusic-sctp-sdp], Section 5.2, and the value stored. o If present, a single "a=max-message-size" attribute MUST be parsed as specified in [I-D.ietf-mmusic-sctp-sdp], Section 6, and the value stored. Otherwise, use the specified default. - As required by [RFC4566], Section 5.13, unknown attribute lines MUST - be ignored. + Other attributes that are not relevant to JSEP may also be present, + and implementations SHOULD process any that they recognize. As + required by [RFC4566], Section 5.13, unknown attribute lines MUST be + ignored. 5.7.3. Semantics Verification Assuming parsing completes successfully, the parsed description is then evaluated to ensure internal consistency as well as proper support for mandatory features. Specifically, the following checks are performed: o For each m= section, valid values for each of the mandatory-to-use features enumerated in Section 5.1.1 MUST be present. These values MAY either be present at the media level, or inherited from the session level. * ICE ufrag and password values, which MUST comply with the size limits specified in [RFC5245], Section 15.4. - * dtls-id value, which MUST be set according to - [I-D.ietf-mmusic-dtls-sdp] Section 5. If this is a re-offer - and the dtls-id value is different from that presently in use, + * tls-id value, which MUST be set according to + [I-D.ietf-mmusic-dtls-sdp], Section 5. If this is a re-offer + and the tls-id value is different from that presently in use, the DTLS connection is not being continued and the remote description MUST be part of an ICE restart, together with new - ufrag and password values. If this is an answer, the dtls-id + ufrag and password values. If this is an answer, the tls-id value, if present, MUST be the same as in the offer. * DTLS setup value, which MUST be set according to the rules specified in [RFC5763], Section 5 and MUST be consistent with the selected role of the current DTLS connection, if one exists and is being continued. * DTLS fingerprint values, where at least one fingerprint MUST be present. o All RID values referenced in an "a=simulcast" line MUST exist as "a=rid" lines. o Each m= section is also checked to ensure prohibited features are - not used. If this is a local description, the "ice-lite" - attribute MUST NOT be specified. + not used. o If the RTP/RTCP multiplexing policy is "require", each m= section - MUST contain an "a=rtcp-mux" attribute. If an "m=" section - contains an "a=rtcp-mux-only" attribute then that section MUST - also contain an "a=rtcp-mux" attribute. + MUST contain an "a=rtcp-mux" attribute. If an m= section contains + an "a=rtcp-mux-only" attribute then that section MUST also contain + an "a=rtcp-mux" attribute. + + o If this m= section was present in the previous answer then the + state of RTP/RTCP multiplexing MUST match what was previously + negotiated. If this session description is of type "pranswer" or "answer", the following additional checks are applied: o The session description must follow the rules defined in [RFC3264], Section 6, including the requirement that the number of m= sections MUST exactly match the number of m= sections in the associated offer. o For each m= section, the media type and protocol values MUST @@ -2934,137 +2986,137 @@ The following steps are performed at the media engine level to apply a local description. If an error is returned, the session MUST be restored to the state it was in before performing these steps. Next, m= sections are processed. For each m= section, the following steps MUST be performed; if any parameters are out of bounds, or cannot be applied, processing MUST stop and an error MUST be returned. o If this m= section is new, begin gathering candidates for it, as - defined in [RFC5245], Section 4.1.1, unless it has been marked as - bundle-only. + defined in [RFC5245], Section 4.1.1, unless it is definitively + being bundled (either this is an offer and the m= section is + marked bundle-only, or it is an answer and the m= section is + bundled into into another m= section.) - o Or, if the ICE ufrag and password values have changed, and it has - not been marked as bundle-only, trigger the ICE agent to start an - ICE restart, and begin gathering new candidates for the m= section - as described in [RFC5245], Section 9.1.1.1. If this description - is an answer, also start checks on that media section as defined - in [RFC5245], Section 9.3.1.1. + o Or, if the ICE ufrag and password values have changed, trigger the + ICE agent to start an ICE restart, and begin gathering new + candidates for the m= section as described in [RFC5245], + Section 9.1.1.1. If this description is an answer, also start + checks on that media section as defined in [RFC5245], + Section 9.3.1.1. o If the m= section proto value indicates use of RTP: - * If there is no RtpTransceiver associated with this m= section - (which will only happen when applying an offer), find one and - associate it with this m= section according to the following - steps: + * If there is no RtpTransceiver associated with this m= section, + find one and associate it with this m= section according to the + following steps. Note that this situation will only occur when + applying an offer. + Find the RtpTransceiver that corresponds to this m= section, using the mapping between transceivers and m= section indices established when creating the offer. + Set the value of this RtpTransceiver's mid property to the MID of the m= section. * If RTCP mux is indicated, prepare to demux RTP and RTCP from the RTP ICE component, as specified in [RFC5761], - Section 5.1.3. If RTCP mux is not indicated, but was - previously negotiated, i.e., the RTCP ICE component no longer - exists, this MUST result in an error. + Section 5.1.3. * For each specified RTP header extension, establish a mapping - between the extension ID and URI, as described in section 6 of - [RFC5285]. If any indicated RTP header extension is not - supported, this MUST result in an error. + between the extension ID and URI, as described in [RFC5285], + Section 6. * If the MID header extension is supported, prepare to demux RTP streams intended for this m= section based on the MID header extension, as described in - [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 14. + [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 15. * For each specified media format, establish a mapping between the payload type and the actual media format, as described in - [RFC3264], Section 6.1. If any indicated media format is not - supported, this MUST result in an error. + [RFC3264], Section 6.1. In addition, prepare to demux RTP + streams intended for this m= section based on the media formats + supported by this m= section, as described in + [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 10.2. * For each specified "rtx" media format, establish a mapping between the RTX payload type and its associated primary payload - type, as described in [RFC4588], Sections 8.6 and 8.7. If any - referenced primary payload types are not present, this MUST - result in an error. + type, as described in [RFC4588], Sections 8.6 and 8.7. * If the directional attribute is of type "sendrecv" or "recvonly", enable receipt and decoding of media. Finally, if this description is of type "pranswer" or "answer", - follow the processing defined in the Section 5.10 section below. + follow the processing defined in Section 5.10 below. 5.9. Applying a Remote Description The following steps are performed to apply a remote description. If an error is returned, the session MUST be restored to the state it was in before performing these steps. If the answer contains any "a=ice-options" attributes where "trickle" is listed as an attribute, update the PeerConnection canTrickle property to be true. Otherwise, set this property to false. The following steps MUST be performed for attributes at the session level; if any parameters are out of bounds, or cannot be applied, processing MUST stop and an error MUST be returned. o For any specified "CT" bandwidth value, set this as the limit for the maximum total bitrate for all m= sections, as specified in - Section 5.8 of [RFC4566]. Within this overall limit, the + [RFC4566], Section 5.8. Within this overall limit, the implementation can dynamically decide how to best allocate the available bandwidth between m= sections, respecting any specific limits that have been specified for individual m= sections. o For any specified "RR" or "RS" bandwidth values, handle as specified in [RFC3556], Section 2. o Any "AS" bandwidth value MUST be ignored, as the meaning of this construct at the session level is not well defined. For each m= section, the following steps MUST be performed; if any parameters are out of bounds, or cannot be applied, processing MUST stop and an error MUST be returned. - o If the PeerConnection state is "have-local-offer", and the ICE - ufrag or password changed from the previous remote description, - then an ICE restart is needed, as described in Section 9.1.1.1 of - [RFC5245]. If the description is of type "offer", note that an - ICE restart is needed. If the description is of type "answer" or - "pranswer" and the current local description is also an ICE - restart, then signal the ICE agent to begin checks as described in - Section 9.3.1.1 of [RFC5245]. An answerer MUST change the ufrag - and password in an answer if and only if ICE is restarting, as - described in Section 9.2.1.1 of [RFC5245]. + o If the ICE ufrag or password changed from the previous remote + description: [RFC5245]. - o If the PeerConnection state is "have-remote-pranswer", and the ICE - ufrag or password changed from the previous provisional answer, - then signal the ICE agent to discard any previous ICE check list - state for the m= section and begin checks as if this were the - first answer. However, such an answer MAY only change the ICE - ufrag or password if the local offer is starting or restarting ICE - for the m= section. + * If the description is of type "offer", note that an ICE restart + is needed, as described in [RFC5245], Section 9.1.1.1 . + + * If the description is of type "answer" or "pranswer", then + check to see if the current local description is an ICE + restart, and if not, generate an error. It the PeerConnection + state is "have-remote-pranswer", and the ICE ufrag or password + changed from the previous provisional answer, then signal the + ICE agent to discard any previous ICE check list state for the + m= section. Finally, signal the ICE agent to begin checks as + described in [RFC5245], Section 9.3.1.1. + + o If the current local description indicates an ICE restart, and + either the ICE ufrag or password has not changed from the previous + remote description, as prescribed by [RFC5245], Section 9.2.1.1, + generate an error. o Configure the ICE components associated with this media section to use the supplied ICE remote ufrag and password for their connectivity checks. o Pair any supplied ICE candidates with any gathered local - candidates, as described in Section 5.7 of [RFC5245] and start + candidates, as described in [RFC5245], Section 5.7, and start connectivity checks with the appropriate credentials. o If an "a=end-of-candidates" attribute is present, process the end- - of-candidates indication as described in [I-D.ietf-ice-trickle] + of-candidates indication as described in [I-D.ietf-ice-trickle], Section 11. o If the m= section proto value indicates use of RTP: * If the m= section is being recycled (see Section 5.2.2), dissociate the currently associated RtpTransceiver by setting its mid property to null, and discard the mapping between the transceiver and its m= section index. * If the m= section is not associated with any RtpTransceiver @@ -3099,152 +3151,172 @@ records the payload type to be used in outgoing RTP packets when sending each specified media format, as well as the relative preference for each format that is indicated in their ordering. If any indicated media format is not supported by the local implementation, it MUST be ignored. * For each specified "rtx" media format, establish a mapping between the RTX payload type and its associated primary payload type, as described in [RFC4588], Section 4. If any referenced primary payload types are not present, this MUST result in an - error. + error. Note that RTX payload types may refer to primary + payload types which are not supported by the local media + implementation, in which case, the RTX payload type MUST also + be ignored. * For each specified fmtp parameter that is supported by the local implementation, enable them on the associated media formats. + * For each specified SSRC that is signaled in the m= section, + prepare to demux RTP streams intended for this m= section using + that SSRC, as described in + [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 10.2. + * For each specified RTP header extension that is also supported by the local implementation, establish a mapping between the extension ID and URI, as described in [RFC5285], Section 5. Specifically, this means that the implementation records the extension ID to be used in outgoing RTP packets when sending each specified header extension. If any indicated RTP header extension is not supported by the local implementation, it MUST be ignored. * For each specified RTCP feedback mechanism that is supported by the local implementation, enable them on the associated media formats. * For any specified "TIAS" bandwidth value, set this value as a constraint on the maximum RTP bitrate to be used when sending media, as specified in [RFC3890]. If a "TIAS" value is not present, but an "AS" value is specified, generate a "TIAS" value using this formula: - TIAS = AS * 1000 * 0.95 - 50 * 40 * 8 - + TIAS = AS * 1000 * 0.95 - (50 * 40 * 8) The 50 is based on 50 packets per second, the 40 is based on an estimate of total header size, the 1000 changes the unit from kbps to bps (as required by TIAS), and the 0.95 is to allocate 5% to RTCP. "TIAS" is used in preference to "AS" because it provides more accurate control of bandwidth. * For any "RR" or "RS" bandwidth values, handle as specified in [RFC3556], Section 2. * Any specified "CT" bandwidth value MUST be ignored, as the meaning of this construct at the media level is not well defined. * If the m= section is of type audio: - + For each specified "CN" media format, enable DTX for all - supported media formats with the same clockrate, as - described in [RFC3389], Section 5, except for formats that - have their own internal DTX mechanisms. DTX for such - formats (e.g., Opus) is controlled via fmtp parameters, as - discussed in Section 5.2.3.2. + + For each specified "CN" media format, configure silence + suppression for all supported media formats with the same + clockrate, as described in [RFC3389], Section 5, except for + formats that have their own internal silence suppression + mechanisms. Silence suppression for such formats (e.g., + Opus) is controlled via fmtp parameters, as discussed in + Section 5.2.3.2. + For each specified "telephone-event" media format, enable DTMF transmission for all supported media formats with the same clockrate, as described in [RFC4733], Section 2.5.1.2. If the application attempts to transmit DTMF when using a media format that does not have a corresponding telephone- event format, this MUST result in an error. + For any specified "ptime" value, configure the available - media formats to use the specified packet size. If the - specified size is not supported for a media format, use the - next closest value instead. + media formats to use the specified packet size when sending. + If the specified size is not supported for a media format, + use the next closest value instead. Finally, if this description is of type "pranswer" or "answer", - follow the processing defined in the Section 5.10 section below. + follow the processing defined in Section 5.10 below. 5.10. Applying an Answer In addition to the steps mentioned above for processing a local or remote description, the following steps are performed when processing a description of type "pranswer" or "answer". For each m= section, the following steps MUST be performed: o If the m= section has been rejected (i.e. port is set to zero in the answer), stop any reception or transmission of media for this section, and, unless a non-rejected m= section is bundled with this m= section, discard any associated ICE components, as - described in Section 9.2.1.3 of [RFC5245]. + described in [RFC5245], Section 9.2.1.3. - o If the remote DTLS fingerprint has been changed or the dtls-id has + o If the remote DTLS fingerprint has been changed or the tls-id has changed, tear down the DTLS connection. This includes the case when the PeerConnection state is "have-remote-pranswer". If a DTLS connection needs to be torn down but the answer does not indicate an ICE restart or, in the case of "have-remote-pranswer", new ICE credentials, an error MUST be generated. If an ICE - restart is performed without a change in dtls-id or fingerprint, + restart is performed without a change in tls-id or fingerprint, then the same DTLS connection is continued over the new ICE channel. o If no valid DTLS connection exists, prepare to start a DTLS connection, using the specified roles and fingerprints, on any underlying ICE components, once they are active. o If the m= section proto value indicates use of RTP: - * If the m= section references any media formats, RTP header - extensions, or RTCP feedback mechanisms that were not present - in the corresponding m= section in the offer, this indicates a - negotiation problem and MUST result in an error. + * If the m= section references RTCP feedback mechanisms that were + not present in the corresponding m= section in the offer, this + indicates a negotiation problem and MUST result in an error. + However, new media formats and new RTP header extension values + are permitted in the answer, as described in [RFC3264], + Section 7, and [RFC5285], Section 6. * If the m= section has RTCP mux enabled, discard the RTCP ICE component, if one exists, and begin or continue muxing RTCP over the RTP ICE component, as specified in [RFC5761], Section 5.1.3. Otherwise, prepare to transmit RTCP over the RTCP ICE component; if no RTCP ICE component exists, because RTCP mux was previously enabled, this MUST result in an error. * If the m= section has reduced-size RTCP enabled, configure the RTCP transmission for this m= section to use reduced-size RTCP, as specified in [RFC5506]. - * If the directional attribute in the answer is of type - "sendrecv" or "sendonly", choose the media format to send as - the most preferred media format from the remote description - that is also present in the answer, as described in [RFC3264], - Sections 6.1 and 7, and start transmitting RTP media once the - underlying transport layers have been established. If an SSRC - has not already been chosen for this outgoing RTP stream, + * If the directional attribute in the answer indicates that the + JSEP implementation should be sending media ("sendonly" for + local answers, "recvonly" for remote answers, or "sendrecv" for + either type of answer), choose the media format to send as the + most preferred media format from the remote description that is + also locally supported, as discussed in [RFC3264], Sections 6.1 + and 7, and start transmitting RTP media using that format once + the underlying transport layers have been established. If an + SSRC has not already been chosen for this outgoing RTP stream, choose a random one. If media is already being transmitted, the same SSRC SHOULD be used unless the clockrate of the new codec is different, in which case a new SSRC MUST be chosen, as specified in [RFC7160], Section 3.1. * The payload type mapping from the remote description is used to determine payload types for the outgoing RTP streams, including the payload type for the send media format chosen above. Any RTP header extensions that were negotiated should be included in the outgoing RTP streams, using the extension mapping from the remote description; if the RID header extension has been negotiated, and RID values are specified, include the RID header extension in the outgoing RTP streams, as indicated in [I-D.ietf-mmusic-rid], Section 4. + * If the m= section is of type audio, and silence suppression was + configured for the send media format as a result of processing + the remote description, and is also enabled for that format in + the local description, use silence suppression for outgoing + media, in accordance with the guidance in Section 5.2.3.2. If + these conditions are not met, silence suppression MUST NOT be + used for outgoing media. + * If simulcast has been negotiated, send the number of Source RTP Streams as specified in [I-D.ietf-mmusic-sdp-simulcast], Section 6.2.2. * If the send media format chosen above has a corresponding "rtx" media format, or a FEC mechanism has been negotiated, establish a Redundancy RTP Stream with a random SSRC for each Source RTP Stream, and start or continue transmitting RTX/FEC packets as needed. @@ -3254,27 +3326,28 @@ discussed in [I-D.ietf-rtcweb-fec], Section 3.2. Note that unlike RTX or FEC media formats, the "red" format is transmitted on the Source RTP Stream, not the Redundancy RTP Stream. * Enable the RTCP feedback mechanisms referenced in the media section for all Source RTP Streams using the specified media formats. Specifically, begin or continue sending the requested feedback types and reacting to received feedback, as specified in [RFC4585], Section 4.2. When sending RTCP feedback, follow - the rules and recommendations from - [I-D.ietf-avtcore-rtp-multi-stream], Section 5.4.1 to select - which SSRC to use. + the rules and recommendations from [RFC8108] Section 5.4.1, to + select which SSRC to use. - * If the directional attribute is of type "recvonly" or - "inactive", stop transmitting all RTP media, but continue - sending RTCP, as described in [RFC3264], Section 5.1. + * If the directional attribute in the answer indicates that the + JSEP implementation should not be sending media ("recvonly" for + local answers, "sendonly" for remote answers, or "inactive" for + either type of answer) stop transmitting all RTP media, but + continue sending RTCP, as described in [RFC3264], Section 5.1. o If the m= section proto value indicates use of SCTP: * If an SCTP association exists, and the remote SCTP port has changed, discard the existing SCTP association. This includes the case when the PeerConnection state is "have-remote- pranswer". * If no valid SCTP association exists, prepare to initiate a SCTP association over the associated ICE component and DTLS @@ -3288,57 +3361,60 @@ ICE components in each bundle, and begin muxing these m= sections accordingly, as described in [I-D.ietf-mmusic-sdp-bundle-negotiation], Section 8.2. If the description is of type "answer", and there are still remaining candidates in the ICE candidate pool, discard them. 6. Processing RTP/RTCP When bundling, associating incoming RTP/RTCP with the proper m= - section is defined in [I-D.ietf-mmusic-sdp-bundle-negotiation]. When - not bundling, the proper m= section is clear from the ICE component - over which the RTP/RTCP is received. + section is defined in [I-D.ietf-mmusic-sdp-bundle-negotiation], + Section 10.2. When not bundling, the proper m= section is clear from + the ICE component over which the RTP/RTCP is received. Once the proper m= section(s) are known, RTP/RTCP is delivered to the RtpTransceiver(s) associated with the m= section(s) and further processing of the RTP/RTCP is done at the RtpTransceiver level. This includes using RID [I-D.ietf-mmusic-rid] to distinguish between multiple Encoded Streams, as well as determine which Source RTP stream should be repaired by a given Redundancy RTP stream. 7. Examples Note that this example section shows several SDP fragments. To format in 72 columns, some of the lines in SDP have been split into multiple lines, where leading whitespace indicates that a line is a continuation of the previous line. In addition, some blank lines have been added to improve readability but are not valid in SDP. - More examples of SDP for WebRTC call flows can be found in - [I-D.nandakumar-rtcweb-sdp]. + More examples of SDP for WebRTC call flows, including examples with + IPv6 addresses, can be found in [I-D.ietf-rtcweb-sdp]. 7.1. Simple Example This section shows a very simple example that sets up a minimal audio / video call between two JSEP endpoints without using trickle ICE. The example in the following section provides a more detailed example of what could happen in a JSEP session. The code flow below shows Alice's endpoint initiating the session to - Bob's endpoint. The messages from Alice's JS to Bob's JS are assumed - to flow over some signaling protocol via a web server. The JS on - both Alice's side and Bob's side waits for all candidates before - sending the offer or answer, so the offers and answers are complete; - trickle ICE is not used. Both Alice and Bob are using the default - bundle policy of "balanced", and the default RTCP mux policy of - "require". + Bob's endpoint. The messages from the JavaScript application in + Alice's browser to the JavaScript in Bob's browser, abbreviated as + AliceJS and BobJS respectively, are assumed to flow over some + signaling protocol via a web server. The JavaScript on both Alice's + side and Bob's side waits for all candidates before sending the offer + or answer, so the offers and answers are complete; trickle ICE is not + used. The user agents (JSEP implementations) in Alice and Bob's + browsers, abbreviated as AliceUA and BobUA respectively, are using + the default bundle policy of "balanced", and the default RTCP mux + policy of "require". // set up local media state AliceJS->AliceUA: create new PeerConnection AliceJS->AliceUA: addTrack with two tracks: audio and video AliceJS->AliceUA: createOffer to get offer AliceJS->AliceUA: setLocalDescription with offer AliceUA->AliceJS: multiple onicecandidate events with candidates // wait for ICE gathering to complete AliceUA->AliceJS: onicecandidate event with null candidate @@ -3387,59 +3463,66 @@ m=audio 10100 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 203.0.113.100 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:47017fee-b6c1-4162-929c-a25110252400 f83006c5-a0ff-4e0a-9ed9-d3e6747be7d9 a=ice-ufrag:ETEn a=ice-pwd:OtSK0WpNtpUjkY4+86js7ZQl a=fingerprint:sha-256 19:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04: BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 a=setup:actpass - a=dtls-id:1 + a=tls-id:1 a=rtcp:10101 IN IP4 203.0.113.100 a=rtcp-mux a=rtcp-rsize a=candidate:1 1 udp 2113929471 203.0.113.100 10100 typ host a=candidate:1 2 udp 2113929470 203.0.113.100 10101 typ host a=end-of-candidates - m=video 10102 UDP/TLS/RTP/SAVPF 100 101 + m=video 10102 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 203.0.113.100 a=mid:v1 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:47017fee-b6c1-4162-929c-a25110252400 f30bdb4a-5db8-49b5-bcdc-e0c9a23172e0 a=ice-ufrag:BGKk a=ice-pwd:mqyWsAjvtKwTGnvhPztQ9mIf a=fingerprint:sha-256 19:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04: BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 a=setup:actpass - a=dtls-id:1 + a=tls-id:1 a=rtcp:10103 IN IP4 203.0.113.100 a=rtcp-mux a=rtcp-rsize a=candidate:1 1 udp 2113929471 203.0.113.100 10102 typ host a=candidate:1 2 udp 2113929470 203.0.113.100 10103 typ host a=end-of-candidates The SDP for |answer-A1| looks like: v=0 @@ -3452,45 +3535,53 @@ m=audio 10200 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 203.0.113.200 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:61317484-2ed4-49d7-9eb7-1414322a7aae 5a7b57b8-f043-4bd1-a45d-09d4dfa31226 + a=ice-ufrag:6sFv a=ice-pwd:cOTZKZNVlO9RSGsEGM63JXT2 a=fingerprint:sha-256 6B:8B:F0:65:5F:78:E2:51:3B:AC:6F:F3:3F:46:1B:35: DC:B8:5F:64:1A:24:C2:43:F0:A1:58:D0:A1:2C:19:08 a=setup:active - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-rsize a=candidate:1 1 udp 2113929471 203.0.113.200 10200 typ host a=end-of-candidates - m=video 10200 UDP/TLS/RTP/SAVPF 100 101 + m=video 10200 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 203.0.113.200 a=mid:v1 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:61317484-2ed4-49d7-9eb7-1414322a7aae 4ea4d4a1-2fda-4511-a9cc-1b32c2e59552 7.2. Detailed Example This section shows a more involved example of a session between two JSEP endpoints. Trickle ICE is used in full trickle mode, with a @@ -3607,32 +3697,34 @@ m=audio 9 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 0.0.0.0 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:57017fee-b6c1-4162-929c-a25110252400 e83006c5-a0ff-4e0a-9ed9-d3e6747be7d9 a=ice-ufrag:ATEn a=ice-pwd:AtSK0WpNtpUjkY4+86js7ZQl a=fingerprint:sha-256 29:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04: BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 a=setup:actpass - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize m=application 0 UDP/DTLS/SCTP webrtc-datachannel c=IN IP4 0.0.0.0 a=mid:d1 a=sctp-port:5000 a=max-message-size:65536 a=bundle-only @@ -3670,32 +3763,34 @@ m=audio 9 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 0.0.0.0 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:71317484-2ed4-49d7-9eb7-1414322a7aae 6a7b57b8-f043-4bd1-a45d-09d4dfa31226 a=ice-ufrag:7sFv a=ice-pwd:dOTZKZNVlO9RSGsEGM63JXT2 a=fingerprint:sha-256 7B:8B:F0:65:5F:78:E2:51:3B:AC:6F:F3:3F:46:1B:35: DC:B8:5F:64:1A:24:C2:43:F0:A1:58:D0:A1:2C:19:08 a=setup:active - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize m=application 9 UDP/DTLS/SCTP webrtc-datachannel c=IN IP4 0.0.0.0 a=mid:d1 a=sctp-port:5000 a=max-message-size:65536 @@ -3738,76 +3834,88 @@ m=audio 12200 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 192.0.2.200 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:71317484-2ed4-49d7-9eb7-1414322a7aae 6a7b57b8-f043-4bd1-a45d-09d4dfa31226 a=ice-ufrag:7sFv a=ice-pwd:dOTZKZNVlO9RSGsEGM63JXT2 a=fingerprint:sha-256 7B:8B:F0:65:5F:78:E2:51:3B:AC:6F:F3:3F:46:1B:35: DC:B8:5F:64:1A:24:C2:43:F0:A1:58:D0:A1:2C:19:08 a=setup:actpass - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize a=candidate:1 1 udp 2113929471 203.0.113.200 10200 typ host a=candidate:1 1 udp 1845494015 198.51.100.200 11200 typ srflx raddr 203.0.113.200 rport 10200 a=candidate:1 1 udp 255 192.0.2.200 12200 typ relay raddr 198.51.100.200 rport 11200 a=end-of-candidates m=application 12200 UDP/DTLS/SCTP webrtc-datachannel c=IN IP4 192.0.2.200 a=mid:d1 a=sctp-port:5000 a=max-message-size:65536 - m=video 12200 UDP/TLS/RTP/SAVPF 100 101 102 + m=video 12200 UDP/TLS/RTP/SAVPF 100 101 102 103 104 c=IN IP4 192.0.2.200 a=mid:v1 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 - a=rtpmap:102 flexfec/90000 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 + a=rtpmap:104 flexfec/90000 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:71317484-2ed4-49d7-9eb7-1414322a7aae 5ea4d4a1-2fda-4511-a9cc-1b32c2e59552 a=rid:1 send a=rid:2 send a=rid:3 send a=simulcast:send 1;2;3 - m=video 12200 UDP/TLS/RTP/SAVPF 100 101 102 + m=video 12200 UDP/TLS/RTP/SAVPF 100 101 102 103 104 c=IN IP4 192.0.2.200 a=mid:v2 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 - a=rtpmap:102 flexfec/90000 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 + a=rtpmap:104 flexfec/90000 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:81317484-2ed4-49d7-9eb7-1414322a7aae 6ea4d4a1-2fda-4511-a9cc-1b32c2e59552 The SDP for |answer-B2| is shown below. In addition to the acceptance of the video m= sections, the use of a=recvonly to indicate one-way video, and the use of a=imageattr to limit the received resolution, note the use of setup:passive to maintain the @@ -3823,70 +3931,82 @@ m=audio 12100 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 192.0.2.100 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:57017fee-b6c1-4162-929c-a25110252400 e83006c5-a0ff-4e0a-9ed9-d3e6747be7d9 a=ice-ufrag:ATEn a=ice-pwd:AtSK0WpNtpUjkY4+86js7ZQl a=fingerprint:sha-256 29:E2:1C:3B:4B:9F:81:E6:B8:5C:F4:A5:A8:D8:73:04: BB:05:2F:70:9F:04:A9:0E:05:E9:26:33:E8:70:88:A2 a=setup:passive - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize a=candidate:1 1 udp 2113929471 203.0.113.100 10100 typ host a=candidate:1 1 udp 1845494015 198.51.100.100 11100 typ srflx raddr 203.0.113.100 rport 10100 a=candidate:1 1 udp 255 192.0.2.100 12100 typ relay raddr 198.51.100.100 rport 11100 a=end-of-candidates m=application 12100 UDP/DTLS/SCTP webrtc-datachannel c=IN IP4 192.0.2.100 a=mid:d1 a=sctp-port:5000 a=max-message-size:65536 - m=video 12100 UDP/TLS/RTP/SAVPF 100 101 + m=video 12100 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 192.0.2.100 a=mid:v1 a=recvonly a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=imageattr:100 recv [x=[48:1920],y=[48:1080],q=1.0] a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli - m=video 12100 UDP/TLS/RTP/SAVPF 100 101 + m=video 12100 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 192.0.2.100 a=mid:v2 a=recvonly a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=imageattr:100 recv [x=[48:1920],y=[48:1080],q=1.0] a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli 7.3. Early Transport Warmup Example This example demonstrates the early warmup technique described in Section 4.1.8.1. Here, Alice's endpoint sends an offer to Bob's endpoint to start an audio/video call. Bob immediately responds with an answer that accepts the audio/video m= sections, but marks them as @@ -3987,50 +4108,59 @@ m=audio 9 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 0.0.0.0 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:bbce3ba6-abfc-ac63-d00a-e15b286f8fce e80098db-7159-3c06-229a-00df2a9b3dbc + a=ice-ufrag:4ZcD a=ice-pwd:ZaaG6OG7tCn4J/lehAGz+HHD a=fingerprint:sha-256 C4:68:F8:77:6A:44:F1:98:6D:7C:9F:47:EB:E3:34:A4: 0A:AA:2D:49:08:28:70:2E:1F:AE:18:7D:4E:3E:66:BF a=setup:actpass - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize - m=video 0 UDP/TLS/RTP/SAVPF 100 101 + m=video 0 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 0.0.0.0 a=mid:v1 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:bbce3ba6-abfc-ac63-d00a-e15b286f8fce ac701365-eb06-42df-cc93-7f22bc308789 a=bundle-only + |offer-C1-candidate-1| looks like: ufrag 4ZcD index 0 mid a1 attr candidate:1 1 udp 255 192.0.2.100 12100 typ relay raddr 0.0.0.0 rport 0 The SDP for |answer-C1| looks like: @@ -4044,44 +4174,51 @@ m=audio 9 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 0.0.0.0 a=mid:a1 a=sendonly a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:751f239e-4ae0-c549-aa3d-890de772998b 04b5a445-82cc-c9e8-9ffe-c24d0ef4b0ff a=ice-ufrag:TpaA a=ice-pwd:t2Ouhc67y8JcCaYZxUUTgKw/ a=fingerprint:sha-256 A2:F3:A5:6D:4C:8C:1E:B2:62:10:4A:F6:70:61:C4:FC: 3C:E0:01:D6:F3:24:80:74:DA:7C:3E:50:18:7B:CE:4D a=setup:active - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize - m=video 9 UDP/TLS/RTP/SAVPF 100 101 + m=video 9 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 0.0.0.0 a=mid:v1 a=sendonly a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:751f239e-4ae0-c549-aa3d-890de772998b 39292672-c102-d075-f580-5826f31ca958 |answer-C1-candidate-1| looks like: ufrag TpaA index 0 @@ -4101,46 +4238,54 @@ m=audio 12200 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 192.0.2.200 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:751f239e-4ae0-c549-aa3d-890de772998b 04b5a445-82cc-c9e8-9ffe-c24d0ef4b0ff a=ice-ufrag:TpaA a=ice-pwd:t2Ouhc67y8JcCaYZxUUTgKw/ a=fingerprint:sha-256 A2:F3:A5:6D:4C:8C:1E:B2:62:10:4A:F6:70:61:C4:FC: 3C:E0:01:D6:F3:24:80:74:DA:7C:3E:50:18:7B:CE:4D a=setup:actpass - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize a=candidate:1 1 udp 255 192.0.2.200 12200 typ relay raddr 0.0.0.0 rport 0 a=end-of-candidates - m=video 12200 UDP/TLS/RTP/SAVPF 100 101 + + m=video 12200 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 192.0.2.200 a=mid:v1 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:751f239e-4ae0-c549-aa3d-890de772998b 39292672-c102-d075-f580-5826f31ca958 The SDP for |answer-C2| looks like: v=0 o=- 1070771854436052752 2 IN IP4 0.0.0.0 @@ -4152,47 +4297,54 @@ m=audio 12100 UDP/TLS/RTP/SAVPF 96 0 8 97 98 c=IN IP4 192.0.2.100 a=mid:a1 a=sendrecv a=rtpmap:96 opus/48000/2 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:97 telephone-event/8000 a=rtpmap:98 telephone-event/48000 + a=fmtp:97 0-15 + a=fmtp:98 0-15 a=maxptime:120 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid a=extmap:2 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=msid:bbce3ba6-abfc-ac63-d00a-e15b286f8fce e80098db-7159-3c06-229a-00df2a9b3dbc a=ice-ufrag:4ZcD a=ice-pwd:ZaaG6OG7tCn4J/lehAGz+HHD a=fingerprint:sha-256 C4:68:F8:77:6A:44:F1:98:6D:7C:9F:47:EB:E3:34:A4: 0A:AA:2D:49:08:28:70:2E:1F:AE:18:7D:4E:3E:66:BF a=setup:passive - a=dtls-id:1 + a=tls-id:1 a=rtcp-mux a=rtcp-mux-only a=rtcp-rsize a=candidate:1 1 udp 255 192.0.2.100 12100 typ relay raddr 0.0.0.0 rport 0 a=end-of-candidates - m=video 12100 UDP/TLS/RTP/SAVPF 100 101 + m=video 12100 UDP/TLS/RTP/SAVPF 100 101 102 103 c=IN IP4 192.0.2.100 a=mid:v1 a=sendrecv a=rtpmap:100 VP8/90000 - a=rtpmap:101 rtx/90000 - a=fmtp:101 apt=100 + a=rtpmap:101 H264/90000 + a=fmtp:101 packetization-mode=1;profile-level-id=42e01f + a=rtpmap:102 rtx/90000 + a=fmtp:102 apt=100 + =rtpmap:103 rtx/90000 + a=fmtp:103 apt=101 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid + a=extmap:3 urn:ietf:params:rtp-hdrext:sdes:rtp-stream-id a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=msid:bbce3ba6-abfc-ac63-d00a-e15b286f8fce ac701365-eb06-42df-cc93-7f22bc308789 8. Security Considerations The IETF has published separate documents [I-D.ietf-rtcweb-security-arch] [I-D.ietf-rtcweb-security] describing @@ -4233,307 +4384,317 @@ Adam Bergkvist, Dan Burnett, Ben Campbell, Alissa Cooper, Richard Ejzak, Stefan Hakansson, Ted Hardie, Christer Holmberg Andrew Hutton, Randell Jesup, Matthew Kaufman, Anant Narayanan, Adam Roach, Neil Stratford, Martin Thomson, Sean Turner, and Magnus Westerlund all provided valuable feedback on this proposal. 11. References 11.1. Normative References - [I-D.ietf-avtcore-rtp-multi-stream] - Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, - "Sending Multiple RTP Streams in a Single RTP Session", - draft-ietf-avtcore-rtp-multi-stream-11 (work in progress), - December 2015. - [I-D.ietf-avtext-rid] Roach, A., Nandakumar, S., and P. Thatcher, "RTP Stream - Identifier (RID) Source Description (SDES)", draft-ietf- - avtext-rid-00 (work in progress), February 2016. + Identifier Source Description (SDES)", draft-ietf-avtext- + rid-09 (work in progress), October 2016. [I-D.ietf-ice-trickle] Ivov, E., Rescorla, E., Uberti, J., and P. Saint-Andre, "Trickle ICE: Incremental Provisioning of Candidates for the Interactive Connectivity Establishment (ICE) - Protocol". - - [I-D.ietf-mmusic-4572-update] - Holmberg, C., "Updates to RFC 4572", draft-ietf-mmusic- - 4572-update-05 (work in progress), June 2016. + Protocol", draft-ietf-ice-trickle-12 (work in progress), + June 2017. [I-D.ietf-mmusic-dtls-sdp] Holmberg, C. and R. Shpount, "Using the SDP Offer/Answer - Mechanism for DTLS", draft-ietf-mmusic-dtls-sdp-14 (work - in progress), July 2016. + Mechanism for DTLS", draft-ietf-mmusic-dtls-sdp-26 (work + in progress), June 2017. [I-D.ietf-mmusic-msid] - Alvestrand, H., "Cross Session Stream Identification in - the Session Description Protocol", draft-ietf-mmusic- - msid-01 (work in progress), August 2013. + Alvestrand, H., "WebRTC MediaStream Identification in the + Session Description Protocol", draft-ietf-mmusic-msid-16 + (work in progress), February 2017. [I-D.ietf-mmusic-mux-exclusive] Holmberg, C., "Indicating Exclusive Support of RTP/RTCP Multiplexing using SDP", draft-ietf-mmusic-mux- - exclusive-08 (work in progress), June 2016. + exclusive-12 (work in progress), May 2017. [I-D.ietf-mmusic-rid] Thatcher, P., Zanaty, M., Nandakumar, S., Burman, B., Roach, A., and B. Campen, "RTP Payload Format - Constraints", draft-ietf-mmusic-rid-04 (work in progress), - February 2016. + Restrictions", draft-ietf-mmusic-rid-10 (work in + progress), March 2017. [I-D.ietf-mmusic-sctp-sdp] - Loreto, S. and G. Camarillo, "Stream Control Transmission - Protocol (SCTP)-Based Media Transport in the Session - Description Protocol (SDP)", draft-ietf-mmusic-sctp-sdp-04 - (work in progress), June 2013. + Holmberg, C., Shpount, R., Loreto, S., and G. Camarillo, + "Session Description Protocol (SDP) Offer/Answer + Procedures For Stream Control Transmission Protocol (SCTP) + over Datagram Transport Layer Security (DTLS) Transport.", + draft-ietf-mmusic-sctp-sdp-26 (work in progress), April + 2017. [I-D.ietf-mmusic-sdp-bundle-negotiation] Holmberg, C., Alvestrand, H., and C. Jennings, - "Multiplexing Negotiation Using Session Description - Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- - bundle-negotiation-04 (work in progress), June 2013. + "Negotiating Media Multiplexing Using the Session + Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- + negotiation-38 (work in progress), April 2017. [I-D.ietf-mmusic-sdp-mux-attributes] Nandakumar, S., "A Framework for SDP Attributes when - Multiplexing", draft-ietf-mmusic-sdp-mux-attributes-01 - (work in progress), February 2014. + Multiplexing", draft-ietf-mmusic-sdp-mux-attributes-16 + (work in progress), December 2016. [I-D.ietf-mmusic-sdp-simulcast] Burman, B., Westerlund, M., Nandakumar, S., and M. Zanaty, "Using Simulcast in SDP and RTP Sessions", draft-ietf- - mmusic-sdp-simulcast-04 (work in progress), February 2016. - - [I-D.ietf-rtcweb-audio] - Valin, J. and C. Bran, "WebRTC Audio Codec and Processing - Requirements", draft-ietf-rtcweb-audio-02 (work in - progress), August 2013. + mmusic-sdp-simulcast-08 (work in progress), March 2017. [I-D.ietf-rtcweb-fec] Uberti, J., "WebRTC Forward Error Correction - Requirements", draft-ietf-rtcweb-fec-00 (work in - progress), February 2015. + Requirements", draft-ietf-rtcweb-fec-05 (work in + progress), May 2017. [I-D.ietf-rtcweb-rtp-usage] Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time Communication (WebRTC): Media Transport and Use of RTP", - draft-ietf-rtcweb-rtp-usage-09 (work in progress), - September 2013. + draft-ietf-rtcweb-rtp-usage-26 (work in progress), March + 2016. [I-D.ietf-rtcweb-security] Rescorla, E., "Security Considerations for WebRTC", draft- - ietf-rtcweb-security-06 (work in progress), January 2014. + ietf-rtcweb-security-08 (work in progress), February 2015. [I-D.ietf-rtcweb-security-arch] Rescorla, E., "WebRTC Security Architecture", draft-ietf- - rtcweb-security-arch-09 (work in progress), February 2014. - - [I-D.ietf-rtcweb-video] - Roach, A., "WebRTC Video Processing and Codec - Requirements", draft-ietf-rtcweb-video-00 (work in - progress), July 2014. + rtcweb-security-arch-12 (work in progress), June 2016. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate - Requirement Levels", BCP 14, RFC 2119, March 1997. + Requirement Levels", BCP 14, RFC 2119, + DOI 10.17487/RFC2119, March 1997, + . [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, - June 2002. + DOI 10.17487/RFC3261, June 2002, + . [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model - with Session Description Protocol (SDP)", RFC 3264, June - 2002. + with Session Description Protocol (SDP)", RFC 3264, + DOI 10.17487/RFC3264, June 2002, + . [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC - Text on Security Considerations", BCP 72, RFC 3552, July - 2003. + Text on Security Considerations", BCP 72, RFC 3552, + DOI 10.17487/RFC3552, July 2003, + . [RFC3605] Huitema, C., "Real Time Control Protocol (RTCP) attribute - in Session Description Protocol (SDP)", RFC 3605, October - 2003. + in Session Description Protocol (SDP)", RFC 3605, + DOI 10.17487/RFC3605, October 2003, + . + + [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. + Norrman, "The Secure Real-time Transport Protocol (SRTP)", + RFC 3711, DOI 10.17487/RFC3711, March 2004, + . [RFC3890] Westerlund, M., "A Transport Independent Bandwidth - Modifier for the Session Description Protocol (SDP)", RFC - 3890, DOI 10.17487/RFC3890, September 2004, + Modifier for the Session Description Protocol (SDP)", + RFC 3890, DOI 10.17487/RFC3890, September 2004, . [RFC4145] Yon, D. and G. Camarillo, "TCP-Based Media Transport in the Session Description Protocol (SDP)", RFC 4145, - September 2005. + DOI 10.17487/RFC4145, September 2005, + . [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session - Description Protocol", RFC 4566, July 2006. - - [RFC4572] Lennox, J., "Connection-Oriented Media Transport over the - Transport Layer Security (TLS) Protocol in the Session - Description Protocol (SDP)", RFC 4572, July 2006. + Description Protocol", RFC 4566, DOI 10.17487/RFC4566, + July 2006, . [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, "Extended RTP Profile for Real-time Transport Control - Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July - 2006. + Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, + DOI 10.17487/RFC4585, July 2006, + . + + [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for + Real-time Transport Control Protocol (RTCP)-Based Feedback + (RTP/SAVPF)", RFC 5124, DOI 10.17487/RFC5124, February + 2008, . [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) - Traversal for Offer/Answer Protocols", RFC 5245, April - 2010. + Traversal for Offer/Answer Protocols", RFC 5245, + DOI 10.17487/RFC5245, April 2010, + . [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP - Header Extensions", RFC 5285, July 2008. + Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July + 2008, . [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and - Control Packets on a Single Port", RFC 5761, April 2010. + Control Packets on a Single Port", RFC 5761, + DOI 10.17487/RFC5761, April 2010, + . [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description - Protocol (SDP) Grouping Framework", RFC 5888, June 2010. + Protocol (SDP) Grouping Framework", RFC 5888, + DOI 10.17487/RFC5888, June 2010, + . [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image - Attributes in the Session Description Protocol (SDP)", RFC - 6236, May 2011. + Attributes in the Session Description Protocol (SDP)", + RFC 6236, DOI 10.17487/RFC6236, May 2011, + . [RFC6347] Rescorla, E. and N. Modadugu, "Datagram Transport Layer - Security Version 1.2", RFC 6347, January 2012. + Security Version 1.2", RFC 6347, DOI 10.17487/RFC6347, + January 2012, . [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the Opus Audio Codec", RFC 6716, DOI 10.17487/RFC6716, September 2012, . [RFC6904] Lennox, J., "Encryption of Header Extensions in the Secure - Real-time Transport Protocol (SRTP)", RFC 6904, April - 2013. + Real-time Transport Protocol (SRTP)", RFC 6904, + DOI 10.17487/RFC6904, April 2013, + . [RFC7160] Petit-Huguenin, M. and G. Zorn, Ed., "Support for Multiple - Clock Rates in an RTP Session", RFC 7160, DOI 10.17487/ - RFC7160, April 2014, + Clock Rates in an RTP Session", RFC 7160, + DOI 10.17487/RFC7160, April 2014, . [RFC7587] Spittka, J., Vos, K., and JM. Valin, "RTP Payload Format - for the Opus Speech and Audio Codec", RFC 7587, DOI - 10.17487/RFC7587, June 2015, + for the Opus Speech and Audio Codec", RFC 7587, + DOI 10.17487/RFC7587, June 2015, . + [RFC7742] Roach, A., "WebRTC Video Processing and Codec + Requirements", RFC 7742, DOI 10.17487/RFC7742, March 2016, + . + [RFC7850] Nandakumar, S., "Registering Values of the SDP 'proto' Field for Transporting RTP Media over TCP under Various RTP Profiles", RFC 7850, DOI 10.17487/RFC7850, April 2016, . - [RFC7941] Westerlund, M., Burman, B., Even, R., and M. Zanaty, "RTP - Header Extension for the RTP Control Protocol (RTCP) - Source Description Items", RFC 7941, DOI 10.17487/RFC7941, - August 2016, . + [RFC7874] Valin, JM. and C. Bran, "WebRTC Audio Codec and Processing + Requirements", RFC 7874, DOI 10.17487/RFC7874, May 2016, + . + + [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, + "Sending Multiple RTP Streams in a Single RTP Session", + RFC 8108, DOI 10.17487/RFC8108, March 2017, + . + + [RFC8122] Lennox, J. and C. Holmberg, "Connection-Oriented Media + Transport over the Transport Layer Security (TLS) Protocol + in the Session Description Protocol (SDP)", RFC 8122, + DOI 10.17487/RFC8122, March 2017, + . 11.2. Informative References - [I-D.ietf-avtext-lrr] - Lennox, J., Hong, D., Uberti, J., Homer, S., and M. - Flodman, "The Layer Refresh Request (LRR) RTCP Feedback - Message", draft-ietf-avtext-lrr-03 (work in progress), - July 2016. + [I-D.ietf-mmusic-trickle-ice-sip] + Ivov, E., Stach, T., Marocco, E., and C. Holmberg, "A + Session Initiation Protocol (SIP) usage for Trickle ICE", + draft-ietf-mmusic-trickle-ice-sip-07 (work in progress), + March 2017. [I-D.ietf-rtcweb-ip-handling] Uberti, J. and G. Shieh, "WebRTC IP Address Handling - Recommendations", draft-ietf-rtcweb-ip-handling-01 (work - in progress), March 2016. + Requirements", draft-ietf-rtcweb-ip-handling-03 (work in + progress), January 2017. - [I-D.nandakumar-rtcweb-sdp] - Nandakumar, S. and C. Jennings, "SDP for the WebRTC", - draft-nandakumar-rtcweb-sdp-02 (work in progress), July - 2013. + [I-D.ietf-rtcweb-sdp] + Nandakumar, S. and C. Jennings, "Annotated Example SDP for + WebRTC", draft-ietf-rtcweb-sdp-06 (work in progress), + April 2017. [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for - Comfort Noise (CN)", RFC 3389, September 2002. - - [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. - Jacobson, "RTP: A Transport Protocol for Real-Time - Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, - July 2003, . + Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, + September 2002, . [RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth - Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC - 3556, July 2003. - - [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control - Protocol Extended Reports (RTCP XR)", RFC 3611, DOI - 10.17487/RFC3611, November 2003, - . + Modifiers for RTP Control Protocol (RTCP) Bandwidth", + RFC 3556, DOI 10.17487/RFC3556, July 2003, + . [RFC3960] Camarillo, G. and H. Schulzrinne, "Early Media and Ringing Tone Generation in the Session Initiation Protocol (SIP)", - RFC 3960, December 2004. + RFC 3960, DOI 10.17487/RFC3960, December 2004, + . [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session Description Protocol (SDP) Security Descriptions for Media - Streams", RFC 4568, July 2006. + Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006, + . [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. Hakenberg, "RTP Retransmission Payload Format", RFC 4588, - July 2006. + DOI 10.17487/RFC4588, July 2006, + . [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF Digits, Telephony Tones, and Telephony Signals", RFC 4733, DOI 10.17487/RFC4733, December 2006, . - [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, - "Codec Control Messages in the RTP Audio-Visual Profile - with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, - February 2008, . - [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities - and Consequences", RFC 5506, April 2009. + and Consequences", RFC 5506, DOI 10.17487/RFC5506, April + 2009, . [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific Media Attributes in the Session Description Protocol - (SDP)", RFC 5576, June 2009. + (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, + . [RFC5763] Fischl, J., Tschofenig, H., and E. Rescorla, "Framework for Establishing a Secure Real-time Transport Protocol (SRTP) Security Context Using Datagram Transport Layer - Security (DTLS)", RFC 5763, May 2010. + Security (DTLS)", RFC 5763, DOI 10.17487/RFC5763, May + 2010, . [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure - Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. + Real-time Transport Protocol (SRTP)", RFC 5764, + DOI 10.17487/RFC5764, May 2010, + . [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time Transport Protocol (RTP) Header Extension for Client-to- - Mixer Audio Level Indication", RFC 6464, DOI 10.17487/ - RFC6464, December 2011, + Mixer Audio Level Indication", RFC 6464, + DOI 10.17487/RFC6464, December 2011, . [RFC6544] Rosenberg, J., Keranen, A., Lowekamp, B., and A. Roach, "TCP Candidates with Interactive Connectivity Establishment (ICE)", RFC 6544, DOI 10.17487/RFC6544, March 2012, . - [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and - B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms - for Real-Time Transport Protocol (RTP) Sources", RFC 7656, - DOI 10.17487/RFC7656, November 2015, - . - [TS26.114] 3GPP TS 26.114 V12.8.0, "3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; IP Multimedia Subsystem (IMS); Multimedia Telephony; Media handling and interaction (Release 12)", December 2014, . - [W3C.WD-webrtc-20140617] - Bergkvist, A., Burnett, D., Narayanan, A., and C. - Jennings, "WebRTC 1.0: Real-time Communication Between - Browsers", World Wide Web Consortium WD WD-webrtc- - 20140617, June 2014, - . + [W3C.webrtc] + Bergkvist, A., Burnett, D., Jennings, C., Narayanan, A., + Aboba, B., and T. Brandstetter, "WebRTC 1.0: Real-time + Communication Between Browsers", World Wide Web Consortium + WD WD-webrtc-20170515, May 2017, + . Appendix A. Appendix A For the syntax validation performed in Section 5.7, the following list of ABNF definitions is used: +------------------------+------------------------------------------+ | Attribute | Reference | +------------------------+------------------------------------------+ | ptime | [RFC4566] Section 9 | @@ -4542,47 +4703,88 @@ | recvonly | [RFC4566] Section 9 | | sendrecv | [RFC4566] Section 9 | | sendonly | [RFC4566] Section 9 | | inactive | [RFC4566] Section 9 | | framerate | [RFC4566] Section 9 | | fmtp | [RFC4566] Section 9 | | quality | [RFC4566] Section 9 | | rtcp | [RFC3605] Section 2.1 | | setup | [RFC4145] Sections 3, 4, and 5 | | connection | [RFC4145] Sections 3, 4, and 5 | - | fingerprint | [RFC4572] Section 5 | + | fingerprint | [RFC8122] Section 5 | | rtcp-fb | [RFC4585] Section 4.2 | | candidate | [RFC5245] Section 15.1 | | remote-candidates | [RFC5245] Section 15.2 | | ice-lite | [RFC5245] Section 15.3 | | ice-ufrag | [RFC5245] Section 15.4 | | ice-pwd | [RFC5245] Section 15.4 | | ice-options | [RFC5245] Section 15.5 | | extmap | [RFC5285] Section 7 | - | mid | [RFC5888] Section 4 and 5 | - | group | [RFC5888] Section 4 and 5 | + | mid | [RFC5888] Sections 4 and 5 | + | group | [RFC5888] Sections 4 and 5 | | imageattr | [RFC6236] Section 3.1 | | extmap (encrypt | [RFC6904] Section 4 | | option) | | | msid | [I-D.ietf-mmusic-msid] Section 2 | | rid | [I-D.ietf-mmusic-rid] Section 10 | | simulcast | [I-D.ietf-mmusic-sdp-simulcast] Section | | | 6.1 | - | dtls-id | [I-D.ietf-mmusic-dtls-sdp] Section 4 | + | tls-id | [I-D.ietf-mmusic-dtls-sdp] Section 4 | +------------------------+------------------------------------------+ Table 1: SDP ABNF References Appendix B. Change log Note: This section will be removed by RFC Editor before publication. + Changes in draft-21: + + o Change dtls-id to tls-id to match MMUSIC draft. + + o Replace regular expression for proto field with a list and clarify + that the answer must exactly match the offer. + + o Remove text about how to error check on setLocal because local + descriptions cannot be changed. + + o Rework silence suppression support to always require that both + sides agree to silence suppression or none is used. + + o Remove instructions to parse "a=ssrc-group". + + o Allow the addition of new codecs in answers and in subsequent + offers. + + o Clarify imageattr processing. Replace use of [x=0,y=0] with + direction indicators. + + o Document when early media can occur. + + o Fix ICE default port handling when bundle-only is used. + + o Forbid duplicating IDENTICAL/TRANSPORT attributes when you are + bundling. + + o Clarify the number of components to gather when bundle is + involved. + + o Explicitly state that PTs and SSRCs are to be used for demuxing. + + o Update guidance on "a=setup" line. This should now match the + MMUSIC draft. + + o Update guidance on certificate/digest matching to conform to + RFC8122. + + o Update examples. + Changes in draft-20: o Remove Appendix-B. Changes in draft-19: o Examples are now machine-generated for correctness, and use IETF- approved example IP addresses. o Add early transport warmup example, and add missing attributes to