draft-ietf-rtcweb-rtp-usage-07.txt | draft-ietf-rtcweb-rtp-usage-08.txt | |||
---|---|---|---|---|
RTCWEB Working Group C. S. Perkins | RTCWEB Working Group C. S. Perkins | |||
Internet-Draft University of Glasgow | Internet-Draft University of Glasgow | |||
Intended status: Standards Track M. Westerlund | Intended status: Standards Track M. Westerlund | |||
Expires: January 16, 2014 Ericsson | Expires: March 05, 2014 Ericsson | |||
J. Ott | J. Ott | |||
Aalto University | Aalto University | |||
July 15, 2013 | September 01, 2013 | |||
Web Real-Time Communication (WebRTC): Media Transport and Use of RTP | Web Real-Time Communication (WebRTC): Media Transport and Use of RTP | |||
draft-ietf-rtcweb-rtp-usage-07 | draft-ietf-rtcweb-rtp-usage-08 | |||
Abstract | Abstract | |||
The Web Real-Time Communication (WebRTC) framework provides support | The Web Real-Time Communication (WebRTC) framework provides support | |||
for direct interactive rich communication using audio, video, text, | for direct interactive rich communication using audio, video, text, | |||
collaboration, games, etc. between two peers' web-browsers. This | collaboration, games, etc. between two peers' web-browsers. This | |||
memo describes the media transport aspects of the WebRTC framework. | memo describes the media transport aspects of the WebRTC framework. | |||
It specifies how the Real-time Transport Protocol (RTP) is used in | It specifies how the Real-time Transport Protocol (RTP) is used in | |||
the WebRTC context, and gives requirements for which RTP features, | the WebRTC context, and gives requirements for which RTP features, | |||
profiles, and extensions need to be supported. | profiles, and extensions need to be supported. | |||
skipping to change at page 1, line 39 | skipping to change at page 1, line 39 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 16, 2014. | This Internet-Draft will expire on March 05, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 5 | 4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 5 | |||
4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . 6 | 4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . 5 | |||
4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 7 | 4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 6 | |||
4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 7 | 4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 7 | |||
4.4. Use of RTP Sessions . . . . . . . . . . . . . . . . . . . 9 | 4.4. Use of RTP Sessions . . . . . . . . . . . . . . . . . . . 8 | |||
4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 9 | 4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 9 | |||
4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 | 4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 | |||
4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . 10 | 4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . 10 | |||
4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 10 | 4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 10 | |||
4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 | 4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 | |||
5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 12 | 5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 11 | |||
5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 12 | 5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 12 | |||
5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . 13 | 5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . 13 | |||
5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 13 | 5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 13 | |||
5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 | 5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 | |||
5.1.4. Reference Picture Selection Indication (RPSI) . . . . 13 | 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 13 | |||
5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 14 | 5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 14 | |||
5.1.6. Temporary Maximum Media Stream Bit Rate Request | 5.1.6. Temporary Maximum Media Stream Bit Rate Request | |||
(TMMBR) . . . . . . . . . . . . . . . . . . . . . . . 14 | (TMMBR) . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 | 5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 | |||
5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 15 | 5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 14 | |||
5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 15 | 5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 15 | |||
5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 | 5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 | |||
5.2.4. Associating RTP Media Streams and Signalling Contexts 15 | ||||
6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 16 | 6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 16 | |||
6.1. Negative Acknowledgements and RTP Retransmission . . . . 16 | 6.1. Negative Acknowledgements and RTP Retransmission . . . . 16 | |||
6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . 17 | 6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . 17 | |||
7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . 17 | 7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . 17 | |||
7.1. Boundary Conditions and Circuit Breakers . . . . . . . . 18 | 7.1. Boundary Conditions and Circuit Breakers . . . . . . . . 18 | |||
7.2. RTCP Limitations for Congestion Control . . . . . . . . . 19 | 7.2. RTCP Limitations for Congestion Control . . . . . . . . . 19 | |||
7.3. Congestion Control Interoperability and Legacy Systems . 19 | 7.3. Congestion Control Interoperability and Legacy Systems . 19 | |||
8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 20 | 8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 20 | |||
9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . 21 | 9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . 21 | |||
10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 | 10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 | |||
11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 23 | 11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 23 | |||
12. RTP Implementation Considerations . . . . . . . . . . . . . . 23 | 12. RTP Implementation Considerations . . . . . . . . . . . . . . 23 | |||
12.1. RTP Sessions and PeerConnections . . . . . . . . . . . . 24 | 12.1. Configuration and Use of RTP Sessions . . . . . . . . . 24 | |||
12.2. Multiple Sources . . . . . . . . . . . . . . . . . . . . 25 | 12.1.1. Use of Multiple Media Flows Within an RTP Session . 24 | |||
12.3. Multiparty . . . . . . . . . . . . . . . . . . . . . . . 25 | 12.1.2. Use of Multiple RTP Sessions . . . . . . . . . . . . 25 | |||
12.4. SSRC Collision Detection . . . . . . . . . . . . . . . . 27 | 12.1.3. Differentiated Treatment of Flows . . . . . . . . . 30 | |||
12.5. Contributing Sources and the CSRC List . . . . . . . . . 28 | 12.2. Source, Flow, and Participant Identification . . . . . . 31 | |||
12.6. Media Synchronization . . . . . . . . . . . . . . . . . 28 | 12.2.1. Media Streams . . . . . . . . . . . . . . . . . . . 31 | |||
12.7. Multiple RTP End-points . . . . . . . . . . . . . . . . 29 | 12.2.2. Media Streams: SSRC Collision Detection . . . . . . 32 | |||
12.8. Simulcast . . . . . . . . . . . . . . . . . . . . . . . 30 | 12.2.3. Media Synchronisation Context . . . . . . . . . . . 33 | |||
12.9. Differentiated Treatment of Flows . . . . . . . . . . . 30 | 12.2.4. Correlation of Media Streams . . . . . . . . . . . . 34 | |||
13. Security Considerations . . . . . . . . . . . . . . . . . . . 32 | 13. Security Considerations . . . . . . . . . . . . . . . . . . . 34 | |||
14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 | 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 | |||
15. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 33 | 15. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 33 | 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 35 | |||
17. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 | 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
17.1. Normative References . . . . . . . . . . . . . . . . . . 34 | 17.1. Normative References . . . . . . . . . . . . . . . . . . 35 | |||
17.2. Informative References . . . . . . . . . . . . . . . . . 37 | 17.2. Informative References . . . . . . . . . . . . . . . . . 38 | |||
Appendix A. Supported RTP Topologies . . . . . . . . . . . . . . 38 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
A.1. Point to Point . . . . . . . . . . . . . . . . . . . . . 39 | ||||
A.2. Multi-Unicast (Mesh) . . . . . . . . . . . . . . . . . . 41 | ||||
A.3. Mixer Based . . . . . . . . . . . . . . . . . . . . . . . 44 | ||||
A.3.1. Media Mixing . . . . . . . . . . . . . . . . . . . . 45 | ||||
A.3.2. Media Switching . . . . . . . . . . . . . . . . . . . 47 | ||||
A.3.3. Media Projecting . . . . . . . . . . . . . . . . . . 50 | ||||
A.4. Translator Based . . . . . . . . . . . . . . . . . . . . 52 | ||||
A.4.1. Transcoder . . . . . . . . . . . . . . . . . . . . . 52 | ||||
A.4.2. Gateway / Protocol Translator . . . . . . . . . . . . 53 | ||||
A.4.3. Relay . . . . . . . . . . . . . . . . . . . . . . . . 55 | ||||
A.5. End-point Forwarding . . . . . . . . . . . . . . . . . . 58 | ||||
A.6. Simulcast . . . . . . . . . . . . . . . . . . . . . . . . 60 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 60 | ||||
1. Introduction | 1. Introduction | |||
The Real-time Transport Protocol (RTP) [RFC3550] provides a framework | The Real-time Transport Protocol (RTP) [RFC3550] provides a framework | |||
for delivery of audio and video teleconferencing data and other real- | for delivery of audio and video teleconferencing data and other real- | |||
time media applications. Previous work has defined the RTP protocol, | time media applications. Previous work has defined the RTP protocol, | |||
along with numerous profiles, payload formats, and other extensions. | along with numerous profiles, payload formats, and other extensions. | |||
When combined with appropriate signalling, these form the basis for | When combined with appropriate signalling, these form the basis for | |||
many teleconferencing systems. | many teleconferencing systems. | |||
The Web Real-Time communication (WebRTC) framework provides the | The Web Real-Time communication (WebRTC) framework provides the | |||
protocol building blocks to support direct, interactive, real-time | protocol building blocks to support direct, interactive, real-time | |||
communication using audio, video, collaboration, games, etc., between | communication using audio, video, collaboration, games, etc., between | |||
two peers' web-browsers. This memo describes how the RTP framework | two peers' web-browsers. This memo describes how the RTP framework | |||
is to be used in the WebRTC context. It proposes a baseline set of | is to be used in the WebRTC context. It proposes a baseline set of | |||
RTP features that are to be implemented by all WebRTC-aware end- | RTP features that are to be implemented by all WebRTC-aware end- | |||
points, along with suggested extensions for enhanced functionality. | points, along with suggested extensions for enhanced functionality. | |||
The WebRTC overview [I-D.ietf-rtcweb-overview] outlines the complete | This memo specifies a protocol intended for use within the WebRTC | |||
WebRTC framework, of which this memo is a part. | framework, but is not restricted to that context. An overview of the | |||
WebRTC framework is given in [I-D.ietf-rtcweb-overview]. | ||||
The structure of this memo is as follows. Section 2 outlines our | The structure of this memo is as follows. Section 2 outlines our | |||
rationale in preparing this memo and choosing these RTP features. | rationale in preparing this memo and choosing these RTP features. | |||
Section 3 defines terminology. Requirements for core RTP protocols | Section 3 defines terminology. Requirements for core RTP protocols | |||
are described in Section 4 and suggested RTP extensions are described | are described in Section 4 and suggested RTP extensions are described | |||
in Section 5. Section 6 outlines mechanisms that can increase | in Section 5. Section 6 outlines mechanisms that can increase | |||
robustness to network problems, while Section 7 describes congestion | robustness to network problems, while Section 7 describes congestion | |||
control and rate adaptation mechanisms. The discussion of mandated | control and rate adaptation mechanisms. The discussion of mandated | |||
RTP mechanisms concludes in Section 8 with a review of performance | RTP mechanisms concludes in Section 8 with a review of performance | |||
monitoring and network management tools that can be used in the | monitoring and network management tools that can be used in the | |||
skipping to change at page 5, line 43 | skipping to change at page 5, line 26 | |||
directly in RTP and RTCP packets, or as a contributing source | directly in RTP and RTCP packets, or as a contributing source | |||
(CSRC) in RTP packets from a mixer. The RTP Session scope is | (CSRC) in RTP packets from a mixer. The RTP Session scope is | |||
hence decided by the endpoints' network interconnection topology, | hence decided by the endpoints' network interconnection topology, | |||
in combination with RTP and RTCP forwarding strategies deployed by | in combination with RTP and RTCP forwarding strategies deployed by | |||
endpoints and any interconnecting middle nodes. | endpoints and any interconnecting middle nodes. | |||
WebRTC MediaStream: The MediaStream concept defined by the W3C in | WebRTC MediaStream: The MediaStream concept defined by the W3C in | |||
the API. | the API. | |||
Other terms are used according to their definitions from the RTP | Other terms are used according to their definitions from the RTP | |||
Specification [RFC3550] and WebRTC overview | Specification [RFC3550]. | |||
[I-D.ietf-rtcweb-overview] documents. | ||||
4. WebRTC Use of RTP: Core Protocols | 4. WebRTC Use of RTP: Core Protocols | |||
The following sections describe the core features of RTP and RTCP | The following sections describe the core features of RTP and RTCP | |||
that need to be implemented, along with the mandated RTP profiles and | that need to be implemented, along with the mandated RTP profiles and | |||
payload formats. Also described are the core extensions providing | payload formats. Also described are the core extensions providing | |||
essential features that all WebRTC implementations need to implement | essential features that all WebRTC implementations need to implement | |||
to function effectively on today's networks. | to function effectively on today's networks. | |||
4.1. RTP and RTCP | 4.1. RTP and RTCP | |||
skipping to change at page 6, line 24 | skipping to change at page 6, line 12 | |||
functionality implementations of RTP, but are REQUIRED in all WebRTC | functionality implementations of RTP, but are REQUIRED in all WebRTC | |||
implementations: | implementations: | |||
o Support for use of multiple simultaneous SSRC values in a single | o Support for use of multiple simultaneous SSRC values in a single | |||
RTP session, including support for RTP end-points that send many | RTP session, including support for RTP end-points that send many | |||
SSRC values simultaneously, following [RFC3550] and | SSRC values simultaneously, following [RFC3550] and | |||
[I-D.ietf-avtcore-rtp-multi-stream]. Support for the RTCP | [I-D.ietf-avtcore-rtp-multi-stream]. Support for the RTCP | |||
optimisations for multi-SSRC sessions defined in | optimisations for multi-SSRC sessions defined in | |||
[I-D.ietf-avtcore-rtp-multi-stream-optimisation] is RECOMMENDED. | [I-D.ietf-avtcore-rtp-multi-stream-optimisation] is RECOMMENDED. | |||
* (tbd: is draft-westerlund-mmusic-max-ssrc-01 needed?) | * (tbd: do endpoints need to signal the maximum number of SSRCs | |||
that they support (e.g., draft-westerlund-mmusic-max-ssrc-01) | ||||
and/or some constraint on the maximum number of simultaneous | ||||
streams of various kinds that can be decoded?) | ||||
o Random choice of SSRC on joining a session; collision detection | o Random choice of SSRC on joining a session; collision detection | |||
and resolution for SSRC values (see also Section 4.8). | and resolution for SSRC values (see also Section 4.8). | |||
o Support for reception of RTP data packets containing CSRC lists, | o Support for reception of RTP data packets containing CSRC lists, | |||
as generated by RTP mixers, and RTCP packets relating to CSRCs. | as generated by RTP mixers, and RTCP packets relating to CSRCs. | |||
o Support for sending correct synchronization information in the | o Support for sending correct synchronization information in the | |||
RTCP Sender Reports, to allow a receiver to implement lip-sync, | RTCP Sender Reports, to allow a receiver to implement lip-sync, | |||
with RECOMMENDED support for the rapid RTP synchronisation | with RECOMMENDED support for the rapid RTP synchronisation | |||
skipping to change at page 8, line 7 | skipping to change at page 7, line 47 | |||
Implementations MUST support DTLS-SRTP [RFC5764] for key-management. | Implementations MUST support DTLS-SRTP [RFC5764] for key-management. | |||
Other key management schemes MAY be supported. | Other key management schemes MAY be supported. | |||
4.3. Choice of RTP Payload Formats | 4.3. Choice of RTP Payload Formats | |||
The set of mandatory to implement codecs and RTP payload formats for | The set of mandatory to implement codecs and RTP payload formats for | |||
WebRTC is not specified in this memo. Implementations can support | WebRTC is not specified in this memo. Implementations can support | |||
any codec for which an RTP payload format and associated signalling | any codec for which an RTP payload format and associated signalling | |||
is defined. Implementation cannot assume that the other participants | is defined. Implementation cannot assume that the other participants | |||
in an RTP session understand any RTP payload format, no matter how | in an RTP session understand any RTP payload format, no matter how | |||
common; support for all RTP payload formats MUST be negotiated before | common; the mapping between RTP payload type numbers and specific | |||
they are used. | configurations of particular RTP payload formats MUST be agreed | |||
before those payload types/formats can be used. In an SDP context, | ||||
this can be done using the "a=rtpmap:" and "a=fmtp:" attributes | ||||
associated with an "m=" line. | ||||
Endpoints can signal support for multiple RTP payload formats, or | Endpoints can signal support for multiple RTP payload formats, or | |||
multiple configurations of a single RTP payload format, as long as | multiple configurations of a single RTP payload format, as long as | |||
each unique RTP payload format configuration uses a different RTP | each unique RTP payload format configuration uses a different RTP | |||
payload type number. As outlined in Section 4.8, the RTP payload | payload type number. As outlined in Section 4.8, the RTP payload | |||
type number is sometimes used to associate an RTP media stream with a | type number is sometimes used to associate an RTP media stream with a | |||
signalling context. This association is possible provided unique RTP | signalling context. This association is possible provided unique RTP | |||
payload type numbers are used in each context. For example, an RTP | payload type numbers are used in each context. For example, an RTP | |||
media stream can be associated with an SDP "m=" line by comparing the | media stream can be associated with an SDP "m=" line by comparing the | |||
RTP payload type numbers used by the media stream with payload types | RTP payload type numbers used by the media stream with payload types | |||
skipping to change at page 9, line 24 | skipping to change at page 9, line 16 | |||
this way, separating each session using different transport-layer | this way, separating each session using different transport-layer | |||
addresses (e.g., different UDP ports) for compatibility with legacy | addresses (e.g., different UDP ports) for compatibility with legacy | |||
systems. | systems. | |||
In modern day networks, however, with the widespread use of network | In modern day networks, however, with the widespread use of network | |||
address/port translators (NAT/NAPT) and firewalls, it is desirable to | address/port translators (NAT/NAPT) and firewalls, it is desirable to | |||
reduce the number of transport-layer flows used by RTP applications. | reduce the number of transport-layer flows used by RTP applications. | |||
This can be done by sending all the RTP media streams in a single RTP | This can be done by sending all the RTP media streams in a single RTP | |||
session, which will comprise a single transport-layer flow (this will | session, which will comprise a single transport-layer flow (this will | |||
prevent the use of some quality-of-service mechanisms, as discussed | prevent the use of some quality-of-service mechanisms, as discussed | |||
in Section 12.9). Implementations are REQUIRED to support transport | in Section 12.1.3). Implementations are REQUIRED to support | |||
of all RTP media streams, independent of media type, in a single RTP | transport of all RTP media streams, independent of media type, in a | |||
session according to [I-D.ietf-avtcore-multi-media-rtp-session]. If | single RTP session according to | |||
such RTP session set-up is to be used, this MUST be negotiated during | [I-D.ietf-avtcore-multi-media-rtp-session]. If multiple types of | |||
the signalling phase [I-D.ietf-mmusic-sdp-bundle-negotiation]. | media are to be used in a single RTP session, all participants in | |||
that session MUST agree to this usage. In an SDP context, | ||||
[I-D.ietf-mmusic-sdp-bundle-negotiation] can be used to signal this. | ||||
It is also possible to use a shim-based approach to run multiple RTP | It is also possible to use a shim-based approach to run multiple RTP | |||
sessions on a single transport-layer flow. This gives advantages in | sessions on a single transport-layer flow. This gives advantages in | |||
some gateway scenarios, and makes it easy to distinguish groups of | some gateway scenarios, and makes it easy to distinguish groups of | |||
RTP media streams that might need distinct processing. One way of | RTP media streams that might need distinct processing. One way of | |||
doing this is described in | doing this is described in | |||
[I-D.westerlund-avtcore-transport-multiplexing]. At the time of this | [I-D.westerlund-avtcore-transport-multiplexing]. At the time of this | |||
writing, there is no consensus to use a shim-based approach in WebRTC | writing, there is no consensus to use a shim-based approach in WebRTC | |||
implementations. | implementations. | |||
skipping to change at page 12, line 23 | skipping to change at page 12, line 18 | |||
5.1. Conferencing Extensions | 5.1. Conferencing Extensions | |||
RTP is inherently a group communication protocol. Groups can be | RTP is inherently a group communication protocol. Groups can be | |||
implemented using a centralised server, multi-unicast, or using IP | implemented using a centralised server, multi-unicast, or using IP | |||
multicast. While IP multicast is popular in IPTV systems, overlay- | multicast. While IP multicast is popular in IPTV systems, overlay- | |||
based topologies dominate in interactive conferencing environments. | based topologies dominate in interactive conferencing environments. | |||
Such overlay-based topologies typically use one or more central | Such overlay-based topologies typically use one or more central | |||
servers to connect end-points in a star or flat tree topology. These | servers to connect end-points in a star or flat tree topology. These | |||
central servers can be implemented in a number of ways as discussed | central servers can be implemented in a number of ways as discussed | |||
in Appendix A, and in the memo on RTP Topologies | in the memo on RTP Topologies | |||
[I-D.westerlund-avtcore-rtp-topologies-update]. | [I-D.ietf-avtcore-rtp-topologies-update]. | |||
Not all of the possible the overlay-based topologies are suitable for | Not all of the possible the overlay-based topologies are suitable for | |||
use in the WebRTC environment. Specifically: | use in the WebRTC environment. Specifically: | |||
o The use of video switching MCUs makes the use of RTCP for | o The use of video switching MCUs makes the use of RTCP for | |||
congestion control and quality of service reports problematic (see | congestion control and quality of service reports problematic (see | |||
Section 3.7 of [I-D.westerlund-avtcore-rtp-topologies-update]). | Section 3.6.2 of [I-D.ietf-avtcore-rtp-topologies-update]). | |||
o The use of content modifying MCUs with RTCP termination breaks RTP | o The use of content modifying MCUs with RTCP termination breaks RTP | |||
loop detection, and prevents receivers from identifying active | loop detection, and prevents receivers from identifying active | |||
senders (see section 3.8 of | senders (see section 3.8 of | |||
[I-D.westerlund-avtcore-rtp-topologies-update]). | [I-D.ietf-avtcore-rtp-topologies-update]). | |||
o RTP Transport Translators (Topo-Translator) are not of immediate | ||||
interest to WebRTC, although the main difference compared to point | ||||
to point is the possibility of seeing multiple different transport | ||||
paths in any RTCP feedback. | ||||
Accordingly, only Point to Point (Topo-Point-to-Point), Multiple | Accordingly, only Point to Point (Topo-Point-to-Point), Multiple | |||
concurrent Point to Point (Mesh) and RTP Mixers (Topo-Mixer) | concurrent Point to Point (Mesh) and RTP Mixers (Topo-Mixer) | |||
topologies are needed to achieve the use-cases to be supported in | topologies are needed to achieve the use-cases to be supported in | |||
WebRTC initially. These RECOMMENDED topologies are expected to be | WebRTC initially. These RECOMMENDED topologies are expected to be | |||
supported by all WebRTC end-points (these topologies require no | supported by all WebRTC end-points (these topologies require no | |||
special RTP-layer support in the end-point if the RTP features | special RTP-layer support in the end-point if the RTP features | |||
mandated in this memo are implemented). | mandated in this memo are implemented). | |||
The RTP extensions described in Section 5.1.1 to Section 5.1.6 are | The RTP extensions described in Section 5.1.1 to Section 5.1.6 are | |||
skipping to change at page 16, line 5 | skipping to change at page 15, line 42 | |||
the client with the audio level of the different sources mixed into a | the client with the audio level of the different sources mixed into a | |||
common mix by a RTP mixer. This enables a user interface to indicate | common mix by a RTP mixer. This enables a user interface to indicate | |||
the relative activity level of each session participant, rather than | the relative activity level of each session participant, rather than | |||
just being included or not based on the CSRC field. This is a pure | just being included or not based on the CSRC field. This is a pure | |||
optimisations of non critical functions, and is hence OPTIONAL to | optimisations of non critical functions, and is hence OPTIONAL to | |||
implement. If it is implemented, it is REQUIRED that the header | implement. If it is implemented, it is REQUIRED that the header | |||
extensions are encrypted according to | extensions are encrypted according to | |||
[I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | [I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information | |||
contained in these header extensions can be considered sensitive. | contained in these header extensions can be considered sensitive. | |||
5.2.4. Associating RTP Media Streams and Signalling Contexts | ||||
(tbd: it seems likely that we need a mechanism to associate RTP media | ||||
streams with signalling contexts. The mechanism by which this is | ||||
done will likely be some combination of an RTP header extension, | ||||
periodic transmission of a new RTCP SDES item, and some signalling | ||||
extension. The semantics of those items are not yet settled; see | ||||
draft-westerlund-avtext-rtcp-sdes-srcname, draft-ietf-mmusic-msid, | ||||
and draft-even-mmusic-application-token for discussion). | ||||
6. WebRTC Use of RTP: Improving Transport Robustness | 6. WebRTC Use of RTP: Improving Transport Robustness | |||
There are tools that can make RTP media streams robust against packet | There are tools that can make RTP media streams robust against packet | |||
loss and reduce the impact of loss on media quality. However, they | loss and reduce the impact of loss on media quality. However, they | |||
all add extra bits compared to a non-robust stream. The overhead of | all add extra bits compared to a non-robust stream. The overhead of | |||
these extra bits needs to be considered, and the aggregate bit-rate | these extra bits needs to be considered, and the aggregate bit-rate | |||
MUST be rate controlled to avoid causing network congestion (see | MUST be rate controlled to avoid causing network congestion (see | |||
Section 7). As a result, improving robustness might require a lower | Section 7). As a result, improving robustness might require a lower | |||
base encoding quality, but has the potential to deliver that quality | base encoding quality, but has the potential to deliver that quality | |||
with fewer errors. The mechanisms described in the following sub- | with fewer errors. The mechanisms described in the following sub- | |||
skipping to change at page 24, line 5 | skipping to change at page 24, line 5 | |||
(tbd: It is an open question whether these considerations are best | (tbd: It is an open question whether these considerations are best | |||
discussed in this draft, in the W3C WebRTC API spec, or elsewhere. | discussed in this draft, in the W3C WebRTC API spec, or elsewhere. | |||
12. RTP Implementation Considerations | 12. RTP Implementation Considerations | |||
The following discussion provides some guidance on the implementation | The following discussion provides some guidance on the implementation | |||
of the RTP features described in this memo. The focus is on a WebRTC | of the RTP features described in this memo. The focus is on a WebRTC | |||
end-point implementation perspective, and while some mention is made | end-point implementation perspective, and while some mention is made | |||
of the behaviour of middleboxes, that is not the focus of this memo. | of the behaviour of middleboxes, that is not the focus of this memo. | |||
12.1. RTP Sessions and PeerConnections | 12.1. Configuration and Use of RTP Sessions | |||
An RTP session is an association among RTP nodes, which have a single | ||||
shared SSRC space. An RTP session can include a large number of end- | ||||
points and nodes, each sourcing, sinking, manipulating, or reporting | ||||
on the RTP media streams being sent within the RTP session. | ||||
A PeerConnection is a point-to-point association between an end-point | ||||
and some other peer node. That peer node can be either an end-point | ||||
or a centralized processing node of some type. Hence, an RTP session | ||||
can terminate immediately at the far end of a PeerConnection, or it | ||||
might continue as further discussed below for multiparty sessions | ||||
(Section 12.3) and sessions with multiple end points (Section 12.7). | ||||
A PeerConnection can contain one or more RTP sessions, depending on | ||||
how it is set up, and how many UDP flows it uses. A common usage has | ||||
been to have one RTP session per media type, e.g. one for audio and | ||||
one for video, each sent over a different UDP flow. However, the | ||||
default usage in WebRTC will be to use one RTP session for all media | ||||
types, with RTP and RTCP multiplexing (Section 4.5) also mandated. | ||||
This RTP session then uses only one UDP flow. However, for legacy | ||||
interworking and flow-based network prioritization (Section 12.9), a | ||||
WebRTC end-point needs to support a mode of operation where one RTP | ||||
session per media type is used. Currently, each RTP session has to | ||||
use its own UDP flow in this case, however it might be possible to | ||||
multiplex several RTP sessions over a single UDP flow, see | ||||
Section 4.4. | ||||
The multi-unicast- or mesh-based multi-party topology (Figure 1) is a | ||||
good example for this section as it concerns the relation between RTP | ||||
sessions and PeerConnections. In this topology, each participant | ||||
sends individual unicast RTP/UDP/IP flows to each of the other | ||||
participants using independent PeerConnections in a full mesh. This | ||||
topology has the benefit of not requiring central nodes. The | ||||
downside is that it increases the used bandwidth at each sender by | ||||
requiring one copy of the RTP media streams for each participant that | ||||
are part of the same session beyond the sender itself. Hence, this | ||||
topology is limited to scenarios with few participants unless the | ||||
media is very low bandwidth. | ||||
+---+ +---+ | ||||
| A |<---->| B | | ||||
+---+ +---+ | ||||
^ ^ | ||||
\ / | ||||
\ / | ||||
v v | ||||
+---+ | ||||
| C | | ||||
+---+ | ||||
Figure 1: Multi-unicast | ||||
The multi-unicast topology could be implemented as a single RTP | ||||
session, spanning multiple peer-to-peer transport layer connections, | ||||
or as several pairwise RTP sessions, one between each pair of peers. | ||||
To maintain a coherent mapping between the relation between RTP | ||||
sessions and PeerConnections we recommend that one implements this as | ||||
individual RTP sessions. The only downside is that end-point A will | ||||
not learn of the quality of any transmission happening between B and | ||||
C based on RTCP. This has not been seen as a significant downside as | ||||
no one has yet seen a clear need for why A would need to know about | ||||
the B's and C's communication. An advantage of using separate RTP | ||||
sessions is that it enables using different media bit-rates to the | ||||
different peers, thus not forcing B to endure the same quality | ||||
reductions if there are limitations in the transport from A to C as C | ||||
will. | ||||
12.2. Multiple Sources | ||||
A WebRTC end-point might have multiple cameras, microphones or audio | ||||
inputs and thus a single end-point can source multiple RTP media | ||||
streams of the same media type concurrently. Even if an end-point | ||||
does not have multiple media sources of the same media type it has to | ||||
support transmission using multiple SSRCs concurrently in the same | ||||
RTP session. This is due to the requirement on an WebRTC end-point | ||||
to support multiple media types in one RTP session. For example, one | ||||
audio and one video source can result in the end-point sending with | ||||
two different SSRCs in the same RTP session. As multi-party | ||||
conferences are supported, as discussed below in Section 12.3, a | ||||
WebRTC end-point will need to be capable of receiving, decoding and | ||||
play out multiple RTP media streams of the same type concurrently. | ||||
tbd: there needs to be a way of indicating how RTP stream relate when | ||||
there are multiple sources, possibly with simulcast or layered | ||||
coding, and different types of mixer or other middlebox. It is | ||||
possible that the various BUNDLE/Plan-X proposals will solve this, | ||||
but it might also need RTP-level stream identification. To be | ||||
resolved once the outcome of the BUNDLE and plan-X discussions is | ||||
known. | ||||
tbd: Are any mechanism needed to signal limitations in the number of | ||||
active SSRC that an end-point can handle? | ||||
12.3. Multiparty | ||||
There are numerous situations and clear use cases for WebRTC | ||||
supporting RTP sessions supporting multi-party. This can be realized | ||||
in a number of ways using a number of different implementation | ||||
strategies. In the following, the focus is on the different set of | ||||
WebRTC end-point requirements that arise from different sets of | ||||
multi-party topologies. | ||||
The multi-unicast mesh (Figure 1)-based multi-party topology | ||||
discussed above provides a non-centralized solution but can incur a | ||||
heavy tax on the end-points' outgoing paths. It can also consume | ||||
large amount of encoding resources if each outgoing stream is | ||||
specifically encoded. If an encoding is transmitted to multiple | ||||
parties, as in some implementations of the mesh case, a requirement | ||||
on the end-point becomes to be able to create RTP media streams | ||||
suitable for multiple destinations requirements. These requirements | ||||
can both be dependent on transport path and the different end-points | ||||
preferences related to play out of the media. | ||||
+---+ +------------+ +---+ | A WebRTC end-point will be a simultaneous participant in one or more | |||
| A |<---->| |<---->| B | | RTP sessions. Each RTP session can convey multiple media flows, and | |||
+---+ | | +---+ | can include media data from multiple end-points. In the following, | |||
| Mixer | | we outline some ways in which WebRTC end-points can configure and use | |||
+---+ | | +---+ | RTP sessions. | |||
| C |<---->| |<---->| D | | ||||
+---+ +------------+ +---+ | ||||
Figure 2: RTP Mixer with Only Unicast Paths | 12.1.1. Use of Multiple Media Flows Within an RTP Session | |||
A Mixer (Figure 2) is an RTP end-point that optimizes the | RTP is a group communication protocol, and in a WebRTC context every | |||
transmission of RTP media streams from certain perspectives, either | RTP session can potentially contain multiple media flows. There are | |||
by only sending some of the received RTP media stream to any given | several reasons why this might be desirable: | |||
receiver or by providing a combined RTP media stream out of a set of | ||||
contributing streams. There are various methods of implementation as | ||||
discussed in Appendix A.3. A common aspect is that these central | ||||
nodes can use a number of tools to control the media encoding | ||||
provided by a WebRTC end-point. This includes functions like | ||||
requesting breaking the encoding chain and have the encoder produce a | ||||
so called Intra frame. Another is limiting the bit-rate of a given | ||||
stream to better suit the mixer view of the multiple down-streams. | ||||
Others are controlling the most suitable frame-rate, picture | ||||
resolution, the trade-off between frame-rate and spatial quality. | ||||
A mixer gets a significant responsibility to correctly perform | Multiple media types: Outside of WebRTC, it is common to use one RTP | |||
congestion control, source identification, manage synchronization | session for each type of media (e.g., one RTP session for audio | |||
while providing the application with suitable media optimizations. | and one for video, each sent on a different UDP port). However, | |||
to reduce the number of UDP ports used, the default in WebRTC is | ||||
to send all types of media in a single RTP session, as described | ||||
in Section 4.4, using RTP and RTCP multiplexing (Section 4.5) to | ||||
further reduce the number of UDP ports needed. This RTP session | ||||
then uses only one UDP flow, but will contain multiple RTP media | ||||
streams, each containing a different type of media. A common | ||||
example might be an end-point with a camera and microphone that | ||||
sends two RTP streams, one video and one audio, into a single RTP | ||||
session. | ||||
Mixers also need to be trusted nodes when it comes to security as it | Multiple Capture Devices: A WebRTC end-point might have multiple | |||
manipulates either RTP or the media itself before sending it on | cameras, microphones, or other media capture devices, and so might | |||
towards the end-point(s), thus they need to be able to decrypt and | want to generate several RTP media streams of the same media type. | |||
then encrypt it before sending it out. | Alternatively, it might want to send media from a single capture | |||
device in several different formats or quality settings at once. | ||||
Both can result in a single end-point sending multiple RTP media | ||||
streams of the same media type into a single RTP session at the | ||||
same time. | ||||
12.4. SSRC Collision Detection | Associated Repair Data: An end-point might send a media stream that | |||
is somehow associated with another stream. For example, it might | ||||
send an RTP stream that contains FEC or retransmission data | ||||
relating to another stream. Some RTP payload formats send this | ||||
sort of associated repair data as part of the original media | ||||
stream, while others send it as a separate stream. | ||||
The RTP standard [RFC3550] requires any RTP implementation to have | Layered or Multiple Description Coding: An end-point can use a | |||
support for detecting and handling SSRC collisions, i.e., resolve the | layered media codec, for example H.264 SVC, or a multiple | |||
conflict when two different end-points use the same SSRC value. This | description codec, that generates multiple media flows, each with | |||
requirement also applies to WebRTC end-points. There are several | a distinct RTP SSRC, within a single RTP session. | |||
scenarios where SSRC collisions can occur. | ||||
In a point-to-point session where each SSRC is associated with either | RTP Mixers, Translators, and Other Middleboxes: An RTP session, in | |||
of the two end-points and where the main media carrying SSRC | the WebRTC context, is a point-to-point association between an | |||
identifier will be announced in the signalling channel, a collision | end-point and some other peer device, where those devices share a | |||
is less likely to occur due to the information about used SSRCs | common SSRC space. The peer device might be another WebRTC end- | |||
provided by Source-Specific SDP Attributes [RFC5576]. Still if both | point, or it might be an RTP mixer, translator, or some other form | |||
end-points start uses an new SSRC identifier prior to having | of media processing middlebox. In the latter cases, the middlebox | |||
signalled it to the peer and received acknowledgement on the | might send mixed or relayed RTP streams from several participants, | |||
signalling message, there can be collisions. The Source-Specific SDP | that the WebRTC end-point will need to render. Thus, even though | |||
Attributes [RFC5576] contains no mechanism to resolve SSRC collisions | a WebRTC end-point might only be a member of a single RTP session, | |||
or reject a end-points usage of an SSRC. | the peer device might be extending that RTP session to incorporate | |||
other end-points. WebRTC is a group communication environment and | ||||
end-points need to be capable of receiving, decoding, and playing | ||||
out multiple RTP media streams at once, even in a single RTP | ||||
session. | ||||
There could also appear SSRC values that are not signalled. This is | (tbd: Are any mechanism needed to signal limitations in the number | |||
more likely than it appears as certain RTP functions need extra SSRCs | of active SSRC that an end-point can handle?) | |||
to provide functionality related to another (the "main") SSRC, for | ||||
example, SSRC multiplexed RTP retransmission [RFC4588]. In those | ||||
cases, an end-point can create a new SSRC that strictly doesn't need | ||||
to be announced over the signalling channel to function correctly on | ||||
both RTP and PeerConnection level. | ||||
The more likely case for SSRC collision is that multiple end-points | (tbd: need to discuss signalling for the above here, preferably by | |||
in a multiparty conference create new sources and signals those | referring to a separate document that describes SDP use for WebRTC) | |||
towards the central server. In cases where the SSRC/CSRC are | ||||
propagated between the different end-points from the central node | ||||
collisions can occur. | ||||
Another scenario is when the central node manages to connect an end- | 12.1.2. Use of Multiple RTP Sessions | |||
point's PeerConnection to another PeerConnection the end-point | ||||
already has, thus forming a loop where the end-point will receive its | ||||
own traffic. While is is clearly considered a bug, it is important | ||||
that the end-point is able to recognise and handle the case when it | ||||
occurs. This case becomes even more problematic when media mixers, | ||||
and so on, are involved, where the stream received is a different | ||||
stream but still contains this client's input. | ||||
These SSRC/CSRC collisions can only be handled on RTP level as long | In addition to sending and receiving multiple media streams within a | |||
as the same RTP session is extended across multiple PeerConnections | single RTP session, a WebRTC end-point might participate in multiple | |||
by a RTP middlebox. To resolve the more generic case where multiple | RTP sessions. There are several reasons why a WebRTC end-point might | |||
PeerConnections are interconnected, then identification of the media | choose to do this: | |||
source(s) part of a MediaStreamTrack being propagated across multiple | ||||
interconnected PeerConnection needs to be preserved across these | ||||
interconnections. | ||||
12.5. Contributing Sources and the CSRC List | To interoperate with legacy devices: The common practice in the non- | |||
WebRTC world is to send different types of media in separate RTP | ||||
sessions, for example using one RTP session for audio and another | ||||
RTP session, on a different UDP port, for video. All WebRTC end- | ||||
points need to support the option of sending different types of | ||||
media on different RTP sessions, so they can interwork with such | ||||
legacy devices. This is discussed further in Section 4.4. | ||||
RTP allows a mixer, or other RTP-layer middlebox, to combine media | To provide enhanced quality of service: Some network-based quality | |||
flows from multiple sources to form a new media flow. The RTP data | of service mechanisms operate on the granularity of UDP 5-tuples. | |||
packets in that new flow will include a Contributing Source (CSRC) | If it is desired to use these mechanisms to provide differentiated | |||
list, indicating which original SSRCs contributed to the combined | quality of service for some RTP flows, then those RTP flows need | |||
packet. As described in Section 4.1, implementations need to support | to be sent in a separate RTP session using a different UDP port | |||
reception of RTP data packets containing a CSRC list and RTCP packets | number, and with appropriate quality of service marking. This is | |||
that relate to sources present in the CSRC list. | discussed further in Section 12.1.3. | |||
The CSRC list can change on a packet-by-packet basis, depending on | To separate media with different purposes: An end-point might want | |||
the mixing operation being performed. Knowledge of what sources | to send media streams that have different purposes on different | |||
contributed to a particular RTP packet can be important if the user | RTP sessions, to make it easy for the peer device to distinguish | |||
interface indicates which participants are active in the session. | them. For example, some centralised multiparty conferencing | |||
Changes in the CSRC list included in packets needs to be exposed to | systems display the active speaker in high resolution, but show | |||
the WebRTC application using some API, if the application is to be | low resolution "thumbnails" of other participants. Such systems | |||
able to track changes in session participation. It is desirable to | might configure the end-points to send simulcast high- and low- | |||
map CSRC values back into WebRTC MediaStream identities as they cross | resolution versions of their video using separate RTP sessions, to | |||
this API, to avoid exposing the SSRC/CSRC name space to JavaScript | simplify the operation of the central mixer In the WebRTC context | |||
applications. | this appears to be most easily accomplished by establishing | |||
multiple PeerConnection all being feed the same set of WebRTC | ||||
MediaStreams. Each PeerConnection is then configured to deliver a | ||||
particular media quality and thus media bit-rate, and will produce | ||||
an independently encoded version with the codec parameters agreed | ||||
specifically in the context of that PeerConnection. The central | ||||
mixer can always distinguish packets corresponding to the low- and | ||||
high-resolution streams by inspecting their SSRC, RTP payload | ||||
type, or some other information contained in RTP header extensions | ||||
or RTCP packets, but it can be easier to distinguish the flows if | ||||
they arrive on separate RTP sessions on separate UDP ports. | ||||
If the mixer-to-client audio level extension [RFC6465] is being used | To directly connect with multiple peers: A multi-party conference | |||
in the session (see Section 5.2.3), the information in the CSRC list | does not need to use a central mixer. Rather, a multi-unicast | |||
is augmented by audio level information for each contributing source. | mesh can be created, comprising several distinct RTP sessions, | |||
This information can usefully be exposed in the user interface. | with each participant sending RTP traffic over a separate RTP | |||
session (that is, using an independent an PeerConnection object) | ||||
to every other participant, as shown in Figure 1. This topology | ||||
has the benefit of not requiring a central mixer node that is | ||||
trusted to access and manipulate the media data. The downside is | ||||
that it increases the used bandwidth at each sender by requiring | ||||
one copy of the RTP media streams for each participant that are | ||||
part of the same session beyond the sender itself. | ||||
This memo does not require implementations to be able to add a CSRC | The multi-unicast topology could also be implemented as a single | |||
list to outgoing RTP packets. It is expected that the any CSRC list | RTP session, spanning multiple peer-to-peer transport layer | |||
will be added by a mixer or other middlebox that performs in-network | connections, or as several pairwise RTP sessions, one between each | |||
processing of RTP streams. If there is a desire to allow end-system | pair of peers. To maintain a coherent mapping between the | |||
mixing, the requirement in Section 4.1 will need to be updated to | relation between RTP sessions and PeerConnection objects we | |||
support setting the CSRC list in outgoing RTP data packets. | recommend that this is implemented as several individual RTP | |||
sessions. The only downside is that end-point A will not learn of | ||||
the quality of any transmission happening between B and C, since | ||||
it will not see RTCP reports for the RTP session between B and C, | ||||
whereas it would it all three participants were part of a single | ||||
RTP session. Experience with the Mbone tools (experimental RTP- | ||||
based multicast conferencing tools from the late 1990s) has showed | ||||
that RTCP reception quality reports for third parties can usefully | ||||
be presented to the users in a way that helps them understand | ||||
asymmetric network problems, and the approach of using separate | ||||
RTP sessions prevents this. However, an advantage of using | ||||
separate RTP sessions is that it enables using different media | ||||
bit-rates and RTP session configurations between the different | ||||
peers, thus not forcing B to endure the same quality reductions if | ||||
there are limitations in the transport from A to C as C will. It | ||||
it believed that these advantages outweigh the limitations in | ||||
debugging power. | ||||
12.6. Media Synchronization | To indirectly connect with multiple peers: A common scenario in | |||
multi-party conferencing is to create indirect connections to | ||||
multiple peers, using an RTP mixer, translator, or some other type | ||||
of RTP middlebox. Figure 2 outlines a simple topology that might | ||||
be used in a four-person centralised conference. The middlebox | ||||
acts to optimise the transmission of RTP media streams from | ||||
certain perspectives, either by only sending some of the received | ||||
RTP media stream to any given receiver, or by providing a combined | ||||
RTP media stream out of a set of contributing streams. | ||||
When an end-point sends media from more than one media source, it | There are various methods of implementation for the middlebox. If | |||
needs to consider if (and which of) these media sources are to be | implemented as a standard RTP mixer or translator, a single RTP | |||
synchronized. In RTP/RTCP, synchronisation is provided by having a | session will extend across the middlebox and encompass all the | |||
set of RTP media streams be indicated as coming from the same | end-points in one multi-party session. Other types of middlebox | |||
synchronisation context and logical end-point by using the same CNAME | might use separate RTP sessions between each end-point and the | |||
identifier. | middlebox. A common aspect is that these central nodes can use a | |||
number of tools to control the media encoding provided by a WebRTC | ||||
end-point. This includes functions like requesting breaking the | ||||
encoding chain and have the encoder produce a so called Intra | ||||
frame. Another is limiting the bit-rate of a given stream to | ||||
better suit the mixer view of the multiple down-streams. Others | ||||
are controlling the most suitable frame-rate, picture resolution, | ||||
the trade-off between frame-rate and spatial quality. The | ||||
middlebox gets the significant responsibility to correctly perform | ||||
congestion control, source identification, manage synchronization | ||||
while providing the application with suitable media optimizations. | ||||
The middlebox is also has to be a trusted node when it comes to | ||||
security, since it manipulates either the RTP header or the media | ||||
itself (or both) received from one end-point, before sending it on | ||||
towards the end-point(s), thus they need to be able to decrypt and | ||||
then encrypt it before sending it out. | ||||
The next provision is that the internal clocks of all media sources, | RTP Mixers can create a situation where an end-point experiences a | |||
i.e., what drives the RTP timestamp, can be correlated to a system | situation in-between a session with only two end-points and | |||
clock that is provided in RTCP Sender Reports encoded in an NTP | multiple RTP sessions. Mixers are expected to not forward RTCP | |||
format. By correlating all RTP timestamps to a common system clock | reports regarding RTP media streams across themselves. This is | |||
for all sources, the timing relation of the different RTP media | due to the difference in the RTP media streams provided to the | |||
streams, also across multiple RTP sessions can be derived at the | different end-points. The original media source lacks information | |||
receiver and, if desired, the streams can be synchronized. The | about a mixer's manipulations prior to sending it the different | |||
requirement is for the media sender to provide the correlation | receivers. This scenario also results in that an end-point's | |||
information; it is up to the receiver to use it or not. | feedback or requests goes to the mixer. When the mixer can't act | |||
on this by itself, it is forced to go to the original media source | ||||
to fulfil the receivers request. This will not necessarily be | ||||
explicitly visible any RTP and RTCP traffic, but the interactions | ||||
and the time to complete them will indicate such dependencies. | ||||
12.7. Multiple RTP End-points | Providing source authentication in multi-party scenarios is a | |||
challenge. In the mixer-based topologies, end-points source | ||||
authentication is based on, firstly, verifying that media comes | ||||
from the mixer by cryptographic verification and, secondly, trust | ||||
in the mixer to correctly identify any source towards the end- | ||||
point. In RTP sessions where multiple end-points are directly | ||||
visible to an end-point, all end-points will have knowledge about | ||||
each others' master keys, and can thus inject packets claimed to | ||||
come from another end-point in the session. Any node performing | ||||
relay can perform non-cryptographic mitigation by preventing | ||||
forwarding of packets that have SSRC fields that came from other | ||||
end-points before. For cryptographic verification of the source | ||||
SRTP would require additional security mechanisms, for example | ||||
TESLA for SRTP [RFC4383], that are not part of the base WebRTC | ||||
standards. | ||||
Some usages of RTP beyond the recommend topologies result in that an | To forward media between multiple peers: It might be desirable for | |||
WebRTC end-point sending media in an RTP session out over a single | an end-point that receives an RTP media stream to be able to | |||
PeerConnection will receive receiver reports from multiple RTP | forward that media stream to a third party. The are obvious | |||
receivers. Note that receiving multiple receiver reports is expected | security and privacy implications in this, but also potential | |||
because any RTP node that has multiple SSRCs has to report to the | uses. If it is to be allowed, there are two implementation | |||
media sender. The difference here is that they are multiple nodes, | strategies: either the browser can relay the flow at the RTP | |||
and thus will likely have different path characteristics. | layer, or it transcode and forward the media at the application | |||
layer. | ||||
RTP Mixers can create a situation where an end-point experiences a | A relay approach will result in the RTP session be extended beyond | |||
situation in-between a session with only two end-points and multiple | the PeerConnection, making both the original end-point and the | |||
end-points. Mixers are expected to not forward RTCP reports | destination to which the media is forwarded part of the RTP | |||
regarding RTP media streams across themselves. This is due to the | session. These end-points can have different path | |||
difference in the RTP media streams provided to the different end- | characteristics, and hence different reception quality. Thus | |||
points. The original media source lacks information about a mixer's | sender's congestion control needs to be capable of handling this. | |||
manipulations prior to sending it the different receivers. This | The security solution can either support mechanism that the sender | |||
scenario also results in that an end-point's feedback or requests | informs both receivers of the key; alternatively the end-point | |||
goes to the mixer. When the mixer can't act on this by itself, it is | that is forwarding the media needs to decrypt and then re-encrypt | |||
forced to go to the original media source to fulfil the receivers | using a new key. The relay based approach has the advantage that | |||
request. This will not necessarily be explicitly visible any RTP and | the forwarding end-point does not need to transcode the media, | |||
RTCP traffic, but the interactions and the time to complete them will | thus maintaining the quality of the encoding and reducing the | |||
indicate such dependencies. | computational complexity requirements. If the right security | |||
solutions are supported then the end-point that receives the | ||||
forwarded media will be able to verify the authenticity of the | ||||
media coming from the original sender. A downside is that the | ||||
original sender is forced to take both receivers into | ||||
consideration when delivering content. | ||||
The topologies in which an end-point receives receiver reports from | The media transcoder approach is similar to having the forwarding | |||
multiple other end-points are the centralized relay, multicast and an | end-point act as Mixer, terminating the RTP session, combined with | |||
end-point forwarding an RTP media stream. Having multiple RTP nodes | a transcoder. The original sender will only see a single receiver | |||
receive an RTP flow and send reports and feedback about it has | of its media. The receiving end-point will responsible to produce | |||
several impacts. As previously discussed (Section 12.3) any codec | a RTP media stream suitable for onwards transmission. This might | |||
control and rate control needs to be capable of merging the | require media transcoding for congestion control purpose to | |||
requirements and preferences to provide a single best encoding | produce a suitable bit-rate. Thus loosing media quality in the | |||
according to the situation RTP media stream. Specifically, when it | transcoding and forcing the forwarding end-point to spend the | |||
comes to congestion control it needs to be capable of identifying the | resource on the transcoding. The media transcoding does result in | |||
different end-points to form independent congestion state information | a separation of the two different legs removing almost all | |||
for each different path. | dependencies, and allowing the forwarding end-point to optimize | |||
its media transcoding operation. It also allows forwarding | ||||
without the original sender being aware of the forwarding. The | ||||
cost is greatly increased computational complexity on the | ||||
forwarding node. | ||||
Providing source authentication in multi-party scenarios is a | (tbd: ought media forwarding be allowed?) | |||
challenge. In the mixer-based topologies, end-points source | ||||
authentication is based on, firstly, verifying that media comes from | ||||
the mixer by cryptographic verification and, secondly, trust in the | ||||
mixer to correctly identify any source towards the end-point. In RTP | ||||
sessions where multiple end-points are directly visible to an end- | ||||
point, all end-points will have knowledge about each others' master | ||||
keys, and can thus inject packets claimed to come from another end- | ||||
point in the session. Any node performing relay can perform non- | ||||
cryptographic mitigation by preventing forwarding of packets that | ||||
have SSRC fields that came from other end-points before. For | ||||
cryptographic verification of the source SRTP would require | ||||
additional security mechanisms, like TESLA for SRTP [RFC4383]. | ||||
12.8. Simulcast | +---+ +---+ | |||
| A |<--->| B | | ||||
+---+ +---+ | ||||
^ ^ | ||||
\ / | ||||
\ / | ||||
v v | ||||
+---+ | ||||
| C | | ||||
+---+ | ||||
This section discusses simulcast in the meaning of providing a node, | Figure 1: Multi-unicast using several RTP sessions | |||
for example a Mixer, with multiple different encoded versions of the | ||||
same media source. In the WebRTC context, this could be accomplished | ||||
in two ways. One is to establish multiple PeerConnection all being | ||||
feed the same set of WebRTC MediaStreams. Another method is to use | ||||
multiple WebRTC MediaStreams that are differently configured when it | ||||
comes to the media parameters. This would result in that multiple | ||||
different RTP Media Streams (SSRCs) being in used with different | ||||
encoding based on the same media source (camera, microphone). | ||||
When intending to use simulcast it is important that this is made | +---+ +-------------+ +---+ | |||
explicit so that the end-points don't automatically try to optimize | | A |<---->| |<---->| B | | |||
away the different encodings and provide a single common version. | +---+ | RTP mixer, | +---+ | |||
Thus, some explicit indications that the intent really is to have | | translator, | | |||
different media encodings is likely needed. It is to be noted that | | or other | | |||
it might be a central node, rather than an WebRTC end-point that | +---+ | middlebox | +---+ | |||
would benefit from receiving simulcast media sources. | | C |<---->| |<---->| D | | |||
+---+ +-------------+ +---+ | ||||
tbd: How to perform simulcast needs to be determined and the | Figure 2: RTP mixer with only unicast paths | |||
appropriate API or signalling for its usage needs to be defined. | ||||
12.9. Differentiated Treatment of Flows | 12.1.3. Differentiated Treatment of Flows | |||
There are use cases for differentiated treatment of RTP media | There are use cases for differentiated treatment of RTP media | |||
streams. Such differentiation can happen at several places in the | streams. Such differentiation can happen at several places in the | |||
system. First of all is the prioritization within the end-point | system. First of all is the prioritization within the end-point | |||
sending the media, which controls, both which RTP media streams that | sending the media, which controls, both which RTP media streams that | |||
will be sent, and their allocation of bit-rate out of the current | will be sent, and their allocation of bit-rate out of the current | |||
available aggregate as determined by the congestion control. | available aggregate as determined by the congestion control. | |||
It is expected that the WebRTC API will allow the application to | It is expected that the WebRTC API will allow the application to | |||
indicate relative priorities for different MediaStreamTracks. These | indicate relative priorities for different MediaStreamTracks. These | |||
skipping to change at page 31, line 17 | skipping to change at page 30, line 37 | |||
streams and FEC. The importance of such associated RTP traffic flows | streams and FEC. The importance of such associated RTP traffic flows | |||
is dependent on the media type and codec used, in regards to how | is dependent on the media type and codec used, in regards to how | |||
robust that codec is to packet loss. However, a default policy might | robust that codec is to packet loss. However, a default policy might | |||
to be to use the same priority for associated RTP flows as for the | to be to use the same priority for associated RTP flows as for the | |||
primary RTP flow. | primary RTP flow. | |||
Secondly, the network can prioritize packet flows, including RTP | Secondly, the network can prioritize packet flows, including RTP | |||
media streams. Typically, differential treatment includes two steps, | media streams. Typically, differential treatment includes two steps, | |||
the first being identifying whether an IP packet belongs to a class | the first being identifying whether an IP packet belongs to a class | |||
that has to be treated differently, the second the actual mechanism | that has to be treated differently, the second the actual mechanism | |||
to prioritize packets. This is done according to three methods; | to prioritize packets. This is done according to three methods: | |||
DiffServ: The end-point marks a packet with a DiffServ code point to | DiffServ: The end-point marks a packet with a DiffServ code point to | |||
indicate to the network that the packet belongs to a particular | indicate to the network that the packet belongs to a particular | |||
class. | class. | |||
Flow based: Packets that need to be given a particular treatment are | Flow based: Packets that need to be given a particular treatment are | |||
identified using a combination of IP and port address. | identified using a combination of IP and port address. | |||
Deep Packet Inspection: A network classifier (DPI) inspects the | Deep Packet Inspection: A network classifier (DPI) inspects the | |||
packet and tries to determine if the packet represents a | packet and tries to determine if the packet represents a | |||
skipping to change at page 32, line 24 | skipping to change at page 31, line 45 | |||
particular RTP media flow need to be marked. RTCP compound packets | particular RTP media flow need to be marked. RTCP compound packets | |||
with Sender Reports (SR), ought to be marked with the same priority | with Sender Reports (SR), ought to be marked with the same priority | |||
as the RTP media flow itself, so the RTCP-based round-trip time (RTT) | as the RTP media flow itself, so the RTCP-based round-trip time (RTT) | |||
measurements are done using the same flow priority as the media flow | measurements are done using the same flow priority as the media flow | |||
experiences. RTCP compound packets containing RR packet ought to be | experiences. RTCP compound packets containing RR packet ought to be | |||
sent with the priority used by the majority of the RTP media flows | sent with the priority used by the majority of the RTP media flows | |||
reported on. RTCP packets containing time-critical feedback packets | reported on. RTCP packets containing time-critical feedback packets | |||
can use higher priority to improve the timeliness and likelihood of | can use higher priority to improve the timeliness and likelihood of | |||
delivery of such feedback. | delivery of such feedback. | |||
12.2. Source, Flow, and Participant Identification | ||||
12.2.1. Media Streams | ||||
Each RTP media stream is identified by a unique synchronisation | ||||
source (SSRC) identifier. The SSRC identifier is carried in the RTP | ||||
data packets comprising a media stream, and is also used to identify | ||||
that stream in the corresponding RTCP reports. The SSRC is chosen as | ||||
discussed in Section 4.8. The first stage in demultiplexing RTP and | ||||
RTCP packets received at a WebRTC end-point is to separate the media | ||||
streams based on their SSRC value; once that is done, additional | ||||
demultiplexing steps can determine how and where to render the media. | ||||
RTP allows a mixer, or other RTP-layer middlebox, to combine media | ||||
flows from multiple sources to form a new media flow. The RTP data | ||||
packets in that new flow can include a Contributing Source (CSRC) | ||||
list, indicating which original SSRCs contributed to the combined | ||||
packet. As described in Section 4.1, implementations need to support | ||||
reception of RTP data packets containing a CSRC list and RTCP packets | ||||
that relate to sources present in the CSRC list. The CSRC list can | ||||
change on a packet-by-packet basis, depending on the mixing operation | ||||
being performed. Knowledge of what sources contributed to a | ||||
particular RTP packet can be important if the user interface | ||||
indicates which participants are active in the session. Changes in | ||||
the CSRC list included in packets needs to be exposed to the WebRTC | ||||
application using some API, if the application is to be able to track | ||||
changes in session participation. It is desirable to map CSRC values | ||||
back into WebRTC MediaStream identities as they cross this API, to | ||||
avoid exposing the SSRC/CSRC name space to JavaScript applications. | ||||
If the mixer-to-client audio level extension [RFC6465] is being used | ||||
in the session (see Section 5.2.3), the information in the CSRC list | ||||
is augmented by audio level information for each contributing source. | ||||
This information can usefully be exposed in the user interface. | ||||
12.2.2. Media Streams: SSRC Collision Detection | ||||
The RTP standard [RFC3550] requires any RTP implementation to have | ||||
support for detecting and handling SSRC collisions, i.e., resolve the | ||||
conflict when two different end-points use the same SSRC value. This | ||||
requirement also applies to WebRTC end-points. There are several | ||||
scenarios where SSRC collisions can occur. | ||||
In a point-to-point session where each SSRC is associated with either | ||||
of the two end-points and where the main media carrying SSRC | ||||
identifier will be announced in the signalling channel, a collision | ||||
is less likely to occur due to the information about used SSRCs | ||||
provided by Source-Specific SDP Attributes [RFC5576]. Still if both | ||||
end-points start uses an new SSRC identifier prior to having | ||||
signalled it to the peer and received acknowledgement on the | ||||
signalling message, there can be collisions. The Source-Specific SDP | ||||
Attributes [RFC5576] contains no mechanism to resolve SSRC collisions | ||||
or reject a end-points usage of an SSRC. | ||||
There could also appear SSRC values that are not signalled. This is | ||||
more likely than it appears as certain RTP functions need extra SSRCs | ||||
to provide functionality related to another (the "main") SSRC, for | ||||
example, SSRC multiplexed RTP retransmission [RFC4588]. In those | ||||
cases, an end-point can create a new SSRC that strictly doesn't need | ||||
to be announced over the signalling channel to function correctly on | ||||
both RTP and PeerConnection level. | ||||
The more likely case for SSRC collision is that multiple end-points | ||||
in a multiparty conference create new sources and signals those | ||||
towards the central server. In cases where the SSRC/CSRC are | ||||
propagated between the different end-points from the central node | ||||
collisions can occur. | ||||
Another scenario is when the central node manages to connect an end- | ||||
point's PeerConnection to another PeerConnection the end-point | ||||
already has, thus forming a loop where the end-point will receive its | ||||
own traffic. While is is clearly considered a bug, it is important | ||||
that the end-point is able to recognise and handle the case when it | ||||
occurs. This case becomes even more problematic when media mixers, | ||||
and so on, are involved, where the stream received is a different | ||||
stream but still contains this client's input. | ||||
These SSRC/CSRC collisions can only be handled on RTP level as long | ||||
as the same RTP session is extended across multiple PeerConnections | ||||
by a RTP middlebox. To resolve the more generic case where multiple | ||||
PeerConnections are interconnected, then identification of the media | ||||
source(s) part of a MediaStreamTrack being propagated across multiple | ||||
interconnected PeerConnection needs to be preserved across these | ||||
interconnections. | ||||
12.2.3. Media Synchronisation Context | ||||
When an end-point sends media from more than one media source, it | ||||
needs to consider if (and which of) these media sources are to be | ||||
synchronized. In RTP/RTCP, synchronisation is provided by having a | ||||
set of RTP media streams be indicated as coming from the same | ||||
synchronisation context and logical end-point by using the same RTCP | ||||
CNAME identifier. | ||||
The next provision is that the internal clocks of all media sources, | ||||
i.e., what drives the RTP timestamp, can be correlated to a system | ||||
clock that is provided in RTCP Sender Reports encoded in an NTP | ||||
format. By correlating all RTP timestamps to a common system clock | ||||
for all sources, the timing relation of the different RTP media | ||||
streams, also across multiple RTP sessions can be derived at the | ||||
receiver and, if desired, the streams can be synchronized. The | ||||
requirement is for the media sender to provide the correlation | ||||
information; it is up to the receiver to use it or not. | ||||
12.2.4. Correlation of Media Streams | ||||
(tbd: this need to outline the approach to mapping media streams to | ||||
the signalling context defined in the unified plan) | ||||
(tbd: need to discuss correlation between associated RTP streams, for | ||||
example between a media stream and its associated FEC stream) | ||||
13. Security Considerations | 13. Security Considerations | |||
The overall security architecture for WebRTC is described in | The overall security architecture for WebRTC is described in | |||
[I-D.ietf-rtcweb-security-arch], and security considerations for the | [I-D.ietf-rtcweb-security-arch], and security considerations for the | |||
WebRTC framework are described in [I-D.ietf-rtcweb-security]. These | WebRTC framework are described in [I-D.ietf-rtcweb-security]. These | |||
considerations apply to this memo also. | considerations apply to this memo also. | |||
The security considerations of the RTP specification, the RTP/SAVPF | The security considerations of the RTP specification, the RTP/SAVPF | |||
profile, and the various RTP/RTCP extensions and RTP payload formats | profile, and the various RTP/RTCP extensions and RTP payload formats | |||
that form the complete protocol suite described in this memo apply. | that form the complete protocol suite described in this memo apply. | |||
skipping to change at page 33, line 26 | skipping to change at page 35, line 15 | |||
15. Open Issues | 15. Open Issues | |||
This section contains a summary of the open issues or to be done | This section contains a summary of the open issues or to be done | |||
things noted in the document: | things noted in the document: | |||
1. tbd: The API mapping to RTP level concepts has to be agreed and | 1. tbd: The API mapping to RTP level concepts has to be agreed and | |||
documented in Section 11. | documented in Section 11. | |||
2. tbd: An open question if any requirements are needed to agree and | 2. tbd: An open question if any requirements are needed to agree and | |||
limit the number of simultaneously used media sources (SSRCs) | limit the number of simultaneously used media sources (SSRCs) | |||
within an RTP session. See Section 12.2 and Section 4.1. | within an RTP session. See Section 4.1. | |||
3. tbd: The method for achieving simulcast of a media source has to | 3. tbd: The method for achieving simulcast of a media source has to | |||
be decided as discussed in Section 12.8. | be decided. | |||
4. tbd: Possible documentation of what support for differentiated | 4. tbd: Possible documentation of what support for differentiated | |||
treatment that are needed on RTP level as the API and the network | treatment that are needed on RTP level as the API and the network | |||
level specification matures as discussed in Section 12.9. | level specification matures as discussed in Section 12.1.3. | |||
5. tbd: Editing of Appendix A to remove redundancy between this and | ||||
the update of RTP Topologies | ||||
[I-D.westerlund-avtcore-rtp-topologies-update]. | ||||
16. Acknowledgements | 16. Acknowledgements | |||
The authors would like to thank Harald Alvestrand, Cary Bran, Charles | The authors would like to thank Harald Alvestrand, Cary Bran, Charles | |||
Eckel, Cullen Jennings, Bernard Aboba, and the other members of the | Eckel, Cullen Jennings, Bernard Aboba, and the other members of the | |||
IETF RTCWEB working group for their valuable feedback. | IETF RTCWEB working group for their valuable feedback. | |||
17. References | 17. References | |||
17.1. Normative References | 17.1. Normative References | |||
[I-D.ietf-avtcore-6222bis] | [I-D.ietf-avtcore-6222bis] | |||
Begen, A., Perkins, C., Wing, D., and E. Rescorla, | Begen, A., Perkins, C., Wing, D., and E. Rescorla, | |||
"Guidelines for Choosing RTP Control Protocol (RTCP) | "Guidelines for Choosing RTP Control Protocol (RTCP) | |||
Canonical Names (CNAMEs)", draft-ietf-avtcore-6222bis-06 | Canonical Names (CNAMEs)", draft-ietf-avtcore-6222bis-06 | |||
(work in progress), July 2013. | (work in progress), July 2013. | |||
[I-D.ietf-avtcore-avp-codecs] | [I-D.ietf-avtcore-avp-codecs] | |||
Terriberry, T., "Update to Remove DVI4 from the | Terriberry, T., "Update to Remove DVI4 from the | |||
skipping to change at page 34, line 27 | skipping to change at page 36, line 6 | |||
[I-D.ietf-avtcore-multi-media-rtp-session] | [I-D.ietf-avtcore-multi-media-rtp-session] | |||
Westerlund, M., Perkins, C., and J. Lennox, "Sending | Westerlund, M., Perkins, C., and J. Lennox, "Sending | |||
Multiple Types of Media in a Single RTP Session", draft- | Multiple Types of Media in a Single RTP Session", draft- | |||
ietf-avtcore-multi-media-rtp-session-03 (work in | ietf-avtcore-multi-media-rtp-session-03 (work in | |||
progress), July 2013. | progress), July 2013. | |||
[I-D.ietf-avtcore-rtp-circuit-breakers] | [I-D.ietf-avtcore-rtp-circuit-breakers] | |||
Perkins, C. and V. Singh, "Multimedia Congestion Control: | Perkins, C. and V. Singh, "Multimedia Congestion Control: | |||
Circuit Breakers for Unicast RTP Sessions", draft-ietf- | Circuit Breakers for Unicast RTP Sessions", draft-ietf- | |||
avtcore-rtp-circuit-breakers-02 (work in progress), | avtcore-rtp-circuit-breakers-03 (work in progress), July | |||
February 2013. | 2013. | |||
[I-D.ietf-avtcore-rtp-multi-stream-optimisation] | [I-D.ietf-avtcore-rtp-multi-stream-optimisation] | |||
Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, | Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, | |||
"Sending Multiple Media Streams in a Single RTP Session: | "Sending Multiple Media Streams in a Single RTP Session: | |||
Grouping RTCP Reception Statistics and Other Feedback ", | Grouping RTCP Reception Statistics and Other Feedback ", | |||
draft-ietf-avtcore-rtp-multi-stream-optimisation-00 (work | draft-ietf-avtcore-rtp-multi-stream-optimisation-00 (work | |||
in progress), July 2013. | in progress), July 2013. | |||
[I-D.ietf-avtcore-rtp-multi-stream] | [I-D.ietf-avtcore-rtp-multi-stream] | |||
Lennox, J., Westerlund, M., Wu, W., and C. Perkins, | Lennox, J., Westerlund, M., Wu, W., and C. Perkins, | |||
skipping to change at page 35, line 14 | skipping to change at page 36, line 39 | |||
Petit-Huguenin, M. and G. Zorn, "Support for Multiple | Petit-Huguenin, M. and G. Zorn, "Support for Multiple | |||
Clock Rates in an RTP Session", draft-ietf-avtext- | Clock Rates in an RTP Session", draft-ietf-avtext- | |||
multiple-clock-rates-09 (work in progress), April 2013. | multiple-clock-rates-09 (work in progress), April 2013. | |||
[I-D.ietf-mmusic-sdp-bundle-negotiation] | [I-D.ietf-mmusic-sdp-bundle-negotiation] | |||
Holmberg, C., Alvestrand, H., and C. Jennings, | Holmberg, C., Alvestrand, H., and C. Jennings, | |||
"Multiplexing Negotiation Using Session Description | "Multiplexing Negotiation Using Session Description | |||
Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- | Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- | |||
bundle-negotiation-04 (work in progress), June 2013. | bundle-negotiation-04 (work in progress), June 2013. | |||
[I-D.ietf-rtcweb-overview] | ||||
Alvestrand, H., "Overview: Real Time Protocols for Brower- | ||||
based Applications", draft-ietf-rtcweb-overview-06 (work | ||||
in progress), February 2013. | ||||
[I-D.ietf-rtcweb-security-arch] | [I-D.ietf-rtcweb-security-arch] | |||
Rescorla, E., "WebRTC Security Architecture", draft-ietf- | Rescorla, E., "WebRTC Security Architecture", draft-ietf- | |||
rtcweb-security-arch-07 (work in progress), July 2013. | rtcweb-security-arch-07 (work in progress), July 2013. | |||
[I-D.ietf-rtcweb-security] | [I-D.ietf-rtcweb-security] | |||
Rescorla, E., "Security Considerations for WebRTC", draft- | Rescorla, E., "Security Considerations for WebRTC", draft- | |||
ietf-rtcweb-security-05 (work in progress), July 2013. | ietf-rtcweb-security-05 (work in progress), July 2013. | |||
[I-D.westerlund-avtcore-transport-multiplexing] | ||||
Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a | ||||
Single Lower-Layer Transport", draft-westerlund-avtcore- | ||||
transport-multiplexing-05 (work in progress), February | ||||
2013. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP | [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP | |||
Payload Format Specifications", BCP 36, RFC 2736, December | Payload Format Specifications", BCP 36, RFC 2736, December | |||
1999. | 1999. | |||
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. | |||
Jacobson, "RTP: A Transport Protocol for Real-Time | Jacobson, "RTP: A Transport Protocol for Real-Time | |||
Applications", STD 64, RFC 3550, July 2003. | Applications", STD 64, RFC 3550, July 2003. | |||
skipping to change at page 37, line 21 | skipping to change at page 38, line 39 | |||
[I-D.alvestrand-rtcweb-msid] | [I-D.alvestrand-rtcweb-msid] | |||
Alvestrand, H., "Cross Session Stream Identification in | Alvestrand, H., "Cross Session Stream Identification in | |||
the Session Description Protocol", draft-alvestrand- | the Session Description Protocol", draft-alvestrand- | |||
rtcweb-msid-02 (work in progress), May 2012. | rtcweb-msid-02 (work in progress), May 2012. | |||
[I-D.ietf-avt-srtp-ekt] | [I-D.ietf-avt-srtp-ekt] | |||
Wing, D., McGrew, D., and K. Fischer, "Encrypted Key | Wing, D., McGrew, D., and K. Fischer, "Encrypted Key | |||
Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 | Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 | |||
(work in progress), October 2011. | (work in progress), October 2011. | |||
[I-D.ietf-avtcore-rtp-topologies-update] | ||||
Westerlund, M. and S. Wenger, "RTP Topologies", draft- | ||||
ietf-avtcore-rtp-topologies-update-00 (work in progress), | ||||
April 2013. | ||||
[I-D.ietf-rtcweb-overview] | ||||
Alvestrand, H., "Overview: Real Time Protocols for Brower- | ||||
based Applications", draft-ietf-rtcweb-overview-07 (work | ||||
in progress), August 2013. | ||||
[I-D.ietf-rtcweb-qos] | [I-D.ietf-rtcweb-qos] | |||
Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and | Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and | |||
other packet markings for RTCWeb QoS", draft-ietf-rtcweb- | other packet markings for RTCWeb QoS", draft-ietf-rtcweb- | |||
qos-00 (work in progress), October 2012. | qos-00 (work in progress), October 2012. | |||
[I-D.ietf-rtcweb-use-cases-and-requirements] | [I-D.ietf-rtcweb-use-cases-and-requirements] | |||
Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- | Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- | |||
Time Communication Use-cases and Requirements", draft- | Time Communication Use-cases and Requirements", draft- | |||
ietf-rtcweb-use-cases-and-requirements-11 (work in | ietf-rtcweb-use-cases-and-requirements-11 (work in | |||
progress), June 2013. | progress), June 2013. | |||
skipping to change at page 37, line 43 | skipping to change at page 39, line 22 | |||
Jesup, R. and H. Alvestrand, "Congestion Control | Jesup, R. and H. Alvestrand, "Congestion Control | |||
Requirements For Real Time Media", draft-jesup-rtp- | Requirements For Real Time Media", draft-jesup-rtp- | |||
congestion-reqs-00 (work in progress), March 2012. | congestion-reqs-00 (work in progress), March 2012. | |||
[I-D.westerlund-avtcore-multiplex-architecture] | [I-D.westerlund-avtcore-multiplex-architecture] | |||
Westerlund, M., Perkins, C., and H. Alvestrand, | Westerlund, M., Perkins, C., and H. Alvestrand, | |||
"Guidelines for using the Multiplexing Features of RTP", | "Guidelines for using the Multiplexing Features of RTP", | |||
draft-westerlund-avtcore-multiplex-architecture-03 (work | draft-westerlund-avtcore-multiplex-architecture-03 (work | |||
in progress), February 2013. | in progress), February 2013. | |||
[I-D.westerlund-avtcore-rtp-topologies-update] | [I-D.westerlund-avtcore-transport-multiplexing] | |||
Westerlund, M. and S. Wenger, "RTP Topologies", draft- | Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a | |||
westerlund-avtcore-rtp-topologies-update-02 (work in | Single Lower-Layer Transport", draft-westerlund-avtcore- | |||
progress), February 2013. | transport-multiplexing-05 (work in progress), February | |||
2013. | ||||
[RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control | [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control | |||
Protocol Extended Reports (RTCP XR)", RFC 3611, November | Protocol Extended Reports (RTCP XR)", RFC 3611, November | |||
2003. | 2003. | |||
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | |||
Control Protocol (DCCP) Congestion Control ID 2: TCP-like | Control Protocol (DCCP) Congestion Control ID 2: TCP-like | |||
Congestion Control", RFC 4341, March 2006. | Congestion Control", RFC 4341, March 2006. | |||
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | |||
skipping to change at page 38, line 41 | skipping to change at page 40, line 19 | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
[RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP | [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP | |||
Control Protocol (RTCP)", RFC 5968, September 2010. | Control Protocol (RTCP)", RFC 5968, September 2010. | |||
[RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for | [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for | |||
Keeping Alive the NAT Mappings Associated with RTP / RTP | Keeping Alive the NAT Mappings Associated with RTP / RTP | |||
Control Protocol (RTCP) Flows", RFC 6263, June 2011. | Control Protocol (RTCP) Flows", RFC 6263, June 2011. | |||
Appendix A. Supported RTP Topologies | ||||
RTP supports both unicast and group communication, with participants | ||||
being connected using wide range of transport-layer topologies. Some | ||||
of these topologies involve only the end-points, while others use RTP | ||||
translators and mixers to provide in-network processing. Properties | ||||
of some RTP topologies are discussed in | ||||
[I-D.westerlund-avtcore-rtp-topologies-update], and we further | ||||
describe those expected to be useful for WebRTC in the following. We | ||||
also goes into important RTP session aspects that the topology or | ||||
implementation variant can place on a WebRTC end-point. | ||||
This section includes RTP topologies beyond the RECOMMENDED ones. | ||||
This in an attempt to highlight the differences and the in many case | ||||
small differences in implementation to support a larger set of | ||||
possible topologies. | ||||
(tbd: This section needs reworking and clearer relation to | ||||
[I-D.westerlund-avtcore-rtp-topologies-update].) | ||||
A.1. Point to Point | ||||
The point-to-point RTP topology (Figure 3) is the simplest scenario | ||||
for WebRTC applications. This is going to be very common for user to | ||||
user calls. | ||||
+---+ +---+ | ||||
| A |<------->| B | | ||||
+---+ +---+ | ||||
Figure 3: Point to Point | ||||
This being the basic one lets use the topology to high-light a couple | ||||
of details that are common for all RTP usage in the WebRTC context. | ||||
First is the intention to multiplex RTP and RTCP over the same UDP- | ||||
flow. Secondly is the question of using only a single RTP session or | ||||
one per media type for legacy interoperability. Thirdly is the | ||||
question of using multiple sender sources (SSRCs) per end-point. | ||||
Historically, RTP and RTCP have been run on separate UDP ports. With | ||||
the increased use of Network Address/Port Translation (NAPT) this has | ||||
become problematic, since maintaining multiple NAT bindings can be | ||||
costly. It also complicates firewall administration, since multiple | ||||
ports need to be opened to allow RTP traffic. To reduce these costs | ||||
and session set-up times, support for multiplexing RTP data packets | ||||
and RTCP control packets on a single port [RFC5761] will be | ||||
supported. | ||||
In cases where there is only one type of media (e.g., a voice-only | ||||
call) this topology will be implemented as a single RTP session, with | ||||
bidirectional flows of RTP and RTCP packets, all then multiplexed | ||||
onto a single 5-tuple. If multiple types of media are to be used | ||||
(e.g., audio and video), then each type media can be sent as a | ||||
separate RTP session using a different 5-tuple, allowing for separate | ||||
transport level treatment of each type of media. Alternatively, all | ||||
types of media can be multiplexed onto a single 5-tuple as a single | ||||
RTP session, or as several RTP sessions if using a demultiplexing | ||||
shim. Multiplexing different types of media onto a single 5-tuple | ||||
places some limitations on how RTP is used, as described in "RTP | ||||
Multiplexing Architecture" | ||||
[I-D.westerlund-avtcore-multiplex-architecture]. It is not expected | ||||
that these limitations will significantly affect the scenarios | ||||
targeted by WebRTC, but they can impact interoperability with legacy | ||||
systems. | ||||
An RTP session have good support for simultaneously transport | ||||
multiple media sources. Each media source uses an unique SSRC | ||||
identifier and each SSRC has independent RTP sequence number and | ||||
timestamp spaces. This is being utilized in WebRTC for several | ||||
cases. One is to enable multiple media sources of the same type, an | ||||
end-point that has two video cameras can potentially transmit video | ||||
from both to its peer(s). Another usage is when a single RTP session | ||||
is being used for both multiple media types, thus an end-point can | ||||
transmit both audio and video to the peer(s). Thirdly to support | ||||
multi-party cases as will be discussed below support for multiple | ||||
SSRC of the same media type is needed. | ||||
Thus we can introduce a couple of different notations in the below | ||||
two alternate figures of a single peer connection in a point to point | ||||
set-up. The first depicting a setup where the peer connection | ||||
established has two different RTP sessions, one for audio and one for | ||||
video. The second one using a single RTP session. In both cases A | ||||
has two video streams to send and one audio stream. B has only one | ||||
audio and video stream. These are used to illustrate the relation | ||||
between a peerConnection, the UDP flow(s), the RTP session(s) and the | ||||
SSRCs that will be used in the later cases also. In the below | ||||
figures RTCP flows are not included. They will flow bi-directionally | ||||
between any RTP session instances in the different nodes. | ||||
+-A-------------+ +-B-------------+ | ||||
| +-PeerC1------| |-PeerC1------+ | | ||||
| | +-UDP1------| |-UDP1------+ | | | ||||
| | | +-RTP1----| |-RTP1----+ | | | | ||||
| | | | +-Audio-| |-Audio-+ | | | | | ||||
| | | | | AA1|---------------->| | | | | | | ||||
| | | | | |<----------------|BA1 | | | | | | ||||
| | | | +-------| |-------+ | | | | | ||||
| | | +---------| |---------+ | | | | ||||
| | +-----------| |-----------+ | | | ||||
| | | | | | | ||||
| | +-UDP2------| |-UDP2------+ | | | ||||
| | | +-RTP2----| |-RTP1----+ | | | | ||||
| | | | +-Video-| |-Video-+ | | | | | ||||
| | | | | AV1|---------------->| | | | | | | ||||
| | | | | AV2|---------------->| | | | | | | ||||
| | | | | |<----------------|BV1 | | | | | | ||||
| | | | +-------| |-------+ | | | | | ||||
| | | +---------| |---------+ | | | | ||||
| | +-----------| |-----------+ | | | ||||
| +-------------| |-------------+ | | ||||
+---------------+ +---------------+ | ||||
Figure 4: Point to Point: Multiple RTP sessions | ||||
As can be seen above in the Point to Point: Multiple RTP sessions | ||||
(Figure 4) the single Peer Connection contains two RTP sessions over | ||||
different UDP flows UDP 1 and UDP 2, i.e. their 5-tuples will be | ||||
different, normally on source and destination ports. The first RTP | ||||
session (RTP1) carries audio, one stream in each direction AA1 and | ||||
BA1. The second RTP session contains two video streams from A (AV1 | ||||
and AV2) and one from B to A (BV1). | ||||
+-A-------------+ +-B-------------+ | ||||
| +-PeerC1------| |-PeerC1------+ | | ||||
| | +-UDP1------| |-UDP1------+ | | | ||||
| | | +-RTP1----| |-RTP1----+ | | | | ||||
| | | | +-Audio-| |-Audio-+ | | | | | ||||
| | | | | AA1|---------------->| | | | | | | ||||
| | | | | |<----------------|BA1 | | | | | | ||||
| | | | +-------| |-------+ | | | | | ||||
| | | | | | | | | | | ||||
| | | | +-Video-| |-Video-+ | | | | | ||||
| | | | | AV1|---------------->| | | | | | | ||||
| | | | | AV2|---------------->| | | | | | | ||||
| | | | | |<----------------|BV1 | | | | | | ||||
| | | | +-------| |-------+ | | | | | ||||
| | | +---------| |---------+ | | | | ||||
| | +-----------| |-----------+ | | | ||||
| +-------------| |-------------+ | | ||||
+---------------+ +---------------+ | ||||
Figure 5: Point to Point: Single RTP session. | ||||
In (Figure 5) there is only a single UDP flow and RTP session (RTP1). | ||||
This RTP session carries a total of five (5) RTP media streams | ||||
(SSRCs). From A to B there is Audio (AA1) and two video (AV1 and | ||||
AV2). From B to A there is Audio (BA1) and Video (BV1). | ||||
A.2. Multi-Unicast (Mesh) | ||||
For small multiparty calls, it is practical to set up a multi-unicast | ||||
topology (Figure 6). In this topology, each participant sends | ||||
individual unicast RTP/UDP/IP flows to each of the other participants | ||||
using independent PeerConnections in a full mesh. | ||||
+---+ +---+ | ||||
| A |<---->| B | | ||||
+---+ +---+ | ||||
^ ^ | ||||
\ / | ||||
\ / | ||||
v v | ||||
+---+ | ||||
| C | | ||||
+---+ | ||||
Figure 6: Multi-unicast | ||||
This topology has the benefit of not requiring central nodes. The | ||||
downside is that it increases the used bandwidth at each sender by | ||||
requiring one copy of the RTP media streams for each participant that | ||||
are part of the same session beyond the sender itself. Hence, this | ||||
topology is limited to scenarios with few participants unless the | ||||
media is very low bandwidth. The multi-unicast topology could be | ||||
implemented as a single RTP session, spanning multiple peer-to-peer | ||||
transport layer connections, or as several pairwise RTP sessions, one | ||||
between each pair of peers. To maintain a coherent mapping between | ||||
the relation between RTP sessions and PeerConnections we recommend | ||||
that one implements this as individual RTP sessions. The only | ||||
downside is that end-point A will not learn of the quality of any | ||||
transmission happening between B and C based on RTCP. This has not | ||||
been seen as a significant downside as now one has yet seen a need | ||||
for why A would need to know about the B's and C's communication. An | ||||
advantage of using separate RTP sessions is that it enables using | ||||
different media bit-rates to the different peers, thus not forcing B | ||||
to endure the same quality reductions if there are limitations in the | ||||
transport from A to C as C will. | ||||
+-A------------------------+ +-B-------------+ | ||||
|+---+ +-PeerC1------| |-PeerC1------+ | | ||||
||MIC| | +-UDP1------| |-UDP1------+ | | | ||||
|+---+ | | +-RTP1----| |-RTP1----+ | | | | ||||
| | +----+ | | | +-Audio-| |-Audio-+ | | | | | ||||
| +->|ENC1|--+-+-+-+--->AA1|------------->| | | | | | | ||||
| | +----+ | | | | |<-------------|BA1 | | | | | | ||||
| | | | | +-------| |-------+ | | | | | ||||
| | | | +---------| |---------+ | | | | ||||
| | | +-----------| |-----------+ | | | ||||
| | +-------------| |-------------+ | | ||||
| | | |---------------+ | ||||
| | | | ||||
| | | +-C-------------+ | ||||
| | +-PeerC2------| |-PeerC2------+ | | ||||
| | | +-UDP2------| |-UDP2------+ | | | ||||
| | | | +-RTP2----| |-RTP2----+ | | | | ||||
| | +----+ | | | +-Audio-| |-Audio-+ | | | | | ||||
| +->|ENC2|--+-+-+-+--->AA2|------------->| | | | | | | ||||
| +----+ | | | | |<-------------|CA1 | | | | | | ||||
| | | | +-------| |-------+ | | | | | ||||
| | | +---------| |---------+ | | | | ||||
| | +-----------| |-----------+ | | | ||||
| +-------------| |-------------+ | | ||||
+--------------------------+ +---------------+ | ||||
Figure 7: Session structure for Multi-Unicast Setup | ||||
Lets review how the RTP sessions looks from A's perspective by | ||||
considering both how the media is a handled and what PeerConnections | ||||
and RTP sessions that are set-up in Figure 7. A's microphone is | ||||
captured and the digital audio can then be feed into two different | ||||
encoder instances each beeing associated with two different | ||||
PeerConnections (PeerC1 and PeerC2) each containing independent RTP | ||||
sessions (RTP1 and RTP2). The SSRCs in each RTP session will be | ||||
completely independent and the media bit-rate produced by the encoder | ||||
can also be tuned to address any congestion control requirements | ||||
between A and B differently then for the path A to C. | ||||
For media encodings which are more resource consuming, like video, | ||||
one could expect that it will be common that end-points that are | ||||
resource constrained will use a different implementation strategy | ||||
where the encoder is shared between the different PeerConnections as | ||||
shown below Figure 8. | ||||
+-A----------------------+ +-B-------------+ | ||||
|+---+ | | | | ||||
||CAM| +-PeerC1------| |-PeerC1------+ | | ||||
|+---+ | +-UDP1------| |-UDP1------+ | | | ||||
| | | | +-RTP1----| |-RTP1----+ | | | | ||||
| V | | | +-Video-| |-Video-+ | | | | | ||||
|+----+ | | | | |<----------------|BV1 | | | | | | ||||
||ENC |----+-+-+-+--->AV1|---------------->| | | | | | | ||||
|+----+ | | | +-------| |-------+ | | | | | ||||
| | | | +---------| |---------+ | | | | ||||
| | | +-----------| |-----------+ | | | ||||
| | +-------------| |-------------+ | | ||||
| | | |---------------+ | ||||
| | | | ||||
| | | +-C-------------+ | ||||
| | +-PeerC2------| |-PeerC2------+ | | ||||
| | | +-UDP2------| |-UDP2------+ | | | ||||
| | | | +-RTP2----| |-RTP2----+ | | | | ||||
| | | | | +-Video-| |-Video-+ | | | | | ||||
| +-------+-+-+-+--->AV2|---------------->| | | | | | | ||||
| | | | | |<----------------|CV1 | | | | | | ||||
| | | | +-------| |-------+ | | | | | ||||
| | | +---------| |---------+ | | | | ||||
| | +-----------| |-----------+ | | | ||||
| +-------------| |-------------+ | | ||||
+------------------------+ +---------------+ | ||||
Figure 8: Single Encoder Multi-Unicast Setup | ||||
This will clearly save resources consumed by encoding but does | ||||
introduce the need for the end-point A to make decisions on how it | ||||
encodes the media so it suites delivery to both B and C. This is not | ||||
limited to congestion control, also preferred resolution to receive | ||||
based on dispaly area available is another aspect requiring | ||||
consideration. The need for this type of decision logic does arise | ||||
in several different topologies and implementation. | ||||
A.3. Mixer Based | ||||
An mixer (Figure 9) is a centralised point that selects or mixes | ||||
content in a conference to optimise the RTP session so that each end- | ||||
point only needs connect to one entity, the mixer. The mixer can | ||||
also reduce the bit-rate needed from the mixer down to a conference | ||||
participants as the media sent from the mixer to the end-point can be | ||||
optimised in different ways. These optimisations include methods | ||||
like only choosing media from the currently most active speaker or | ||||
mixing together audio so that only one audio stream is needed instead | ||||
of 3 in the depicted scenario (Figure 9). | ||||
+---+ +------------+ +---+ | ||||
| A |<---->| |<---->| B | | ||||
+---+ | | +---+ | ||||
| Mixer | | ||||
+---+ | | +---+ | ||||
| C |<---->| |<---->| D | | ||||
+---+ +------------+ +---+ | ||||
Figure 9: RTP Mixer with Only Unicast Paths | ||||
Mixers have two downsides, the first is that the mixer has to be a | ||||
trusted node as they either performs media operations or at least re- | ||||
packetize the media. Both type of operations requires when using | ||||
SRTP that the mixer verifies integrity, decrypts the content, perform | ||||
its operation and form new RTP packets, encrypts and integrity | ||||
protect them. This applies to all types of mixers described below. | ||||
The second downside is that all these operations and optimization of | ||||
the session requires processing. How much depends on the | ||||
implementation as will become evident below. | ||||
The implementation of an mixer can take several different forms and | ||||
we will discuss the main themes available that doesn't break RTP. | ||||
Please note that a Mixer could also contain translator | ||||
functionalities, like a media transcoder to adjust the media bit-rate | ||||
or codec used on a particular RTP media stream. | ||||
A.3.1. Media Mixing | ||||
This type of mixer is one which clearly can be called RTP mixer is | ||||
likely the one that most thinks of when they hear the term mixer. | ||||
Its basic patter of operation is that it will receive the different | ||||
participants RTP media stream. Select which that are to be included | ||||
in a media domain mix of the incoming RTP media streams. Then create | ||||
a single outgoing stream from this mix. | ||||
Audio mixing is straight forward and commonly possible to do for a | ||||
number of participants. Lets assume that you want to mix N number of | ||||
streams from different participants. Then the mixer need to perform | ||||
decoding N times. Then it needs to produce N or N+1 mixes, the | ||||
reasons that different mixes are needed are so that each contributing | ||||
source get a mix which don't contain themselves, as this would result | ||||
in an echo. When N is lower than the number of all participants one | ||||
can produce a Mix of all N streams for the group that are curently | ||||
not included in the mix, thus N+1 mixes. These audio streams are | ||||
then encoded again, RTP packetized and sent out. | ||||
Video can't really be "mixed" and produce something particular useful | ||||
for the users, however creating an composition out of the contributed | ||||
video streams can be done. In fact it can be done in a number of | ||||
ways, tiling the different streams creating a chessboard, selecting | ||||
someone as more important and showing them large and a number of | ||||
other sources as smaller is another. Also here one commonly need to | ||||
produce a number of different compositions so that the contributing | ||||
part doesn't need to see themselves. Then the mixer re-encodes the | ||||
created video stream, RTP packetize it and send it out | ||||
The problem with media mixing is that it both consume large amount of | ||||
media processing and encoding resources. The second is the quality | ||||
degradation created by decoding and re-encoding the RTP media stream. | ||||
Its advantage is that it is quite simplistic for the clients to | ||||
handle as they don't need to handle local mixing and composition. | ||||
+-A-------------+ +-MIXER--------------------------+ | ||||
| +-PeerC1------| |-PeerC1--------+ | | ||||
| | +-UDP1------| |-UDP1--------+ | | | ||||
| | | +-RTP1----| |-RTP1------+ | | +-----+ | | ||||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | ||||
| | | | | AA1|------------>|---------+-+-+-+-|DEC|->| | | | ||||
| | | | | |<------------|MA1 <----+ | | | +---+ | | | | ||||
| | | | | | |(BA1+CA1)|\| | | +---+ | | | | ||||
| | | | +-------| |---------+ +-+-+-|ENC|<-| B+C | | | ||||
| | | +---------| |-----------+ | | +---+ | | | | ||||
| | +-----------| |-------------+ | | M | | | ||||
| +-------------| |---------------+ | E | | | ||||
+---------------+ | | D | | | ||||
| | I | | | ||||
+-B-------------+ | | A | | | ||||
| +-PeerC2------| |-PeerC2--------+ | | | | ||||
| | +-UDP2------| |-UDP2--------+ | | M | | | ||||
| | | +-RTP2----| |-RTP2------+ | | | I | | | ||||
| | | | +-Audio-| |-Audio---+ | | | +---+ | X | | | ||||
| | | | | BA1|------------>|---------+-+-+-+-|DEC|->| E | | | ||||
| | | | | |<------------|MA2 <----+ | | | +---+ | R | | | ||||
| | | | +-------| |(BA1+CA1)|\| | | +---+ | | | | ||||
| | | +---------| |---------+ +-+-+-|ENC|<-| A+C | | | ||||
| | +-----------| |-----------+ | | +---+ | | | | ||||
| +-------------| |-------------+ | | | | | ||||
+---------------+ |---------------+ | | | | ||||
| | | | | ||||
+-C-------------+ | | | | | ||||
| +-PeerC3------| |-PeerC3--------+ | | | | ||||
| | +-UDP3------| |-UDP3--------+ | | | | | ||||
| | | +-RTP3----| |-RTP3------+ | | | | | | ||||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | ||||
| | | | | CA1|------------>|---------+-+-+-+-|DEC|->| | | | ||||
| | | | | |<------------|MA3 <----+ | | | +---+ | | | | ||||
| | | | +-------| |(BA1+CA1)|\| | | +---+ | | | | ||||
| | | +---------| |---------+ +-+-+-|ENC|<-| A+B | | | ||||
| | +-----------| |-----------+ | | +---+ | | | | ||||
| +-------------| |-------------+ | +-----+ | | ||||
+---------------+ |---------------+ | | ||||
+--------------------------------+ | ||||
Figure 10: Session and SSRC details for Media Mixer | ||||
From an RTP perspective media mixing can be very straight forward as | ||||
can be seen in Figure 10. The mixer present one SSRC towards the | ||||
peer client, e.g. MA1 to Peer A, which is the media mix of the other | ||||
participants. As each peer receives a different version produced by | ||||
the mixer there are no actual relation between the different RTP | ||||
sessions in the actual media or the transport level information. | ||||
There is however one connection between RTP1-RTP3 in this figure. It | ||||
has to do with the SSRC space and the identity information. When A | ||||
receives the MA1 stream which is a combination of BA1 and CA1 streams | ||||
in the other PeerConnections RTP could enable the mixer to include | ||||
CSRC information in the MA1 stream to identify the contributing | ||||
source BA1 and CA1. | ||||
The CSRC has in its turn utility in RTP extensions, like the in | ||||
Section 5.2.3 discussed Mixer to Client audio levels RTP header | ||||
extension [RFC6465]. If the SSRC from one PeerConnection are used as | ||||
CSRC in another PeerConnection then RTP1, RTP2 and RTP3 becomes one | ||||
joint session as they have a common SSRC space. At this stage one | ||||
also need to consider which RTCP information one need to expose in | ||||
the different legs. For the above situation commonly nothing more | ||||
than the Source Description (SDES) information and RTCP BYE for CSRC | ||||
need to be exposed. The main goal would be to enable the correct | ||||
binding against the application logic and other information sources. | ||||
This also enables loop detection in the RTP session. | ||||
A.3.1.1. RTP Session Termination | ||||
There exist an possible implementation choice to have the RTP | ||||
sessions being separated between the different legs in the multi- | ||||
party communication session and only generate RTP media streams in | ||||
each without carrying on RTP/RTCP level any identity information | ||||
about the contributing sources. This removes both the functionality | ||||
that CSRC can provide and the possibility to use any extensions that | ||||
build on CSRC and the loop detection. It might appear a | ||||
simplification if SSRC collision would occur between two different | ||||
end-points as they can be avoided to be resolved and instead remapped | ||||
between the independent sessions if at all exposed. However, SSRC/ | ||||
CSRC remapping requires that SSRC/CSRC are never exposed to the | ||||
WebRTC JavaScript client to use as reference. This as they only have | ||||
local importance if they are used on a multi-party session scope the | ||||
result would be mis-referencing. Also SSRC collision handling will | ||||
still be needed as it can occur between the mixer and the end-point. | ||||
Session termination might appear to resolve some issues, it however | ||||
creates other issues that needs resolving, like loop detection, | ||||
identification of contributing sources and the need to handle mapped | ||||
identities and ensure that the right one is used towards the right | ||||
identities and never used directly between multiple end-points. | ||||
A.3.2. Media Switching | ||||
An RTP Mixer based on media switching avoids the media decoding and | ||||
encoding cycle in the mixer, but not the decryption and re-encryption | ||||
cycle as one rewrites RTP headers. This both reduces the amount of | ||||
computational resources needed in the mixer and increases the media | ||||
quality per transmitted bit. This is achieve by letting the mixer | ||||
have a number of SSRCs that represents conceptual or functional | ||||
streams the mixer produces. These streams are created by selecting | ||||
media from one of the by the mixer received RTP media streams and | ||||
forward the media using the mixers own SSRCs. The mixer can then | ||||
switch between available sources if that is needed by the concept for | ||||
the source, like currently active speaker. | ||||
To achieve a coherent RTP media stream from the mixer's SSRC the | ||||
mixer is forced to rewrite the incoming RTP packet's header. First | ||||
the SSRC field has to be set to the value of the Mixer's SSRC. | ||||
Secondly, the sequence number is set to the next in the sequence of | ||||
outgoing packets it sent. Thirdly the RTP timestamp value needs to | ||||
be adjusted using an offset that changes each time one switch media | ||||
source. Finally depending on the negotiation the RTP payload type | ||||
value representing this particular RTP payload configuration might | ||||
have to be changed if the different PeerConnections have not arrived | ||||
on the same numbering for a given configuration. This also requires | ||||
that the different end-points do support a common set of codecs, | ||||
otherwise media transcoding for codec compatibility is still needed. | ||||
Lets consider the operation of media switching mixer that supports a | ||||
video conference with six participants (A-F) where the two latest | ||||
speakers in the conference are shown to each participants. Thus the | ||||
mixer has two SSRCs sending video to each peer. | ||||
+-A-------------+ +-MIXER--------------------------+ | ||||
| +-PeerC1------| |-PeerC1--------+ | | ||||
| | +-UDP1------| |-UDP1--------+ | | | ||||
| | | +-RTP1----| |-RTP1------+ | | +-----+ | | ||||
| | | | +-Video-| |-Video---+ | | | | | | | ||||
| | | | | AV1|------------>|---------+-+-+-+------->| | | | ||||
| | | | | |<------------|MV1 <----+-+-+-+-BV1----| | | | ||||
| | | | | |<------------|MV2 <----+-+-+-+-EV1----| | | | ||||
| | | | +-------| |---------+ | | | | | | | ||||
| | | +---------| |-----------+ | | | | | | ||||
| | +-----------| |-------------+ | | S | | | ||||
| +-------------| |---------------+ | W | | | ||||
+---------------+ | | I | | | ||||
| | T | | | ||||
+-B-------------+ | | C | | | ||||
| +-PeerC2------| |-PeerC2--------+ | H | | | ||||
| | +-UDP2------| |-UDP2--------+ | | | | | ||||
| | | +-RTP2----| |-RTP2------+ | | | M | | | ||||
| | | | +-Video-| |-Video---+ | | | | A | | | ||||
| | | | | BV1|------------>|---------+-+-+-+------->| T | | | ||||
| | | | | |<------------|MV3 <----+-+-+-+-AV1----| R | | | ||||
| | | | | |<------------|MV4 <----+-+-+-+-EV1----| I | | | ||||
| | | | +-------| |---------+ | | | | X | | | ||||
| | | +---------| |-----------+ | | | | | | ||||
| | +-----------| |-------------+ | | | | | ||||
| +-------------| |---------------+ | | | | ||||
+---------------+ | | | | | ||||
: : : : | ||||
: : : : | ||||
+-F-------------+ | | | | | ||||
| +-PeerC6------| |-PeerC6--------+ | | | | ||||
| | +-UDP6------| |-UDP6--------+ | | | | | ||||
| | | +-RTP6----| |-RTP6------+ | | | | | | ||||
| | | | +-Video-| |-Video---+ | | | | | | | ||||
| | | | | CV1|------------>|---------+-+-+-+------->| | | | ||||
| | | | | |<------------|MV11 <---+-+-+-+-AV1----| | | | ||||
| | | | | |<------------|MV12 <---+-+-+-+-EV1----| | | | ||||
| | | | +-------| |---------+ | | | | | | | ||||
| | | +---------| |-----------+ | | | | | | ||||
| | +-----------| |-------------+ | +-----+ | | ||||
| +-------------| |---------------+ | | ||||
+---------------+ +--------------------------------+ | ||||
Figure 11: Media Switching RTP Mixer | ||||
The Media Switching RTP mixer can similar to the Media Mixing one | ||||
reduce the bit-rate needed towards the different peers by selecting | ||||
and switching in a sub-set of RTP media streams out of the ones it | ||||
receives from the conference participations. | ||||
To ensure that a media receiver can correctly decode the RTP media | ||||
stream after a switch, it becomes necessary to ensure for state | ||||
saving codecs that they start from default state at the point of | ||||
switching. Thus one common tool for video is to request that the | ||||
encoding creates an intra picture, something that isn't dependent on | ||||
earlier state. This can be done using Full Intra Request RTCP codec | ||||
control message as discussed in Section 5.1.1. | ||||
Also in this type of mixer one could consider to terminate the RTP | ||||
sessions fully between the different PeerConnection. The same | ||||
arguments and considerations as discussed in Appendix A.3.1.1 applies | ||||
here. | ||||
A.3.3. Media Projecting | ||||
Another method for handling media in the RTP mixer is to project all | ||||
potential sources (SSRCs) into a per end-point independent RTP | ||||
session. The mixer can then select which of the potential sources | ||||
that are currently actively transmitting media, despite that the | ||||
mixer in another RTP session receives media from that end-point. | ||||
This is similar to the media switching Mixer but have some important | ||||
differences in RTP details. | ||||
+-A-------------+ +-MIXER--------------------------+ | ||||
| +-PeerC1------| |-PeerC1--------+ | | ||||
| | +-UDP1------| |-UDP1--------+ | | | ||||
| | | +-RTP1----| |-RTP1------+ | | +-----+ | | ||||
| | | | +-Video-| |-Video---+ | | | | | | | ||||
| | | | | AV1|------------>|---------+-+-+-+------->| | | | ||||
| | | | | |<------------|BV1 <----+-+-+-+--------| | | | ||||
| | | | | |<------------|CV1 <----+-+-+-+--------| | | | ||||
| | | | | |<------------|DV1 <----+-+-+-+--------| | | | ||||
| | | | | |<------------|EV1 <----+-+-+-+--------| | | | ||||
| | | | | |<------------|FV1 <----+-+-+-+--------| | | | ||||
| | | | +-------| |---------+ | | | | | | | ||||
| | | +---------| |-----------+ | | | | | | ||||
| | +-----------| |-------------+ | | S | | | ||||
| +-------------| |---------------+ | W | | | ||||
+---------------+ | | I | | | ||||
| | T | | | ||||
+-B-------------+ | | C | | | ||||
| +-PeerC2------| |-PeerC2--------+ | H | | | ||||
| | +-UDP2------| |-UDP2--------+ | | | | | ||||
| | | +-RTP2----| |-RTP2------+ | | | M | | | ||||
| | | | +-Video-| |-Video---+ | | | | A | | | ||||
| | | | | BV1|------------>|---------+-+-+-+------->| T | | | ||||
| | | | | |<------------|AV1 <----+-+-+-+--------| R | | | ||||
| | | | | |<------------|CV1 <----+-+-+-+--------| I | | | ||||
| | | | | | : : : |: : : : : : : : : : :| X | | | ||||
| | | | | |<------------|FV1 <----+-+-+-+--------| | | | ||||
| | | | +-------| |---------+ | | | | | | | ||||
| | | +---------| |-----------+ | | | | | | ||||
| | +-----------| |-------------+ | | | | | ||||
| +-------------| |---------------+ | | | | ||||
+---------------+ | | | | | ||||
: : : : | ||||
: : : : | ||||
+-F-------------+ | | | | | ||||
| +-PeerC6------| |-PeerC6--------+ | | | | ||||
| | +-UDP6------| |-UDP6--------+ | | | | | ||||
| | | +-RTP6----| |-RTP6------+ | | | | | | ||||
| | | | +-Video-| |-Video---+ | | | | | | | ||||
| | | | | CV1|------------>|---------+-+-+-+------->| | | | ||||
| | | | | |<------------|AV1 <----+-+-+-+--------| | | | ||||
| | | | | | : : : |: : : : : : : : : : :| | | | ||||
| | | | | |<------------|EV1 <----+-+-+-+--------| | | | ||||
| | | | +-------| |---------+ | | | | | | | ||||
| | | +---------| |-----------+ | | | | | | ||||
| | +-----------| |-------------+ | +-----+ | | ||||
| +-------------| |---------------+ | | ||||
+---------------+ +--------------------------------+ | ||||
Figure 12: Media Projecting Mixer | ||||
So in this six participant conference depicted above in (Figure 12) | ||||
one can see that end-point A will in this case be aware of 5 incoming | ||||
SSRCs, BV1-FV1. If this mixer intend to have the same behavior as in | ||||
Appendix A.3.2 where the mixer provides the end-points with the two | ||||
latest speaking end-points, then only two out of these five SSRCs | ||||
will concurrently transmit media to A. As the mixer selects which | ||||
source in the different RTP sessions that transmit media to the end- | ||||
points each RTP media stream will require some rewriting when being | ||||
projected from one session into another. The main thing is that the | ||||
sequence number will need to be consecutively incremented based on | ||||
the packet actually being transmitted in each RTP session. Thus the | ||||
RTP sequence number offset will change each time a source is turned | ||||
on in RTP session. | ||||
As the RTP sessions are independent the SSRC numbers used can be | ||||
handled independently also thus working around any SSRC collisions by | ||||
having remapping tables between the RTP sessions. However the | ||||
related WebRTC MediaStream signalling need to be correspondingly | ||||
changed to ensure consistent WebRTC MediaStream to SSRC mappings | ||||
between the different PeerConnections and the same comment that | ||||
higher functions MUST NOT use SSRC as references to RTP media streams | ||||
applies also here. | ||||
The mixer will also be responsible to act on any RTCP codec control | ||||
requests coming from an end-point and decide if it can act on it | ||||
locally or needs to translate the request into the RTP session that | ||||
contains the media source. Both end-points and the mixer will need | ||||
to implement conference related codec control functionalities to | ||||
provide a good experience. Full Intra Request to request from the | ||||
media source to provide switching points between the sources, | ||||
Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer | ||||
to aggregate congestion control response towards the media source and | ||||
have it adjust its bit-rate in case the limitation is not in the | ||||
source to mixer link. | ||||
This version of the mixer also puts different requirements on the | ||||
end-point when it comes to decoder instances and handling of the RTP | ||||
media streams providing media. As each projected SSRC can at any | ||||
time provide media the end-point either needs to handle having thus | ||||
many allocated decoder instances or have efficient switching of | ||||
decoder contexts in a more limited set of actual decoder instances to | ||||
cope with the switches. The WebRTC application also gets more | ||||
responsibility to update how the media provides is to be presented to | ||||
the user. | ||||
A.4. Translator Based | ||||
There is also a variety of translators. The core commonality is that | ||||
they do not need to make themselves visible in the RTP level by | ||||
having an SSRC themselves. Instead they sit between one or more end- | ||||
point and perform translation at some level. It can be media | ||||
transcoding, protocol translation or covering missing functionality | ||||
for a legacy end-point or simply relay packets between transport | ||||
domains or to realize multi-party. We will go in details below. | ||||
A.4.1. Transcoder | ||||
A transcoder operates on media level and really used for two | ||||
purposes, the first is to allow two end-points that doesn't have a | ||||
common set of media codecs to communicate by translating from one | ||||
codec to another. The second is to change the bit-rate to a lower | ||||
one. For WebRTC end-points communicating with each other only the | ||||
first one is relevant. In certain legacy deployment media transcoder | ||||
will be necessary to ensure both codecs and bit-rate falls within the | ||||
envelope the legacy end-point supports. | ||||
As transcoding requires access to the media, the transcoder has to be | ||||
within the security context and access any media encryption and | ||||
integrity keys. On the RTP plane a media transcoder will in practice | ||||
fork the RTP session into two different domains that are highly | ||||
decoupled when it comes to media parameters and reporting, but not | ||||
identities. To maintain signalling bindings to SSRCs a transcoder is | ||||
likely needing to use the SSRC of one end-point to represent the | ||||
transcoded RTP media stream to the other end-point(s). The | ||||
congestion control loop can be terminated in the transcoder as the | ||||
media bit-rate being sent by the transcoder can be adjusted | ||||
independently of the incoming bit-rate. However, for optimizing | ||||
performance and resource consumption the translator needs to consider | ||||
what signals or bit-rate reductions it needs to send towards the | ||||
source end-point. For example receiving a 2.5 Mbps video stream and | ||||
then send out a 250 kbps video stream after transcoding is a waste of | ||||
resources. In most cases a 500 kbps video stream from the source in | ||||
the right resolution is likely to provide equal quality after | ||||
transcoding as the 2.5 Mbps source stream. At the same time | ||||
increasing media bit-rate further than what is needed to represent | ||||
the incoming quality accurate is also wasted resources. | ||||
+-A-------------+ +-Translator------------------+ | ||||
| +-PeerC1------| |-PeerC1--------+ | | ||||
| | +-UDP1------| |-UDP1--------+ | | | ||||
| | | +-RTP1----| |-RTP1------+ | | | | ||||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | ||||
| | | | | AA1|------------>|---------+-+-+-+-|DEC|----+ | | ||||
| | | | | |<------------|BA1 <----+ | | | +---+ | | | ||||
| | | | | | | |\| | | +---+ | | | ||||
| | | | +-------| |---------+ +-+-+-|ENC|<-+ | | | ||||
| | | +---------| |-----------+ | | +---+ | | | | ||||
| | +-----------| |-------------+ | | | | | ||||
| +-------------| |---------------+ | | | | ||||
+---------------+ | | | | | ||||
| | | | | ||||
+-B-------------+ | | | | | ||||
| +-PeerC2------| |-PeerC2--------+ | | | | ||||
| | +-UDP2------| |-UDP2--------+ | | | | | ||||
| | | +-RTP1----| |-RTP1------+ | | | | | | ||||
| | | | +-Audio-| |-Audio---+ | | | +---+ | | | | ||||
| | | | | BA1|------------>|---------+-+-+-+-|DEC|--+ | | | ||||
| | | | | |<------------|AA1 <----+ | | | +---+ | | | ||||
| | | | | | | |\| | | +---+ | | | ||||
| | | | +-------| |---------+ +-+-+-|ENC|<---+ | | ||||
| | | +---------| |-----------+ | | +---+ | | ||||
| | +-----------| |-------------+ | | | ||||
| +-------------| |---------------+ | | ||||
+---------------+ +-----------------------------+ | ||||
Figure 13: Media Transcoder | ||||
Figure 13 exposes some important details. First of all you can see | ||||
the SSRC identifiers used by the translator are the corresponding | ||||
end-points. Secondly, there is a relation between the RTP sessions | ||||
in the two different PeerConnections that are represented by having | ||||
both parts be identified by the same level and they need to share | ||||
certain contexts. Also certain type of RTCP messages will need to be | ||||
bridged between the two parts. Certain RTCP feedback messages are | ||||
likely needed to be sourced by the translator in response to actions | ||||
by the translator and its media encoder. | ||||
A.4.2. Gateway / Protocol Translator | ||||
Gateways are used when some protocol feature that are needed are not | ||||
supported by an end-point wants to participate in session. This RTP | ||||
translator in Figure 14 takes on the role of ensuring that from the | ||||
perspective of participant A, participant B appears as a fully | ||||
compliant WebRTC end-point (that is, it is the combination of the | ||||
Translator and participant B that looks like a WebRTC end point). | ||||
+------------+ | ||||
| | | ||||
+---+ | Translator | +---+ | ||||
| A |<---->| to legacy |<---->| B | | ||||
+---+ | end-point | +---+ | ||||
WebRTC | | Legacy | ||||
+------------+ | ||||
Figure 14: Gateway (RTP translator) towards legacy end-point | ||||
For WebRTC there are a number of requirements that could force the | ||||
need for a gateway if a WebRTC end-point is to communicate with a | ||||
legacy end-point, such as support of ICE and DTLS-SRTP for key | ||||
management. On RTP level the main functions that might be missing in | ||||
a legacy implementation that otherwise support RTP are RTCP in | ||||
general, SRTP implementation, congestion control and feedback | ||||
messages needed to make it work. | ||||
+-A-------------+ +-Translator------------------+ | ||||
| +-PeerC1------| |-PeerC1------+ | | ||||
| | +-UDP1------| |-UDP1------+ | | | ||||
| | | +-RTP1----| |-RTP1-----------------------+| | ||||
| | | | +-Audio-| |-Audio---+ || | ||||
| | | | | AA1|------------>|---------+----------------+ || | ||||
| | | | | |<------------|BA1 <----+--------------+ | || | ||||
| | | | | |<---RTCP---->|<--------+----------+ | | || | ||||
| | | | +-------| |---------+ +---+-+ | | || | ||||
| | | +---------| |---------------+| T | | | || | ||||
| | +-----------| |-----------+ | || R | | | || | ||||
| +-------------| |-------------+ || A | | | || | ||||
+---------------+ | || N | | | || | ||||
| || S | | | || | ||||
+-B-(Legacy)----+ | || L | | | || | ||||
| | | || A | | | || | ||||
| +-UDP2------| |-UDP2------+ || T | | | || | ||||
| | +-RTP1----| |-RTP1----------+| E | | | || | ||||
| | | +-Audio-| |-Audio---+ +---+-+ | | || | ||||
| | | | |<---RTCP---->|<--------+----------+ | | || | ||||
| | | | BA1|------------>|---------+--------------+ | || | ||||
| | | | |<------------|AA1 <----+----------------+ || | ||||
| | | +-------| |---------+ || | ||||
| | +---------| |----------------------------+| | ||||
| +-----------| |-----------+ | | ||||
| | | | | ||||
+---------------+ +-----------------------------+ | ||||
Figure 15: RTP/RTCP Protocol Translator | ||||
The legacy gateway can be implemented in several ways and what it | ||||
need to change is highly dependent on what functions it need to proxy | ||||
for the legacy end-point. One possibility is depicted in Figure 15 | ||||
where the RTP media streams are compatible and forward without | ||||
changes. However, their RTP header values are captured to enable the | ||||
RTCP translator to create RTCP reception information related to the | ||||
leg between the end-point and the translator. This can then be | ||||
combined with the more basic RTCP reports that the legacy endpoint | ||||
(B) provides to give compatible and expected RTCP reporting to A. | ||||
Thus enabling at least full congestion control on the path between A | ||||
and the translator. If B has limited possibilities for congestion | ||||
response for the media then the translator might need the capability | ||||
to perform media transcoding to address cases where it otherwise | ||||
would need to terminate media transmission. | ||||
As the translator are generating RTP/RTCP traffic on behalf of B to A | ||||
it will need to be able to correctly protect these packets that it | ||||
translates or generates. Thus security context information are | ||||
needed in this type of translator if it operates on the RTP/RTCP | ||||
packet content or media. In fact one of the more likely scenario is | ||||
that the translator (gateway) will need to have two different | ||||
security contexts one towards A and one towards B and for each RTP/ | ||||
RTCP packet do a authenticity verification, decryption followed by a | ||||
encryption and integrity protection operation to resolve mismatch in | ||||
security systems. | ||||
A.4.3. Relay | ||||
There exist a class of translators that operates on transport level | ||||
below RTP and thus do not effect RTP/RTCP packets directly. They | ||||
come in two distinct flavours, the one used to bridge between two | ||||
different transport or address domains to more function as a gateway | ||||
and the second one which is to to provide a group communication | ||||
feature as depicted below in Figure 16. | ||||
+---+ +------------+ +---+ | ||||
| A |<---->| |<---->| B | | ||||
+---+ | | +---+ | ||||
| Translator | | ||||
+---+ | | +---+ | ||||
| C |<---->| |<---->| D | | ||||
+---+ +------------+ +---+ | ||||
Figure 16: RTP Translator (Relay) with Only Unicast Paths | ||||
The first kind is straight forward and is likely to exist in WebRTC | ||||
context when an legacy end-point is compatible with the exception for | ||||
ICE, and thus needs a gateway that terminates the ICE and then | ||||
forwards all the RTP/RTCP traffic and key management to the end-point | ||||
only rewriting the IP/UDP to forward the packet to the legacy node. | ||||
The second type is useful if one wants a less complex central node or | ||||
a central node that is outside of the security context and thus do | ||||
not have access to the media. This relay takes on the role of | ||||
forwarding the media (RTP and RTCP) packets to the other end-points | ||||
but doesn't perform any RTP or media processing. Such a device | ||||
simply forwards the media from each sender to all of the other | ||||
participants, and is sometimes called a transport-layer translator. | ||||
In Figure 16, participant A will only need to send a media once to | ||||
the relay, which will redistribute it by sending a copy of the stream | ||||
to participants B, C, and D. Participant A will still receive three | ||||
RTP streams with the media from B, C and D if they transmit | ||||
simultaneously. This is from an RTP perspective resulting in an RTP | ||||
session that behaves equivalent to one transporter over an IP Any | ||||
Source Multicast (ASM). | ||||
This results in one common RTP session between all participants | ||||
despite that there will be independent PeerConnections created to the | ||||
translator as depicted below Figure 17. | ||||
+-A-------------+ +-RELAY--------------------------+ | ||||
| +-PeerC1------| |-PeerC1--------+ | | ||||
| | +-UDP1------| |-UDP1--------+ | | | ||||
| | | +-RTP1----| |-RTP1-------------------------+ | | ||||
| | | | +-Video-| |-Video---+ | | | ||||
| | | | | AV1|------------>|---------------------------+ | | | ||||
| | | | | |<------------|BV1 <--------------------+ | | | | ||||
| | | | | |<------------|CV1 <------------------+ | | | | | ||||
| | | | +-------| |---------+ | | | | | | ||||
| | | +---------| |-------------------+ ^ ^ V | | | ||||
| | +-----------| |-------------+ | | | | | | | | ||||
| +-------------| |---------------+ | | | | | | | ||||
+---------------+ | | | | | | | | ||||
| | | | | | | | ||||
+-B-------------+ | | | | | | | | ||||
| +-PeerC2------| |-PeerC2--------+ | | | | | | | ||||
| | +-UDP2------| |-UDP2--------+ | | | | | | | | ||||
| | | +-RTP2----| |-RTP1--------------+ | | | | | | ||||
| | | | +-Video-| |-Video---+ | | | | | | ||||
| | | | | BV1|------------>|-----------------------+ | | | | | ||||
| | | | | |<------------|AV1 <----------------------+ | | | ||||
| | | | | |<------------|CV1 <--------------------+ | | | | ||||
| | | | +-------| |---------+ | | | | | | ||||
| | | +---------| |-------------------+ | | | | | | ||||
| | +-----------| |-------------+ | | V ^ V | | | ||||
| +-------------| |---------------+ | | | | | | | ||||
+---------------+ | | | | | | | | ||||
: | | | | | | | ||||
: | | | | | | | ||||
+-C-------------+ | | | | | | | | ||||
| +-PeerC3------| |-PeerC3--------+ | | | | | | | ||||
| | +-UDP3------| |-UDP3--------+ | | | | | | | | ||||
| | | +-RTP3----| |-RTP1--------------+ | | | | | | ||||
| | | | +-Video-| |-Video---+ | | | | | | ||||
| | | | | CV1|------------>|-------------------------+ | | | | ||||
| | | | | |<------------|AV1 <----------------------+ | | | ||||
| | | | | |<------------|BV1 <------------------+ | | | ||||
| | | | +-------| |---------+ | | | ||||
| | | +---------| |------------------------------+ | | ||||
| | +-----------| |-------------+ | | | ||||
| +-------------| |---------------+ | | ||||
+---------------+ +--------------------------------+ | ||||
Figure 17: Transport Multi-party Relay | ||||
As the Relay RTP and RTCP packets between the UDP flows as indicated | ||||
by the arrows for the media flow a given WebRTC end-point, like A | ||||
will see the remote sources BV1 and CV1. There will be also two | ||||
different network paths between A, and B or C. This results in that | ||||
the client A has to be capable of handling that when determining | ||||
congestion state that there might exist multiple destinations on the | ||||
far side of a PeerConnection and that these paths have to be treated | ||||
differently. It also results in a requirement to combine the | ||||
different congestion states into a decision to transmit a particular | ||||
RTP media stream suitable to all participants. | ||||
It is also important to note that the relay can not perform selective | ||||
relaying of some sources and not others. The reason is that the RTCP | ||||
reporting in that case becomes inconsistent and without explicit | ||||
information about it being blocked has to be interpreted as severe | ||||
congestion. | ||||
In this usage it is also necessary that the session management has | ||||
configured a common set of RTP configuration including RTP payload | ||||
formats as when A sends a packet with pt=97 it will arrive at both B | ||||
and C carrying pt=97 and having the same packetization and encoding, | ||||
no entity will have manipulated the packet. | ||||
When it comes to security there exist some additional requirements to | ||||
ensure that the property that the relay can't read the media traffic | ||||
is enforced. First of all the key to be used has to be agreed such | ||||
so that the relay doesn't get it, e.g. no DTLS-SRTP handshake with | ||||
the relay, instead some other method needs to be used. Secondly, the | ||||
keying structure has to be capable of handling multiple end-points in | ||||
the same RTP session. | ||||
The second problem can basically be solved in two ways. Either a | ||||
common master key from which all derive their per source key for | ||||
SRTP. The second alternative which might be more practical is that | ||||
each end-point has its own key used to protects all RTP/RTCP packets | ||||
it sends. Each participants key are then distributed to the other | ||||
participants. This second method could be implemented using DTLS- | ||||
SRTP to a special key server and then use Encrypted Key Transport | ||||
[I-D.ietf-avt-srtp-ekt] to distribute the actual used key to the | ||||
other participants in the RTP session Figure 18. The first one could | ||||
be achieved using MIKEY messages in SDP. | ||||
+---+ +---+ | ||||
| | +-----------+ | | | ||||
| A |<------->| DTLS-SRTP |<------->| C | | ||||
| |<-- -->| HOST |<-- -->| | | ||||
+---+ \ / +-----------+ \ / +---+ | ||||
X X | ||||
+---+ / \ +-----------+ / \ +---+ | ||||
| |<-- -->| RTP |<-- -->| | | ||||
| B |<------->| RELAY |<------->| D | | ||||
| | +-----------+ | | | ||||
+---+ +---+ | ||||
Figure 18: DTLS-SRTP host and RTP Relay Separated | ||||
The relay can still verify that a given SSRC isn't used or spoofed by | ||||
another participant within the multi-party session by binding SSRCs | ||||
on their first usage to a given source address and port pair. | ||||
Packets carrying that source SSRC from other addresses can be | ||||
suppressed to prevent spoofing. This is possible as long as SRTP is | ||||
used which leaves the SSRC of the packet originator in RTP and RTCP | ||||
packets in the clear. If such packet level method for enforcing | ||||
source authentication within the group, then there exist | ||||
cryptographic methods such as TESLA [RFC4383] that could be used for | ||||
true source authentication. | ||||
A.5. End-point Forwarding | ||||
An WebRTC end-point (B in Figure 19) will receive a WebRTC | ||||
MediaStream (set of SSRCs) over a PeerConnection (from A). For the | ||||
moment is not decided if the end-point is allowed or not to in its | ||||
turn send that WebRTC MediaStream over another PeerConnection to C. | ||||
This section discusses the RTP and end-point implications of allowing | ||||
such functionality, which on the API level is extremely simplistic to | ||||
perform. | ||||
+---+ +---+ +---+ | ||||
| A |--->| B |--->| C | | ||||
+---+ +---+ +---+ | ||||
Figure 19: MediaStream Forwarding | ||||
There exist two main approaches to how B forwards the media from A to | ||||
C. The first one is to simply relay the RTP media stream. The | ||||
second one is for B to act as a transcoder. Lets consider both | ||||
approaches. | ||||
A relay approach will result in that the WebRTC end-points will have | ||||
to have the same capabilities as being discussed in Relay | ||||
(Appendix A.4.3). Thus A will see an RTP session that is extended | ||||
beyond the PeerConnection and see two different receiving end-points | ||||
with different path characteristics (B and C). Thus A's congestion | ||||
control needs to be capable of handling this. The security solution | ||||
can either support mechanism that allows A to inform C about the key | ||||
A is using despite B and C having agreed on another set of keys. | ||||
Alternatively B will decrypt and then re-encrypt using a new key. | ||||
The relay based approach has the advantage that B does not need to | ||||
transcode the media thus both maintaining the quality of the encoding | ||||
and reducing B's complexity requirements. If the right security | ||||
solutions are supported then also C will be able to verify the | ||||
authenticity of the media coming from A. As downside A are forced to | ||||
take both B and C into consideration when delivering content. | ||||
The media transcoder approach is similar to having B act as Mixer | ||||
terminating the RTP session combined with the transcoder as discussed | ||||
in Appendix A.4.1. A will only see B as receiver of its media. B | ||||
will responsible to produce a RTP media stream suitable for the B to | ||||
C PeerConnection. This might require media transcoding for | ||||
congestion control purpose to produce a suitable bit-rate. Thus | ||||
loosing media quality in the transcoding and forcing B to spend the | ||||
resource on the transcoding. The media transcoding does result in a | ||||
separation of the two different legs removing almost all | ||||
dependencies. B could choice to implement logic to optimize its | ||||
media transcoding operation, by for example requesting media | ||||
properties that are suitable for C also, thus trying to avoid it | ||||
having to transcode the content and only forward the media payloads | ||||
between the two sides. For that optimization to be practical WebRTC | ||||
end-points have to support sufficiently good tools for codec control. | ||||
A.6. Simulcast | ||||
This section discusses simulcast in the meaning of providing a node, | ||||
for example a stream switching Mixer, with multiple different encoded | ||||
version of the same media source. In the WebRTC context that appears | ||||
to be most easily accomplished by establishing multiple | ||||
PeerConnection all being feed the same set of WebRTC MediaStreams. | ||||
Each PeerConnection is then configured to deliver a particular media | ||||
quality and thus media bit-rate. This will work well as long as the | ||||
end-point implements media encoding according to Figure 7. Then each | ||||
PeerConnection will receive an independently encoded version and the | ||||
codec parameters can be agreed specifically in the context of this | ||||
PeerConnection. | ||||
For simulcast to work one needs to prevent that the end-point deliver | ||||
content encoded as depicted in Figure 8. If a single encoder | ||||
instance is feed to multiple PeerConnections the intention of | ||||
performing simulcast will fail. | ||||
Thus it needs to be considered to explicitly signal which of the two | ||||
implementation strategies that are desired and which will be done. | ||||
At least making the application and possible the central node | ||||
interested in receiving simulcast of an end-points RTP media streams | ||||
to be aware if it will function or not. | ||||
Authors' Addresses | Authors' Addresses | |||
Colin Perkins | Colin Perkins | |||
University of Glasgow | University of Glasgow | |||
School of Computing Science | School of Computing Science | |||
Glasgow G12 8QQ | Glasgow G12 8QQ | |||
United Kingdom | United Kingdom | |||
Email: csp@csperkins.org | Email: csp@csperkins.org | |||
End of changes. 63 change blocks. | ||||
1417 lines changed or deleted | 447 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |