draft-ietf-rtcweb-audio-03.txt | draft-ietf-rtcweb-audio-04.txt | |||
---|---|---|---|---|
Network Working Group JM. Valin | Network Working Group JM. Valin | |||
Internet-Draft Mozilla | Internet-Draft Mozilla | |||
Intended status: Standards Track C. Bran | Intended status: Standards Track C. Bran | |||
Expires: April 18, 2014 Plantronics | Expires: July 31, 2014 Plantronics | |||
October 15, 2013 | January 27, 2014 | |||
WebRTC Audio Codec and Processing Requirements | WebRTC Audio Codec and Processing Requirements | |||
draft-ietf-rtcweb-audio-03 | draft-ietf-rtcweb-audio-04 | |||
Abstract | Abstract | |||
This document outlines the audio codec and processing requirements | This document outlines the audio codec and processing requirements | |||
for WebRTC client application and endpoint devices. | for WebRTC client application and endpoint devices. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
skipping to change at page 1, line 32 | skipping to change at page 1, line 32 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 18, 2014. | This Internet-Draft will expire on July 31, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
3. Codec Requirements . . . . . . . . . . . . . . . . . . . . . 2 | 3. Codec Requirements . . . . . . . . . . . . . . . . . . . . . 2 | |||
4. Audio Level . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 4. Audio Level . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
5. Acoustic Echo Cancellation (AEC) . . . . . . . . . . . . . . 4 | 5. Acoustic Echo Cancellation (AEC) . . . . . . . . . . . . . . 4 | |||
6. Legacy VoIP Interoperability . . . . . . . . . . . . . . . . 4 | 6. Legacy VoIP Interoperability . . . . . . . . . . . . . . . . 4 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 4 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 | |||
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 | |||
10. Normative References . . . . . . . . . . . . . . . . . . . . 5 | 10. Normative References . . . . . . . . . . . . . . . . . . . . 5 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 5 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
1. Introduction | 1. Introduction | |||
An integral part of the success and adoption of the Web Real Time | An integral part of the success and adoption of the Web Real Time | |||
Communications (WebRTC) will be the voice and video interoperability | Communications (WebRTC) will be the voice and video interoperability | |||
between WebRTC applications. This specification will outline the | between WebRTC applications. This specification will outline the | |||
audio processing and codec requirements for WebRTC client | audio processing and codec requirements for WebRTC client | |||
implementations. | implementations. | |||
2. Terminology | 2. Terminology | |||
skipping to change at page 2, line 44 | skipping to change at page 2, line 44 | |||
other suitable audio codecs are available for the browser to use, it | other suitable audio codecs are available for the browser to use, it | |||
is RECOMMENDED that they are also be included in the offer in order | is RECOMMENDED that they are also be included in the offer in order | |||
to maximize the possibility to establish the session without the need | to maximize the possibility to establish the session without the need | |||
for audio transcoding. | for audio transcoding. | |||
WebRTC clients are REQUIRED to implement the following audio codecs. | WebRTC clients are REQUIRED to implement the following audio codecs. | |||
o Opus [RFC6716], with the payload format specified in [Opus-RTP] | o Opus [RFC6716], with the payload format specified in [Opus-RTP] | |||
and any ptime value up to 120 ms | and any ptime value up to 120 ms | |||
o G.711 PCMA and PCMU with one channel, a rate of 8000 Hz and a | o G.711 PCMA and PCMU with one channel, a rate of 8000 Hz and any | |||
ptime of 20 - see section 4.5.14 of [RFC3551] | ptime value up to 120 ms - see section 4.5.14 of [RFC3551] | |||
o Telephone Event - [RFC4733] | o The audio/telephone-event media format as specified in [RFC4733]. | |||
WebRTC clients are REQUIRED to be able to generate and consume the | ||||
following events: | ||||
+------------+--------------------------------+-----------+ | ||||
|Event Code | Event Name | Reference | | ||||
+------------+--------------------------------+-----------+ | ||||
| 0 | DTMF digit "0" | RFC4733 | | ||||
| 1 | DTMF digit "1" | RFC4733 | | ||||
| 2 | DTMF digit "2" | RFC4733 | | ||||
| 3 | DTMF digit "3" | RFC4733 | | ||||
| 4 | DTMF digit "4" | RFC4733 | | ||||
| 5 | DTMF digit "5" | RFC4733 | | ||||
| 6 | DTMF digit "6" | RFC4733 | | ||||
| 7 | DTMF digit "7" | RFC4733 | | ||||
| 8 | DTMF digit "8" | RFC4733 | | ||||
| 9 | DTMF digit "9" | RFC4733 | | ||||
| 10 | DTMF digit "*" | RFC4733 | | ||||
| 11 | DTMF digit "#" | RFC4733 | | ||||
+------------+--------------------------------+-----------+ | ||||
For all cases where the client is able to process audio at a sampling | For all cases where the client is able to process audio at a sampling | |||
rate higher than 8 kHz, it is RECOMMENDED that Opus be offered before | rate higher than 8 kHz, it is RECOMMENDED that Opus be offered before | |||
PCMA/PCMU. For Opus, all modes MUST be supported on the decoder | PCMA/PCMU. For Opus, all modes MUST be supported on the decoder | |||
side. The choice of encoder-side modes is left to the implementer. | side. The choice of encoder-side modes is left to the implementer. | |||
Clients MAY use the offer/answer mechanism to signal a preference for | Clients MAY use the offer/answer mechanism to signal a preference for | |||
a particular mode or ptime. | a particular mode or ptime. | |||
4. Audio Level | 4. Audio Level | |||
skipping to change at page 3, line 25 | skipping to change at page 3, line 45 | |||
and G.115, which recommend an active audio level of -19 dBm0. | and G.115, which recommend an active audio level of -19 dBm0. | |||
However, unlike G.169 and G.115, the audio for WebRTC is not | However, unlike G.169 and G.115, the audio for WebRTC is not | |||
constrained to have a passband specified by G.712 and can in fact be | constrained to have a passband specified by G.712 and can in fact be | |||
sampled at any sampling rate from 8 kHz to 48 kHz and up. For this | sampled at any sampling rate from 8 kHz to 48 kHz and up. For this | |||
reason, the level SHOULD be normalized by only considering | reason, the level SHOULD be normalized by only considering | |||
frequencies above 300 Hz, regardless of the sampling rate used. The | frequencies above 300 Hz, regardless of the sampling rate used. The | |||
level SHOULD also be adapted to avoid clipping, either by lowering | level SHOULD also be adapted to avoid clipping, either by lowering | |||
the gain to a level below -19 dBm0, or through the use of a | the gain to a level below -19 dBm0, or through the use of a | |||
compressor. | compressor. | |||
AUTHORS' NOTE: The idea of using the same level as what the ITU-T | ||||
recommends is that it should improve inter-operability while at the | ||||
same time maintaining sufficient dynamic range and reducing the risk | ||||
of clipping. The main drawbacks are that the resulting level is | ||||
about 12 dB lower than typical "commercial music" levels and it | ||||
leaves room for ill-behaved clients to be much louder than a normal | ||||
client. While using music-type levels is not really an option (it | ||||
would require using the same compressor-limitors that studios use), | ||||
it would be possible to have a level slightly higher (e.g. 3 dB) | ||||
than what is recommended above without causing interoperability | ||||
problems. | ||||
Assuming 16-bit PCM with a value of +/-32767, -19 dBm0 corresponds to | Assuming 16-bit PCM with a value of +/-32767, -19 dBm0 corresponds to | |||
a root mean square (RMS) level of 2600. Only active speech should be | a root mean square (RMS) level of 2600. Only active speech should be | |||
considered in the RMS calculation. If the client has control over | considered in the RMS calculation. If the client has control over | |||
the entire audio capture path, as is typically the case for a regular | the entire audio capture path, as is typically the case for a regular | |||
phone, then it is RECOMMENDED that the gain be adjusted in such a way | phone, then it is RECOMMENDED that the gain be adjusted in such a way | |||
that active speech have a level of 2600 (-19 dBm0) for an average | that active speech have a level of 2600 (-19 dBm0) for an average | |||
speaker. If the client does not have control over the entire audio | speaker. If the client does not have control over the entire audio | |||
capture, as is typically the case for a software client, then the | capture, as is typically the case for a software client, then the | |||
client SHOULD use automatic gain control (AGC) to dynamically adjust | client SHOULD use automatic gain control (AGC) to dynamically adjust | |||
the level to 2600 (-19 dBm0) +/- 6 dB. For music or desktop sharing | the level to 2600 (-19 dBm0) +/- 6 dB. For music or desktop sharing | |||
skipping to change at page 4, line 22 | skipping to change at page 4, line 30 | |||
It is plausible that the dominant near to mid-term WebRTC usage model | It is plausible that the dominant near to mid-term WebRTC usage model | |||
will be people using the interactive audio and video capabilities to | will be people using the interactive audio and video capabilities to | |||
communicate with each other via web browsers running on a notebook | communicate with each other via web browsers running on a notebook | |||
computer that has built-in microphone and speakers. The notebook-as- | computer that has built-in microphone and speakers. The notebook-as- | |||
communication-device paradigm presents challenging echo cancellation | communication-device paradigm presents challenging echo cancellation | |||
problems, the specific remedy of which will not be mandated here. | problems, the specific remedy of which will not be mandated here. | |||
However, while no specific algorithm or standard will be required by | However, while no specific algorithm or standard will be required by | |||
WebRTC compatible clients, echo cancellation will improve the user | WebRTC compatible clients, echo cancellation will improve the user | |||
experience and should be implemented by the endpoint device. | experience and should be implemented by the endpoint device. | |||
WebRTC clients SHOULD include an AEC and if that is not possible, the | WebRTC clients SHOULD include an AEC or some other form of echo | |||
clients SHOULD ensure that the speaker-to-microphone gain is below | control and if that is not possible, the clients SHOULD ensure that | |||
unity at all frequencies to avoid instability when none of the client | the speaker-to-microphone gain is below unity at all frequencies to | |||
has echo cancellation. For clients that do not control the audio | avoid instability when none of the client has echo control. For | |||
capture and playback devices directly, it is RECOMMENDED to support | clients that do not control the audio capture and playback hardware, | |||
echo cancellation between devices running at slight different | it is RECOMMENDED to support echo cancellation between devices | |||
sampling rates, such as when a webcam is used for microphone. | running at slightly different sampling rates, such as when a webcam | |||
is used for microphone. | ||||
The client SHOULD allow either the entire AEC or the non-linear | Clients SHOULD allow the entire AEC and/or the non-linear processing | |||
processing (NLP) to be turned off for applications, such as music, | (NLP) to be turned off for applications, such as music, that do not | |||
that do not behave well with the spectral attenuation methods | behave well with the spectral attenuation methods typically used in | |||
typically used in NLPs. It SHOULD have the ability to detect the | NLPs. Similarly, clients SHOULD have the ability to detect the | |||
presence of a headset and disable echo cancellation. | presence of a headset and disable echo cancellation. | |||
For some applications where the remote client may not have an echo | For some applications where the remote client may not have an echo | |||
canceller, the local client MAY include a far-end echo canceller, but | canceller, the local client MAY include a far-end echo canceller, but | |||
if that is the case, it SHOULD be disabled by default. | if that is the case, it SHOULD be disabled by default. | |||
6. Legacy VoIP Interoperability | 6. Legacy VoIP Interoperability | |||
The codec requirements above will ensure, at a minimum, voice | The codec requirements above will ensure, at a minimum, voice | |||
interoperability capabilities between WebRTC client applications and | interoperability capabilities between WebRTC client applications and | |||
legacy phone systems. | legacy phone systems. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document makes no request of IANA. | This document makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
skipping to change at page 5, line 7 | skipping to change at page 5, line 17 | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document makes no request of IANA. | This document makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
8. Security Considerations | 8. Security Considerations | |||
The codec requirements have no additional security considerations | Implementers should consider whether the use of VBR is appropriate | |||
other than those captured in | for their application based on [RFC6562]. Encryption and | |||
[I-D.ekr-security-considerations-for-rtc-web]. | authentication issues are beyond the scope of this document. | |||
9. Acknowledgements | 9. Acknowledgements | |||
This draft incorporates ideas and text from various other drafts. In | This draft incorporates ideas and text from various other drafts. In | |||
particularly we would like to acknowledge, and say thanks for, work | particularly we would like to acknowledge, and say thanks for, work | |||
we incorporated from Harald Alvestrand and Cullen Jennings. | we incorporated from Harald Alvestrand and Cullen Jennings. | |||
10. Normative References | 10. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
skipping to change at page 5, line 33 | skipping to change at page 5, line 43 | |||
Video Conferences with Minimal Control", STD 65, RFC 3551, | Video Conferences with Minimal Control", STD 65, RFC 3551, | |||
July 2003. | July 2003. | |||
[RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF | [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF | |||
Digits, Telephony Tones, and Telephony Signals", RFC 4733, | Digits, Telephony Tones, and Telephony Signals", RFC 4733, | |||
December 2006. | December 2006. | |||
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | |||
Opus Audio Codec", RFC 6716, September 2012. | Opus Audio Codec", RFC 6716, September 2012. | |||
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of | ||||
Variable Bit Rate Audio with Secure RTP", RFC 6562, March | ||||
2012. | ||||
[Opus-RTP] | [Opus-RTP] | |||
Spittka, J., Vos, K., and JM. Valin, "RTP Payload Format | Spittka, J., Vos, K., and JM. Valin, "RTP Payload Format | |||
for Opus Codec", August 2013. | for Opus Codec", August 2013. | |||
[I-D.ekr-security-considerations-for-rtc-web] | ||||
Rescorla, E.K., "Security Considerations for RTC-Web", May | ||||
2011. | ||||
Authors' Addresses | Authors' Addresses | |||
Jean-Marc Valin | Jean-Marc Valin | |||
Mozilla | Mozilla | |||
650 Castro Street | 650 Castro Street | |||
Mountain View, CA 94041 | Mountain View, CA 94041 | |||
USA | USA | |||
Email: jmvalin@jmvalin.ca | Email: jmvalin@jmvalin.ca | |||
Cary Bran | Cary Bran | |||
Plantronics | Plantronics | |||
345 Encinial Street | 345 Encinial Street | |||
Santa Cruz, CA 95060 | Santa Cruz, CA 95060 | |||
USA | USA | |||
Phone: +1 206 661-2398 | Phone: +1 206 661-2398 | |||
Email: cary.bran@plantronics.com | Email: cary.bran@plantronics.com | |||
End of changes. 16 change blocks. | ||||
41 lines changed or deleted | 49 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |