draft-ietf-rtcweb-audio-07.txt | draft-ietf-rtcweb-audio-08.txt | |||
---|---|---|---|---|
Network Working Group JM. Valin | Network Working Group JM. Valin | |||
Internet-Draft Mozilla | Internet-Draft Mozilla | |||
Intended status: Standards Track C. Bran | Intended status: Standards Track C. Bran | |||
Expires: April 27, 2015 Plantronics | Expires: November 1, 2015 Plantronics | |||
October 24, 2014 | April 30, 2015 | |||
WebRTC Audio Codec and Processing Requirements | WebRTC Audio Codec and Processing Requirements | |||
draft-ietf-rtcweb-audio-07 | draft-ietf-rtcweb-audio-08 | |||
Abstract | Abstract | |||
This document outlines the audio codec and processing requirements | This document outlines the audio codec and processing requirements | |||
for WebRTC client application and endpoint devices. | for WebRTC endpoints. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 27, 2015. | This Internet-Draft will expire on November 1, 2015. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2014 IETF Trust and the persons identified as the | Copyright (c) 2015 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
skipping to change at page 2, line 17 | skipping to change at page 2, line 17 | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
3. Codec Requirements . . . . . . . . . . . . . . . . . . . . . 2 | 3. Codec Requirements . . . . . . . . . . . . . . . . . . . . . 2 | |||
4. Audio Level . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 4. Audio Level . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
5. Acoustic Echo Cancellation (AEC) . . . . . . . . . . . . . . 4 | 5. Acoustic Echo Cancellation (AEC) . . . . . . . . . . . . . . 4 | |||
6. Legacy VoIP Interoperability . . . . . . . . . . . . . . . . 5 | 6. Legacy VoIP Interoperability . . . . . . . . . . . . . . . . 5 | |||
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 | 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 5 | |||
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 | 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . 5 | 10.1. Normative References . . . . . . . . . . . . . . . . . . 6 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . 6 | 10.2. Informative References . . . . . . . . . . . . . . . . . 6 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
1. Introduction | 1. Introduction | |||
An integral part of the success and adoption of the Web Real Time | An integral part of the success and adoption of the Web Real Time | |||
Communications (WebRTC) will be the voice and video interoperability | Communications (WebRTC) will be the voice and video interoperability | |||
between WebRTC applications. This specification will outline the | between WebRTC applications. This specification will outline the | |||
audio processing and codec requirements for WebRTC client | audio processing and codec requirements for WebRTC endpoint | |||
implementations. | implementations. | |||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | "OPTIONAL" in this document are to be interpreted as described in RFC | |||
2119 [RFC2119]. | ||||
3. Codec Requirements | 3. Codec Requirements | |||
To ensure a baseline level of interoperability between WebRTC | To ensure a baseline level of interoperability between WebRTC | |||
clients, a minimum set of required codecs are specified below. If | endpoints, a minimum set of required codecs are specified below. If | |||
other suitable audio codecs are available for the browser to use, it | other suitable audio codecs are available for the browser to use, it | |||
is RECOMMENDED that they are also be included in the offer in order | is RECOMMENDED that they are also be included in the offer in order | |||
to maximize the possibility to establish the session without the need | to maximize the possibility to establish the session without the need | |||
for audio transcoding. | for audio transcoding. | |||
WebRTC clients are REQUIRED to implement the following audio codecs: | WebRTC endpoints are REQUIRED to implement the following audio | |||
codecs: | ||||
o Opus [RFC6716] with the payload format specified in | o Opus [RFC6716] with the payload format specified in | |||
[I-D.ietf-payload-rtp-opus]. | [I-D.ietf-payload-rtp-opus]. | |||
o G.711 PCMA and PCMU with the payload format specified in section | o G.711 PCMA and PCMU with the payload format specified in section | |||
4.5.14 of [RFC3551]. | 4.5.14 of [RFC3551]. | |||
o [RFC3389] comfort noise (CN). Receivers MUST support RFC3389 CN | o [RFC3389] comfort noise (CN). Receivers MUST support RFC3389 CN | |||
for streams encoded with G.711 or any other supported codec that | for streams encoded with G.711 or any other supported codec that | |||
does not provide its own CN. Since Opus provides its own CN | does not provide its own CN. Since Opus provides its own CN | |||
mechanism, the use of RFC3389 CN with Opus is NOT RECOMMENDED. | mechanism, the use of RFC3389 CN with Opus is NOT RECOMMENDED. | |||
Use of DTX/CN by senders is OPTIONAL. | Use of DTX/CN by senders is OPTIONAL. | |||
o The audio/telephone-event media format as specified in [RFC4733]. | o The audio/telephone-event media format as specified in [RFC4733]. | |||
WebRTC clients are REQUIRED to be able to generate and consume the | WebRTC endpoints are REQUIRED to be able to generate and consume | |||
following events: | the following events: | |||
+------------+--------------------------------+-----------+ | +------------+--------------------------------+-----------+ | |||
|Event Code | Event Name | Reference | | |Event Code | Event Name | Reference | | |||
+------------+--------------------------------+-----------+ | +------------+--------------------------------+-----------+ | |||
| 0 | DTMF digit "0" | RFC4733 | | | 0 | DTMF digit "0" | RFC4733 | | |||
| 1 | DTMF digit "1" | RFC4733 | | | 1 | DTMF digit "1" | RFC4733 | | |||
| 2 | DTMF digit "2" | RFC4733 | | | 2 | DTMF digit "2" | RFC4733 | | |||
| 3 | DTMF digit "3" | RFC4733 | | | 3 | DTMF digit "3" | RFC4733 | | |||
| 4 | DTMF digit "4" | RFC4733 | | | 4 | DTMF digit "4" | RFC4733 | | |||
| 5 | DTMF digit "5" | RFC4733 | | | 5 | DTMF digit "5" | RFC4733 | | |||
| 6 | DTMF digit "6" | RFC4733 | | | 6 | DTMF digit "6" | RFC4733 | | |||
| 7 | DTMF digit "7" | RFC4733 | | | 7 | DTMF digit "7" | RFC4733 | | |||
| 8 | DTMF digit "8" | RFC4733 | | | 8 | DTMF digit "8" | RFC4733 | | |||
| 9 | DTMF digit "9" | RFC4733 | | | 9 | DTMF digit "9" | RFC4733 | | |||
| 10 | DTMF digit "*" | RFC4733 | | | 10 | DTMF digit "*" | RFC4733 | | |||
| 11 | DTMF digit "#" | RFC4733 | | | 11 | DTMF digit "#" | RFC4733 | | |||
+------------+--------------------------------+-----------+ | +------------+--------------------------------+-----------+ | |||
For all cases where the client is able to process audio at a sampling | For all cases where the endpoint is able to process audio at a | |||
rate higher than 8 kHz, it is RECOMMENDED that Opus be offered before | sampling rate higher than 8 kHz, it is RECOMMENDED that Opus be | |||
PCMA/PCMU. For Opus, all modes MUST be supported on the decoder | offered before PCMA/PCMU. For Opus, all modes MUST be supported on | |||
side. The choice of encoder-side modes is left to the implementer. | the decoder side. The choice of encoder-side modes is left to the | |||
Clients MAY use the offer/answer mechanism to signal a preference for | implementer. Endpoints MAY use the offer/answer mechanism to signal | |||
a particular mode or ptime. | a preference for a particular mode or ptime. | |||
For additional information on implementing codecs other than the | For additional information on implementing codecs other than the | |||
mandatory-to-implement codecs listed above, refer to | mandatory-to-implement codecs listed above, refer to | |||
[I-D.ietf-rtcweb-audio-codecs-for-interop]. | [I-D.ietf-rtcweb-audio-codecs-for-interop]. | |||
4. Audio Level | 4. Audio Level | |||
It is desirable to standardize the "on the wire" audio level for | It is desirable to standardize the "on the wire" audio level for | |||
speech transmission to avoid users having to manually adjust the | speech transmission to avoid users having to manually adjust the | |||
playback and to facilitate mixing in conferencing applications. It | playback and to facilitate mixing in conferencing applications. It | |||
skipping to change at page 4, line 12 | skipping to change at page 4, line 12 | |||
constrained to have a passband specified by G.712 and can in fact be | constrained to have a passband specified by G.712 and can in fact be | |||
sampled at any sampling rate from 8 kHz to 48 kHz and up. For this | sampled at any sampling rate from 8 kHz to 48 kHz and up. For this | |||
reason, the level SHOULD be normalized by only considering | reason, the level SHOULD be normalized by only considering | |||
frequencies above 300 Hz, regardless of the sampling rate used. The | frequencies above 300 Hz, regardless of the sampling rate used. The | |||
level SHOULD also be adapted to avoid clipping, either by lowering | level SHOULD also be adapted to avoid clipping, either by lowering | |||
the gain to a level below -19 dBm0, or through the use of a | the gain to a level below -19 dBm0, or through the use of a | |||
compressor. | compressor. | |||
Assuming 16-bit PCM with a value of +/-32767, -19 dBm0 corresponds to | Assuming 16-bit PCM with a value of +/-32767, -19 dBm0 corresponds to | |||
a root mean square (RMS) level of 2600. Only active speech should be | a root mean square (RMS) level of 2600. Only active speech should be | |||
considered in the RMS calculation. If the client has control over | considered in the RMS calculation. If the endpoint has control over | |||
the entire audio capture path, as is typically the case for a regular | the entire audio capture path, as is typically the case for a regular | |||
phone, then it is RECOMMENDED that the gain be adjusted in such a way | phone, then it is RECOMMENDED that the gain be adjusted in such a way | |||
that active speech have a level of 2600 (-19 dBm0) for an average | that active speech have a level of 2600 (-19 dBm0) for an average | |||
speaker. If the client does not have control over the entire audio | speaker. If the endpoint does not have control over the entire audio | |||
capture, as is typically the case for a software client, then the | capture, as is typically the case for a software endpoint, then the | |||
client SHOULD use automatic gain control (AGC) to dynamically adjust | endpoint SHOULD use automatic gain control (AGC) to dynamically | |||
the level to 2600 (-19 dBm0) +/- 6 dB. For music or desktop sharing | adjust the level to 2600 (-19 dBm0) +/- 6 dB. For music or desktop | |||
applications, the level SHOULD NOT be automatically adjusted and the | sharing applications, the level SHOULD NOT be automatically adjusted | |||
client SHOULD allow the user to set the gain manually. | and the endpoint SHOULD allow the user to set the gain manually. | |||
The RECOMMENDED filter for normalizing the signal energy is a second- | The RECOMMENDED filter for normalizing the signal energy is a second- | |||
order Butterworth filter with a 300 Hz cutoff frequency. | order Butterworth filter with a 300 Hz cutoff frequency. | |||
It is common for the audio output on some devices to be "calibrated" | It is common for the audio output on some devices to be "calibrated" | |||
for playing back pre-recorded "commercial" music, which is typically | for playing back pre-recorded "commercial" music, which is typically | |||
around 12 dB louder than the level recommended in this section. | around 12 dB louder than the level recommended in this section. | |||
Because of this, clients MAY increase the gain before playback. | Because of this, endpoints MAY increase the gain before playback. | |||
5. Acoustic Echo Cancellation (AEC) | 5. Acoustic Echo Cancellation (AEC) | |||
It is plausible that the dominant near to mid-term WebRTC usage model | It is plausible that the dominant near to mid-term WebRTC usage model | |||
will be people using the interactive audio and video capabilities to | will be people using the interactive audio and video capabilities to | |||
communicate with each other via web browsers running on a notebook | communicate with each other via web browsers running on a notebook | |||
computer that has built-in microphone and speakers. The notebook-as- | computer that has built-in microphone and speakers. The notebook-as- | |||
communication-device paradigm presents challenging echo cancellation | communication-device paradigm presents challenging echo cancellation | |||
problems, the specific remedy of which will not be mandated here. | problems, the specific remedy of which will not be mandated here. | |||
However, while no specific algorithm or standard will be required by | However, while no specific algorithm or standard will be required by | |||
WebRTC compatible clients, echo cancellation will improve the user | WebRTC compatible endpoints, echo cancellation will improve the user | |||
experience and should be implemented by the endpoint device. | experience and should be implemented by the endpoint device. | |||
WebRTC clients SHOULD include an AEC or some other form of echo | WebRTC endpoints SHOULD include an AEC or some other form of echo | |||
control and if that is not possible, the clients SHOULD ensure that | control. On general purpose platforms (e.g. PC), it is common for | |||
the speaker-to-microphone gain is below unity at all frequencies to | the audio capture ADC and the audio playback DAC to use different | |||
avoid instability when none of the client has echo control. For | clocks. In these cases, such as when a webcam is used for capture | |||
clients that do not control the audio capture and playback hardware, | and a separate soundcard is used for playback, the sampling rates are | |||
it is RECOMMENDED to support echo cancellation between devices | likely to differ slightly. Endpoint AECs SHOULD be robust to such | |||
running at slightly different sampling rates, such as when a webcam | conditions, unless they are shipped along with hardware that | |||
is used for microphone. | guarantees capture and playback to be sampled from the same clock. | |||
Clients SHOULD allow the entire AEC and/or the non-linear processing | Endpoints SHOULD allow the entire AEC and/or the non-linear | |||
(NLP) to be turned off for applications, such as music, that do not | processing (NLP) to be turned off for applications, such as music, | |||
behave well with the spectral attenuation methods typically used in | that do not behave well with the spectral attenuation methods | |||
NLPs. Similarly, clients SHOULD have the ability to detect the | typically used in NLPs. Similarly, endpoints SHOULD have the ability | |||
presence of a headset and disable echo cancellation. | to detect the presence of a headset and disable echo cancellation. | |||
For some applications where the remote client may not have an echo | For some applications where the remote endpoint may not have an echo | |||
canceller, the local client MAY include a far-end echo canceller, but | canceller, the local endpoint MAY include a far-end echo canceller, | |||
if that is the case, it SHOULD be disabled by default. | but if that is the case, it SHOULD be disabled by default. | |||
6. Legacy VoIP Interoperability | 6. Legacy VoIP Interoperability | |||
The codec requirements above will ensure, at a minimum, voice | The codec requirements above will ensure, at a minimum, voice | |||
interoperability capabilities between WebRTC client applications and | interoperability capabilities between WebRTC endpoints applications | |||
legacy phone systems that support G.711. | and legacy phone systems that support G.711. | |||
7. IANA Considerations | 7. IANA Considerations | |||
This document makes no request of IANA. | This document makes no request of IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
8. Security Considerations | 8. Security Considerations | |||
For security considerations regarding the codecs themselves please | ||||
refer their specifications, including [RFC6716], | ||||
[I-D.ietf-payload-rtp-opus], [RFC3551], [RFC3389], and [RFC4733]. | ||||
Likewise, consult the RTP base specification for security RTP-based | ||||
security considerations. WebRTC security is further discussed in | ||||
[I-D.ietf-rtcweb-security] and [I-D.ietf-rtcweb-security-arch] and | ||||
[I-D.ietf-rtcweb-rtp-usage]. | ||||
Implementers should consider whether the use of VBR is appropriate | Implementers should consider whether the use of VBR is appropriate | |||
for their application based on [RFC6562]. Encryption and | for their application based on [RFC6562]. Encryption and | |||
authentication issues are beyond the scope of this document. | authentication issues are beyond the scope of this document. | |||
9. Acknowledgements | 9. Acknowledgements | |||
This draft incorporates ideas and text from various other drafts. In | This draft incorporates ideas and text from various other drafts. In | |||
particularly we would like to acknowledge, and say thanks for, work | particularly we would like to acknowledge, and say thanks for, work | |||
we incorporated from Harald Alvestrand and Cullen Jennings. | we incorporated from Harald Alvestrand and Cullen Jennings. | |||
skipping to change at page 6, line 18 | skipping to change at page 6, line 29 | |||
[RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the | |||
Opus Audio Codec", RFC 6716, September 2012. | Opus Audio Codec", RFC 6716, September 2012. | |||
[RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of | [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of | |||
Variable Bit Rate Audio with Secure RTP", RFC 6562, March | Variable Bit Rate Audio with Secure RTP", RFC 6562, March | |||
2012. | 2012. | |||
[I-D.ietf-payload-rtp-opus] | [I-D.ietf-payload-rtp-opus] | |||
Spittka, J., Vos, K., and J. Valin, "RTP Payload Format | Spittka, J., Vos, K., and J. Valin, "RTP Payload Format | |||
for Opus Speech and Audio Codec", draft-ietf-payload-rtp- | for the Opus Speech and Audio Codec", draft-ietf-payload- | |||
opus-03 (work in progress), July 2014. | rtp-opus-11 (work in progress), April 2015. | |||
10.2. Informative References | 10.2. Informative References | |||
[I-D.ietf-rtcweb-security] | ||||
Rescorla, E., "Security Considerations for WebRTC", draft- | ||||
ietf-rtcweb-security-08 (work in progress), February 2015. | ||||
[I-D.ietf-rtcweb-security-arch] | ||||
Rescorla, E., "WebRTC Security Architecture", draft-ietf- | ||||
rtcweb-security-arch-11 (work in progress), March 2015. | ||||
[I-D.ietf-rtcweb-rtp-usage] | ||||
Perkins, C., Westerlund, M., and J. Ott, "Web Real-Time | ||||
Communication (WebRTC): Media Transport and Use of RTP", | ||||
draft-ietf-rtcweb-rtp-usage-23 (work in progress), March | ||||
2015. | ||||
[I-D.ietf-rtcweb-audio-codecs-for-interop] | [I-D.ietf-rtcweb-audio-codecs-for-interop] | |||
Proust, S., Berger, E., Feiten, B., Bogineni, K., Lei, M., | Proust, S., Berger, E., Feiten, B., Bogineni, K., Lei, M., | |||
and E. Marocco, "Additional WebRTC audio codecs for | and E. Marocco, "Additional WebRTC audio codecs for | |||
interoperability with legacy networks.", draft-ietf- | interoperability.", draft-ietf-rtcweb-audio-codecs-for- | |||
rtcweb-audio-codecs-for-interop-00 (work in progress), | interop-01 (work in progress), January 2015. | |||
September 2014. | ||||
Authors' Addresses | Authors' Addresses | |||
Jean-Marc Valin | Jean-Marc Valin | |||
Mozilla | Mozilla | |||
331 E. Evelyn Avenue | 331 E. Evelyn Avenue | |||
Mountain View, CA 94041 | Mountain View, CA 94041 | |||
USA | USA | |||
Email: jmvalin@jmvalin.ca | Email: jmvalin@jmvalin.ca | |||
End of changes. 25 change blocks. | ||||
53 lines changed or deleted | 76 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |