--- 1/draft-ietf-rtcweb-security-01.txt 2012-03-13 00:14:18.074671251 +0100 +++ 2/draft-ietf-rtcweb-security-02.txt 2012-03-13 00:14:18.126671748 +0100 @@ -1,18 +1,18 @@ RTC-Web E. Rescorla Internet-Draft RTFM, Inc. -Intended status: Standards Track October 30, 2011 -Expires: May 2, 2012 +Intended status: Standards Track March 12, 2012 +Expires: September 13, 2012 Security Considerations for RTC-Web - draft-ietf-rtcweb-security-01 + draft-ietf-rtcweb-security-02 Abstract The Real-Time Communications on the Web (RTC-Web) working group is tasked with standardizing protocols for real-time communications between Web browsers. The major use cases for RTC-Web technology are real-time audio and/or video calls, Web conferencing, and direct data transfer. Unlike most conventional real-time systems (e.g., SIP- based soft phones) RTC-Web communications are directly controlled by some Web server, which poses new security challenges. For instance, @@ -42,25 +42,25 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on May 2, 2012. + This Internet-Draft will expire on September 13, 2012. Copyright Notice - Copyright (c) 2011 IETF Trust and the persons identified as the + Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -73,73 +73,52 @@ modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Table of Contents - 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 - 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 - 3. The Browser Threat Model . . . . . . . . . . . . . . . . . . . 6 - 3.1. Access to Local Resources . . . . . . . . . . . . . . . . 7 - 3.2. Same Origin Policy . . . . . . . . . . . . . . . . . . . . 7 + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 + 3. The Browser Threat Model . . . . . . . . . . . . . . . . . . . 5 + 3.1. Access to Local Resources . . . . . . . . . . . . . . . . 6 + 3.2. Same Origin Policy . . . . . . . . . . . . . . . . . . . . 6 3.3. Bypassing SOP: CORS, WebSockets, and consent to - communicate . . . . . . . . . . . . . . . . . . . . . . . 8 - 4. Security for RTC-Web Applications . . . . . . . . . . . . . . 8 + communicate . . . . . . . . . . . . . . . . . . . . . . . 7 + 4. Security for RTC-Web Applications . . . . . . . . . . . . . . 7 4.1. Access to Local Devices . . . . . . . . . . . . . . . . . 8 - 4.1.1. Calling Scenarios and User Expectations . . . . . . . 9 - 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 9 + 4.1.1. Calling Scenarios and User Expectations . . . . . . . 8 + 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 8 4.1.1.2. Calling the Site You're On . . . . . . . . . . . . 9 - 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 10 + 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 9 4.1.2. Origin-Based Security . . . . . . . . . . . . . . . . 10 - 4.1.3. Security Properties of the Calling Page . . . . . . . 12 - 4.2. Communications Consent Verification . . . . . . . . . . . 13 - 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 13 - 4.2.2. Masking . . . . . . . . . . . . . . . . . . . . . . . 14 - 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 14 - 4.2.4. IP Location Privacy . . . . . . . . . . . . . . . . . 15 - 4.3. Communications Security . . . . . . . . . . . . . . . . . 15 - 4.3.1. Protecting Against Retrospective Compromise . . . . . 16 - 4.3.2. Protecting Against During-Call Attack . . . . . . . . 17 - 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 17 - 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 18 - 4.3.2.3. Recommendations . . . . . . . . . . . . . . . . . 19 - 5. Security Considerations . . . . . . . . . . . . . . . . . . . 19 - 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 + 4.1.3. Security Properties of the Calling Page . . . . . . . 11 + 4.2. Communications Consent Verification . . . . . . . . . . . 12 + 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 4.2.2. Masking . . . . . . . . . . . . . . . . . . . . . . . 13 + 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 13 + 4.2.4. IP Location Privacy . . . . . . . . . . . . . . . . . 14 + 4.3. Communications Security . . . . . . . . . . . . . . . . . 14 + 4.3.1. Protecting Against Retrospective Compromise . . . . . 15 + 4.3.2. Protecting Against During-Call Attack . . . . . . . . 16 + 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 16 + 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 17 + 4.3.2.3. Third Party Identity . . . . . . . . . . . . . . . 18 + 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 + 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 7.1. Normative References . . . . . . . . . . . . . . . . . . . 19 - 7.2. Informative References . . . . . . . . . . . . . . . . . . 20 - Appendix A. A Proposed Security Architecture [No Consensus on - This] . . . . . . . . . . . . . . . . . . . . . . . . 22 - A.1. Trust Hierarchy . . . . . . . . . . . . . . . . . . . . . 22 - A.1.1. Authenticated Entities . . . . . . . . . . . . . . . . 22 - A.1.2. Unauthenticated Entities . . . . . . . . . . . . . . . 23 - A.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 23 - A.2.1. Initial Signaling . . . . . . . . . . . . . . . . . . 24 - A.2.2. Media Consent Verification . . . . . . . . . . . . . . 26 - A.2.3. DTLS Handshake . . . . . . . . . . . . . . . . . . . . 26 - A.2.4. Communications and Consent Freshness . . . . . . . . . 27 - A.3. Detailed Technical Description . . . . . . . . . . . . . . 27 - A.3.1. Origin and Web Security Issues . . . . . . . . . . . . 27 - A.3.2. Device Permissions Model . . . . . . . . . . . . . . . 28 - A.3.3. Communications Consent . . . . . . . . . . . . . . . . 29 - A.3.4. IP Location Privacy . . . . . . . . . . . . . . . . . 29 - A.3.5. Communications Security . . . . . . . . . . . . . . . 30 - A.3.6. Web-Based Peer Authentication . . . . . . . . . . . . 31 - A.3.6.1. Generic Concepts . . . . . . . . . . . . . . . . . 31 - A.3.6.2. BrowserID . . . . . . . . . . . . . . . . . . . . 32 - A.3.6.3. OAuth . . . . . . . . . . . . . . . . . . . . . . 35 - A.3.6.4. Generic Identity Support . . . . . . . . . . . . . 36 - Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 36 + 7.2. Informative References . . . . . . . . . . . . . . . . . . 19 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 21 1. Introduction The Real-Time Communications on the Web (RTC-Web) working group is tasked with standardizing protocols for real-time communications between Web browsers. The major use cases for RTC-Web technology are real-time audio and/or video calls, Web conferencing, and direct data transfer. Unlike most conventional real-time systems, (e.g., SIP- based[RFC3261] soft phones) RTC-Web communications are directly controlled by some Web server. A simple case is shown below. @@ -182,20 +161,24 @@ particular, it needs to contend with malicious calling services. For example, if the calling service can cause the browser to make a call at any time to any callee of its choice, then this facility can be used to bug a user's computer without their knowledge, simply by placing a call to some recording service. More subtly, if the exposed APIs allow the server to instruct the browser to send arbitrary content, then they can be used to bypass firewalls or mount denial of service attacks. Any successful system will need to be resistant to this and other attacks. + A companion document [I-D.ietf-rtcweb-security-arch] describes a + security architecture intended to address the issues raised in this + document. + 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 3. The Browser Threat Model The security requirements for RTC-Web follow directly from the requirement that the browser's job is to protect the user. Huang et @@ -255,26 +238,26 @@ at all. For instance, there is no real way to run specific executables directly from a script (though the user can of course be induced to download executable files and run them). 3.2. Same Origin Policy Many other resources are accessible but isolated. For instance, while scripts are allowed to make HTTP requests via the XMLHttpRequest() API those requests are not allowed to be made to any server, but rather solely to the same ORIGIN from whence the script - came.[I-D.abarth-origin] (although CORS [CORS] and WebSockets - [I-D.ietf-hybi-thewebsocketprotocol] provides a escape hatch from - this restriction, as described below.) This SAME ORIGIN POLICY (SOP) - prevents server A from mounting attacks on server B via the user's - browser, which protects both the user (e.g., from misuse of his - credentials) and the server (e.g., from DoS attack). + came.[RFC6454] (although CORS [CORS] and WebSockets [RFC6455] + provides a escape hatch from this restriction, as described below.) + This SAME ORIGIN POLICY (SOP) prevents server A from mounting attacks + on server B via the user's browser, which protects both the user + (e.g., from misuse of his credentials) and the server (e.g., from DoS + attack). More generally, SOP forces scripts from each site to run in their own, isolated, sandboxes. While there are techniques to allow them to interact, those interactions generally must be mutually consensual (by each site) and are limited to certain channels. For instance, multiple pages/browser panes from the same origin can read each other's JS variables, but pages from the different origins--or even iframes from different origins on the same page--cannot. 3.3. Bypassing SOP: CORS, WebSockets, and consent to communicate @@ -286,27 +269,27 @@ The W3C Cross-Origin Resource Sharing (CORS) spec [CORS] is a response to this demand. In CORS, when a script from origin A executes what would otherwise be a forbidden cross-origin request, the browser instead contacts the target server to determine whether it is willing to allow cross-origin requests from A. If it is so willing, the browser then allows the request. This consent verification process is designed to safely allow cross-origin requests. While CORS is designed to allow cross-origin HTTP requests, - WebSockets [I-D.ietf-hybi-thewebsocketprotocol] allows cross-origin - establishment of transparent channels. Once a WebSockets connection - has been established from a script to a site, the script can exchange - any traffic it likes without being required to frame it as a series - of HTTP request/response transactions. As with CORS, a WebSockets - transaction starts with a consent verification stage to avoid - allowing scripts to simply send arbitrary data to another origin. + WebSockets [RFC6455] allows cross-origin establishment of transparent + channels. Once a WebSockets connection has been established from a + script to a site, the script can exchange any traffic it likes + without being required to frame it as a series of HTTP request/ + response transactions. As with CORS, a WebSockets transaction starts + with a consent verification stage to avoid allowing scripts to simply + send arbitrary data to another origin. While consent verification is conceptually simple--just do a handshake before you start exchanging the real data--experience has shown that designing a correct consent verification system is difficult. In particular, Huang et al. [huang-w2sp] have shown vulnerabilities in the existing Java and Flash consent verification techniques and in a simplified version of the WebSockets handshake. In particular, it is important to be wary of CROSS-PROTOCOL attacks in which the attacking script generates traffic which is acceptable to some non-Web protocol state machine. In order to resist this form @@ -449,47 +432,42 @@ cases. As discussed above, individual consent puts the user's approval in the UI flow for every call. Not only does this quickly become annoying but it can train the user to simply click "OK", at which point the consent becomes useless. Thus, while it may be necessary to have individual consent in some case, this is not a suitable solution for (for instance) the calling service case. Where necessary, in-flow user interfaces must be carefully designed to avoid the risk of the user blindly clicking through. The other two options are designed to restrict calls to a given - target. Unfortunately, Callee-oriented consent does not work well - because a malicious site can claim that the user is calling any user - of his choice. One fix for this is to tie calls to a + target. Callee-oriented consent provided by the calling site not + work well because a malicious site can claim that the user is calling + any user of his choice. One fix for this is to tie calls to a cryptographically established identity. While not suitable for all cases, this approach may be useful for some. If we consider the advertising case described in Section 4.1.1.3, it's not particularly convenient to require the advertiser to instantiate an iframe on the hosting site just to get permission; a more convenient approach is to cryptographically tie the advertiser's certificate to the communication directly. We're still tying permissions to origin here, but to the media origin (and-or destination) rather than to the - Web origin. + Web origin. [I-D.ietf-rtcweb-security-arch] and + [I-D.rescorla-rtcweb-generic-idp] describe mechanisms which + facilitate this sort of consent. Another case where media-level cryptographic identity makes sense is when a user really does not trust the calling site. For instance, I might be worried that the calling service will attempt to bug my computer, but I also want to be able to conveniently call my friends. If consent is tied to particular communications endpoints, then my - risk is limited. However, this is also not that convenient an - interface, since managing individual user permissions can be painful. - - While this is primarily a question not for IETF, it should be clear - that there is no really good answer. In general, if you cannot trust - the site which you have authorized for calling not to bug you then - your security situation is not really ideal. It is RECOMMENDED that - browsers have explicit (and obvious) indicators that they are in a - call in order to mitigate this risk. + risk is limited. Naturally, it is somewhat challenging to design UI + primitives which express this sort of policy. 4.1.3. Security Properties of the Calling Page Origin-based security is intended to secure against web attackers. However, we must also consider the case of network attackers. Consider the case where I have granted permission to a calling service by an origin that has the HTTP scheme, e.g., http://calling-service.example.com. If I ever use my computer on an unsecured network (e.g., a hotspot or if my own home wireless network is insecure), and browse any HTTP site, then an attacker can bug my @@ -514,25 +492,20 @@ Even if calls are only possible from HTTPS sites, if the site embeds active content (e.g., JavaScript) that is fetched over HTTP or from an untrusted site, because that JavaScript is executed in the security context of the page [finer-grained]. Thus, it is also dangerous to allow RTC-Web functionality from HTTPS origins that embed mixed content. Note: this issue is not restricted to PAGES which contain mixed content. If a page from a given origin ever loads mixed content then it is possible for a network attacker to infect the browser's notion of that origin semi-permanently. - [[ OPEN ISSUE: What recommendation should IETF make about (a) - whether RTCWeb long-term consent should be available over HTTP pages - and (b) How to handle origins where the consent is to an HTTPS URL - but the page contains active mixed content? ]] - 4.2. Communications Consent Verification As discussed in Section 3.3, allowing web applications unrestricted network access via the browser introduces the risk of using the browser as an attack platform against machines which would not otherwise be accessible to the malicious site, for instance because they are topologically restricted (e.g., behind a firewall or NAT). In order to prevent this form of attack as well as cross-protocol attacks it is important to require that the target of traffic explicitly consent to receiving the traffic in question. Until that @@ -574,21 +547,23 @@ As long as communication is limited to UDP, then this risk is probably limited, thus masking is not required for UDP. I.e., once communications consent has been verified, it is most likely safe to allow the implementation to send arbitrary UDP traffic to the chosen destination, provided that the STUN keepalives continue to succeed. In particular, this is true for the data channel if DTLS is used because DTLS (with the anti-chosen plaintext mechanisms required by TLS 1.1) does not allow the attacker to generate predictable ciphertext. However, with TCP the risk of transparent proxies becomes much more severe. If TCP is to be used, then WebSockets - style masking MUST be employed. + style masking MUST be employed. [Note: current thinking in the + RTCWEB WG is not to support TCP and to support SCTP over DTLS, thus + removing the need for masking.] 4.2.3. Backward Compatibility A requirement to use ICE limits compatibility with legacy non-ICE clients. It seems unsafe to completely remove the requirement for some check. All proposed checks have the common feature that the browser sends some message to the candidate traffic recipient and refuses to send other traffic until that message has been replied to. The message/reply pair must be generated in such a way that an attacker who controls the Web application cannot forge them, @@ -630,24 +605,20 @@ probably the best choice. Once initial consent is verified, we also need to verify continuing consent, in order to avoid attacks where two people briefly share an IP (e.g., behind a NAT in an Internet cafe) and the attacker arranges for a large, unstoppable, traffic flow to the network and then leaves. The appropriate technologies here are fairly similar to those for initial consent, though are perhaps weaker since the threats is less severe. - [[ OPEN ISSUE: Exactly what should be the requirements here? - Proposals include ICE all the time or ICE but with allowing one of - these non-ICE things for legacy. ]] - 4.2.4. IP Location Privacy Note that as soon as the callee sends their ICE candidates, the callee learns the callee's IP addresses. The callee's server reflexive address reveals a lot of information about the callee's location. In order to avoid tracking, implementations may wish to suppress the start of ICE negotiation until the callee has answered. In addition, either side may wish to hide their location entirely by forcing all traffic through a TURN server. @@ -817,30 +788,36 @@ avoid them needing to check it on every call. However, this is problematic for reasons indicated in Section 4.3.2.1. In principle it is of course possible to render a different UI element to indicate that calls are using an unauthenticated set of keying material (recall that the attacker can just present a slightly different name so that the attack shows the same UI as a call to a new device or to someone you haven't called before) but as a practical matter, users simply ignore such indicators even in the rather more dire case of mixed content warnings. -4.3.2.3. Recommendations - - [[ OPEN ISSUE: What are the best UI recommendations to make? - Proposal: take the text from [I-D.kaufman-rtcweb-security-ui] - Section 2]] +4.3.2.3. Third Party Identity - [[ OPEN ISSUE: Exactly what combination of media security primitives - should be specified and/or mandatory to implement? In particular, - should we allow DTLS-SRTP only, or both DTLS-SRTP and SDES. Should - we allow RTP for backward compatibility? ]] + The conventional approach to providing communications identity has of + course been to have some third party identity system (e.g., PKI) to + authenticate the endpoints. Such mechanisms have proven to be too + cumbersome for use by typical users (and nearly too cumbersome for + administrators). However, a new generation of Web-based identity + providers (BrowserID, Federated Google Login, Facebook Connect, + OAuth, OpenID, WebFinger), has recently been developed and use Web + technologies to provide lightweight (from the user's perspective) + third-party authenticated transactions. It is possible (see + [I-D.rescorla-rtcweb-generic-idp]) to use systems of this type to + authenticate RTCWEB calls, linking them to existing user notions of + identity (e.g., Facebook adjacencies). Calls which are authenticated + in this fashion are naturally resistant even to active MITM attack by + the calling site. 5. Security Considerations This entire document is about security. 6. Acknowledgements Bernard Aboba, Harald Alvestrand, Cullen Jennings, Hadriel Kaplan (S 4.2.1), Matthew Kaufman, Magnus Westerland. @@ -838,44 +815,44 @@ 5. Security Considerations This entire document is about security. 6. Acknowledgements Bernard Aboba, Harald Alvestrand, Cullen Jennings, Hadriel Kaplan (S 4.2.1), Matthew Kaufman, Magnus Westerland. 7. References - 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 7.2. Informative References [CORS] van Kesteren, A., "Cross-Origin Resource Sharing". - [I-D.abarth-origin] - Barth, A., "The Web Origin Concept", - draft-abarth-origin-09 (work in progress), November 2010. - - [I-D.ietf-hybi-thewebsocketprotocol] - Fette, I. and A. Melnikov, "The WebSocket protocol", - draft-ietf-hybi-thewebsocketprotocol-17 (work in - progress), September 2011. + [I-D.ietf-rtcweb-security-arch] + Rescorla, E., "RTCWEB Security Architecture", + draft-ietf-rtcweb-security-arch-00 (work in progress), + January 2012. [I-D.kaufman-rtcweb-security-ui] Kaufman, M., "Client Security User Interface Requirements for RTCWEB", draft-kaufman-rtcweb-security-ui-00 (work in progress), June 2011. + [I-D.rescorla-rtcweb-generic-idp] + Rescorla, E., "RTCWEB Generic Identity Provider + Interface", draft-rescorla-rtcweb-generic-idp-00 (work in + progress), January 2012. + [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, July 2003. @@ -905,20 +882,26 @@ [RFC5763] Fischl, J., Tschofenig, H., and E. Rescorla, "Framework for Establishing a Secure Real-time Transport Protocol (SRTP) Security Context Using Datagram Transport Layer Security (DTLS)", RFC 5763, May 2010. [RFC6189] Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media Path Key Agreement for Unicast Secure RTP", RFC 6189, April 2011. + [RFC6454] Barth, A., "The Web Origin Concept", RFC 6454, + December 2011. + + [RFC6455] Fette, I. and A. Melnikov, "The WebSocket Protocol", + RFC 6455, December 2011. + [abarth-rtcweb] Barth, A., "Prompting the user is security failure", RTC- Web Workshop. [cranor-wolf] Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and L. cranor, "Crying Wolf: An Empirical Study of SSL Warning Effectiveness", Proceedings of the 18th USENIX Security Symposium, 2009. @@ -939,688 +922,20 @@ Kain, A. and M. Macon, "Design and Evaluation of a Voice Conversion Algorithm based on Spectral Envelope Mapping and Residual Prediction", Proceedings of ICASSP, May 2001. [whitten-johnny] Whitten, A. and J. Tygar, "Why Johnny Can't Encrypt: A Usability Evaluation of PGP 5.0", Proceedings of the 8th USENIX Security Symposium, 1999. -Appendix A. A Proposed Security Architecture [No Consensus on This] - - This section contains a proposed security architecture, based on the - considerations discussed in the main body of this memo. This section - is currently the opinion of the author and does not have consensus - though some (many?) elements of this proposal do seem to have general - consensus. - -A.1. Trust Hierarchy - - The basic assumption of this proposal is that network resources exist - in a hierarchy of trust, rooted in the browser, which serves as the - user's TRUSTED COMPUTING BASE (TCB). Any security property which the - user wishes to have enforced must be ultimately guaranteed by the - browser (or transitively by some property the browser verifies). - Conversely, if the browser is compromised, then no security - guarantees are possible. Note that there are cases (e.g., Internet - kiosks) where the user can't really trust the browser that much. In - these cases, the level of security provided is limited by how much - they trust the browser. - - Optimally, we would not rely on trust in any entities other than the - browser. However, this is unfortunately not possible if we wish to - have a functional system. Other network elements fall into two - categories: those which can be authenticated by the browser and thus - are partly trusted--though to the minimum extent necessary--and those - which cannot be authenticated and thus are untrusted. This is a - natural extension of the end-to-end principle. - -A.1.1. Authenticated Entities - - There are two major classes of authenticated entities in the system: - - o Calling services: Web sites whose origin we can verify (optimally - via HTTPS). - o Other users: RTC-Web peers whose origin we can verify - cryptographically (optimally via DTLS-SRTP). - - Note that merely being authenticated does not make these entities - trusted. For instance, just because we can verify that - https://www.evil.org/ is owned by Dr. Evil does not mean that we can - trust Dr. Evil to access our camera an microphone. However, it gives - the user an opportunity to determine whether he wishes to trust Dr. - Evil or not; after all, if he desires to contact Dr. Evil, it's safe - to temporarily give him access to the camera and microphone for the - purpose of the call. The point here is that we must first identify - other elements before we can determine whether to trust them. - - It's also worth noting that there are settings where authentication - is non-cryptographic, such as other machines behind a firewall. - Naturally, the level of trust one can have in identities verified in - this way depends on how strong the topology enforcement is. - -A.1.2. Unauthenticated Entities - - Other than the above entities, we are not generally able to identify - other network elements, thus we cannot trust them. This does not - mean that it is not possible to have any interaction with them, but - it means that we must assume that they will behave maliciously and - design a system which is secure even if they do so. - -A.2. Overview - - This section describes a typical RTCWeb session and shows how the - various security elements interact and what guarantees are provided - to the user. The example in this section is a "best case" scenario - in which we provide the maximal amount of user authentication and - media privacy with the minimal level of trust in the calling service. - Simpler versions with lower levels of security are also possible and - are noted in the text where applicable. It's also important to - recognize the tension between security (or performance) and privacy. - The example shown here is aimed towards settings where we are more - concerned about secure calling than about privacy, but as we shall - see, there are settings where one might wish to make different - tradeoffs--this architecture is still compatible with those settings. - - For the purposes of this example, we assume the topology shown in the - figure below. This topology is derived from the topology shown in - Figure 1, but separates Alice and Bob's identities from the process - of signaling. Specifically, Alice and Bob have relationships with - some Identity Provider (IDP) that supports a protocol such OpenID or - BrowserID) that can be used to attest to their identity. This - separation isn't particularly important in "closed world" cases where - Alice and Bob are users on the same social network and have - identities based on that network. However, there are important - settings where that is not the case, such as federation (calls from - one network to another) and calling on untrusted sites, such as where - two users who have a relationship via a given social network want to - call each other on another, untrusted, site, such as a poker site. - - +----------------+ - | | - | Signaling | - | Server | - | | - +----------------+ - ^ ^ - / \ - HTTPS / \ HTTPS - / \ - / \ - v v - JS API JS API - +-----------+ +-----------+ - | | Media | | - Alice | Browser |<---------->| Browser | Bob - | | (DTLS-SRTP)| | - +-----------+ +-----------+ - ^ ^--+ +--^ ^ - | | | | - v | | v - +-----------+ | | +-----------+ - | |<--------+ | | - | IDP | | | IDP | - | | +------->| | - +-----------+ +-----------+ - - Figure 2: A call with IDP-based identity - -A.2.1. Initial Signaling - - Alice and Bob are both users of a common calling service; they both - have approved the calling service to make calls (we defer the - discussion of device access permissions till later). They are both - connected to the calling service via HTTPS and so know the origin - with some level of confidence. They also have accounts with some - identity provider. This sort of identity service is becoming - increasingly common in the Web environment in technologies such - (BrowserID, Federated Google Login, Facebook Connect, OAuth, OpenID, - WebFinger), and is often provided as a side effect service of your - ordinary accounts with some service. In this example, we show Alice - and Bob using a separate identity service, though they may actually - be using the same identity service as calling service or have no - identity service at all. - - Alice is logged onto the calling service and decides to call Bob. She - can see from the calling service that he is online and the calling - service presents a JS UI in the form of a button next to Bob's name - which says "Call". Alice clicks the button, which initiates a JS - callback that instantiates a PeerConnection object. This does not - require a security check: JS from any origin is allowed to get this - far. - - Once the PeerConnection is created, the calling service JS needs to - set up some media. Because this is an audio/video call, it creates - two MediaStreams, one connected to an audio input and one connected - to a video input. At this point the first security check is - required: untrusted origins are not allowed to access the camera and - microphone. In this case, because Alice is a long-term user of the - calling service, she has made a permissions grant (i.e., a setting in - the browser) to allow the calling service to access her camera and - microphone any time it wants. The browser checks this setting when - the camera and microphone requests are made and thus allows them. - - In the current W3C API, once some streams have been added, Alice's - browser + JS generates a signaling message The format of this data is - currently undefined. It may be a complete message as defined by ROAP - [REF] or may be assembled piecemeal by the JS. In either case, it - will contain: - - o Media channel information - o ICE candidates - o A fingerprint attribute binding the message to Alice's public key - [RFC5763] - - Prior to sending out the signaling message, the PeerConnection code - contacts the identity service and obtains an assertion binding - Alice's identity to her fingerprint. The exact details depend on the - identity service (though as discussed in Appendix A.3.6.4 I believe - PeerConnection can be agnostic to them), but for now it's easiest to - think of as a BrowserID assertion. - - This message is sent to the signaling server, e.g., by XMLHttpRequest - [REF] or by WebSockets [I-D.ietf-hybi-thewebsocketprotocol]. The - signaling server processes the message from Alice's browser, - determines that this is a call to Bob and sends a signaling message - to Bob's browser (again, the format is currently undefined). The JS - on Bob's browser processes it, and alerts Bob to the incoming call - and to Alice's identity. In this case, Alice has provided an - identity assertion and so Bob's browser contacts Alice's identity - provider (again, this is done in a generic way so the browser has no - specific knowledge of the IDP) to verity the assertion. This allows - the browser to display a trusted element indicating that a call is - coming in from Alice. If Alice is in Bob's address book, then this - interface might also include her real name, a picture, etc. The - calling site will also provide some user interface element (e.g., a - button) to allow Bob to answer the call, though this is most likely - not part of the trusted UI. - - If Bob agrees [I am ignoring early media for now], a PeerConnection - is instantiated with the message from Alice's side. Then, a similar - process occurs as on Alice's browser: Bob's browser verifies that - the calling service is approved, the media streams are created, and a - return signaling message containing media information, ICE - candidates, and a fingerprint is sent back to Alice via the signaling - service. If Bob has a relationship with an IDP, the message will - also come with an identity assertion. - - At this point, Alice and Bob each know that the other party wants to - have a secure call with them. Based purely on the interface provided - by the signaling server, they know that the signaling server claims - that the call is from Alice to Bob. Because the far end sent an - identity assertion along with their message, they know that this is - verifiable from the IDP as well. Of course, the call works perfectly - well if either Alice or Bob doesn't have a relationship with an IDP; - they just get a lower level of assurance. Moreover, Alice might wish - to make an anonymous call through an anonymous calling site, in which - case she would of course just not provide any identity assertion and - the calling site would mask her identity from Bob. - -A.2.2. Media Consent Verification - - As described in Section 4.2. This proposal specifies that that be - performed via ICE. Thus, Alice and Bob perform ICE checks with each - other. At the completion of these checks, they are ready to send - non-ICE data. - - At this point, Alice knows that (a) Bob (assuming he is verified via - his IDP) or someone else who the signaling service is claiming is Bob - is willing to exchange traffic with her and (b) that either Bob is at - the IP address which she has verified via ICE or there is an attacker - who is on-path to that IP address detouring the traffic. Note that - it is not possible for an attacker who is on-path but not attached to - the signaling service to spoof these checks because they do not have - the ICE credentials. Bob's security guarantees with respect to Alice - are the converse of this. - -A.2.3. DTLS Handshake - - Once the ICE checks have completed [more specifically, once some ICE - checks have completed], Alice and Bob can set up a secure channel. - This is performed via DTLS [RFC4347] (for the data channel) and DTLS- - SRTP [RFC5763] for the media channel. Specifically, Alice and Bob - perform a DTLS handshake on every channel which has been established - by ICE. The total number of channels depends on the amount of - muxing; in the most likely case we are using both RTP/RTCP mux and - muxing multiple media streams on the same channel, in which case - there is only one DTLS handshake. Once the DTLS handshake has - completed, the keys are extracted and used to key SRTP for the media - channels. - - At this point, Alice and Bob know that they share a set of secure - data and/or media channels with keys which are not known to any - third-party attacker. If Alice and Bob authenticated via their IDPs, - then they also know that the signaling service is not attacking them. - Even if they do not use an IDP, as long as they have minimal trust in - the signaling service not to perform a man-in-the-middle attack, they - know that their communications are secure against the signaling - service as well. - -A.2.4. Communications and Consent Freshness - - From a security perspective, everything from here on in is a little - anticlimactic: Alice and Bob exchange data protected by the keys - negotiated by DTLS. Because of the security guarantees discussed in - the previous sections, they know that the communications are - encrypted and authenticated. - - The one remaining security property we need to establish is "consent - freshness", i.e., allowing Alice to verify that Bob is still prepared - to receive her communications. ICE specifies periodic STUN - keepalizes but only if media is not flowing. Because the consent - issue is more difficult here, we require RTCWeb implementations to - periodically send keepalives. If a keepalive fails and no new ICE - channels can be established, then the session is terminated. - -A.3. Detailed Technical Description - -A.3.1. Origin and Web Security Issues - - The basic unit of permissions for RTC-Web is the origin - [I-D.abarth-origin]. Because the security of the origin depends on - being able to authenticate content from that origin, the origin can - only be securely established if data is transferred over HTTPS. - Thus, clients MUST treat HTTP and HTTPS origins as different - permissions domains and SHOULD NOT permit access to any RTC-Web - functionality from scripts fetched over non-secure (HTTP) origins. - If an HTTPS origin contains mixed active content (regardless of - whether it is present on the specific page attempting to access RTC- - Web functionality), any access MUST be treated as if it came from the - HTTP origin. For instance, if a https://www.example.com/example.html - loads https://www.example.com/example.js and - http://www.example.org/jquery.js, any attempt by example.js to access - RTCWeb functionality MUST be treated as if it came from - http://www.example.com/. Note that many browsers already track mixed - content and either forbid it by default or display a warning. - -A.3.2. Device Permissions Model - - Implementations MUST obtain explicit user consent prior to providing - access to the camera and/or microphone. Implementations MUST at - minimum support the following two permissions models: - - o Requests for one-time camera/microphone access. - o Requests for permanent access. - - In addition, they SHOULD support requests for access to a single - communicating peer. E.g., "Call customerservice@ford.com". Browsers - servicing such requests SHOULD clearly indicate that identity to the - user when asking for permission. - - API Requirement: The API MUST provide a mechanism for the requesting - JS to indicate which of these forms of permissions it is - requesting. This allows the client to know what sort of user - interface experience to provide. In particular, browsers might - display a non-invasive door hanger ("some features of this site - may not work..." when asking for long-term permissions) but a more - invasive UI ("here is your own video") for single-call - permissions. The API MAY grant weaker permissions than the JS - asked for if the user chooses to authorize only those permissions, - but if it intends to grant stronger ones SHOULD display the - appropriate UI for those permissions. - - API Requirement: The API MUST provide a mechanism for the requesting - JS to relinquish the ability to see or modify the media (e.g., via - MediaStream.record()). Combined with secure authentication of the - communicating peer, this allows a user to be sure that the calling - site is not accessing or modifying their conversion. - - UI Requirement: The UI MUST clearly indicate when the user's camera - and microphone are in use. This indication MUST NOT be - suppressable by the JS and MUST clearly indicate how to terminate - a call, and provide a UI means to immediately stop camera/ - microphone input without the JS being able to prevent it. - - UI Requirement: If the UI indication of camera/microphone use are - displayed in the browser such that minimizing the browser window - would hide the indication, or the JS creating an overlapping - window would hide the indication, then the browser SHOULD stop - camera and microphone input. - - Clients MAY permit the formation of data channels without any direct - user approval. Because sites can always tunnel data through the - server, further restrictions on the data channel do not provide any - additional security. (though see Appendix A.3.3 for a related issue). - - Implementations which support some form of direct user authentication - SHOULD also provide a policy by which a user can authorize calls only - to specific counterparties. Specifically, the implementation SHOULD - provide the following interfaces/controls: - - o Allow future calls to this verified user. - o Allow future calls to any verified user who is in my system - address book (this only works with address book integration, of - course). - - Implementations SHOULD also provide a different user interface - indication when calls are in progress to users whose identities are - directly verifiable. Appendix A.3.5 provides more on this. - -A.3.3. Communications Consent - - Browser client implementations of RTC-Web MUST implement ICE. Server - gateway implementations which operate only at public IP addresses may - implement ICE-Lite. - - Browser implementations MUST verify reachability via ICE prior to - sending any non-ICE packets to a given destination. Implementations - MUST NOT provide the ICE transaction ID to JavaScript. [Note: this - document takes no position on the split between ICE in JS and ICE in - the browser. The above text is written the way it is for editorial - convenience and will be modified appropriately if the WG decides on - ICE in the JS.] - - Implementations MUST send keepalives no less frequently than every 30 - seconds regardless of whether traffic is flowing or not. If a - keepalive fails then the implementation MUST either attempt to find a - new valid path via ICE or terminate media for that ICE component. - Note that ICE [RFC5245]; Section 10 keepalives use STUN Binding - Indications which are one-way and therefore not sufficient. We will - need to define a new mechanism for this. [OPEN ISSUE: what to do - here.] - -A.3.4. IP Location Privacy - - As mentioned in Section 4.2.4 above, a side effect of the default ICE - behavior is that the peer learns one's IP address, which leaks large - amounts of location information, especially for mobile devices. This - has negative privacy consequences in some circumstances. The - following two API requirements are intended to mitigate this issue: - - API Requirement: The API MUST provide a mechanism to suppress ICE - negotiation (though perhaps to allow candidate gathering) until - the user has decided to answer the call [note: determining when - the call has been answered is a question for the JS.] This - enables a user to prevent a peer from learning their IP address if - they elect not to answer a call. - - API Requirement: The API MUST provide a mechanism for the calling - application to indicate that only TURN candidates are to be used. - This prevents the peer from learning one's IP address at all. - -A.3.5. Communications Security - - Implementations MUST implement DTLS and DTLS-SRTP. All data channels - MUST be secured via DTLS. DTLS-SRTP MUST be offered for every media - channel and MUST be the default; i.e., if an implementation receives - an offer for DTLS-SRTP and SDES and/or plain RTP, DTLS-SRTP MUST be - selected. - - [OPEN ISSUE: What should the settings be here? MUST?] - Implementations MAY support SDES and RTP for media traffic for - backward compatibility purposes. - - API Requirement: The API MUST provide a mechanism to indicate that a - fresh DTLS key pair is to be generated for a specific call. This - is intended to allow for unlinkability. Note that there are also - settings where it is attractive to use the same keying material - repeatedly, especially those with key continuity-based - authentication. - - API Requirement: The API MUST provide a mechanism to indicate that a - fresh DTLS key pair is to be generated for a specific call. This - is intended to allow for unlinkability. - - API Requirement: When DTLS-SRTP is used, the API MUST NOT permit the - JS to obtain the negotiated keying material. This requirement - preserves the end-to-end security of the media. - - UI Requirements: A user-oriented client MUST provide an - "inspector" interface which allows the user to determine the - security characteristics of the media. [largely derived from - [I-D.kaufman-rtcweb-security-ui] - The following properties SHOULD be displayed "up-front" in the - browser chrome, i.e., without requiring the user to ask for them: - - * A client MUST provide a user interface through which a user may - determine the security characteristics for currently-displayed - audio and video stream(s) - * A client MUST provide a user interface through which a user may - determine the security characteristics for transmissions of - their microphone audio and camera video. - * The "security characteristics" MUST include an indication as to - whether or not the transmission is cryptographically protected - and whether that protection is based on a key that was - delivered out-of-band (from a server) or was generated as a - result of a pairwise negotiation. - * If the far endpoint was directly verified Appendix A.3.6 the - "security characteristics" MUST include the verified - information. - The following properties are more likely to require some "drill- - down" from the user: - - * If the transmission is cryptographically protected, the The - algorithms in use (For example: "AES-CBC" or "Null Cipher".) - * If the transmission is cryptographically protected, the - "security characteristics" MUST indicate whether PFS is - provided. - * If the transmission is cryptographically protected via an end- - to-end mechanism the "security characteristics" MUST include - some mechanism to allow an out-of-band verification of the - peer, such as a certificate fingerprint or an SAS. - -A.3.6. Web-Based Peer Authentication - -A.3.6.1. Generic Concepts - - In a number of cases, it is desirable for the endpoint (i.e., the - browser) to be able to directly identity the endpoint on the other - side without trusting only the signaling service to which they are - connected. For instance, users may be making a call via a federated - system where they wish to get direct authentication of the other - side. Alternately, they may be making a call on a site which they - minimally trust (such as a poker site) but to someone who has an - identity on a site they do trust (such as a social network.) - - Recently, a number of Web-based identity technologies (OAuth, - BrowserID, Facebook Connect), etc. have been developed. While the - details vary, what these technologies share is that they have a Web- - based (i.e., HTTP/HTTPS identity provider) which attests to your - identity. For instance, if I have an account at example.org, I could - use the example.org identity provider to prove to others that I was - alice@example.org. The development of these technologies allows us - to separate calling from identity provision: I could call you on - Poker Galaxy but identify myself as alice@example.org. - - Whatever the underlying technology, the general principle is that the - party which is being authenticated is NOT the signaling site but - rather the user (and their browser). Similarly, the relying party is - the browser and not the signaling site. This means that the - PeerConnection API MUST arrange to talk directly to the identity - provider in a way that cannot be impersonated by the calling site. - The following sections provide two examples of this. - -A.3.6.2. BrowserID - - BrowserID [https://browserid.org/] is a technology which allows a - user with a verified email address to generate an assertion - (authenticated by their identity provider) attesting to their - identity (phrased as an email address). The way that this is used in - practice is that the relying party embeds JS in their site which - talks to the BrowserID code (either hosted on a trusted intermediary - or embedded in the browser). That code generates the assertion which - is passed back to the relying party for verification. The assertion - can be verified directly or with a Web service provided by the - identity provider. It's relatively easy to extend this functionality - to authenticate RTC-Web calls, as shown below. - - +----------------------+ +----------------------+ - | | | | - | Alice's Browser | | Bob's Browser | - | | OFFER ------------> | | - | Calling JS Code | | Calling JS Code | - | ^ | | ^ | - | | | | | | - | v | | v | - | PeerConnection | | PeerConnection | - | | ^ | | | ^ | - | Finger| |Signed | |Signed | | | - | print | |Finger | |Finger | |"Alice"| - | | |print | |print | | | - | v | | | v | | - | +--------------+ | | +---------------+ | - | | BrowserID | | | | BrowserID | | - | | Signer | | | | Verifier | | - | +--------------+ | | +---------------+ | - | ^ | | ^ | - +-----------|----------+ +----------|-----------+ - | | - | Get certificate | - v | Check - +----------------------+ | certificate - | | | - | Identity |/-------------------------------+ - | Provider | - | | - +----------------------+ - - The way this mechanism works is as follows. On Alice's side, Alice - goes to initiate a call. - - 1. The calling JS instantiates a PeerConnection and tells it that it - is interested in having it authenticated via BrowserID. - 2. The PeerConnection instantiates the BrowserID signer in an - invisible IFRAME. The IFRAME is tagged with an origin that - indicates that it was generated by the PeerConnection (this - prevents ordinary JS from implementing it). The BrowserID signer - is provided with Alice's fingerprint. Note that the IFRAME here - does not render any UI. It is being used solely to allow the - browser to load the BrowserID signer in isolation, especially - from the calling site. - 3. The BrowserID signer contacts Alice's identity provider, - authenticating as Alice (likely via a cookie). - 4. The identity provider returns a short-term certificate attesting - to Alice's identity and her short-term public key. - - 5. The Browser-ID code signs the fingerprint and returns the signed - assertion + certificate to the PeerConnection. [Note: there are - well-understood Web mechanisms for this that I am excluding here - for simplicity.] - 6. The PeerConnection returns the signed information to the calling - JS code. - 7. The signed assertion gets sent over the wire to Bob's browser - (via the signaling service) as part of the call setup. - - Obviously, the format of the signed assertion varies depending on - what signaling style the WG ultimately adopts. However, for - concreteness, if something like ROAP were adopted, then the entire - message might look like: - - { - "messageType":"OFFER", - "callerSessionId":"13456789ABCDEF", - "seq": 1 - "sdp":" - v=0\n - o=- 2890844526 2890842807 IN IP4 192.0.2.1\n - s= \n - c=IN IP4 192.0.2.1\n - t=2873397496 2873404696\n - m=audio 49170 RTP/AVP 0\n - a=fingerprint: SHA-1 \ - 4A:AD:B9:B1:3F:82:18:3B:54:02:12:DF:3E:5D:49:6B:19:E5:7C:AB\n", - "identity":{ - "identityType":"browserid", - "assertion": { - "digest":"", - "audience": "[TBD]" - "valid-until": 1308859352261, - }, // signed using user's key - "certificate": { - "email": "rescorla@gmail.com", - "public-key": "", - "valid-until": 1308860561861, - } // certificate is signed by gmail.com - } - } - - Note that we only expect to sign the fingerprint values and the - session IDs, in order to allow the JS or calling service to modify - the rest of the SDP, while protecting the identity binding. [OPEN - ISSUE: should we sign seq too?] - - [TODO: NEed to talk about Audience a bit.] - On Bob's side, he receives the signed assertion as part of the call - setup message and a similar procedure happens to verify it. - - 1. The calling JS instantiates a PeerConnection and provides it the - relevant signaling information, including the signed assertion. - 2. The PeerConnection instantiates a BrowserID verifier in an IFRAME - and provides it the signed assertion. - 3. The BrowserID verifier contacts the identity provider to verify - the certificate and then uses the key to verify the signed - fingerprint. - 4. Alice's verified identity is returned to the PeerConnection (it - already has the fingerprint). - 5. At this point, Bob's browser can display a trusted UI indication - that Alice is on the other end of the call. - - When Bob returns his answer, he follows the converse procedure, which - provides Alice with a signed assertion of Bob's identity and keying - material. - -A.3.6.3. OAuth - - While OAuth is not directly designed for user-to-user authentication, - with a little lateral thinking it can be made to serve. We use the - following mapping of OAuth concepts to RTC-Web concepts: - - +----------------------+----------------------+ - | OAuth | RTCWeb | - +----------------------+----------------------+ - | Client | Relying party | - | Resource owner | Authenticating party | - | Authorization server | Identity service | - | Resource server | Identity service | - +----------------------+----------------------+ - - Table 1 - - The idea here is that when Alice wants to authenticate to Bob (i.e., - for Bob to be aware that she is calling). In order to do this, she - allows Bob to see a resource on the identity provider that is bound - to the call, her identity, and her public key. Then Bob retrieves - the resource from the identity provider, thus verifying the binding - between Alice and the call. - - Alice IDP Bob - --------------------------------------------------------- - Call-Id, Fingerprint -------> - <------------------- Auth Code - Auth Code ----------------------------------------------> - <----- Get Token + Auth Code - Token ---------------------> - <------------- Get call-info - Call-Id, Fingerprint ------> - - This is a modified version of a common OAuth flow, but omits the - redirects required to have the client point the resource owner to the - IDP, which is acting as both the resource server and the - authorization server, since Alice already has a handle to the IDP. - - Above, we have referred to "Alice", but really what we mean is the - PeerConnection. Specifically, the PeerConnection will instantiate an - IFRAME with JS from the IDP and will use that IFRAME to communicate - with the IDP, authenticating with Alice's identity (e.g., cookie). - Similarly, Bob's PeerConnection instantiates an IFRAME to talk to the - IDP. - -A.3.6.4. Generic Identity Support - - I believe it's possible to build a generic interface between the - PeerConnection and any identity sub-module so that the PeerConnection - just gets pointed to the IDP (which the relying party either trusts - or not) and JS from the IDP provides the concrete interfaces. - However, I need to work out the details, so I'm not specifying this - yet. If it works, the previous two sections will just be examples. - Author's Address Eric Rescorla RTFM, Inc. 2064 Edgewood Drive Palo Alto, CA 94303 USA Phone: +1 650 678 2350 Email: ekr@rtfm.com