--- 1/draft-ietf-rtcweb-security-02.txt 2012-06-05 19:14:10.193341774 +0200 +++ 2/draft-ietf-rtcweb-security-03.txt 2012-06-05 19:14:10.237341819 +0200 @@ -1,18 +1,18 @@ RTC-Web E. Rescorla Internet-Draft RTFM, Inc. -Intended status: Standards Track March 12, 2012 -Expires: September 13, 2012 +Intended status: Standards Track June 05, 2012 +Expires: December 7, 2012 Security Considerations for RTC-Web - draft-ietf-rtcweb-security-02 + draft-ietf-rtcweb-security-03 Abstract The Real-Time Communications on the Web (RTC-Web) working group is tasked with standardizing protocols for real-time communications between Web browsers. The major use cases for RTC-Web technology are real-time audio and/or video calls, Web conferencing, and direct data transfer. Unlike most conventional real-time systems (e.g., SIP- based soft phones) RTC-Web communications are directly controlled by some Web server, which poses new security challenges. For instance, @@ -42,21 +42,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on September 13, 2012. + This Internet-Draft will expire on December 7, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -81,44 +81,45 @@ Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 3. The Browser Threat Model . . . . . . . . . . . . . . . . . . . 5 3.1. Access to Local Resources . . . . . . . . . . . . . . . . 6 3.2. Same Origin Policy . . . . . . . . . . . . . . . . . . . . 6 3.3. Bypassing SOP: CORS, WebSockets, and consent to communicate . . . . . . . . . . . . . . . . . . . . . . . 7 4. Security for RTC-Web Applications . . . . . . . . . . . . . . 7 - 4.1. Access to Local Devices . . . . . . . . . . . . . . . . . 8 + 4.1. Access to Local Devices . . . . . . . . . . . . . . . . . 7 4.1.1. Calling Scenarios and User Expectations . . . . . . . 8 - 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 8 + 4.1.1.1. Dedicated Calling Services . . . . . . . . . . . . 9 4.1.1.2. Calling the Site You're On . . . . . . . . . . . . 9 - 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 9 + 4.1.1.3. Calling to an Ad Target . . . . . . . . . . . . . 10 4.1.2. Origin-Based Security . . . . . . . . . . . . . . . . 10 - 4.1.3. Security Properties of the Calling Page . . . . . . . 11 + 4.1.3. Security Properties of the Calling Page . . . . . . . 12 4.2. Communications Consent Verification . . . . . . . . . . . 12 - 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 12 + 4.2.1. ICE . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.2.2. Masking . . . . . . . . . . . . . . . . . . . . . . . 13 - 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 13 - 4.2.4. IP Location Privacy . . . . . . . . . . . . . . . . . 14 - 4.3. Communications Security . . . . . . . . . . . . . . . . . 14 - 4.3.1. Protecting Against Retrospective Compromise . . . . . 15 - 4.3.2. Protecting Against During-Call Attack . . . . . . . . 16 - 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 16 - 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 17 - 4.3.2.3. Third Party Identity . . . . . . . . . . . . . . . 18 - 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 - 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 - 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 - 7.1. Normative References . . . . . . . . . . . . . . . . . . . 19 - 7.2. Informative References . . . . . . . . . . . . . . . . . . 19 - Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 21 + 4.2.3. Backward Compatibility . . . . . . . . . . . . . . . . 14 + 4.2.4. IP Location Privacy . . . . . . . . . . . . . . . . . 15 + 4.3. Communications Security . . . . . . . . . . . . . . . . . 15 + 4.3.1. Protecting Against Retrospective Compromise . . . . . 16 + 4.3.2. Protecting Against During-Call Attack . . . . . . . . 17 + 4.3.2.1. Key Continuity . . . . . . . . . . . . . . . . . . 17 + 4.3.2.2. Short Authentication Strings . . . . . . . . . . . 18 + 4.3.2.3. Third Party Identity . . . . . . . . . . . . . . . 19 + 4.3.2.4. Page Access to Media . . . . . . . . . . . . . . . 19 + 5. Security Considerations . . . . . . . . . . . . . . . . . . . 20 + 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 + 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20 + 7.1. Normative References . . . . . . . . . . . . . . . . . . . 20 + 7.2. Informative References . . . . . . . . . . . . . . . . . . 20 + Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 22 1. Introduction The Real-Time Communications on the Web (RTC-Web) working group is tasked with standardizing protocols for real-time communications between Web browsers. The major use cases for RTC-Web technology are real-time audio and/or video calls, Web conferencing, and direct data transfer. Unlike most conventional real-time systems, (e.g., SIP- based[RFC3261] soft phones) RTC-Web communications are directly controlled by some Web server. A simple case is shown below. @@ -139,29 +140,30 @@ | | Media | | | Browser |<---------->| Browser | | | | | +-----------+ +-----------+ Figure 1: A simple RTC-Web system In the system shown in Figure 1, Alice and Bob both have RTC-Web enabled browsers and they visit some Web server which operates a calling service. Each of their browsers exposes standardized - JavaScript calling APIs which are used by the Web server to set up a - call between Alice and Bob. While this system is topologically - similar to a conventional SIP-based system (with the Web server - acting as the signaling service and browsers acting as softphones), - control has moved to the central Web server; the browser simply - provides API points that are used by the calling service. As with - any Web application, the Web server can move logic between the server - and JavaScript in the browser, but regardless of where the code is - executing, it is ultimately under control of the server. + JavaScript calling APIs (implementated as browser built-ins) which + are used by the Web server to set up a call between Alice and Bob. + While this system is topologically similar to a conventional SIP- + based system (with the Web server acting as the signaling service and + browsers acting as softphones), control has moved to the central Web + server; the browser simply provides API points that are used by the + calling service. As with any Web application, the Web server can + move logic between the server and JavaScript in the browser, but + regardless of where the code is executing, it is ultimately under + control of the server. It should be immediately apparent that this type of system poses new security challenges beyond those of a conventional VoIP system. In particular, it needs to contend with malicious calling services. For example, if the calling service can cause the browser to make a call at any time to any callee of its choice, then this facility can be used to bug a user's computer without their knowledge, simply by placing a call to some recording service. More subtly, if the exposed APIs allow the server to instruct the browser to send arbitrary content, then they can be used to bypass firewalls or mount @@ -203,27 +205,25 @@ In this model, then, the browser acts as a TRUSTED COMPUTING BASE (TCB) both from the user's perspective and to some extent from the server's. While HTML and JS provided by the server can cause the browser to execute a variety of actions, those scripts operate in a sandbox that isolates them both from the user's computer and from each other, as detailed below. Conventionally, we refer to either WEB ATTACKERS, who are able to induce you to visit their sites but do not control the network, and NETWORK ATTACKERS, who are able to control your network. Network - attackers correspond to the [RFC3552] "Internet Threat Model". In - general, it is desirable to build a system which is secure against - both kinds of attackers, but realistically many sites do not run - HTTPS [RFC2818] and so our ability to defend against network - attackers is necessarily somewhat limited. Most of the rest of this - section is devoted to web attackers, with the assumption that - protection against network attackers is provided by running HTTPS. + attackers correspond to the [RFC3552] "Internet Threat Model". Note + that for HTTP traffic, a network attacker is also a Web attacker, + since it can inject traffic as if it were any non-HTTPS Web site. + Thus, when analyzing HTTP connections, we must assume that traffic is + going to the attacker. 3.1. Access to Local Resources While the browser has access to local resources such as keying material, files, the camera and the microphone, it strictly limits or forbids web servers from accessing those same resources. For instance, while it is possible to produce an HTML form which will allow file upload, a script cannot do so without user consent and in fact cannot even suggest a specific file (e.g., /etc/passwd); the user must explicitly select the file and consent to its upload. @@ -301,34 +302,56 @@ 4.1. Access to Local Devices As discussed in Section 1, allowing arbitrary sites to initiate calls violates the core Web security guarantee; without some access restrictions on local devices, any malicious site could simply bug a user. At minimum, then, it MUST NOT be possible for arbitrary sites to initiate calls to arbitrary locations without user consent. This immediately raises the question, however, of what should be the scope of user consent. - For the rest of this discussion we assume that the user is somehow - going to grant consent to some entity (e.g., a social networking - site) to initiate a call on his behalf. This consent may be limited - to a single call or may be a general consent. In order for the user - to make an intelligent decision about whether to allow a call (and - hence his camera and microphone input to be routed somewhere), he - must understand either who is requesting access, where the media is - going, or both. So, for instance, one might imagine that at the time - access to camera and microphone is requested, the user is shown a - dialog that says "site X has requested access to camera and - microphone, yes or no" (though note that this type of in-flow - interface violates one of the guidelines in Section 3). The user's - decision will of course be based on his opinion of Site X. However, - as discussed below, this is a complicated concept. + In order for the user to make an intelligent decision about whether + to allow a call (and hence his camera and microphone input to be + routed somewhere), he must understand either who is requesting + access, where the media is going, or both. As detailed below, there + are two basic conceptual models: + + You are sending your media to entity A because you want to talk to + Entity A (e.g., your mother). + Entity A (e.g., a calling service) asks to access the user's + devices with the assurance that it will transfer the media to + entity B (e.g., your mother) + + In either case, identity is at the heart of any consent decision. + Moreover, identity is all that the browser can meaningfully enforce; + if you are calling A, A can simply forward the media to C. Similarly, + if you authorize A to place a call to B, A can call C instead. In + either case, all the browser is able to do is verify and check + authorization for whoever is controlling where the media goes. The + target of the media can of course advertise a security/privacy + policy, but this is not something that the browser can enforce. Even + so, there are a variety of different consent scenarios that motivate + different technical consent mechanisms. We discuss these mechanisms + in the sections below. + + It's important to understand that consent to access local devices is + largely orthogonal to consent to transmit various kinds of data over + the network (see Section 4.2. Consent for device access is largely a + matter of protecting the user's privacy from malicious sites. By + contrast, consent to send network traffic is about preventing the + user's browser from being used to attack its local network. Thus, we + need to ensure communications consent even if the site is not able to + access the camera and microphone at all (hence WebSockets's consent + mechanism) and similarly we need to be concerned with the site + accessing the user's camera and microphone even if the data is to be + sent back to the site via conventional HTTP-based network mechanisms + such as HTTP POST. 4.1.1. Calling Scenarios and User Expectations While a large number of possible calling scenarios are possible, the scenarios discussed in this section illustrate many of the difficulties of identifying the relevant scope of consent. 4.1.1.1. Dedicated Calling Services The first scenario we consider is a dedicated calling service. In @@ -337,25 +360,31 @@ to give permission for each call that the user will want to give the calling service long-term access to the camera and microphone. This is a natural fit for a long-term consent mechanism (e.g., installing an app store "application" to indicate permission for the calling service.) A variant of the dedicated calling service is a gaming site (e.g., a poker site) which hosts a dedicated calling service to allow players to call each other. With any kind of service where the user may use the same service to talk to many different people, there is a question about whether the - user can know who they are talking to. In general, this is difficult - as most of the user interface is presented by the calling site. - - However, communications security mechanisms can be used to give some - assurance, as described in Section 4.3.2. + user can know who they are talking to. If I grant permission to + calling service A to make calls on my behalf, then I am implicitly + granting it permission to bug my computer whenever it wants. This + suggests another consent model in which a site is authorized to make + calls but only to certain target entities (identified via media-plane + cryptographic mechanisms as described in Section 4.3.2 and especially + Section 4.3.2.3.) Note that the question of consent here is related + to but distinct from the question of peer identity: I might be + willing to allow a calling site to in general initiate calls on my + behalf but still have some calls via that site where I can be sure + that the site is not listening in. 4.1.1.2. Calling the Site You're On Another simple scenario is calling the site you're actually visiting. The paradigmatic case here is the "click here to talk to a representative" windows that appear on many shopping sites. In this case, the user's expectation is that they are calling the site they're actually visiting. However, it is unlikely that they want to provide a general consent to such a site; just because I want some information on a car doesn't mean that I want the car manufacturer to @@ -387,25 +416,29 @@ At minimum, then, whatever consent dialog is shown needs to allow the user to have some idea of the organization that they are actually calling. However, because the user also has some relationship with the hosting site, it is also arguable that the hosting site should be allowed to express an opinion (e.g., to be able to allow or forbid a call) since a bad experience with an advertiser reflect negatively on the hosting site [this idea was suggested by Adam Barth]. However, this obviously presents a privacy challenge, as sites which host - advertisements often learn very little about whether individual users - clicked through to the ads, or even which ads were presented. + advertisements in IFRAMEs often learn very little about whether + individual users clicked through to the ads, or even which ads were + presented. 4.1.2. Origin-Based Security + Now that we have seen another use case, we can start to reason about + the security requirements. + As discussed in Section 3.2, the basic unit of Web sandboxing is the origin, and so it is natural to scope consent to origin. Specifically, a script from origin A MUST only be allowed to initiate communications (and hence to access camera and microphone) if the user has specifically authorized access for that origin. It is of course technically possible to have coarser-scoped permissions, but because the Web model is scoped to origin, this creates a difficult mismatch. Arguably, origin is not fine-grained enough. Consider the situation @@ -453,21 +486,22 @@ Web origin. [I-D.ietf-rtcweb-security-arch] and [I-D.rescorla-rtcweb-generic-idp] describe mechanisms which facilitate this sort of consent. Another case where media-level cryptographic identity makes sense is when a user really does not trust the calling site. For instance, I might be worried that the calling service will attempt to bug my computer, but I also want to be able to conveniently call my friends. If consent is tied to particular communications endpoints, then my risk is limited. Naturally, it is somewhat challenging to design UI - primitives which express this sort of policy. + primitives which express this sort of policy. The problem becomes + even more challenging in multi-user calling cases. 4.1.3. Security Properties of the Calling Page Origin-based security is intended to secure against web attackers. However, we must also consider the case of network attackers. Consider the case where I have granted permission to a calling service by an origin that has the HTTP scheme, e.g., http://calling-service.example.com. If I ever use my computer on an unsecured network (e.g., a hotspot or if my own home wireless network is insecure), and browse any HTTP site, then an attacker can bug my @@ -533,20 +567,23 @@ secret. Those credentials are known to the Web application, but would need to also be known and used by the STUN-receiving element to be useful. There also needs to be some mechanism for the browser to verify that the target of the traffic continues to wish to receive it. Obviously, some ICE-based mechanism will work here, but it has been observed that because ICE keepalives are indications, they will not work here, so some other mechanism is needed. + [[ OPEN ISSUE: Do we need some way of verifying the expected traffic + rate, not just consent to receive traffic at all.]] + 4.2.2. Masking Once consent is verified, there still is some concern about misinterpretation attacks as described by Huang et al.[huang-w2sp]. As long as communication is limited to UDP, then this risk is probably limited, thus masking is not required for UDP. I.e., once communications consent has been verified, it is most likely safe to allow the implementation to send arbitrary UDP traffic to the chosen destination, provided that the STUN keepalives continue to succeed. In particular, this is true for the data channel if DTLS is used @@ -608,21 +645,21 @@ consent, in order to avoid attacks where two people briefly share an IP (e.g., behind a NAT in an Internet cafe) and the attacker arranges for a large, unstoppable, traffic flow to the network and then leaves. The appropriate technologies here are fairly similar to those for initial consent, though are perhaps weaker since the threats is less severe. 4.2.4. IP Location Privacy Note that as soon as the callee sends their ICE candidates, the - callee learns the callee's IP addresses. The callee's server + caller learns the callee's IP addresses. The callee's server reflexive address reveals a lot of information about the callee's location. In order to avoid tracking, implementations may wish to suppress the start of ICE negotiation until the callee has answered. In addition, either side may wish to hide their location entirely by forcing all traffic through a TURN server. 4.3. Communications Security Finally, we consider a problem familiar from the SIP world: communications security. For obvious reasons, it MUST be possible @@ -788,47 +826,87 @@ avoid them needing to check it on every call. However, this is problematic for reasons indicated in Section 4.3.2.1. In principle it is of course possible to render a different UI element to indicate that calls are using an unauthenticated set of keying material (recall that the attacker can just present a slightly different name so that the attack shows the same UI as a call to a new device or to someone you haven't called before) but as a practical matter, users simply ignore such indicators even in the rather more dire case of mixed content warnings. + Despite these difficulties, users should be afforded an opportunity + to view an SAS or fingerprint where available, as it is the only + mechanism for the user to directly verify the peer's identity without + trusting any third party identity system (assuming, of course, that + they trust their own software). + 4.3.2.3. Third Party Identity The conventional approach to providing communications identity has of course been to have some third party identity system (e.g., PKI) to authenticate the endpoints. Such mechanisms have proven to be too cumbersome for use by typical users (and nearly too cumbersome for administrators). However, a new generation of Web-based identity providers (BrowserID, Federated Google Login, Facebook Connect, OAuth, OpenID, WebFinger), has recently been developed and use Web technologies to provide lightweight (from the user's perspective) third-party authenticated transactions. It is possible (see [I-D.rescorla-rtcweb-generic-idp]) to use systems of this type to authenticate RTCWEB calls, linking them to existing user notions of - identity (e.g., Facebook adjacencies). Calls which are authenticated - in this fashion are naturally resistant even to active MITM attack by - the calling site. + identity (e.g., Facebook adjacencies). Specifically, the third-party + identity system is used to bind the user's identity to cryptographic + keying material which is then used to authenticate the calling + endpoints. Calls which are authenticated in this fashion are + naturally resistant even to active MITM attack by the calling site. + + Note that there is one special case in which PKI-style certificates + do provide a practical solution: calls from end-users to large + sites. For instance, if you are making a call to Amazon.com, then + Amazon can easily get a certificate to authenticate their media + traffic, just as they get one to authenticate their Web traffic. + This does not provide additional security value in cases in which the + calling site and the media peer are one in the same, but might be + useful in cases in which third parties (e.g., ad networks or + retailers) arrange for calls but do not participate in them. + +4.3.2.4. Page Access to Media + + Identifying the identity of the far media endpoint is a necessary but + not sufficient condition for providing media security. In RTCWEB, + media flows are rendered into HTML5 MediaStreams which can be + manipulated by the calling site. Obviously, if the site can modify + or view the media, then the user is not getting the level of + assurance they would expect from being able to authenticate their + peer. In many cases, this is acceptable because the user values + site-based special effects over complete security from the site. + + However, there are also cases where users wish to know that the site + cannot interfere. In order to facilitate that, it will be necessary + to provide features whereby the site can verifiably give up access to + the media streams. This verification must be possible both from the + local side and the remote side. I.e., I must be able to verify that + the person I am calling has engaged a secure media mode. In order to + achieve this it will be necessary to cryptographically bind an + indication of the local media access policy into the cryptographic + authentication procedures detailed in the previous sections. 5. Security Considerations This entire document is about security. 6. Acknowledgements - Bernard Aboba, Harald Alvestrand, Cullen Jennings, Hadriel Kaplan (S - 4.2.1), Matthew Kaufman, Magnus Westerland. + Bernard Aboba, Harald Alvestrand, Dan Druta, Cullen Jennings, Hadriel + Kaplan (S 4.2.1), Matthew Kaufman, Martin Thomson, Magnus Westerland. 7. References + 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 7.2. Informative References [CORS] van Kesteren, A., "Cross-Origin Resource Sharing". [I-D.ietf-rtcweb-security-arch] @@ -826,32 +904,32 @@ [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 7.2. Informative References [CORS] van Kesteren, A., "Cross-Origin Resource Sharing". [I-D.ietf-rtcweb-security-arch] Rescorla, E., "RTCWEB Security Architecture", - draft-ietf-rtcweb-security-arch-00 (work in progress), - January 2012. + draft-ietf-rtcweb-security-arch-01 (work in progress), + March 2012. [I-D.kaufman-rtcweb-security-ui] Kaufman, M., "Client Security User Interface Requirements for RTCWEB", draft-kaufman-rtcweb-security-ui-00 (work in progress), June 2011. [I-D.rescorla-rtcweb-generic-idp] Rescorla, E., "RTCWEB Generic Identity Provider - Interface", draft-rescorla-rtcweb-generic-idp-00 (work in - progress), January 2012. + Interface", draft-rescorla-rtcweb-generic-idp-01 (work in + progress), March 2012. [RFC2818] Rescorla, E., "HTTP Over TLS", RFC 2818, May 2000. [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552,