Anatomy of a zero-knowledge web application

August 24, 2007

When we launched our online password manager, we dubbed it the first example of a zero-knowledge web application. We simply meant that Clipperz knows nothing about its users and their data. It was a simplistic and inaccurate definition: the zero-knowledge paradigm needs to be better defined. Our fault.

agpl licensed

The original idea aimed to leverage the internet to manage personal information, especially sensitive information. And without disclosing any information to the server providing the service!

The browsers is an ubiquitous and familiar tool and we wanted to use it as a gateway to the online vault containing user’s most precious data. Giulio Cesare was rather skeptical: he has been developing web applications for over six years and he knew how much data is possible to collect about users.

Nonetheless, we focused for months on designing a sound architecture for a new breed of “privacy aware” web applications. The basic idea was to deliver a no trust needed service, where users had the ability to inspect and verify anything running in their browser. We had to drift the attention away from trusting us and let users focus on trusting the application.

It was fun and frustrating at the same time. Privacy and security constraints were popping up everywhere. Despite that we grew convinced that many useful web applications can (and should) be developed applying the following zero-knowledge methodology.

1. Host-proof hosting

In order to avoid storing readable data on the server a zero-knowledge web application should encrypt and decrypt the data inside the browser. A neat idea, not new though. Richard Schwartz, Michael Mahemoff and others introduced the above concept under the name of host-proof hosting in the first half of 2005, few months before we started the Clipperz blog and project. Here is their definition from the AjaxPatterns wiki

Host sensitive data in encrypted form, so that clients can only access and manipulate it by providing a passphrase which is never transmitted to the server. The server is limited to persisting and retrieving whatever encrypted data the browser sends it, and never actually accesses the sensitive data in its plain form. It. All encryption and decryption takes place inside the browser itself.

Eventually Ajax made pure browser-based cryptography a reality. Javascript implementations of crypto functions have been around for years, but Javascript alone can’t remember data between page loads. This causes an annoying issue since it forces the user to re-enter the passphrase each time. On the other hand, an application developed with Ajax techniques tends to not actually do page transitions, hence solving the problem of keeping a persistent key to perform crypto operations.

2. Hide nothing

A zero-knowledge application should be trusted for itself and not because of the reputation of its developers. Therefore full access to the source code of the application is required.

This does not imply that a zero-knowledge application should be free or open source. As an example, Clipperz was originally released under a reference license meant to allow security code reviews while the core crypto libraries were released under a BSD license.

UPDATE 1
Clipperz code is now available under an AGPL license. See the Clipperz Community Edition project. Read more here.

UPDATE 2
The Clipperz Crypto Library, has been renamed Javascript Crypto Library.

2.1 Code inspection

Developers of zero-knowledge web applications must provide the same exact files that are loaded into the browser when accessing the application.

Usually these files are quite difficult, almost impossible, to work with: spaces and comments have been removed, variables have been renamed. To make life easier to code reviewers, it’s recommended to maintain the source files in their original form and provide instructions on how to derive the compressed and optimized versions. (see Clipperz build environment)

2.2 Code integrity

Performing a code security review it’s a complex matter, and it’s quite likely that most users will rely on reviews performed by others.

However any zero-knowledge web application should provide an easy way to verify that the application downloaded by the browser is the same application built from the code available for inspection.

Ideally we envision a solution that is completely browser based and relies on a redundant and distributed network of servers not associated with the application provider. Each third party server hosts the fingerprint of the zero-knowledge web application, i.e. the checksum of its source code.

At the moment, Clipperz is providing a less than ideal solution.

  • The whole application is condensed into a single file containing all the resources needed to run the application on the browser: html, css, javascript and also the images (but for IE).

  • The Clipperz website hosts both MD5 and SHA1 checksums of the above file along with the instructions on how to compute the checksum on your local machine.

(Any proposal to improve the above scheme is welcome!)

3. Prevent code changes

Zero-knowledge applications are basically huge Javascript programs running in the browser. Therefore it’s of the utmost importance to implement the necessary measures to stop any attempt to modify the code executed by the browser.

3.1 Download before login

The whole source code must be downloaded to the browser before the user signs in.

This is an essential requirements! If additional chunks of source code were downloaded from the server after the login phase, the user wouldn’t have any chance to verify in advance the security of the web application. Therefore not a single line of Javascript code should be moved to the browser after a successful user authentication.

3.2 Avoid code injection

Since Javascript is a very powerful and dynamic language, the borders between data and code are quite blurred.

In order to reassure a user about the fact that the web application he logged in won’t morph into a malicious program, a true zero-knowledge application should adopt the following measures:

  • Never, ever, use the “eval” function on data loaded from the server
    The eval function offers great flexibility since it’s able to “run” any string. But if a web application allows to use it to process data provided by the server, then any kind of code could be easily injected, thus hijacking the original application.

  • Limit the use of the “document.write” function
    Keep its use to the bare minimum, allowing for closer inspection when it is really necessary to use it.

  • Never, ever, load any html content from the server
    Loading ‘html’ chunks from the server is another easy way to subvert the behavior of the application. Just imagine what would happen if the server could push this little ‘html’ snippet: <script src="/hijack.js"/>

The scary part, is that this token could be hidden anywhere, even attached to a legitimate response. For this reasons, all the html elements used by a zero-knowledge application must be loaded together with the source code before the sign-in phase.

4. Learn nothing

There are countless design decisions that could disclose information to the server. Sometimes data leaks are easy to detect, sometimes very subtle and dangerous. A zero-knowledge application should pay maximum attention to work with as little information as possible. It’s easy to fall for a new fancy feature that can destroy the whole security architecture …

Consider the protocol behind user authentication. The following paragraph clearly explains why a zero-knowledge application should adopt the SRP protocol or an equivalent verifier-based protocol.

While any reasonably secure authentication protocol is expected not to leak any information about the password to eavesdroppers, protocols classified as zero-knowledge do not even leak any information about the password to the legitimate host (except the fact that the party at the other end really does know it). This subset of verifier-based protocols is strong indeed, since the host never stores plaintext-equivalent information and is never given any such information during the course of authentication. (from srp.stanford.edu)

SRP is complex and slower than traditional methods, but it’s perfect to achieve zero-knowledge! Moreover it can be deployed without revealing to the host both the password and the username! (as we do in Clipperz password manager)

As a consequence of the “learn nothing” mantra, every zero-knowledge application should be completely anonymous, or at least it should make it impossible to relate the real name or email of a user to his data.