Security (Un-)Usability
17
Security Usability Fundamentals
An important consideration when you’re building an application is the usability of the
security features that you’ll be employing. Security experts frequently lament that
security has been bolted onto applications as an afterthought, however the security
community has committed the exact same sin in reverse, placing usability
considerations in second place behind security, if they were considered at all. As a
result, we spent the 1990s building and deploying security that wasn’t really needed,
and now that we’re experiencing widespread phishing attacks with viruses and worms
running rampant and the security
is actually needed, we’re finding that no-one can
use it.
To understand the problem, it’s necessary to go back to the basic definition of
functionality and security. An application exhibits functionality if things that are
supposed to happen, do happen. Similarly, an application exhibits security if things
that aren’t supposed to happen, don’t happen. Security developers are interested in
the latter, marketers and management tend to be more interested in the former.
Ensuring that things that aren’t supposed to happen don’t happen can be approached
from both the application side and from the user side. From the application side, the
application should behave in a safe manner, defaulting to behaviour that protects the
user from harm. From the user side, the application should act in a manner in which
the user’s expectations of a safe user experience are met. The following sections look
at some of the issues that face developers trying to create a user interface for a
security application.
Security (Un-)Usability
Before you start thinking about potential features of your security user interface, you
first need to consider the environment into which it’ll be deployed. Now that we
have 10-15 years of experience in (trying to) deploy Internet security, we can see,
both from hindsight and because in the last few years people have actually started
testing the usability of security applications, that a number of mechanisms that were
expected to Solve The Problem don’t really work in practice [1]. The idea behind
security technology is to translate a hard problem (secure/safe communication and
storage) into a simpler problem, not just to shift the complexity from one layer to
another. This is an example of Fundamental Truth No.6 of the Twelve Networking
Truths, “It is easier to move a problem around than it is to solve it” [2]. Security user
interfaces are usually driven by the underlying technology, which means that they
often just shift the problem from the technical level to the human level. Some of the
most awkward technologies not only shift the complexity but add an extra level of
complexity of their own (IPsec and PKI spring to mind).
Figure 1: Blaming the user for security unusability
The major lesson that we’ve learned from the history of security (un-)usability is that
technical solutions like PKI and access control don’t align too well with usability
conceptual models. As a result, calling in the usability people after the framework of
the application’s user interface measures have been set in concrete by purely
Security Usability Fundamentals
18
technology-driven considerations is doomed to failure, since the user interface will be
forced to conform to the straightjacket constraints imposed by the security technology
rather than being able to exploit the full benefits of years of usability research and
experience. Blaming security problems on the user when they’re actually caused by
the user interface design (Figure 1) is equally ineffective.
This chapter covers some of the issues that affect security user interfaces, and looks at
various problems that you’ll have to deal with if you want to create an effective user
interface for your security application.
Theoretical vs. Effective Security
There can be a significant difference between theoretical and effective security. In
theory, we should all be using smart cards and PKI for authentication. However,
these measures are so painful to deploy and use that they’re almost never employed,
making them far less
effectively secure than basic usernames and passwords. Security
experts tend to focus exclusively on the measures that provide the best (theoretical)
security, but often these measures provide very little effective security because they
end up being misused, or turned off, or bypassed.
Worse yet, when they focus only on the theoretically perfect measures, they don’t
even try to get lesser security measures right. For example passwords are widely
decried as being insecure, but this is mostly because security protocol designers have
chosen to make them insecure. Both SSL and SSH, the two largest users of
passwords for authentication, will connect to anything claiming to be a server and
then hand over the password in plaintext after the handshake has completed. No
attempt is made to provide even the most trivial protection through some form of
challenge/response protocol, because everyone knows that passwords are insecure
and so it isn’t worth bothering to try and protect them.
This problem is exemplified by the IPsec protocol, which after years of discussion
still doesn’t have any standardised way to authenticate users based on simple
mechanisms like one-time passwords or password-token cards. The IETF even
chartered a special working group, IPSRA (IPsec Remote Access), for this purpose.
The group’s milestone list calls for an IPsec “user access control mechanism
submitted for standards track” by March 2001, but six years later its sole output
remains a requirements document [3] and an expired draft. As the author of one
paper on effective engineering of authentication mechanisms points out, the design
assumption behind IPsec was “all password-based authentication is insecure; IPsec is
designed to be secure; therefore, you have to deploy a PKI for it” [4]. The result has
been a system so unworkable that both developers and users have resorted to doing
almost anything to bypass it, from using homebrew (and often insecure)
“management tunnels” to communicate keys to hand-carrying static keying material
to IPsec endpoints to avoiding IPsec altogether and using mechanisms like SSL-based
VPNs, which were never designed to be used for tunnelling IP traffic but are being
pressed into service because users have found that almost anything is preferable to
having to use IPsec (this has become so pressing that there’s now a standard for
transporting TLS over UDP to allow it to fill the gap that IPsec couldn’t, datagram
TLS or DTLS [5]).
More than ten years after SSL was introduced, support for a basic password-based
mutual authentication protocol was finally (reluctantly) added, although even there it
was only under the guise of enabling use with low-powered devices that can’t handle
the preferred PKI-based authentication and lead to prolonged arguments on the SSL
developers list whenever the topic of allowing something other than certificates for
user authentication came up [6]. SSH, a protocol specifically created to protect
passwords sent over the network, still operates in a manner in which the recipient
ends up in possession of the plaintext password instead of having to perform a
challenge-response authentication in its standard mode of authentication. This
practice, under the technical label of a tunnelled authentication protocol, is known to
be insecure [7][8][9] and is explicitly warned against in developer documentation like
Apple’s security user interface guidelines, which instruct developers to avoid
“handing [passwords] off to another program unless you can verify that the other
Theoretical vs. Effective Security
19
program will protect the data” [10], and yet both SSL and SSH persist in using it.
What’s required for proper password-based security for these types of protocols is a
cryptographic binding between the outer tunnel and the inner authentication protocol,
which TLS’ recently-added mutual authentication finally performs, but to date very
few TLS implementations support it.
A nice example of the difference between theory and practice from the opposite point
of view is what its author describes as “the most ineffective CAPTCHA of all time”
[11]. Designed to protect his blog from comment spam, it requires submitters to type
the word “orange” into a text box when they provide a blog comment. This trivial
speed-bump, which would horrify any (non-pragmatist) security expert, has been
effective in stopping virtually all comment spam by changing the economic equation
for spammers, who can no longer auto-post blog spam as they can for unprotected or
monoculture-CAPTCHA protected blogs [12][13]. On paper it’s totally insecure, but
it works because spammers would have to expend manual effort to bypass it, and
keep expending effort when the author counters their move, which is exactly what
spam’s economic model doesn’t allow.
A lot of this problem arises from security’s origin in the government crypto
community. For cryptographers, the security must be perfect — anything less than
perfect security would be inconceivable. In the past this has lead to all-or-nothing
attempts at implementing security such as the US DoD’s “C2 in ‘92” initiative (a
more modern form of this might be “PKI or Bust”), which resulted in nothing in ’92
or at any other date — the whole multilevel-secure (MLS) operating system push
could almost be regarded as a denial-of-service attack on security, since it largely
drained security funding in the 1980s and was a significant R&D distraction. As
security god Butler Lampson observed when he quoted Voltaire, “The best is the
enemy of the good” (“Le mieux est l’ennemi du bien”) — a product that offers
generally effective (but less than perfect) security will be panned by security experts,
who would prefer to see a theoretically perfect but practically unattainable or
unusable product instead [14].
Psychologists refer to this phenomenon as zero-risk bias, the fact that people would
rather reduce a risk (no matter how small) to zero than create a proportionally much
larger decrease that doesn’t reduce it to zero [15]. Instead of reducing one risk from
90% to 10% they’ll concentrate on reducing another risk from 1% to 0%, yielding a
risk reduction of 1% instead of 80%. Zero-risk bias occurs because risk makes
people worry, and reducing it to zero means that they don’t have to worry about it any
more. Obviously this only works if you’re prepared to ignore other risks, which is
why the phenomenon counts as a psychological bias (philosophers, who see things in
more abstract terms, simply tag these things ‘fallacies’). An example of such a zero-
risk bias was the US’ total ban on carcinogenic food additives in the 1950s, which
increased the overall risk because (relatively) high-risk non-carcinogenic additives
were substituted for (relatively) low-risk carcinogenic ones. The bias ignored the fact
that many additives were potentially harmful and focused only on the single class of
carcinogenic additives.
The striving for impossibly perfect security comes about because usability has never
been a requirement put on those designing security protocols or setting security
policies. For example one analysis of a military cryptosystem design reports that “the
NSA designers focused almost exclusively on data confidentiality […] if that meant
that it was expensive, hard to use, and required extremely restrictive and awkward
policy, or if it might lock out legitimate users from time to time, then so be it” [16].
This type of approach to usability issues was summed up by an early paper on
security usability with the observation that “secure systems have a particularly rich
tradition of indifference to the user, whether the user is a security administrator, a
programmer, or an end user [...] Most research and development in secure systems
has strong roots in the military. People in the military are selected and trained to
follow rules and procedures precisely, no matter how onerous. This user training and
selection decreased the pressure on early systems to be user friendly” [17].
Systems such as this, designed and implemented in a vacuum, can fail
catastrophically when exposed to real-world considerations. As the report on the
Security Usability Fundamentals
20
military system discussed above goes on to say, “once the nascent system left the
NSA laboratories the emphasis on security above all changed dramatically. The
people who approved the final design were not security experts at all. They were the
Navy line officers who commanded the fleet. Their actions show that they were far
more concerned with data availability rather than data confidentiality [...] any ship or
station which became isolated by lack of key became an immediate, high-level issue
and prompted numerous and vigorous complaints. A key compromise, by contrast,
was a totally silent affair for the commander. Thus, commanders were prodded
toward approving very insecure systems”. A similar effect occurs with computer
security software that pushes critical security decisions into the user interface, where
users will find ways to work around the security because they don’t understand it and
it’s preventing them from doing their job.
The best security measures are ones that you can easily explain to users so that they
understand the risk and know how to respond appropriately. Don’t be afraid to use
simple but effective security measures, even if they’re not the theoretical best that’s
available. You should however be careful not to use effective (as opposed to
theoretically perfect) security as an excuse for weak security. Using weak or
homebrew encryption mechanisms when proven, industry-standard ones are available
isn’t effective security, it’s weak security. Using appropriately secured passwords
instead of PKI is justifiable, effective security (security researcher Simson Garfinkel
has termed this “The principle of good security now” [18]).
An example of the conflict between theoretical and effective security is illustrated by
what happens when we increase the usability of the security measures in an
application. Computer users are supported by a vast and mostly informal network of
friends, family, and neighbours (for home users) or office-mates and sysadmins (for
work users) who are frequently given passwords and access codes in order to help the
user with a problem. The theoretical security model says that once keys and similar
secrets are in the hands of the user they’ll take perfect care of them and protect them
in an appropriate manner. However in practice the application interface to the keys is
so hard to use that many users rely on help from others, who then need to be given
access to the keys to perform their intended task. Increasing the usability of the
security mechanisms helps close this gap between theory and practice by enabling
users to manage their own security without having to outsource it to others.
In some cases usability is a fundamental component of a system’s security. The Tor
anonymity service was specifically designed to maximise usability (and therefore to
maximise the number of users) because an unusable anonymity system that attracts
few users can’t provide much anonymity [19].
User Conditioning
It’s often claimed that the way to address security issues is through better user
education. As it turns out, we’re been educating users for years about security,
although unfortunately it’s entirely the wrong kind of education. “Conditioning”
might be a better term for what’s been happening. Whenever users go online, they’re
subjected to a constant barrage of error messages, warnings, and popups: DNS errors,
transient network outages, ASP errors, Javascript problems, missing plugins,
temporary server outages, incorrect or expired certificates, problems connecting to
the MySQL backend (common on any slashdotted web site), and a whole host of
other issues. In one attack, covered in more detail in the section on usability testing
below, researchers actually took advantage of this to replace security-related web site
images with a message saying that they were being upgraded and would return at a
later date.
To see just how tolerant browsers are of errors, enable script debugging (Internet
Explorer), look at the error console (Firefox), or install Safari Enhancer and look at
the error log (Safari). No matter which detection method you use, you can barely
navigate to any Javascript-using page without getting errors, sometimes a whole
cascade of them from a single web page. Javascript errors are so pervasive that
browsers hide them by default because the web would be unusable if they even
displayed them, let alone reacted to them. The result is a web ecosystem that bends
User Conditioning
21
over backwards to avoid exposing users to errors, and a user base that’s become
conditioned to ignoring anything that does leak through.
Figure 2: The user has clicked on a button, we’d better pop up a warning dialog
Sometimes the warnings don’t even correspond to real errors but seem to exist only
for their nuisance value. For example what is the warning in Figure 2 trying to
protect us from? Since we’re using a web browser, it’s quite obvious that we’re
about to send information over the Internet. Does a word-processor feel the need to
warn users that it’s about to perform a spell check, or a spreadsheet that it’s about to
recalculate a row? Since this warning is automatically displayed when anything at all
is sent, we have no idea what the significance of the message is. Are we sending an
online banking password, or just searching ebay for cheap dog food? (In this case the
browser was trying to protect us from sending a query for dog food to ebay).
This warning would actually be useful in the situation where a user is entering their
password on a US banks’ insecure login page (discussed later on), but by then the
dialog has long since been disabled due to all the false alarms.
This dialog is a good example of the conventional wisdom that security user
interfaces are often added to applications merely so that developers can show off the
presence of security [20]. Since they’ve put a lot of effort into implementing their
encryption algorithms and security protocols, they want to show off this fact to users.
Unfortunately most users couldn’t care less about the details, they just want to be
assured that they’re secure without needing to have the nitty-gritty details thrust in
their face all the time. This is an unfortunate clash between the goals of developers
and users: developers want to show off their work, but since it doesn’t provide any
direct benefit to users, users don’t want to see it. This type of user interface mostly
serves the needs of the developer rather than the user.
Figure 3: What the previous dialog is really saying
This (and many similarly pointless dialogs that web browsers and other applications
pop up) are prime examples of conditioning users to ignore such messages — note
the enabled-by-default “Do not show this message again” checkbox, in which the
message’s creators admit that users will simply want it to go away and not come back
again. The creation of such dialogs is very deeply ingrained in the programmer
psyche. When Jeff Bezos came up with Amazon’s one-click shopping system, he had
to go back and tell his developers that “one-click” really did mean that the customer
only had to make one click, not one click plus a warning dialog plus another click
Security Usability Fundamentals
22
(this works fine in the Amazon case since their order fulfilment system gives you
several hours grace to change your mind).
Apple’s user interface design guidelines actually equate the appearance of frequent
alerts with a design flaw in the underlying application. OpenBSD, a BSD distribution
that concentrates specifically on security, has a policy of “no useless buttons”
(unfortunately documented only in developer folklore), meaning that if a particular
setting is secure and works for 95% of users then that’s what gets used. Microsoft
has also finally acknowledged this problem in their Vista user interface guidelines
with the design principle that Vista shouldn’t display error messages when users
aren’t likely to change their behaviour as a result of the message, preferring that the
message be suppressed if it’s not going to have any effect anyway (it remains to be
seen how closely this guideline will be adhered to in practice). In fact a general
guideline for dialogs is to avoid ones that aren’t created as a result of a deliberate user
action [20], since users tend to react rather poorly to software events that aren’t a
direct consequence of an action that they’ve taken.
Popups are a user interface instance of the Tragedy of the Commons. If they were
less frequent they’d be more effective, but since they’re all over the place anyway
there’s nothing to stop
my application from popping up a few more than everyone
else’s application in order to get the user’s attention. An economist would describe
this situation by saying that popups have declining marginal utility.
Usability designer Alan Cooper describes these error boxes as “Kafkaesque
interrogations with each successive choice leading to a yet blacker pit of retribution
and regret” [21]. They’re a bit like the land mines that sometimes feature in old war
movies, you put your foot down and hear the click and know that although you’re
safe now, as soon as you take the next step you’re in for a world of hurt.
Unfortunately the war movie get-out-of-jail-free card of being the film’s leading
character and therefore indispensable to the plot doesn’t work in the real world —
you’re just another redshirt, and you’re not coming back from this mission.
The fix for all of these dialog-box problems is to click ‘Yes’, ‘OK’, or ‘Cancel’ as
appropriate if these options are available, or to try again later if they aren’t. Any user
who’s used the Internet for any amount of time has become deeply conditioned to
applying this solution to all Internet/network problems. These warning dialogs don’t
warn, they just hassle. This warning message overload has actually been exploited by
at least one piece of mobile malware, the Cabir virus, which reconnected to every
device within range again and again and again until users eventually clicked ‘OK’
just to get rid of the message [22] (the situation wasn’t helped by the fact that
Symbian OS pops up a warning for every application, even a signed one, that
originates from anywhere other than Symbian, training users to click ‘OK’
automatically).
Even when popups provide legitimate warnings of danger, user reactions to the
warning may not be what the developers of the application were expecting. The
developers of the TrustBar browser plugin, which warns users of phishing sites,
found in one evaluation of the system that almost all users disabled the popups or
even stopped using the plugin entirely because they found the popups disturbing and
felt less safe due to the warnings [23]. Although the whole point of security warnings
is to, well, warn of security issues, this makes users feel uneasy to the point where
they’ll disable the warnings in order to feel better
2
. As security researcher Amir
Herzberg puts it, “Defend, don’t ask”. Building something that relies on user
education to be effective is a recipe for disaster. No-one has the time to learn how to
use it, so they’ll only be adopted by a small number of users, typically hard-core
geeks and, in consumer electronics, gadget fanatics [24].
2
Applying the ostrich algorithm is a natural human reaction to things that make us uneasy. When a security
researcher demonstrated to his parents that the lock on the front door of their house could be picked in a matter of
seconds and offered relatively easy unauthorised entry to their home their reaction was to ask him not to inform
them of this again. This extends beyond security and carries over to general life, if you’d like to read more about
this look up a reference to “cognitive dissonance”.
User Conditioning
23
The best approach to the human-factors problem posed by warning dialogs is to
redesign the way that the application works so that they’re no longer needed. Since
users will invariably click ‘OK’ (or whatever’s needed to make the dialog disappear
so that they get on with their job), the best way to protect the user is to actually do the
right thing, rather than abrogating responsibility to the user. As Mr.Miyagi says in
Karate Kid II, “Best block, not be there”, or as rendered into a computing context by
Gordon Bell, “The cheapest, fastest, and most reliable components of a computer
system are those that aren’t there”. In a security user interface context, the best
warning dialog is one that isn’t there, with the application doing the right thing
without having to bother the user.
Certificates and Conditioned Users
When certificates are used to secure network communications, a genuine attack
displays symptoms that are identical to the dozens of other transient problems that
users have been conditioned to ignore. In other words we’re trying to detect attacks
using certificates when an astronomical false positive rate (endless dialogs and
warnings crying wolf) has conditioned users to ignore any warnings coming from the
certificate layer. In order to be effective, the false positive rate must be close to zero
to have any impact on the user.
An example of the effect of this user conditioning was revealed in a recent case where
a large bank accidentally used an invalid certificate for its online banking services.
An analysis of site access logs indicated that of the approximately 300 users who
accessed the site, just one single user turned back when faced with the invalid
certificate [25]. Although privacy concerns prevented a full-scale study of users’
reactions from being carried out, an informal survey indicated that users were treating
this as yet another transient problem to be sidestepped. Psychologists call this
approach judgemental heuristics (non-psychologists call it “guessing”), a shortcut to
having to think that works reasonably well most of the time at the cost of an
occasional mistake, and the result of the use of these heuristics is termed an automatic
or click, whirr response [26]. As an example of the use of judgemental heuristics,
one user commented that “Hotmail does this a lot, you just wait awhile and it works
again”. The Internet (and specifically the web and web browsers) have conditioned
users to act this way: Guessing is cheap, if you get it right it’s very quick, and if you
don’t get it right you just click the back button and try again. This technique was has
been christened “information foraging” by HCI researchers [27], but is more
commonly known as “maximum benefit for minimum effort”, or by somewhat more
negative label of “laziness” (in this case not in the usual negative sense, it’s merely
optimising the expenditure of effort).
In a similar case, this time with a government site used to pay multi-thousand dollar
property taxes, users ignored the large red cross and warning text that the certificate
was invalid shown in Figure 4 for over two months before a security expert notified
the site administrators that they needed to fix the certificate. In yet another example,
a major US credit union’s certificate was invalid for over a year without anyone
noticing.
Security Usability Fundamentals
24
Figure 4: This certificate warning didn’t stop users from making multi-
thousand-dollar payments via the site
These real-life examples, taken from major banking sites and a large government site,
indicate that certificates, when deployed into a high-false-positive environment, are
completely ineffective in performing their intended task of preventing man-in-the-
middle attacks.
SSH fares little better than SSL, with the majority of users accepting SSH server keys
without checking them. This occurs because, although SSH users are in general more
security-aware than the typical web user, the SSH key verification mechanism
requires that the user stop whatever they’re trying to do and verify from memory a
long strong of hex digits (the key fingerprint) displayed by the client software. A
relatively straightforward attack, for the exceptional occasion where the user is
actually verifying the fingerprint, is to generate random keys until one of them has a
fingerprint whose first few hex digits are close enough to the real thing to pass muster
[28].
There are even automated attack tools around that enable this subversion of the
fingerprint mechanism. The simplest attack, provided by a MITM tool called
ssharpd 29, uses ARP redirection to grab an SSH connect attempt and then reports a
different protocol version to the one that’s actually in use (it can get the protocol
version from the information passed in the SSH handshake). Since SSHv1 and
SSHv2 keys have different fingerprints, the victim doesn’t get the more serious key-
changed warning but merely the relatively benign new-key warning. Since many
users never check key fingerprints but simply assume that everything should be OK
on the first connect, the attack succeeds and the
ssharp MITM has access to the
session contents [30]
3
.
> ssh test@testbox
The authenticity of host 'testbox (192.168.1.38)' can't be
established.
RSA key fingerprint is
86:9c:cc:c7:59:e3:4d:0d:6f:58:3e:af:f6:fa:db:d7.
Are you sure you want to continue connecting (yes/no)?
> ssh test@testbox
The authenticity of host 'testbox (192.168.1.38)' can't be
established.
RSA key fingerprint is
86:9c:cc:d7:39:53:e2:07:df:3a:c6:2f:fa:ba:dd:d7.
Are you sure you want to continue connecting (yes/no)?
Figure 5: Real (top) and spoofed (bottom) SSH servers
3
Since ssharp is based on a modified, rather old, version of OpenSSH, it’d be amusing to use one of the assorted
OpenSSH security holes to attack the MITM while the MITM is attacking you.
User Conditioning
25
A much more interesting attack can be performed using Konrad Rieck’s concept of
fuzzy fingerprints, which are fingerprints that are close enough to the real thing to
pass muster. As with the standard SSH MITM attack, there’s a tool available to
automate this attack for you [31]. This attack, illustrated in Figure 5, takes a target
SSH server key and generates a new key for which the fingerprint is close enough to
fool all but a detailed, byte-for-byte comparison. Since few users are likely to
remember and check the full 40-hex-digit fingerprint for each server that they connect
to, this attack, combined with
ssharpd, is capable of defeating virtually any SSH
setup [32]. This is another instance where a TLS-PSK style mechanism would
protect the user far more than public-key authentication does.
SSL Certificates: Indistinguishable from Placebo
The security model used with SSL server certificates might be called honesty-box
security: In some countries newspapers and similar low-value items are sold on the
street by having a box full of newspapers next to a coin box (the honesty box) into
which people are trusted to put the correct coins before taking out a paper. Of course
they can also put in a coin and take out all the papers, or put in a washer and take out
a paper, but most people are honest and so most of the time it works. SSL’s
certificate usage is similar. If you use a $495 certificate, people will come to your
site. If you use a $9.95 certificate, people will come to your site. If you use a $0 self-
signed certificate, people will come to your site. If you use an expired or invalid
certificate, people will come to your site. If you’re a US financial institution and use
no certificate at all but put up a message reassuring users that everything is OK (see
Figure 6), people will come to your site. In medical terms, the effects of this
“security” are indistinguishable from placebo.
Figure 6: Who needs SSL when you can just use a disclaimer?
In fact the real situation is even worse than this. There has in the past been plenty of
anecdotal evidence of the ineffectiveness of SSL certificates, an example being the
annual SecuritySpace survey, which reported that 58% of all SSL server certificates
in use today are invalid without having any apparent effect on users of the sites [33].
However, it wasn’t until mid-2005, ten years after their introduction, that a rigorous
study of their actual effectiveness was performed. This study, carried out with
computer-literate senior-year computer science students (who one would expect
would be more aware of the issues than the typical user) confirmed the anecdotal
evidence that invalid SSL certificates had no effect whatsoever on users visiting a
site. Security expert Perry Metzger has summed this up, tongue-in-cheek, as “PKI is
like real security, only without the security part”.
It gets worse though. In one part of the study, users were directed to a site that used
no SSL at all, at which point several of the users who had been quite happy to use the
site with an invalid certificate now refused to use it because of the lack of SSL. Users
assumed that the mere existence of a certificate (even if it was invalid) meant that it
was safe to use the site, while they were more reluctant to use a site that didn’t use
SSL or certificates. This is quite understandable — no-one worries about an expired
safety certificate in an elevator because all it signifies is that the owner forgot to get a
Security Usability Fundamentals
26
new one, not that the elevator will crash into the basement and kill its occupants the
next time it’s used. In fact for the vast majority of elevator users the most that they’ll
ever do is register that some form of framed paperwork is present. Whether it’s a
currently valid safety certificate or an old supermarket till printout doesn’t matter.
This real-world conditioning carries across to the virtual world. To quote the study,
“the actual security of existing browsers is appalling when the ‘human in the loop’ is
considered. Because most users dismiss certificate verification error messages, SSL
provides little real protection against man-in-the-middle attacks. Users actually
behaved less insecurely when interacting with the site that was
not SSL-secured”
[34]. The astonishing result of this research is that not only is the use of SSL
certificates in browsers indistinguishable from placebo, it’s actually
worse than
placebo because users are happy to hand over sensitive information to a site just
because it has a certificate. If a medicine were to act in this way, it would be
withdrawn from sale.
Another example of the clash of certificate theory with reality was reported by a
security appliance vendor. Their products ship with a pre-generated self-signed
certificate that ensures that they’re secure out of the box without the user having to
perform any additional certificate setup. Because it’s a self-signed certificate, the
user gets a certificate warning dialog from the browser each time they connect to the
appliance, which in effect lets them know that the security is active. However, if they
replace the self-signed certificate with an “official” CA-issued one, the browser
warning goes away. Having lost the comforting SSL browser warning dialog, users
were assuming that SSL was no longer in effect and complained to the vendor [35].
Again, users treated the (at least from a PKI theory point of view) less secure self-
signed certificate setup as being more secure than the official CA-issued one.
A similar problem occurred during an experiment into the use of S/MIME signed
email. When signed messaging was enabled, users experienced arcane PKI warning
dialogs, requests to insert crypto cards, X.509 certificate displays, and all manner of
other crypto complexity that they didn’t much understand. This caused much
apprehension among users, the exact opposite of the reassurance that signed email is
supposed to provide. The conclusion reached was to “sign your messages only to
people who understand the concept. Until more usable mechanisms are integrated
into popular email clients, signatures using S/MIME should remain in the domain of
‘power users’” [36]. Since a vanishingly small percentage of users really understand
signed email, the actual message of the study is “Don’t use signed email”.
This result is very disturbing to security people. I’ve experienced this shock effect a
number of times at conferences when I’ve mentioned the indistinguishable-from-
placebo nature of SSL’s PKI. Security people were stunned to hear that it basically
doesn’t work, and didn’t know seem to know what to do with the information. A
similar phenomenon has occurred with researchers in other fields as well.
Inattentional blindness, which is covered later on, was filed away by psychologists
for over a quarter of a century after its discovery in 1970 because it was disturbing
enough that no-one quite knew how to deal with it [37].
Social scientists call this a “fundamental surprise”, a profound discrepancy between
your perception of the real world and reality [38]. This differs from the more usual
situational surprise, a localised event that requires the solution of a specific problem,
in that it requires a complete reappraisal of the situation in order to address it (there
isn’t much sign of this happening with PKI yet). Another term for the phenomenon is
an Outside Context Problem, from author Iain Banks’ novel
Excession, in which he
describes it as something that you encounter “in the same way that a sentence
encounters a full stop” [39].
This situation isn’t helped by the fact that even if PKI worked, obtaining bogus
certificates from legitimate CA’s isn’t that hard. For example researcher David
Mazieres was able to obtain a $350 Verisign certificate for a nonexistent business by
providing a Doing Business As (DBA) license [40], which requires little more than
payment of the US$10-$50 filing fee. In case you’re wondering why a DBA (referred
to as a “trading as” license in the UK) has so little apparent security, it’s deliberately
User Conditioning
27
designed this way to allow small-scale businesses such as a single person to operate
without the overhead of creating a full business entity. DBAs were never intended to
be a security measure, they were designed to make operating a small independent
business easier (their effectiveness is indicated by the fact that the US alone had more
than 20 million sole proprietorships and general partnerships recorded for the 2004
tax year). $9.95 certificates are even less rigorous, simply verifying the ability to
obtain a reply from an email address. How much checking do users expect the CA to
do for all of $9.95?
The User is Trusting… What?
CAs are often presented as “trusted third parties”, but as security researcher Scott Rea
has pointed out they’re really just plain “third parties” because the user has no basis
for trusting them [36], and for the large number of unknown CAs hardcoded into
common applications they’re explicitly untrusted third parties because the user
doesn’t even know who they are. Consider the dialog shown in Figure 7, in which
the user is being told that they’ve chosen to trust a certain CA. Most users have no
idea what a CA is, and they most certainly never chose to trust any of them. It’s not
even possible to determine who or what it is that they’re so blindly trusting. The
certificate, when the ‘View’ button is clicked, is issued by something claiming to be
“Digi-SSL Xp” (whatever that is), and that in turn is issued by “UTN-USERFirst-
Hardware” (ditto). In other words the user is being informed that they’re trusting an
unknown entity which is in turn being vouched for by another unknown entity. To
paraphrase Douglas Adams, “This must be some strange new use of the word ‘trust’
with which I wasn’t previously familiar”.
Figure 7: Who are these people and why am I trusting them?
This dialog is reminiscent of a fable about the car that Ken Thompson, one of the
creators of Unix, helped design. Whenever there’s a problem, a giant ‘?’ lights up on
the dashboard. When asked about this, Ken responds that “the experienced user will
usually know what’s wrong”. This dialog presents the same user interface as Ken’s
car, just a giant ‘?’ flashing in the middle of the screen.
A contributing factor in the SSL certificate problem is the fact that the security
warnings presented to the user that are produced by certificates often come with no
supporting context. Danish science writer Tor Nørretranders calls this shared context
between communicating parties “exformation” [41]. In the case of certificates there’s
no certificate-related exformation shared between the programmer and the user. Even
at the best of times users have little chance of effectively evaluating security risk [42]
(even experts find this extraordinarily difficult, which is why it’s almost impossible to
obtain computer security insurance), and the complete lack of context provided for
the warning makes this even more difficult. Since web browsers implicitly and
Security Usability Fundamentals
28
invisibly trust a large number of CAs, and by extension a vast number of certificates,
users have no exformation that allows them to reason about certificates when an error
message mentioning one appears. One user survey found that many users assumed
that it represented some form of notice on the wall of the establishment, like a health
inspection notice in a restaurant or a Better Business Bureau certificate, a piece of
paper that indicates nothing more than that the owner has paid for it (which is indeed
the case for most SSL certificates).
Similarly, the introduction of so-called high-assurance or extended validation (EV)
certificates that allow CAs to charge more for them than standard ones is simply a
case of rounding up twice the usual number of suspects — presumably somebody’s
going to be impressed by it, but the effect on phishing will be minimal since it’s not
fixing any problem that the phishers are exploiting. Indeed, cynics would say that
this was exactly the problem that certificates and CAs were supposed to solve in the
first place, and that “high-assurance” certificates are just a way of charging a second
time for an existing service. A few years ago certificates still cost several hundred
dollars, but now that you can get them for $9.95 the big commercial CAs have had to
reinvent themselves by defining a new standard and convincing the market to go back
to the prices paid in the good old days. When you consider certificates using a purely
financial perspective then from a large-company mindset (“cost is no object”) this
may make some sort of sense but from an Internet mindset (“anything that costs is
bypassed”), it’s simply not going to work. Not everyone can issue or afford these
extra-cost certificates, and not everyone is allowed to apply for them — the 20
million sole proprietorships and general partnerships mentioned earlier are
automatically excluded, for example. High-assurance certificates are a revenue
model rather than a solution for users’ problems, with the end result being the
creation of barriers to entry rather than the solution of any particular security
problem.
Predictably, when the effectiveness of EV certificates was tested once Internet
Explorer with its EV support had been around for a few months, they were found to
have no effect on security [43]. One usability researcher’s rather pithy summary of
the situation is that “the EV approach is to do more of what we have already
discovered doesn’t work” [44]. As with the 2005 study on the effectiveness of
browser SSL indicators which found that users actually behaved less insecurely when
SSL was absent, this study also produced a surprising result: Users who had received
training in browser EV security behaved less securely than ones who hadn’t! The
reason for this was that the browser documentation talked about the use of
(ineffective, see other parts of this section) phishing warnings, and users then relied
on these rather than the certificate security indicators to assess a site. As a result they
were far more likely to classify a fraudulent site as valid than users who had received
no security training. This unexpected result emphasises the importance of post-
release testing when you introduce new security features, which is covered in more
detail later in the section on security testing.
In order for a certificate-differentiation mechanism to work the user would need to
have a very deep understanding of CA brands (recall that the vast majority of users
don’t even know what a CA is, let alone knowing CA names and brands), and know
which of the 100-150 CA certificates hard-coded into web browsers are trustworthy
and which aren’t. No-one, not even the most knowledgeable security expert, knows
who most of these CAs really are. The CA brands are competing against multi-
million dollar advertising campaigns from established brands like Nike and Coke —
it’s no contest [45].
Security companies aren’t helping with this confusion by their handling of things like
trust marks and site security seals. Although these are basically worthless — anyone
can copy the graphic to their site, and by the time it’s finally discovered (if it’s ever
discovered) it’s too late to do much about it — providers of some seals like
Verisign’s Secure Site Seal compound the problem by tying it to their issuing of SSL
server certificates. As a result Verisign’s brand is being attached to a completely
insecure site-marking mechanism, with the unfortunate effect that a significant
proportion of users are more likely to trust sites that display the mark [46]. Phishers
User Conditioning
29
can therefore increase the effectiveness of their phishing by copying the graphics of
any site seals they feel like using to their sites.
The problem with CA branding (and lack of brand recognition) was demonstrated in
the study of user recognition of CA brands discussed in the next section in which, of
the users who actually knew what a CA was (many didn’t), far more identified Visa
as a trusted CA than Verisign, despite the fact that Verisign is the world’s largest CA
and Visa isn’t a CA at all [18]. Combine this with the previously-described user
response to certificates and you have a situation where a bogus CA with a well-
known brand like Visa will be given more weight than a genuine CA like Verisign.
After all, what user would doubt
https://www.visa.com, certified by Visa’s
own CA?
In practice almost everything trumps certificate-based SSL security indicators. One
large-scale study found, for example, that if users were presented with two identical
pages of which one was SSL-protected and had a complex URL,
https://www.-
accountonline.com/View?docId=Index&siteId=AC&langId=EN
and
the other wasn’t secured and had a simple URL,
http://www.-
attuniversalcard.com
, people rated the unprotected version with the simple
URL as considerably more trustworthy than the protected one with the complex URL
[47] (the unsecured page — note the different domains that the two are hosted in,
even though they’re the same page — has since been updated to redirect to the
secured page). Other factors that usability researchers have found will trump SSL
indicators include:
The complexity of the web page. Using fancy graphics and Flash animation
exploits the watermark fallacy, in which users translate the use of complex
features in physical objects that’s used for anti-counterfeiting of items like
banknotes and cheques into an indication of authenticity in the virtual world.
Pseudo-personalisation such as displaying the first four digits of the user’s credit
card number, for example 4828-****-****-****, to “prove” that you know
them. The first four digits are identical across large numbers of users and
therefore relatively easy to anticipate. For attacks targeting the user bases of
individual banks, it’s even easier because prefixes for all cards from that bank
will be identical. For example when phishers spammed (possible) customers of
the Mountain America credit union in Salt Lake City, they were able to display
the first five digits of the card as “proof” of legitimacy because all cards issued
by the bank have the same prefix [48] (in addition they used a legitimate CA-
issued certificate to authenticate their phishing site).
Providing an independent verification channel for information such as a phone
number to call. This exploits the “not-my-problem” fallacy, no-one actually calls
the number since they assume that someone else will. In addition phishers have
already set up their own interactive voice response (IVR) systems using VoIP
technology that mimic those of the target bank, so having a phone number to call
is no guarantee of authenticity [49][50].
The “not-my-problem” fallacy is particularly noteworthy here because it first gained
widespread attention in 1964 when a woman named “Kitty” Genovese was brutally
murdered next to her New York apartment building. As with the phone verification
channel for web pages, people who heard her cries for help during separate attacks
spread over about thirty minutes assumed that someone else had called the police and
so didn’t call themselves (although some of the details, and in particular the number
and apparent apathy of some of the bystanders, was exaggerated by journalists). This
event was later investigated in depth by psychologists, who termed the phenomenon
the “bystander effect”. They found that the more bystanders there are, the less likely
that any one is to come to a victim’s aid because the assumption is that someone else
must have already done so [51]. This effect, which arises due to diffusion of
responsibility, is so noticeable that it’s been quantified by experimental
psychologists. In one experiment, having one bystander present resulted in 85% of
subjects stepping in to help. With two bystanders this dropped to 62%, and with five
Security Usability Fundamentals
30
bystanders it had fallen to 32%, with each one thinking that it was someone else’s job
to intervene [52].
The bystander effect exists in many variations. For example in one experiment
subjects were shown a sample line X and three other lines A, B, and C, of which A
was shorter than X, B was the same length, and C was longer. The subjects were
placed in a room with a varying number of other people who reported that either the
shorter A or the longer C matched X rather than the equal-length B. With one other
person present, 3% of subjects agreed with the (incorrect) assessment. With two
others present this rose to 14%, and with three others it went to 32% (these figures
aren’t exactly identical to the ones from the previous experiment; the point is that
there’s a demonstrable effect that increases with the number of bystanders, not that
some hard-and-fast figure applies across all cases [53].
When people were asked why they’d done this, they explained it away on the basis
that they wanted to fit in, or (more rationally, since the supposed reason for the
experiment was that it was a vision test) that they thought there might be something
wrong with their eyesight since the others couldn’t
all be wrong (although technically
this could be taken as a variation of wanting to fit in).
In the Internet the bystander effect is particularly pernicious. Recall that the effect
increases with the number of bystanders present. In the Genovese murder, the
presence of a relatively small group people was enough to trigger the bystander
effect. On the Internet, the
entire world is potentially a bystander. This is the worst
possible situation into which you can deploy a mechanism that can fall prey to the
bystander effect, and although phishers probably aren’t psychology graduates they do
know how to take advantage of this.
Alongside these tricks, there are myriad other ways that are being actively exploited
by phishers. Any of these factors, or factors in combination, can trump SSL security
in the eyes of the users.
Password Mismanagement
The start of this section touched on the poor implementation of password security by
applications, pointing out that both SSH and SSL/TLS, protocols designed to secure
(among other things) user passwords, will connect to anything claiming to be a server
and then hand over the user’s password in plaintext form without attempting to apply
even the most basic protection mechanisms. However, the problem goes much
further than this. Applications (particularly web browsers) have conditioned users
into constantly entering passwords with no clear indication of who they’re handing
them over to. These password mechanisms are one of the many computer processes
that are training users to become victims of phishing attacks.
Figure 8: Gimme your password!
Consider the dialog in Figure 8, in this case a legitimate one generated by the Firefox
browser for its own use. This dialog is an example of geek-speak at its finest. In
order to understand what it’s asking, you need to know that Netscape-derived
browsers use the PKCS #11 crypto token interface internally to meet their
cryptographic security requirements. PKCS #11 is an object-oriented interface for
devices like smart cards, USB tokens, and PCMCIA crypto cards, but can also be
used as a pure software API. When there’s no hardware crypto token available, the
User Conditioning
31
browser uses an internal software emulation, a so-called PKCS #11 soft-token. In
addition, the PKCS #11 device model works in terms of user sessions with the device.
The default session type is a public session, which only allows restricted (or even no)
access to objects on the device and to device functionality. In order to fully utilise the
device, it’s necessary to open a private session, which requires authenticating yourself
with a PIN or password. What the dialog is asking for is the password that’s required
to open a private session with the internal PKCS #11 soft-token in order to gain
access to the information needed to access a web site.
Figure 9: Password dialog as the user sees it
Possibly as much as one hundredth of one percent of users exposed to this dialog will
understand that. For everyone else, it may as well be written in Klingon (see Figure
9). All they know is that whenever they fire up the browser and go to a web site that
requires authentication, this garbled request for a password pops up. After an initial
training period, their proficiency increases to the point where they’re barely aware of
what they’re doing when they type in their password — it’s become an automatic
process of the kind described in the next chapter.
Other poorly-thought-out password management systems can be similarly
problematic. The OpenID standard, a single-sign-on mechanism for web sites, goes
to a great deal of trouble to remain authentication-provider neutral. The unfortunate
result is what security practitioner Ben Laurie has termed “a standard that has to be
the worst I’ve ever seen from a phishing point of view” [54] because it allows any
web site to steal the credentials you use at any other web site. To do this, an attacker
sets up a joke-of-the-day or animated-dancing-pigs or kitten-photos web page or
some other site of the kind that people find absolutely critical for their daily lives, and
uses OpenID to authenticate users. Instead of using your chosen OpenID provider to
handle the authentication, the attacker sends you to an attacker-controlled provider
that proxies the authentication to the real provider. In this way the attacker can use
your credentials to empty your PayPal account while you’re reading the joke of the
day or looking at kitten pictures.
This is far worse than any standard phishing attack because instead of having to
convince you to go to a fake PayPal site, the attacker can use any site at all to get at
your PayPal credentials. What OpenID is doing is training users to follow links from
random sites and then enter their passwords, exactly the behaviour that phishers want
[55]. By declaring this problem “out of scope” for the specification [56], the
developers of the OpenID standard get to pass it on to someone else. Other federated
single-sign-on mechanisms like Internet2’s Shibboleth exhibit similar flaws.
Future developments have the potential to make this situation even worse. If the
biometrics vendors get their way, we’ll be replacing login passwords with
thumbprints. Instead of at least allowing for the possibility of one password per
account, there’ll be a single password (biometric trait) for all accounts, and once it’s
compromised there’s no way to change it. Far more damaging though is the fact that
biometrics makes it even easier to mindlessly authenticate yourself at every
opportunity, handing out your biometric “password” to anything that asks for it.
Articles proposing the use of biometrics as anti-phishing measures never even
consider these issues, choosing to focus instead on the technical aspects of fingerprint
scanning and related issues [57].
Security Usability Fundamentals
32
Abuse of Authority
Once applications have obtained additional authorisation for a particular action, they
often retain their extra privileges far beyond any sensible amount of time for which
they need them. Like a miser hanging onto money, the clutch the extra privileges to
their chest and refuse to let go for any reason. Consider the Firefox plugin-install
request shown in Figure 10. It only takes two mouse clicks in response to the install
request to give the browser the necessary permission to install the plugin, but
navigation to an obscure configuration dialog buried four levels down in a succession
of menus and other dialogs to remove them again. What’s more, the browser hasn’t
just authorised the installation of that one plugin but of all other plugins hosted at that
domain! Consider the amount of content hosted on domains like
yahoo.com to see
how dangerous such a blanket permission can be — the
groups.yahoo.com
community alone has gigabytes of arbitrary untrusted content hosted on it, all sitting
ready to infect your PC.
Figure 10: Plugin install request
The abuse of authority can be exploited in arbitrarily creative ways. Consider the
following cross-pollination attack, which allows an impostor to set up a genuine
Verisign-certified fake banking site. This takes advantage of the way that browsers
currently handle certificates that they can’t verify. Instead of treating the security
failure as an absolute, they allow users to ignore it and continue anyway, which
virtually all users do. However instead of allowing the certificate to be used once,
they allow it to be used either for the remainder of the browser session or forever (see
Figure 11: Permanent temporary certificate acceptance). Since users tend to leave
browsers (along with applications like email and IM clients) open for extended
periods of time, and PCs powered on (or at least effectively on, for example in
hibernation) for equally long amounts of time, the time periods “for this session” and
“permanently” are more or less synonymous. So what we need to do is get a user to
accept a certificate for a non-valuable site (for which they’re quite likely to click
‘OK’ since there are no real consequences to this action), and then reuse it later to
certify any site that we want.
Figure 11: Permanent temporary certificate acceptance
Other Languages, Other Cultures
33
To do this, we get them to come to the initial site via standard spam techniques
inviting them to read an e-postcard, view someone’s holiday photos, meet a long-lost
school friend, or some other standard (and innocuous) lure. The site is protected with
an SSL certificate that the browser can’t verify, so the user has to accept it either
permanently or for the current session. If the user accepts it permanently then there’s
nothing left to do. If they accept it for the current session then all that the phishing
site needs to do is determine that they’ve accepted the certificate in order to use it to
“authenticate” the phishing site.
Phishers have come up with several ways of doing this that don’t involve the obvious
(and easily-blocked) use of cookies. One is cache mining, which uses the load time
of potentially cached images from the target site to determine whether the browser
has recently visited it (and subsequently cached the images) or not [58]. A far more
effective means of doing this though involves the use of Cascading Style Sheets
(CSS). CSS has a
:visited pseudo-class that allows the server to dynamically
change the appearance of a link
<a href=…> based on whether the client has visited it
in the past or not (the browser default is to change the colour slightly, typically from
blue to purple). In programming terms, this CSS facility allows the server to execute
the command
if url_visited then do_A else do_B on the client.
How does the server find out what the client has done? By having the actions loop
back to the server using CSS’ url() feature, applying two different URLs based on
whether
do_A or do_B is triggered. So the pseudocode becomes if url_visited
then url('server_url_A') else url('server_url_B')
. All of this is hidden from
the user through the use of an empty-appearing link,
<a href="…"></a>. The server
can now tell whether the user has accepted the certificate from the innocuous site as
soon as the user visits the not-so-innocuous site [59].
So how does this allow us to create a genuine Verisign-certified fake banking site?
By making the SSL certificate that the user accepts to get to the innocuous site a CA
certificate, we can now use it to issue our own certificates in any name we want.
Because of the universal implicit cross-certification that’s present in browsers, we can
issue a certificate in the name of Verisign, and our fake Verisign can then certify any
phishing site that it wants. When the user visits the phishing site, they’ll get no
warning from the browser, all the SSL security indicators will be present, and on the
very remote chance that they bother to check the certificate, they’ll see that it’s been
authorised by Verisign, the world’s largest CA. Not bad for a few fragments of CSS
and an extra flag set in a certificate!
Other Languages, Other Cultures
Up until about fifteen years ago, it was assumed that there were universal maxims
such as modes of conversation and politeness that crossed all cultural boundaries.
This turned out to be largely an illusion, contributed to at least to some extent by the
fact that most of the researchers who published on the subject came from an Anglo-
Saxon, or at least European, cultural background.
Since then, the ensuing field of cross-cultural pragmatics, the study of how people
interact across different cultures, has helped dispel this illusion. For example, the
once-popular assumption that the “principles of politeness” are the same everywhere
have been shown to be incorrect in ways ranging from minor variations such as
English vs. eastern European hospitality rituals through to major differences such as
cultures in which you don’t thank someone who performs a service for you because if
they didn’t want you to accept the service they wouldn’t have offered it, a practice
that would seem extremely rude to anyone coming from a European cultural
background.
Let’s look at a simple example of how a security user interface can be affected by
cross-cultural pragmatics issues. Imagine a fairly standard dialog that warns that
something has gone slightly wrong somewhere and that if the user continues, their
privacy may be compromised. Even the simple phrase “your privacy may be
compromised” is a communications minefield. Firstly, the English term “privacy”
has no equivalent in any other European language. In fact the very concept of
Security Usability Fundamentals
34
“privacy” reflects a very Anglo-Saxon cultural value of being able to create a wall
around yourself when and as required. Even in English, privacy is a rather fuzzy
concept, something that philosopher Isaiah Berlin calls a “negative liberty” which is
defined by an intrusion of it rather than any innate property. Like the US Supreme
Court’s (non-)definition of obscenity, people can’t explicitly define it, but know when
they’ve lost it [60]. So in this case warning of a
loss of privacy (rather than stating
that taking a certain measure will increase privacy) is the appropriate way to
communicate this concept to users — assuming that they come from an Anglo-Saxon
cultural background, that is.
Next we have the phrase “may be”, a uniquely English way of avoiding the use of an
imperative [61]. In English culture if you wanted to threaten someone, you might tell
them that if they don’t take the requested action they might have a nasty accident. On
the continent, you’d be more likely to inform them that they
will have a nasty
accident. Moving across to eastern Europe and Italy, you’d not only inform them of
the impending accident but describe it in considerable and occasionally graphic
detail.
The use of so-called whimperatives, extremely common in English culture, is almost
unheard-of in other European languages [62]. A request like “Would you mind
opening the window” (perhaps watered down even further with a side-order of “it’s a
bit cold in here”) would, if you attempted to render it into a language like Polish,
“Czy
miałabyś ochotę …”, sound quite bizarre — at best it would come across as an
inquiry as to whether the addressee is capable of opening the window, but certainly
not as a request.
Finally, we come to the word “compromise”, which in everyday English is mostly
neutral or slightly positive, referring to mutual concessions made in order to reach
agreement (there’s an old joke about a manager who wonders why security people are
always worrying about compromise when everyone knows that compromise is a
necessary requirement for running a business). In other languages the connotations
are more negative, denoting weakness or a sell-out of values. Only in the specialised
language of security-speak, however, is compromise an obviously negative term.
The fact that it’s taken four paragraphs just to explain the ramifications of the phrase
“your privacy may be compromised” is a yardstick of how tricky the effective
communication of security-relevant information can be. Even something as simple as
the much-maligned “Are you sure?” dialog box can be problematic. In some cultures,
particularly when offering hospitality, you never try to second-guess someone else’s
wishes. A host will assume that the addressee should always have more, and any
resistance by them can be safely disregarded (the authors of endless “Are you sure?
dialogs should probably take this attitude to heart). The common English question
“Are you sure?” can thus sound quite odd in some cultures.
Japan has a cultural value called enryo, whose closest English approximation would
be “restraint” or “reserve”. The typical way to express enryo is to avoid giving
opinions and to sidestep choices. Again using the example of hospitality, the norm is
for the host to serve the guest a succession of food and drink and for the guest to
consume at least a part of every item, on the basis that to not do so would imply that
the host had miscalculated the guest’s wishes. The host doesn’t ask, and the guest
doesn’t request. When responding to a security-related dialog in which the user is
required to respond to an uninvited and difficult-to-answer request, the best way to
express enryo is to click ‘OK’. In a Japanese cultural context, the ‘OK’ button on
such dialogs should really be replaced with one that states ‘Nan-demo kaimasen’,
“Anything will be all right with me”. (In practice it’s not quite that bad, since the fact
that the user is interacting with a machine rather than a human relaxes the enryo
etiquette requirements).
So going beyond the better-known problems of security applications being localised
for
xx-geek by their developers, even speaking in plain English can be quite difficult
when the message has to be accurately communicated across different languages and
cultures. Some time ago I was working on an internationalised security application
and the person doing the Polish translation told me that in situations like this in which
Other Languages, Other Cultures
35
the correct interpretation of the application developer’s intent is critical, he preferred
to use the English version of the application (even though it wasn’t his native
language) because then he knew that he was talking directly with the developer, and
not witnessing an attempt to render the meaning across a language and cultural
barrier.
References
[1] “Security Absurdity: The Complete, Unquestionable, And Total Failure of
Information Security”, Noam Eppel,
http://www.securityabsurdity.com/failure.php.
[2] “The Twelve Networking Truths”, RFC 1925, Ross Callon, 1 April 1996.
[3] “Requirements for IPsec Remote Access Scenarios”, RFC 3457, Scott Kelly and
Sankar Ramamoorthi, January 2003.
[4] “Authentication Components: Engineering Experiences and Guidelines”, Pasi
Eronen and Jari Arkko,
Proceedings of the 12
th
International Workshop on
Security Protocols (Protocols’04)
, Springer-Verlag Lecture Notes in Computer
Science No.3957, April 2004, p.68.
[5] “Datagram Transport Layer Security”, RFC 4347, Eric Rescorla and Nagendra
Modadugu, April 2006.
[6] “Straw poll on TLS SRP status”, thread on ietf-tls mailing list, May-June 2007,
http://www1.ietf.org/mail-archive/web/tls/current/-
msg01667.html
.
[7] “Man-in-the-Middle in Tunnelled Authentication Protocols”, N. Asokan,
Valtteri Niemi, and Kaisa Nyberg, Cryptology ePrint Archive, Report 2002/163,
November 2002,
http://eprint.iacr.org/2002/163.
[8] “Man-in-the-Middle in Tunnelled Authentication Protocols”, N.Asokan,
Valttieri Niemi, and Kaisa Nyberg,
Proceedings of the 11
th
Security Protocols
Workshop (Protocols’03)
, Springer-Verlag Lecture Notes in Computer Science
No.3364, April 2003, p.29.
[9] “The Compound Authentication Binding Problem”, IETF draft
draft-
puthenkulam-eap-binding-04
, Jose Puthenkulam, Victor Lortz, Ashwin
Palekar, and Dan Simon,27 October 2003.
[10] “Application Interfaces That Enhance Security”, Apple Computer, 23 May 2006,
http://developer.apple.com/documentation/Security/-
Conceptual/SecureCodingGuide/Articles/-
AppInterfaces.html
.
[11] “CAPTCHA effectiveness”, Jeff Atwood, 25 October 2006,
http://www.codinghorror.com/blog/archives/000712.html.
[12] “CAPTCHA CAPTCHA DECODER”,
http://www.lafdc.com/-
captcha/
.
[13] “OCR Research Team”,
http://ocr-research.org.ua/.
[14] “Computer Security in the Real World”, Butler Lampson, keynote address at the
14
th
Usenix Security Symposium (Security’05), August 2005.
[15] “Prospect Theory: An Analysis of Decision under Risk”, Daniel Kahneman and
Amos Tversky,
Econometrica, Vol.47, No.2 (March 1979), p.263.
[16] “An Analysis of the System Security Weaknesses of the US Navy Fleet
Broadcasting System, 1967-1974, as exploited by CWO John Walker”, Laura
Heath, Master of Military Art and Science thesis, US Army Command and
General Staff College, Ft.Leavenworth, Kansas, 2005.
[17] “User-Centered Security”, Mary Ellen Zurko,
Proceedings of the 1996 New
Security Paradigms Workshop (NSPW’96)
, September 1996, p.27.
[18] “Design Principles and Patterns for Computer Systems That Are Simultaneously
Secure and Usable”, Simson Garfinkel, PhD thesis, Massachusetts Institute of
Technology, May 2005.
[19] “Challenges in deploying low-latency anonymity (Draft)”, Roger Dingledine ,
Nick Mathewson and Paul Syverson, 2005,
http://tor.eff.org/-
svn/trunk/doc/design-paper/challenges.pdf
.
Security Usability Fundamentals
36
[20] “Firefox and the Worry-free Web”, Blake Ross, in “Security and Usability:
Designing Secure Systems That People Can Use”, O’Reilly, 2005, p.577.
[21] “About Face 2.0: The Essentials of Interaction Design”, Alan Cooper and Robert
Reimann, John Wiley and Sons, 2003.
[22] “Cabirn Fever”, Peter Ferrie and Peter Szor,
Virus Bulletin, August 2004, p.4.
[23] “Security and Identification Indicators for Browsers against Spoofing and
Phishing Attacks”, Amir Herzberg and Ahmad Jbara, Cryptology ePrint
Archive,
http://eprint.iacr.org/2004/, 2004.
[24] “Why Features Don’t Matter Any More: The New Laws of Digital Technology”,
Andreas Pfeiffer,
ACM Ubiquity, Vol.7, Issue 7 (February 2006),
http://www.acm.org/ubiquity/views/v7i07_pfeiffer.html.
[25] “Invalid banking cert spooks only one user in 300”, Stephen Bell,
ComputerWorld New Zealand, 16 May 2005,
http://www.computerworld.co.nz/news.nsf/NL/-
FCC8B6B48B24CDF2CC2570020018FF73
.
[26] “The heuristic-systematic model in its broader context”, Serena Chen and Shelly
Chaiken, Dual-Process Theories in Social Psychology, Guilford Press, 1999,
p.73.
[27] “Information Foraging in Information Access Environments”, Peter Pirolli and
Stuart Card,
Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems (SIGCHI’95)
, May 1995, p.51.
[28] “pattern recognition”, Dan Kaminsky, invited talk at Black Ops 2006 at the 20
th
Large Installation System Administration Conference (LISA’06), December
2006.
[29] “SSH for fun and profit”, Sebastian Krahmer, 1 July 2002,
http://www.shellcode.com.ar/docz/asm/ssharp.pdf.
[30] “Hacking: The Art of Exploitation”, Jon Erickson, No Starch Press, 2003.
[31] “THC Fuzzy Fingerprint”, 25 October 2003,
http://www.thc.org/thc-ffp/.
[32] “Fuzzy Fingerprints: Attacking Vulnerabilities in the Human Brain”,
“Plasmoid”, 25 October 2003,
http://www.thc.org/papers/ffp.html.
[33] “Web Survey and Internet Research Reports by SecuritySpace”,
http://www.securityspace.com/s_survey/data/, 2005. Note that the figures
given in the survey results don’t add up to the value quoted in the text, since
some certificates are invalid for multiple reasons and therefore appear in
multiple categories.
[34] “Hardening Web Browsers Against Man-in-the-Middle and Eavesdropping
Attacks”, Haidong Xia and José Brustuloni,
Proceedings of the 14
th
international conference on the World Wide Web (WWW’05), May 2005, p.489.
[35] Lucky Green, private communications, 10 December 2006.
[36] “A Case (Study) For Usability in Secure Email Communication”, Apu Kapadia,
IEEE Security and Privacy, Vol.5, No.2 (March/April 2007), p.80.
[37] “Sights unseen”, Siri Carpenter,
Monitor on Psychology, Vol.32, No.4 (April
2001), p.54.
[38] “Fundamental Surprises”, Zvi Lanir, Center for Strategic Studies, University of
Tel Aviv, 1986.
[39] “Excession”, Iain Banks, Orbit, 1997.
[40] “Self-certifying File System”, David Mazieres, PhD thesis, MIT, May 2000.
[41] “The User Illusion: Cutting Consciousness Down to Size”, Tor Nørretranders,
Penguin, 1999.
[42] “Why Johnny Can’t Evaluate Security Risk”, George Cybenko,
IEEE Security
and Privacy
, Vol.4, No.1 (January/February 2006), p.5
[43] “An Evaluation of Extended Validation and Picture-in-Picture Phishing
Attacks”, Collin Jackson, Dan Simon, Desney Tan, and Adam Barth,
Proceedings of the Usable Security 2007 Conference (USEC’07), February
2007.
[44] “Re: [hcisec] EV certificates and phishing”, James Donald
<[email protected]>, posting to the hcisec@yahoogroups.com mailing list,
message-ID
45DD1784.5010606@echeque.com, 22 February 2007.
Other Languages, Other Cultures
37
[45] “Improving Authentication On The Internet”, Gervase Markham,
http://www.gerv.net/security/improving-
authentication/
, 12 May 2005.
[46] “User Perceptions of Privacy and Security on the Web”, Scott Flinn and Joanna
Lumsden,
Proceedings of the Third Annual Conference on Privacy, Security and
Trust (PST’05)
, October 2005, http://www.lib.unb.ca/-
Texts/PST/2005/pdf/flinn.pdf
.
[47] “The Human Factor in Phishing”, Markus Jakobsson,
Proceedings of the 6
th
National Forum on Privacy & Security of Consumer Information, January 2007.
[48] “The New Face of Phishing”, Brian Krebs, 13 February 2006,
http://blog.washingtonpost.com/securityfix/2006/02/-
the_new_face_of_phishing_1.html
.
[49] “Phishers try a phone hook”, Joris Evers, CNet News.com, 20 April 2006,
http://news.com.com/Phishers+try+a+phone+hook/2100-
7349_3-6066171.html
.
[50] “Phishers come calling on VoIP”, Joris Evers, CNet News.com, 10 July 2006,
http://news.com.com/Phishers+come+calling+on+VoIP/-
2100-7349_3-6092366.html
.
[51] “The Unresponsive Bystander: Why Doesn't He Help?”, Darley Bibb and John
Latane, Prentice Hall, 1970.
[52] “Help in a Crisis: Bystander Response to an Emergency”, Bibb Latané and John
Darley, General Learning Press, 1976.
[53] “Studies of independence and conformity: A minority of one against a
unanimous majority”, Solomon Asch,
Psychological Monographs, Vol.70, No.9
(1956).
[54] “OpenID: Phishing Heaven”, Ben Laurie, 19 January 2007,
http://www.links.org/?p=187.
[55] “Phishing and OpenID: Bookmarks to the Rescue?”, Ka-Ping Yee, 20 January
2007,
http://usablesecurity.com/2007/01/20/phishing-
and-openid/
.
[56] “OpenID Authentication 2.0”,
http://openid.net/specs.bml.
[57] “A Touch of Money”, Anil Jain and Sharathchandra Pankanti,
IEEE Spectrum
(INT)
, Vol.43, No.7 (July 2006), p.14.
[58] “Timing Attacks on Web Privacy”, Ed Felten and Michael Schneider,
Proceedings of the 7
th
ACM Conference on Computer and Communications
Security (CCS’00)
, November 2000, p.23.
[59] “Case Study: Browser Recon Attacks”, Sid Stamm and Tom Jagatic, in
“Phishing and Countermeasures”, Markus Jakobsson and Steven Myers (eds),
2007.
[60] “Privacy and Freedom”, Alan Westin, Atheneum, 1967.
[61] “Watching the English”, Kate Fox, Hodder & Stoughton Paperbacks, 2005.
[62] “Cross-Cultural Pragmatics: The Semantics of Human Interaction (2
nd
ed)”,
Anna Wierzbicka, Walter de Gruyter, 2003.
The Psychology of Security Usability
38
The Psychology of Security Usability
Some of the problems mentioned in the previous chapter wouldn’t be too surprising
to cognitive psychologists, people who study the mental processes involved in how
people understand, analyse, and solve problems. The field of psychology provides a
great deal of insight into how people deal with security user interfaces, but this very
useful resource is rarely applied to the field of computer security. As the author of
one text on human rationality points out, “The heavenly laws of logic and probability
rule the realm of sound reasoning: psychology is assumed to be irrelevant. Only if
mistakes are made are psychologists called in to explain how wrong-wired human
minds deviate from these laws […] Many textbooks present first the laws of logic and
probability as the standard by which to measure human thinking, then data about how
people actually think. The discrepancy between the two makes people appear to be
irrational” [1]. Since the emphasis was on prescribing what people
should be doing
rather than describing what they
actually did, and the prescriptive approach was
defined to constitute rational behaviour, any deviation from the prescribed behaviour
was judged to be irrational [2].
This chapter looks at how some of the human mental processes that are relevant to
security work, and explores why security user interface elements often perform so
poorly in the real world.
How Users Make Decisions
To help understand how we’ve got into this mess, it’s useful to look at how the
human decision-making process actually works. The standard economic decision-
making model, also known as the Bayesian decision-making model, assumes that
someone making a decision will carefully take all relevant information into account
in order to come up with an optimal decision. As one observer put it, this model
“took its marching orders from standard American economics, which assumes that
people always know what they want and choose the optimal course of action for
getting it” [3]. This model, called Utility Theory, goes back to at least 1944 and John
von Neumann’s work on game theory [4], although some trace its origins (in
somewhat distant forms) as far back as the early 1700s [5].
The formalisation of the economic decision-making model, Subjective Expected
Utility Theory (SEU), makes the following assumptions about the decision-making
process [6][7][8][9]:
1. The decision-maker has a utility function that allows them to rank their
preferences based on future outcomes.
2. The decision-maker has a full and detailed overview of all possible
alternative strategies.
3. The decision-maker can estimate the probability of occurrence of outcomes
for each alternative strategy.
4. The decision-maker will choose between alternatives based on their
subjective expected utility.
To apply the SEU model to making a decision, you’re expected to execute the
following algorithm:
for each possible decision alternative
x = all possible consequences of making a decision, which includes
recursive evaluation of any carry-on effects);
p(x) = quantitative probability for x;
U(x) = subjective utility of each consequence;
p(x) × U(x) = probability multiplied by subjective utility;
How Users Make Decisions
39
SEU total =
n
i
0
p(x
i
) × U(x
i
);
The certificate dialog in Figure 12 is a good example of something designed for the
SEU decision-making model (although this was done inadvertently rather than
deliberately). To decide whether to continue, all you need to do is follow the
algorithm given above. Taking one example mentioned in the dialog’s text, the
possibility of a server misconfiguration as mentioned in the dialog, you can evaluate
the probability of this based on an evaluation, in turn, of the competence of the
remote system’s administrators, the chances that they’ve made an error, the chances
of a software bug, and so on. Then you assign probabilities and utilities to each of
these, say 0.6 for the competence of the remote system’s administrators and 0.85 for
the subjective utility.
Figure 12: SEU decision-making sample scenario
Then you have to consider other factors such as the risk involved. If you enter your
credit card information there’s a certain risk that it’ll be phished and misused, or that
your identity will be stolen, or that some other negative outcome will ensue.
However, balancing this are positive factors such as various credit card consumer
protection measures. Finally, there are more intangible factors such as the emotional
satisfaction of making the purchase (or more pragmatically the emotional trauma of
not making the purchase if it’s something like a birthday present) to justify a certain
amount of risk in the purchase process. The process is rather lengthy and tedious, so
let’s just skip ahead and assume that you’ve worked out all of the values. You can
now evaluate the total sum to get the subjective expected utility of this particular
option. Then you repeat this for all of the other possible options. Finally, you pick
the one with the highest subjective expected utility value, and click on that option for
the dialog.
As even the most cursory examination of this decision-making model will show,
no
normal human ever makes decisions this way
. Even if we assume that the long list of
precise requirements that psychologists tell us must be met in order to be able to
apply this approach have somehow been met [10], making a decision in this manner
requires total omniscience, a quality that’s generally lacking in humans.
An attempt to salvage this model involves introducing the concept of stopping rules, a
search optimisation that allows us to bail out when it’s obvious that there’s no (or at
least no cost-effective) benefit to be obtained by going any further [11][12]. How do
we know when the costs of searching further would outweigh any benefits? Simple,
we just apply the SEU model to tell us when to stop.
The Psychology of Security Usability
40
Oops.
So the stopping-rule patch to the SEU model attempts to model limited search by
assuming that we have unlimited knowledge (and time) at our disposal in order to be
able to figure out when we should stop. To put this another way, if stopping rules
were practical then you shouldn’t be sitting there reading this but should be in Las
Vegas busy applying the stopping rule “stop playing just before you start losing”. As
1994 Nobel prize in economics winner Reinhard Stelten put it, “Modern mainstream
economic theory is largely based on an unrealistic picture of human decision making.
Economic agents are portrayed as fully rational Bayesian maximisers of subjective
utility. This view of economics is not based on empirical evidence, but rather on the
simultaneous axiomisation of utility and subjective probability [...] It has strong
intellectual appeal as a concept of ideal rationality. However, it is wrong to assume
that human beings conform to this ideal” [13].
(Coming from a psychology background it feels very strange to read an economics
text and see long, detailed discussions on the use of decision matrices, decision trees,
and expected value/utility models, complete with worked examples of how to use
them. In the field of economics this is a perfectly sensible way to approach decision
making, for example for a company to decide whether it makes sense to go ahead
with the development and distribution of a new product. On the other hand it doesn’t
really make much sense for the consumer sitting at the other end who’s deciding
whether they should buy the new product).
How Users Really Make Decisions
Now that we’ve looked at how things don’t work, how can we found out how they
actually do work? There are two ways to approach this, we can either use empirical
evaluation, examining and measuring what people do in practice, or we can use
conceptual modelling, taking a set of conceptual models (including, for reference, the
SEU model) and seeing which one best matches (or at least approximates) reality.
The first approach that we’ll look at is the empirical modelling one. Although there
was ongoing work in the 1970s to explore various problems in the SEU model [14], it
wasn’t until the 1980s that the US Department of Defence helped dispel the illusion
of the economic decision-making model when they sponsored research to try and find
techniques for helping battlefield commanders make more effective decisions.
This work showed that, contrary to expectations, people under pressure for a quick
decision didn’t weigh up the relative merits of a set of options and choose the most
optimal one. They didn’t even make a choice from a cut-down subset of options.
Instead, they followed a process that the researchers termed recognition-primed
decision making (RPD), in which they generate options one at a time (without ever
comparing any two), rejecting ones that don’t work and going with the first one that
does [15][16].
(The terminology can get a bit confusing at times, other researchers working
independently have somewhat confusingly called this the Take the Best (TTB)
heuristic, the general concept has been called the singular evaluation approach, and
the overall family is often termed the heuristic decision-making approach, as opposed
to SEU’s economic/Bayesian decision-making approach. The one good thing about
this naming muddle is that it demonstrates independent reproducibility, the fact that
many different researchers independently came up with the same (or at least very
similar) results).
Humans take the RPD approach to making a decision when they can’t hold all of the
necessary information in working memory, or can’t retrieve the information needed
to solve the problem from long-term memory, or can’t apply standard problem-
solving techniques within the given time limit. The probable evolutionary reason for
this means of decision-making is pointed out by the author of a book that examines
human irrationality: “Our ancestors in the animal kingdom for the most part had to
solve their problems in a hurry by fighting or fleeing. A monkey confronted by a lion
would be foolish to stand pondering which was the best tree to climb; it is better to be
wrong than eaten” [17].
How Users Make Decisions
41
This approach to making decisions, sometimes called the singular evaluation
approach, is used under the following circumstances [18][19]:
1. The decision-maker is under pressure (a computer user wanting to get on
with their job automatically falls into the under time-pressure category, even
if there’s no overt external time pressure).
2. The conditions are dynamic (the situation may change by the time you
perform a long detailed analysis).
3. The goals are ill-defined (most users have little grasp of the implications of
security mechanisms and the actions associated with them).
Now compare this with the conditions in the earlier SEU model to see how radical the
difference is between this and the economic model — the two are almost mirror
images!
Singular evaluation is something that you’ve probably encountered yourself in
various forms. For example if you move house into a new area and know that you’ll
eventually need a plumber to install a sink for you, you have the luxury of being able
to make a few inquiries about prices and work times, and perhaps look to neighbours
for recommendations before you make your decision. On the other hand if your
basement is under a metre of slowly-rising water, you’ll go with the first plumber
who answers their phone and can get there within 10 minutes. This is the singular
evaluation approach.
Moving from the purely empirical “How do humans act when making decisions
under pressure”, other researchers have examined the problem from the second,
conceptual modelling angle, “Which existing conceptual model best matches human
decision-making behaviour under pressure”. The best match was a heuristic called
Take the Best, which is just another (somewhat misleading) name for recognition-
primed decision making [20]. So both the empirical and theoretical modelling
approaches yielded the same result for human decision-making under pressure.
One contributing factor towards the popularity of simple heuristics is the fact that it’s
very hard to learn from the feedback arising from complicated decision making. If
there are a large number of variables and causes involved, the diffusive reinforcement
that’s provided isn’t sufficient to single out any one strategy as being particularly
effective. The resulting confusion of false correlations and biased attributions of
success and failure tends to lead to superstition-based decision support. Typical
examples of this are the complex “systems” used for gambling and stock trading,
which arise because following some sort of system makes the participants feel better
than not having any systematic approach at all [21] Compare this lack of effective
feedback with what’s possible from basic recognition-based decision making: “I
bought brand X, I had nothing but trouble with it, therefore I won’t buy brand X
again”.
The game-theoretic/economic approach to decision making is particularly
problematic for security decisions because it treats a decision as a gamble, with some
equivalent of a coin toss or die roll followed by immediate feedback on the value of
the decision. Unfortunately security decisions don’t work this way: there’s no
immediate feedback, no obvious feedback, and (since security failures are usually
silent) there may never been any feedback at all, or at least not until it’s far too late (a
phantom withdrawal from your bank account a year later) to take any corrective
action.
This gambling/game-theoretic model does however perfectly model one aspect of
user behaviour, the portion of the decision-making process that leads to dismissing
warning dialogs. In this case the user gets immediate, strongly positive feedback on
their decision: they can go ahead with their task. Other possible outcomes are
unknown, and may never be known — was the phantom withdrawal the result of
clicking ‘Cancel’ on a dialog, or because a credit card processor lost a backup tape, or
because …?
The Psychology of Security Usability
42
A standard abstract model that psychologists use for the problem-solving process is
the problem-solving cycle [22][23]:
1. Recognise or identify the problem.
2. Define and represent the problem mentally.
3. Develop a solution strategy.
4. Organize his or her knowledge about the problem.
5. Allocate mental and physical resources for solving the problem.
6. Monitor his or her progress toward the goal.
7. Evaluate the solution for accuracy.
Consider how this standard model would be applied to the problem of the security
warning dialog in Figure 12:
1. Recognise or identify the problem.
“There is a dialog blocking my way”.
2. Define and represent the problem mentally.
“If I don’t get rid of the dialog I can’t continue doing my work”.
3. Develop a solution strategy.
“I need to get rid of the dialog”.
4. Organize his or her knowledge about the problem.
“With other dialogs like this if I clicked on the close box or the
‘OK’/’Cancel’ button (as appropriate) the problem went away”.
5. Allocate mental and physical resources for solving the problem.
“Hand, move!”.
6. Monitor his or her progress toward the goal.
“The dialog has gone away, allowing me to continue doing my job”.
7. Evaluate the solution for accuracy.
“Works just fine”.
The user has handled the warning dialog exactly as per the psychological model,
although not nearly as the developer would wish.
When there’s no immediately obvious choice, people’s decision-making abilities go
downhill rapidly. One frequently-used method is to look for some distinction, no
matter how trivial or arbitrary, to justify one choice over the other [24][25]. How
many people go into a store to buy something like a DVD player, can’t decide which
of several near-identical models to buy, and end up choosing one based on some
useless feature that they’ll never use like the fact that one player has a karaoke
function and the other doesn’t?
Other common strategies include procrastination [26][27] (which I’m sure you
already knew, but now you have psychological evidence to confirm it), or to decide
based on irrational emotions [28][29]. This is an appalling way to perform critical,
security-related decision making!
The reason why experts are better at singular-evaluation decision-making than the
average person is that they have access to large, well-organised knowledge structures
[30] and are more likely to come up with a good option as their first choice than the
typical person [31]. Research into how experts solve problems has also indicated that
they tend to spend a considerable amount of time thinking about different
representations of the problem and how to solve it based on underlying principles,
while novices simply go for the most obvious representation and use of surface
features [32][33][34][35]. The experts were able to both cover more solution
strategies and arrive at the final solution more quickly than the novices. In addition
experts are able to use their expertise to frame situations more rapidly and accurately
than novices, and can then use this framing to steer their decision-making. Novices
How Users Make Decisions
43
in contrast lack this depth of experience and have to use surface characteristics of the
situation to guide their actions. There are however situations in which experts don’t
perform nearly as well as expected, and that’s when they’re required to deal with
human behaviour rather than physical processes [36]. Processes are inherently
predictable while human behaviour is inherently unpredictable, causing even experts
to have problems with their decision-making.
Psychological studies have shown that in the presence of external stimuli such as
stress (or in this case the desire to get a job done, which is often the same as stress),
people will focus on the least possible amount of evidence to help them make a quick
decision. Specifically, the external stimuli don’t affect the way that we process
information, but reduce our ability to gather information and the ability to use our
working memory to sort out the information that we do have [37][38][39][40]. Thus
even an expert when flooded with external stimuli will eventually have their decision-
making ability reduced to that of a novice.
Stress can play a critical role in the decision-making process. As the author of the
book on irrationality that was mentioned earlier points out, “it has been found that
any high level of emotion is inimical to the careful consideration of different
alternatives [...] It is not merely strong emotions that cause inflexible thinking; any
form of stress produces it”. The scary effects of this were demonstrated in one
experiment carried out on soldiers who had been trained on how to safely exit a plane
in the event of an emergency. Through an intercom that had been “accidentally” left
on they overheard a (rehearsed) conversation among the pilots in which they
discussed the impending crash of the plane. The group had great difficulty in
recalling their instructions compared to a group who didn’t overhear the conversation
from the cockpit (this was done at a time when the requirements for human
experimentation were considerably more lax than they are now) [41].
A more common occurrence of this type of stress-induced performance failure occurs
when we lose something of value. Instead of carefully and rationally considering
where we last saw the item and where we’ve been since then, our initial reaction is to
frantically search the same place or small number of places over and over again on
the assumption that we somehow missed it the other five times that we looked there.
The inability to exhaustively enumerate all possibilities is actively exploited by stage
magicians, who anticipate how observers will reason and then choose a way of doing
things that falls outside our standard ability to think of possibilities. As a result, they
can make things appear, disappear, or transform in a manner that the majority of
observers can’t explain because it’s been deliberately designed to be outside their
normal reasoning ability [42]. You can try a (rather crude) form of this yourself,
create a description of an object disappearing, ask a bunch of people to list all of the
ways in which they’d explain it away, and then see if you can come up with a way of
doing it that doesn’t involve any of the standard expectations of how it could be done
(if the item that disappears is someone else’s money or valuables then you didn’t
learn this particular strategy here).
Stress-induced behavioural modification is of some concern in security user interface
design because any dialog that pops up and prevents the user from doing their job is
liable to induce the stress response. If you’re currently contemplating using these so-
called warn-and-continue dialogs in your security user interface, you should consider
alternatives that don’t lock users into a behaviour mode that leads to very poor
decision-making.
It’s not a Bug, it’s a Feature!
The ability to sort out the relevant details from the noise is what makes it possible for
humans to function. For example as you’ve been reading this you probably haven’t
noticed the sensation of your clothes on your skin until this sentence drew your
attention to them. The entire human sensory and information-processing system acts
as a series of filters to reduce the vast flow of incoming information to the small
amount that’s actually needed in order to function. Even the very early stages of
perception involve filtering light and sound sensations down to a manageable level.
The Psychology of Security Usability
44
Selective attention processes provide further filtering, giving us the ability to do
things like pick out a single conversation in a crowded room, the so-called cocktail
party phenomenon (or more formally the source separation problem) [43]. At the
other end of the chain, forgetting discards non-meaningful or non-useful information.
Imagine if, instead of using singular evaluation, humans had to work through the
implications of all possible facts at their disposal in order to come to a conclusion
about everything they did. They would never get anything done. There exists a
mental disorder called somatising catatonic conversion in which people do exactly
this, over-analysing each situation until, like the 1960s operating system that spent
100% of its time scheduling its own operational processes when running on low-end
hardware, they become paralysed by the overhead of the analysis. Artificial
intelligence researchers ran into exactly this problem, now called the frame problem,
when they tried to recreate singular evaluation (or to use its more usual name
“common sense”) using computer software [44]. The mechanistic approach resulted
in programs that had to grind through millions of implications, putting all of the
relevant ones in a list of facts to consider, and then applying each one to the problem
at hand to find an appropriate solution.
Framing the problem appropriately often plays a significant part in its solution.
Simply recognising what the problem to be solved actually is (rather than what it
apparently is) can be challenging. Most users will, for example, identify a
commonly-encountered security problem as “There is an annoying dialog box
blocking my way” rather than “There is a potential security issue with this web site”.
A long-standing complaint from employers (and to a lesser extent tertiary educators)
is that most current education systems do a very poor job of teaching problem
representation and problem solving, but simply prepare children to answer well-
defined, carefully presented problems, which doesn’t help much with solving less
well-defined ones. As a result, new hires are often unable to function effectively in
the workplace until they’ve picked up sufficient problem-solving skills so that it’s no
longer necessary for them to be provided with paint-by-numbers instructions for the
completion of non-obvious tasks. Even psychologists still lack detailed
understanding of the processes involved in problem recognition, problem definition,
and problem representation [45].
Even without going to such extremes of (in-)decision making as somatising catatonic
conversion, overattention to detail can lead to other psychological problems. One
(comparatively) milder symptom of this may be obsessive-compulsive disorder or
OCD. The overly reductionist brains of sufferers causes them to become lost in a
maze of detail, and they fall back to various rituals that can seem strange and
meaningless to outside observers in order to cope with the anxiety that this causes
[46]. Singular evaluation in humans isn’t a bug, it’s what makes it possible for us to
function.
Usability researchers have already run into this issue when evaluating browser
security indicators. When users were asked to carefully verify whether sites that they
were visiting were legitimate or not, the researchers had to abort the experiment after
finding that users spent “absurd amounts of time” trying to verify site legitimacy [47].
On top of this, making users switch off singular evaluation lead to a false-positive
rate of 63%, because when the users tried hard enough they would eventually find
some reason somewhere to justify regarding the site as non-kosher. More
worryingly, even after spending these absurd amounts of time trying to detect
problem sites, the users still failed to detect 36% of false sites using standard browser
security indicators, no matter how much time they spent on the problem. As in the
non-computer world, the use of singular evaluation is a basic requirement for users to
be able to function, and a security user interface has to carefully take into account this
human approach to problem-solving.
Reasoning without the use of heuristic shortcuts may be even more error-prone than it
is with the shortcuts. If we accept that errors are going to be inevitable (which is
pretty much a given, particularly if we eschew shortcuts and go with a very
demanding cover-all-the-bases strategy) then the use of shortcuts (which amount to
How Users Make Decisions
45
being controlled errors) may be better than avoiding shortcuts and thereby falling
prey to uncontrolled errors [48].
A number of other evolutionary explanations for different types of human reasoning
have been proposed, and the field is still a topic of active debate
[49][50][51][52][53][54]. One interesting theory is that errors may be an
evolutionary survival mechanism, provided that at least some individuals survive the
consequences of the error [55]. Consider what would happen if no errors (deviations
from the norm) ever occurred. Imagine that during some arbitrary time period, say
about 2½ billion years ago, errors (deviations, mutations, whatever you want to label
them) stopped happening. At that point cyanobacteria were busy converting the
earth’s early reducing atmosphere into an oxidizing one, precipitating the oxygen
crisis that proved catastrophic to the anaerobic organisms that existed at the time.
The ecological catastrophe of changing the atmosphere from 0.02% oxygen to around
21% pretty much wiped out the existing anaerobic life (atmospheric change had been
done long before humans had a go at it). Without mutations (errors), there’d be
nothing left except a few minor life-forms that were immune to the poisonous effects
of the oxygen. So from an evolutionary perspective, error is a necessary part of
learning, adaptation, and survival. Without errors, there is no progress.
A more tongue-in-cheek evolutionary explanation for why we don’t use the SEU
model in practice is provided by psychologists Leda Cosmides and John Tooby: “In
the modern world we are awash in numerically expressed statistical information. But
our hominid ancestors did not have access to the modern accumulation which has
produced, for the first time in human history, reliable, numerically expressed
statistical information about the world beyond individual experience. Reliable
numerical statements about single event probabilities were rare or nonexistent in the
Pleistocene” [56].
Evaluating Heuristic Reasoning
Researchers have identified a wide range of heuristics that people use in choosing one
of a range of options, including very simple ones like ignorance-based decision
making (more politely called recognition-based decision making, if you’re given two
options then take the one that you’re familiar with) and one-reason decision making
(take a single dimension and choose the option/object that displays the greatest
magnitude in that dimension), or one of a range of variations on this theme [57][58].
To determine the effectiveness of the different decision-making heuristics,
researchers ran detailed simulations of their performance across a wide range of
scenarios. The tests involved applying the various decision-making strategies to the
problem of deciding which of two objects scored higher in a given category, with the
strategies ranging from simple recognition-primed decision making through to far
more complex ones like linear regression. The decisions as to which scored higher
covered such diverse fields as high school dropout rates, homelessness rates, city
populations, house prices, professors’ salaries, average fuel consumption per state,
obesity at age 18, fish fertility (!!), rainfall due to cloud seeding, and ozone in San
Francisco. The information available to guide the decisions included (to take the
example of home pricing) current property taxes, number of bathrooms, number of
bedrooms, property size, total living area, garage space, age of the house, and various
other factors, up to a maximum of eighteen factors [59]. In other words the
researchers really left no stone unturned in their evaluation process.
An example of a problem that can be solved through recognition-based decision-
making is the question of whether San Diego has more inhabitants than San Jose (if
you’re from outside the US), or Munich has more inhabitants than Dortmund. Most
people will pick San Diego or Munich respectively, using the simple heuristic that
since they’ve heard of one and not the other, the one that they’ve heard of must be
bigger and better-known (in practice people don’t go through this reasoning process,
they just pick the one that they’ve heard of and things work out) [60]. The
recognition data was taken from a survey of students at the University of Chicago for
the German cities (who rated Munich ahead of several larger German cities, including
its capital with three times the population — never underestimate the effect of beer
The Psychology of Security Usability
46
fests on the student psyche), and the University of Salzburg (which is actually in
Austria) for the US cities. The effectiveness of this heuristic was such that data
gathered from the German-speaking students actually served slightly better in
identifying relative populations of US cities than it did for German ones, a
phenomenon that’ll be explained more fully in a minute.
The amazing thing about these basic heuristics is that when researchers compared
them with far more complex ones like full-blown multiple regression analysis using
all 18 available factors, the accuracy of the more complex and heavyweight multiple-
regression analysis was only slightly better than that of the simple heuristic
techniques [61][62]. These results were so astonishing that the researchers had
trouble believing them themselves. To catch any possible errors, they hired two
separate teams of programmers in the US and Germany to independently reproduce
the results, and when they published them included all of their data so that others
could perform their own replications of the experiments, which many did [58].
One proposed explanation for this unlikely-seeming result is that strategies like
multiple linear regression, which make use of large numbers of free parameters,
assume that every possible parameter is relevant, a problem known as overfitting.
Overfitting is of particular concern in machine learning, where a learning mechanism
such as a neural network may concentrate on specific features of the training data that
have little or no causal relation to the target. Simple heuristics on the other hand
reduce overfitting by (hopefully) filtering out the noise and only using the most
important and relevant details, an unconscious mental application of Occam’s razor.
The result is a performance that approaches that of full-blown linear regression but at
a small fraction of the cost.
The overfitting problems of the more complex methods were demonstrated by an
investigation into how well the prediction methods generalised to making future
predictions. In other words when the model is fed data from a training set, how well
does it make predictions for new data based on the existing training-set data?
Generalisation to non-test data is the acid test for any prediction system, as any IDS
researcher who’s worked with the MIT Lincoln Labs test data can tell you.
The results confirmed the overfitting hypothesis: The performance of linear
regression dropped by 12%, leaving one of the simple heuristics as the overall
winner, at a fraction of the cost of the linear regression [59]. Another experiment
using a Bayesian network, the ultimate realisation of the economic decision-making
model, in place of linear regression, produced similar results, with the full-blown
Bayesian network performing only a few percent better than the simplest heuristic,
but at significantly higher cost [63].
Note that this result doesn’t mean that people are locked into using a single heuristic
at all times, merely that their behaviour for certain types of problems is best modelled
by a particular heuristic. In practice for general problem solving people mix
strategies and switch from one to another in unpredictable ways [64]. The
nondeterminism of the human mind when applying problem-solving strategies is
something that’s currently still not too well understood.
Unfortunately while this heuristic strategy is generally quite effective, it can also be
turned against users by the unscrupulous, and not just attackers on the Internet. For
example recognition-based decision making is directly exploited by the phenomenon
of brand recognition, in which marketers go to great lengths to make their brands
visible to consumers because they know that consumers will choose the marketers
products (brands) in preference to other, less- or even un-recognised brands. In
adopting this strategy they’ve performed an active penetration attack on the human
decision-making process (as with a number of other methods of exploiting human
behaviour, the marketers and fraudsters figured out through empirical means how to
exploit the phenomenon long before psychologists had explored it or determined how
or why it worked).
(Note that current ideas on heuristic reasoning almost certainly aren’t the last word on
human decision-making processes, but simply represent the best that we have at the
moment. In particular there’s no overall, unifying theory for human decision-making
How Users Make Decisions
47
yet, just a series of descriptive concepts and astute observations. So the material
that’s presented here represents the newest (or at least the more influential) thinking
by experts in the field, but isn’t necessarily the definitive answer to questions about
human decision-making. What’s missing in particular is more information on the
psychological mechanisms by which human decision-making processes operate).
Consequences of the Human Decision-making Process
Psychologists distinguish between the two types of action taken in response to a
situation as automatic vs. controlled processes [65][66]. Controlled processes are
slow and costly in terms of the amount of mental effort required, but in compensation
provide a great deal of flexibility in handling unexpected situations. Automatic
processes in contrast are quick and have little mental overhead. While controlled
processes are associated with deliberate action, automatic processes are essentially
acting on autopilot. Because of this, automatic processing is inherently parallel (it’s
possible to handle multiple tasks at the same time) while controlled processing is
strictly serial (you have to focus exclusively on the task at hand). Another way of
distinguishing the two types of processes is in the level of control that we have over
them: one is voluntary, the other is involuntary [67][68][69].
A good illustration of the difference between controlled and automatic actions is the
difference between a novice and an experienced driver. A novice driver has to
manually and consciously perform actions such as changing gears and checking the
rear-view mirror, while for an experienced driver these actions occur automatically
without any conscious effort. To cope with the inability to handle the driving process
via automatic actions, novice drivers load-shed by ignoring one of the two main
aspects of driving (speed control and steering), with the result that they crawl down
the road at an irritatingly slow speed while they concentrate on steering. It’s not until
both aspects of vehicle control have become automatic processes that the novices
progress to the level of more experienced drivers [70].
Another example of this occurs when you drive in to work (or along some other
familiar route) and you start thinking about something else. Suddenly you’re at your
destination and you can’t really recall any part of the process that got you there. This
isn’t something as simple as leaving the iron on, this is a long and very involved
process to which a lot of resources are allocated. The brain was paying attention, but
it was an automatic process so you’re not conscious of it [71].
You can see this effect yourself if you write something simple like your name or the
weekday repeatedly across a piece of paper and at some point start counting
backwards from 100 while you write. Look at what happens to either your writing
speed or writing quality when you do this, depending on which load-shedding
strategy you choose to adopt. Now try it again but this time sign your name (an
automatic process for most people) and see what happens.
This simple experiment in fact mirrors some of the early investigations into the
phenomenon of attention that were carried out in the 1950s, which involved seeing
whether and how much performing one task interfered with another [72][73]. The
initial theory was that there was some cognitive bottleneck in the human information-
processing system, but alongside this bottleneck metaphor there’s a whole list of
others including a filtering metaphor that assumes that humans have a limited
information-processing capacity and therefore use a kind of selective filter to protect
themselves from overload, a serial vs. parallel-processing metaphor in which parallel
processing has to switch to serial processing under load, an economic metaphor that
models attention as a limited resource that’s allocated as required, a performance-
oriented characteristic model that tries to measure the resources allocated to each
attention task, and numerous others as well. The debate over which model is the most
accurate one and exactly how and why the finite-attention effect occurs (and even
whether we should be calling this stuff “attention”, “effort”, “capacity”, or
“resources”) hasn’t stopped since the very first metaphors were proposed [74][75].
The nature of this unconscious processing has been explored by psychologists
working with hypnotised patients. If you’ve ever seen a stage hypnotist you’ll know
The Psychology of Security Usability
48
that a common trick is to make people perform some relatively unpleasant or unusual
act while making them think that they’re actually performing a benign act. One
example of this that’s been used by hypnotists is to get people to eat an onion while
thinking that it’s an apple. Going for something a bit less showy, psychologists
prefer to get subjects to stick their hands in very cold water, a harmless method of
inducing pain, which hypnotised subjects can ignore.
In the 1970s, psychology professor Ernest Hilgard looked into this a bit further by
telling hypnotised subjects that he was going to talk to their other selves, and these
other selves were very much aware of what was really going on. This phenomenon
occurs in an even more extreme form in people with dissociative identity disorder
(formerly known as multiple personality disorder), and despite a large number of
hypotheses covering this type of conscious vs. unconscious processing we really
don’t have much clue as to what’s really going on, or in some cases even whether it’s
really going on: because of the observer effect, the act of trying to observe something
may be changing what we’re observing.
One characteristic of an automatic process is that it’s triggered by a certain stimulus
and, once begun, is very hard to stop, since there’s no conscious effort involved in
performing it. This makes automatic processes very hard to control: Present the right
stimulus and the body reacts on its own. This is why people click away confirmation
dialogs without thinking about it or even being aware of what they’re doing (lack of
conscious awareness of an action is another characteristic of automatic processes).
Several of the theoretical models proposed for that have been used to try and analyse
the mechanisms involved in controlled vs. automatic processes. For example the
serial vs. parallel model of attention that was mentioned earlier treats an automatic
process as a parallel process that doesn’t draw on attentional capacity, while a
controlled process is a serial process that does [76][77]. Examinations of automatic
processes have been rendered quite difficult by the fact that most of the processing
takes place at a level below conscious awareness. For example, how would you
explain (or in psychological terminology, introspect) the processes involved in
coming up with the words to produce a sentence? You can be aware of the outcome
of the processing, but not the working of the underlying processes.
As with several of the other psychological phenomena that are covered here, thinking
that an ability to force people into a particular way of doing things will fix whatever
the problem that you’re trying to address is, isn’t necessarily valid. Experimental
psychologists have found that trying to turn an automatic process back into a
controlled process can actually reduce performance [78]. The reason for this is that
directing attention at the process in order to have more control interferes with its
automatic execution, making the overall performance worse rather than better.
From a psychological perspective, judgemental heuristics in the form of automatic
processes work well in most cases by saving users time, energy, and mental capacity.
As psychology professor Arne Öhman puts it, “conscious mental activity is slow, and
therefore conscious deliberation before defensive action is likely to leave the genes of
the potential prey unrepresented in the next generation” [79]. Unfortunately while
automatic processes are a great relief when dealing with pointless popup dialogs it
suffers from the problem that attackers can take advantage of the click, whirr
response to stimulate undesirable (or desirable, from the attacker’s point of view)
behaviour from the user, tricking them into overriding security features in
applications to make them vulnerable to attack. This aspect of user behaviour in
response to SSL certificates is being exploited by phishers through the technique of
“secure phishing”, in which attackers convince users to hand over passwords and
banking details to a site that must be OK because it has a certificate.
In 2005, the first year that records for this phenomenon were kept, over 450 such
secure phishing attacks were discovered. These used a variety of techniques ranging
from self-signed certificates and cross-site scripting and frame injection to insert
content into real financial web sites through to the use of genuine certificates issued
to sound-alike domains [80]. An example of the latter type of attack was the
visa-
secure.com
one aimed at Visa cardholders. Since Visa itself uses soundalike
How Users Make Decisions
49
domains such as verifiedbyvisa.com and visabuxx.com and the site was
otherwise indistinguishable from a real Visa site, the phish proved to be extremely
effective [81]. As Figure 13 shows, Visa isn’t the only company with this problem.
Figure 13: American Express certificate for a different site
The use of multiple completely unrelated domains is fairly common among financial
institutions. Citibank for example uses alongside the obvious
citibank.com six
other unrelated domain names like
accountonline.com, with Figure 14 being
one example of such a domain (with a wrong certificate certifying it to boot). The
domain
citibank-verify.4t.com on the other hand is (or was) a phishing site,
complete with a legitimate CA-issued certificate. Other domains in the “citibank”
namespace alone include
citibank-america.com, citibank-
credicard.com
, citibank-credit-card.com, citibank-credit-
cards.com
, citibank-account-updating.com, citibank-
creditcard.com
, citibank-loans.com, citibank-login.com,
citibank-online-security.com, citibank-secure.com,
citibank-site.com, citibank-sucks.com, citibank-update.com,
citibank-updateinfo.com, citibank-updating.com,
citibankaccount.com, citibankaccountonline.com,
citibankaccounts.com, citibankaccountsonline.com, and
citibankbank.com, of which some are legitimate and some aren’t. For example
citibank-account-updating.com is owned by Ms.Evelyn Musa,
ezayoweezay_haloby[email protected]om.
The Psychology of Security Usability
50
Figure 14: One of Citibank's many aliases
Another example of unrelated domain name usage is the Hanscom Federal Credit
Union (serving the massive Hanscom air force base, a tempting target), which uses all
of
www.hfcu.org, locator.hfcu.org, ask.hfcu.org,
calculators.hfcu.org, www.loans24.net,
hfcu.mortgagewebcenter.com, secure.andera.com,
secure.autofinancialgroup.com, hffo.cuna.org,
www.cudlautosmart.com, www.carsmart.com,
reorder.libertysite.com, www.ncua.gov, www.lpl.com,
anytime.cuna.org, usa.visa.com, and www.mycardsecure.com.
Although obfuscated names like
hffo.cuna.org aren’t (at least in this case) being
run by Evelyn Musa in Nigeria, it’s hard to see what relates something like
“libertysite” to “Hanscom Federal Credit Union”.
Figure 15: Bank of America training future phishing victims
Another example, of a site registered to Douglas-Danielle and hosted at
Qwest.net, is shown in Figure 15. This site asks users for their home address,
account numbers, and Social Security Number. Despite repeated requests to Bank of
America to fix this (it’s a legitimate site, only it carries all the hallmarks of a phishing
site), the problem has been present since at least 2003 [82] and was still active as of
the time of writing.
The reason why people are so bad at spotting these phishing attacks is that they’re not
very good at either generating testable hypotheses or designing tests that falsify
hypotheses, a fact that con artists and salespeople have exploited for centuries, if not
millennia [83][84]. Scientists on the other hand, a subgroup whose lives revolve
around rationality and seeking the truth, know that good science consists of designing
an experiment to try and demonstrate that a theory is wrong. For example a standard
statistical technique consists of generating a null hypothesis as a sceptical reaction to
the research hypothesis that the research is designed to investigate, and then proving
it wrong. So if the research hypothesis postulates that “factor X is significant” then
How Users Make Decisions
51
the null hypothesis would be that “factor X is not significant”, and the study or
experiment would attempt to prove the null hypothesis wrong. This technique is used
in statistics because sampling variation means that we can never prove a hypothesised
value true (another set of samples might produce a slightly different result), but what
we can do is determine whether there is evidence to rule out a hypothesised value. To
put it more simply, it’s much easier to tell someone that they’re wrong than to tell
them what the correct answer is.
The US Navy has addressed this inability to generate testable hypotheses in the
reassessment of tactical decision making that occurred after the accidental shootdown
of an Iranian civilian airliner in July 1988. Part of this reassessment included the
introduction of the so-called STEP cycle, which involves creating a Story
(hypothesis), Testing the hypothesis, and then Evaluating the result [85]. In other
words it makes the creation and application of testable hypotheses an explicit part of
the tactical decision-making process.
An ability to focus on evidence that falsifies a hypothesis seems to be an innate aspect
of geek nature. There’s an interesting demonstration that’s performed by the
convenor of the New Security Paradigms Workshop to demonstrate that a group of
geeks, when given a large amount of correct information and one item of incorrect
information, will focus immediately on the one item that’s wrong rather than the
many items that are correct (a variation of this is the joke that the best way to solicit
information on Usenet isn’t to post a question but to post an incorrect answer). The
rest of the population however is far more likely to merely try to confirm their
hypotheses, a dangerous approach since any amount of confirmatory evidence can be
negated by a single piece of evidence that falsifies it.
Confirmation Bias and other Cognitive Biases
The practice of seeking out only evidence that confirms a hypothesis is known as
confirmation bias and has been recognised since (at least) the time of Francis Bacon,
who observed that “it is the peculiar and perpetual error of the human understanding
to be more moved and excited by affirmative than negatives” [86]. One of the first
rigorous investigations into the confirmation bias phenomenon was carried out by
Peter Wason using a type of task that has since gained fame (at least among
psychologists) as the Wason selection task. In one common form, Wason’s 2-4-6
task, subjects were given a sequence of three numbers such as { 2, 4, 6 } and asked to
determine the rule that governed the sequence by generating a sequence of their own
that they thought followed the rule. When they’d done this, they checked the
accuracy of their prediction by asking the experimenter whether their estimation
followed the actual rule. While the actual rule was a very simple “any ascending
sequence”, the subjects tried to come up with far more complex rules (“even
numbers”, { 4, 6, 8 }) and never really tried to disconfirm them ({ 4, 5, 6 }) [87].
Although there have been repeated attempts to try and improve performance on this
task by re-framing it as a different problem, for example by turning it into an exercise
to determine whether someone is old enough to drink or not (the Drinking Age
problem) [88], researchers aren’t sure that this re-framing is actually valid since it’s
now created an entirely new problem to solve, one that engages quite different
reasoning processes than the Wason selection task. Work on this is ongoing, with
numerous explanations for why people perform better with the Drinking Age problem
than with the Wason selection task despite the fact that under the surface they’re
actually the same thing [89].
The human tendency towards confirmation bias has been extensively explored by
both philosophers and psychologists [90][91][92]. This bias is the reason why social
scientists use as a standard tool a 2 × 2 table (more generically a two-way table of
counts or contingency table) that requires them to enter statistics for and thereby
explore all four sets of probabilities pr( A and B ), pr( A and ¬B ), pr( ¬A and B ),
and pr( ¬A and ¬B ), rather than just the confirmation-biased pr( A and B ) option.
Psychologists have studied the phenomenon of humans cooking the facts to support
the conclusions that they want to reach in great detail. For example when people are
The Psychology of Security Usability
52
exposed to a test or evaluation that makes them look bad, they tend to seek out
information that questions the validity of the test; conversely, when it makes them
look good, they seek out information confirming its validity [93][94]. Psychologists
call the practice of seeking out material that supports your opinions and avoiding
material that challenges them dissonance-motivated selectivity. In one experiment to
investigate this effect in which participants were asked to evaluate the effectiveness
of capital punishment based on the outcomes of two studies, they chose whatever
study produced the conclusion that matched their own personal beliefs on capital
punishment and came up with detailed reasons for why the other study was flawed
[95]. Subsequent studies showed that even trained scientists fell into this trap [96].
Like our physical immune system, we have a psychological immune system that
allows us to feel good and cope with situations, and the above examples are instances
of our psychological immune system at work [97].
An example of this inability to generate testable hypotheses was the alarming practice
used by one user in a phishing study to determine whether a site was genuine or not:
She entered her user name and password, and assumed that if the site allowed her in
then it was the real thing, since only the genuine site would know her password (the
same practice has been reported among other users) [98]. Hearing of practices like
this makes security peoples’ toes curl up. (Having said that though, if the security
people had implemented password-based authentication properly in the first place
then this would be a perfectly valid site-validity check).
The reverse-authorisation fallacy demonstrated by the user in the previous paragraph
has been exploited by fraudsters for decades, if not centuries. In one variation,
someone claiming to be a private investigator will enter a store to apprehend a
shoplifter. They inform the shop owner that they need to take the goods that were
being stolen away as evidence, and have the shop-owner fill out an impressive-
looking amount of paperwork to cover this. Once the owner has finished carefully
authenticating themselves to the “investigator”, he leaves with the goods and the
“shoplifter”. Because of the detailed authentication process that they’ve gone
through, many shop owners don’t see anything wrong with letting the “investigator”
walk out the door with the same goods that they’d have called the police for if the
“shoplifter” had done the same thing.
Hand-in-hand with the confirmation-bias problem is the disconfirmation bias
problem, the fact that people are more likely to accept an invalid but plausible
conclusion than a valid but implausible one [99]. In other words people will believe
what they want to believe, and the beliefs themselves are often based on invalid
reasoning. This is why people are ready to accept a site that looks and behaves
exactly like their bank’s home page, but that’s hosted in eastern Europe — there must
be a transient problem with the server, or the browser has got the URL wrong, or
something similar.
To cap it all off, there’s even a nasty catch-22 bias called blind-spot bias that blinds
us to our own cognitive biases [100][101]. Cognitive biases are of sufficient concern
to some organisations that those who can afford to do so (for example the CIA) have
published special in-house training manuals on dealing with them [102] The CIA
was particularly concerned with something called projection bias (which they refer to
as the “everyone thinks like us” mind-set), the assumption that everyone else has
goals similar to those of the CIA, an assumption that has lead to numerous problems
in the past [103]. The book (which you can read online, it’s a worthwhile read)
contains strategies used to teach intelligence analysts how to think more open-
mindedly about issues. Although the original goal of the work was to train
intelligence analysts, and it’s mentioned here as an example of an organisation
dealing with cognitive biases, some of the techniques are actually quite useful for
analysing security software. For example one technique, premortem analysis, is
covered in the chapter on security user interface testing as a means of performing
software failure analysis.
Projection bias has repeatedly hit security applications, which make the assumption
that once the peer has authenticated (or otherwise apparently proven) themselves,
their goals must match those of the user, programmer, or local system. In other
How Users Make Decisions
53
words an entity like a remote SSH client would never dream of first authenticating
itself and only then launching a buffer overflow or malformed-packet attack. As a
result applications perform rigorous (or in the case of some SSH implementations at
least half-hearted) data checking up until the authentication phase, but very little
checking afterwards, making themselves trivially vulnerable to any data formatting
attacks that follow.
Variations of the projection bias problem include a blind trust in signed data (if the
executable or email is signed it has to be OK, an attacker might try and subvert the
signature but would never think of trying an attack by putting malformed XML inside
the signed container), Unix systems’ checking of security parameters only on first
access but not thereafter, and in general the whole firewall mentality where as long as
something appears on the correct port or with the correct header, it’s OK to let
everything that follows in or out.
The projection bias goes beyond humans and extends to other species as well. The
staphylinid beetle generates allomones (something like bug pheromones) specific to
ants and sprays its chemical passport at soldier ant guards. The guards now believe
that the beetle is an ant larva and carefully carry it into the colony. Once it’s past the
perimeter everyone simply assumes that the beetle is “one of us”, and the beetle-
shaped “larva” is free to eat other larva-shaped larvae without anyone interfering.
Geeks vs. Humans
Unlike the hapless phishing-study subject, techies are accustomed to living so far off
the end of the bell curve that they can’t see that entering their password to see if it’s
accepted is a perfectly sensible site-legitimacy test for many users. Dismissing this
issue with the comment that “well, computer programmers are just weird” (or at least
have thought processes that are very different from those of the typical user) provides
more than just a catchy sound bite though. If you look at the Myers-Briggs type
indicator profile (MBTI, a widely-used psychometric for personality types
4
), you’ll
find that a large proportion of programmers have
TJ traits (other traits in the profile
like introvert vs. extrovert aren’t relevant to this discussion). For example research
has shown that
SF personality types obtain less than half the score of the opposing-
personality
NT types in code reviewing (debugging) tasks [104]. NT types are
thought of as being “logical and ingenious”, with a particular aptitude for solving
problems within their field of expertise.
This same strong personality-type bias applies to the field of computer security as
well, with security exhibiting a predominance of INTJ types [105]. This means that
security people (and programmers in general) tend to have long attention spans,
construct mental models in order to make sense of things, take time to order and
process information before acting on it, and make decisions with their heads rather
than their hearts [106][107][108]. Why does this make programmers weird? Because
only 7% of the population have this particular personality profile
5
. In other words
93% of the users of the software that they’re creating have minds that handle the
software very differently from the way that its creators do.
Here’s another instance of the difference between geeks and normal humans, using as
an example vendors’ exploitation of the recognition heuristic via the mechanism of
brand recognition [109]. Someone walks into a consumer electronics store wanting to
buy a DVD player and sees a Philips model and a Kamakuza model. Non-geeks will
simply take the one that they recognise (applying recognition-primed decision
making), unless there’s some significant differentiating factor like one being half the
price of the other (although in many cases the fact that the recognised brand is twice
the price of the other one will only reinforce the desire to take the brand-name
4
Strictly speaking MBTI classifies personalities by Jungian personality types rather than being a “personality test”.
In particular just because many geeks have MBTI trait X doesn’t mean that anyone who has trait X makes a good
geek.
5
When I was a student, a sociologist gave one of the Computer Science years an MBTI test. The results were a
singularity way off the end of the bell curve. I have no idea what she did with the results, although I’m sure the
term “anomalous” appeared in her report.
The Psychology of Security Usability
54
model). Geeks will look at the Kamakuza model and notice that it supports DivX,
XviD, WMA, and Ogg Vorbis and has an external USB input and SD card slot for
playing alternative media (applying the economic decision-making model), and buy
the Kamakuza player. For both sides it’s a perfectly natural, sensible way to make a
decision, and yet they’ve come to completely opposite conclusions over what to buy.
If you’re still not convinced, here’s a third example of the difference between geek
and standard human mental processes, this time going back to the field of
psychology. Consider the following problem:
All of Anne’s children are blond.
Does it follow that some of Anne’s children are blond?
Is this statement, called a subalternation in Aristotlean logic, true or false (if you’re a
logician, assume in addition that the set of Anne’s children is nonempty)? Most
geeks would agree that the inference from “all A are B” to “some A are B” is valid.
However, 70% of the general public consider it false [110], and this result is
consistent across a range of different cultures and with various re-phrasings of the
problem (in the experimenters’ jargon, the result is robust) [111][112][113]. The
reasons why people think this way are rather complex [114] and mostly
incomprehensible to geeks, for whom it’s a perfectly simple logical problem with a
straightforward solution.
User Conditioning
User conditioning into the adoption of bad habits presents a somewhat difficult
problem. Psychologists have performed numerous studies over the years that
examine people’s behaviour once they’ve become habituated into a particular type of
behaviour and found that, once acquired, an (incorrect) click, whirr response is
extremely difficult to change, with users resisting attempts to change their behaviour
even in the face of overwhelming evidence that what they’re doing is wrong [115].
Gestalt psychologists called this phenomenon “Einstellung”, which can be translated
as “set” or “fixity” but is better described using the more recent terminology of an
inability to think outside the box. Any number of brain-teaser puzzles take advantage
of the human tendency to become locked into a certain Einstellung/set from which
it’s very hard to break free. For example one party game that exploits this involves
having the organiser place a blanket or sheet over a participant and telling them that
they have something that they don’t need and should hand over to the organiser. The
participants typically hand over all manner of items (including, if it’s that sort of
party, their clothing) before they realise that the unneeded item is the blanket that’s
covering them — their Einstellung has blinded them to seeing this, since the blanket
functions as a cover and thus doesn’t come into consideration as a discardable item.
Software vendors have in the past tried to work around users’ Einstellung, the
tendency to stick with whatever works (even if it works really badly) by trying to
“cure” them of the habit with techniques like a tip-of-the-day popup and, notoriously,
the MS Office paperclip, but the success of these approaches has been mixed at best.
Once they adopt a particular belief, people are remarkably reluctant to change it even
in the face of strong disconfirmatory evidence. In one of the first studies into this
phenomenon, experimenters asked student to try and distinguish between fake and
authentic suicide notes as part of a “problem-solving exercise”. They then
“evaluated” the students and told them that their performance was below average,
average, or above average. Finally, they informed them that the ratings that they’d
been given were in fact entirely random and showed them the planning paperwork for
the experiment indicating how it was to play out and who’d be given what random
feedback.
Despite all of this evidence that their performance ratings were entirely arbitrary,
students continued to rate themselves as below average, average, or above average
even though they’d been told that the evidence they were basing this on was
completely fictitious [116].
User Conditioning
55
The likely cause of this phenomenon is the fact that receiving a particular type of
feedback creates a search for further confirmatory evidence to support it. For
example if a subject is told that they’ve done well at the note-interpretation task then
they might recall that they’d also done well in a psychology paper that they took,
made friends easily, were empathic to the needs of others, and so on. Conversely,
someone getting feedback that they’d done poorly might recall that they’d had
particular problems with some aspects of their psychology paper, often felt lonely or
isolated, and had difficulty interpreting others’ feelings [117].
This is a variation of the Barnum effect, from P.T.Barnum’s other famous saying
“we’ve got something for everyone” (this is also known more formally as the
subjective validation effect). In the Barnum effect, people will take a generalisation
and interpret it as applying specifically to them. Generations of psychics, tarot
readers, and crystal-ball gazers have exploited the Barnum effect to their financial
and sometimes social advantage. In one experiment carried out with a professional
palm-reader, the experimenters asked him to tell his customers the exact opposite of
what his readings were indicating. His customers were equally satisfied with the
accuracy of either the literal outcome of the readings or their exact opposite [118]
(experimental psychologists like to mess with people’s minds).
The Barnum effect is also known as the Forer effect after the psychologist Bertram
Forer, who carried out an experiment in which he presented his students with
personality analyses assembled from horoscopes and asked them to rate the analyses
on a five-point Likert scale, with a score of five representing the best possible match
for their personality. The average accuracy score assigned by the subjects was 4.26.
The students had all been given exactly the same, very generic “personality analysis”
[119]
There are a huge number of variations of experiments that have been carried out to
investigate the Barnum/Forer effect [120], presumably because it’s so much fun to
take a poke at common fallacies and present the results. For example in one
experiment subjects were given a political statement and told that it was written by
either Lenin or Thomas Jefferson. Those who were told that it was by Jefferson
recalled it as advocating political debate. Those who were told that it was by Lenin
recalled it as advocating violent revolution [121].
The Barnum effect is in many cases so obvious that it’s even entered popular folklore.
One common example is the number of people who find it almost impossible to read
about a new medicine or therapy without immediately discovering that they suffer
from a large number of the symptoms of whatever it’s supposed to cure and require
immediate treatment, something that was already providing material for humorists in
the late 1800s [122].
A standard theme in the psychological literature is the recognition that humans are
primarily pattern recognisers (click, whirr) rather than analytical problem solvers, and
will attempt to solve any problem by repeatedly applying context-specific pattern
recognition to find a solution before they fall back to tedious analysis and
optimisation. The psychological model for this process is the generic error modelling
system (GEMS), in which someone faced with a problem to solve first tries repeated
applications of a rule-based approach (“if ( situation ) then ( action )”) before falling
back to a knowledge-based approach that requires analysing the problem space and
formulating an appropriate course of action. This fallback step is only performed
with extreme reluctance, with the user trying higher and higher levels of abstraction
in order to try and find some rule that fits before finally giving up and dropping back
to a knowledge-based approach [123].
It’s therefore important to not dismiss the click, whirr response as simply grumbling
about lazy users. It’s not even grumbling about lazy users with psychological
justification for the grumbling. This is a statement of fact, an immutable law of
nature that you can’t ignore, avoid, or “educate” users out of. Click, whirr is not the
exception, it’s the environment, and you need to design your user interface to work in
this environment if you want it to work at all.
The Psychology of Security Usability
56
Going beyond the genetically-acquired resistance to some types of security measures,
security policies often expect us to behave in ways that are contrary to deeply-
ingrained social conditioning. For example when we pass through a door, social
etiquette demands that we hold it open for anyone following us. Security demands
that we slam it in their face to prevent tailgating. Security policies at workplaces
often require that we behave in ways that are perceived to be, as one study of user’s
reactions puts it, “paranoid” and “anal” [124]. Above and beyond the usual
indifference to security measures, security requirements that conflict with social
norms can meet with active resistance from users, who are unlikely to want to aspire
to an image of being an anal-retentive paranoid.
This fact was emphasised by results in the study which found that the sharing of
passwords (or more generally logon credentials) was seen as a sign of trust among co-
workers, and people who didn’t allow others to use their password were seen as
having something to hide and not being team players. Two-factor authentication
tokens make this even worse because while giving someone access to a password-
protected resource typically entails having the password owner log on for you, with a
two-factor authentication token it’s easier to just hand over the token to the requestor
on the understanding that they’ll return it in good time.
Security and Conditioned Users
Microsoft has encountered habituation problems in its automatic security update
system for Windows, which automatically downloads security updates without
requiring the process to be initiated by the user, since most users never bother to do
so. However, before silently installing updates, Windows tells the user what’s about
to happen. Microsoft found that considerable numbers of users were simply clicking
‘Cancel’ or the window-close control whenever it popped up because all they wanted
was for the dialog to go away [125]. Once habituation had set in, this became an
automatic action for any popups that appeared. Apart from the usual problem of user
reactions to such dialogs, an extra contributing factor in this case would have been the
fact that many Windows machines are so riddled with adware popups that users
treated the security update dialog as just another piece of noise to be clicked away.
An unfortunate downside of this transformation into nagware is that when the reboot
dialog pops up, it steals the user’s input focus. If they’re in the middle of typing
something when the focus-stealing occurs, whatever they happen to be typing is used
as their response to the dialog. Since hitting a space bar is equivalent to clicking
whatever button has the input focus, there’s a good chance that being interrupted
while typing text will automatically activate whatever it is that the dialog is nagging
you about. In the case of Windows Automatic Update, the nag is about a reboot of the
machine with all your work on it. As a result users have resorted to such desperate
measures as disabling the Automatic Updates service in order to get rid of the
nagging [126].
Habituation overriding safety occurs outside the computer as well. When British Rail
introduced a safety feature to its trains in which the engine driver was required to
press a button within three seconds of passing a danger signal, after which a klaxon
sounded and then the brakes were automatically applied, they found that drivers
became habituated into pressing the button, thereby overriding the safety mechanism.
In 1989, a driver went through two successive danger signals in this manner,
eventually colliding with another train and killing five people [127].
User Conditioning
57
Figure 16: Desktop noise to be clicked away
Another situation where the click, whirr response occurs is with the copy of the
Norton/Symantec security software that seems to come standard with any computer
purchased from a large vendor like Dell, Gateway, or HP (the software vendors pay
the computer companies up to US$3 per desktop icon to get their products into the
customer’s focus, helping to subsidise the cost of the computer). Since the software
is sold on a subscription basis it expires after a year leaving the computer
unprotected, doubly so because it deactivates the Windows firewall by its presence.
The results, as illustrated in Figure 16, are predictable: “a large proportion of these
[virus-infected] systems had some form of Norton AV installed, and EVERY
SINGLE ONE had a virus subscription which had lapsed. Entirely useless in
protecting those computers” [128]. Like Windows Update, the Symantec nag screen
habituates people into dismissing it without thinking, even more so because it’s
demanding time and money from the user rather than merely asking permission to
install. Although this is more a business-model issue than a security usability one,
it’s worth noting at this point that using the subscription model to sell security
software may be wonderful for the bottom line, but it’s terrible for security.
One minor aid in trying to fix this problem is to remove the window-close control on
the dialog box, providing a roadblock to muscle memory for users who have fallen
into the habit of automatically clicking close to get rid of any pop-ups (even without
this motivation, putting close boxes on dialogs counts as an interface design blooper
because it’s not clear whether clicking the close control represents an ‘OK’ or
‘Cancel’ action for the dialog). The additional step of making the dialog modal
forces the user to pay attention to it. For a while in the 90s, modal dialogs were
regarded as Evil, and so application developers went to great lengths to avoid them.
As a result, far too many applications allow users to pile up a stack of (ignored) non-
modal dialogs while ploughing ahead in an unsafe manner.
Unfortunately, this isn’t possible in all circumstances. For example in extensive
usability testing, Microsoft found that so many users were becoming trapped by
badly-designed wizards created by third-party vendors that they had to remove the
ability to disable the Cancel button and Close controls on wizards in order to protect
users against poorly-designed applications [129].
A better approach to the problem, used by Apple in OS X, is to launch a full-blown
application (in this case Software Update) in an attempt to garner more respect from
the user. Apple also distinguishes security updates from general software updates,
allowing users to apply only critical fixes and leave their system otherwise
untouched, since many users are reluctant to make changes for fear of “breaking
something”.
The Psychology of Security Usability
58
Even steps like resorting to the use of modal dialogs is hardly a proper solution. The
term “modal dialog” is geek-speak for what users call “a dialog that prevents me from
getting my work done until I get rid of it”. Like Pavlov’s dogs, users quickly learn
that clicking the close or cancel button allows them to continue doing what they want
to do. Every time you ask a user to make a choice that they don’t care about, you’ve
failed them in your interface design. Designing your application to minimise or even
avoid the use of question/confirmation dialogs (modal or non-modal) is far better than
trying to come up with ways of coercing users into paying attention to the problem
presented in the dialog.
If you redesign your application to get rid of unnecessary warning dialogs, you need
to be careful how the replacement functionality works. For example the Firefox
browser developers (and as a follow-on effect some developers of Firefox extensions)
made a conscious effort to deprecate warning dialogs in place of notification ribbons
that appear at the top or bottom border of the window to inform users that the browser
or extension has blocked some potentially malicious action. Unfortunately the
implementation of the ribbon sometimes fails to follow through on the optimised
design since it merely provides a shortcut to the usual dialog-based interface. For
example the ribbon that Firefox displays when it blocks a popup or prevents the
installation of an extension leads to the full edit-site-permissions dialog in the
browser’s options menu. As a result, if the user wants to allow a one-off install of a
component, they have to add the site as a trusted site, add the component, navigate
down through the browser menus to the trusted-site dialog (which they may not even
know exists, since it’s only presented in response to clicking on the ribbon),
remember which site they’ve just added, and remove it again.
Apart from being a pain to do (users will invariably leave a site permanently in the
trusted-sites list rather than go through the rigmarole of removing it again), this also
leads to a race-condition attack in which a site installs a harmless extension and then,
in the time it takes to turn site installs off again, installs a more malicious one.
Alternatively, a malicious site can simply rely on the fact that for most users it’ll be
too much bother to remove the site again once they’ve added it, leaving them open to
future malicious content from the site. A better approach would have been to allow
the site’s action on a one-off basis for just that action and no other, something that’s
already done by some of the many threat-blocking plugins that exist for Firefox.
Security and Rationality
As psychologist James Alcock reminds us, our brains evolved to increase our chances
for survival and reproduction, not to automatically seek the truth [130][131]. Quick
and dirty techniques that more or less work serve evolutionary goals better than
purely rational ones that require more time and effort [132].
As a result humans have become very good at rationalising away inconsistencies, and
in general at making up explanations for almost anything. In one experiment subjects
were given a canned generic biography of a person and then told some fact about
them such as “he committed suicide”, “he became a politician”, “he joined the Peace
Corps”, or “he joined the Navy”. In every case they were able to explain the fact via
some item in the short bio, often using the same item to explain diametrically
opposite “facts” [133]. As with the earlier suicide-note interpretation experiment,
when they were later told that the information that they’d based their belief on was
pure fiction, they still drew inferences based on the false information!
In other words people will concoct arbitrary (but plausible) explanations for things
and then continue to believe them even when they’re told that the original
information is pure fiction. The need to maintain self-consistency even in the face of
evidence to the contrary is a known psychological phenomenon that’s received fairly
extensive study, and is another of the psychological self-defence mechanisms that
allows us to function (although it unfortunately plays right into the hands of those
who have learned to exploit it).
An example of how people can rationalise even totally random, unconnected events
was demonstrated in an experiment carried out at the University of Strathclyde in
Security and Rationality
59
which researchers created “inexplicable” situations by combining descriptions of
totally unrelated events like “Kenneth made his way to a shop that sold TV sets.
Celia had recently had her ears pierced”. Participants had ten seconds to create
plausible scenarios for these unrelated, totally random events, and managed to do so
in 71% of cases. When the sentences were slightly modified to contain a common
referent (so the previous example would become “Celia made her way to a shop that
sold TV sets. She had recently had her ears pierced”), this increased to 86%, with
typical explanations such as one that Celia was wearing new earrings and wanted to
see herself on closed-circuit TV or that she had won a bet by having her ears pierced
and was going to spend the money on a new TV set [134].
Nature may abhor a vacuum, but humans abhor disorder, and will quite readily see
apparent order in random patterns. The remarkable ability of humans to not only see
(apparent) order but to persist in this belief even when presented with proof that what
they’re seeing is just random noise has been illustrated by Cornell and Stanford
researchers in their study of “hot hands” in basketball. A “hot hand” is the belief that
basketball players, after making a good shot, will be able to stretch this performance
out to a series of further good shots, and conversely that after they’ve “gone cold”
they’ll have problems making their next few shots.
Based on a detailed analysis of players’ shooting records, they showed that hits and
misses in shots were more or less random, with no evidence of “hot hands” or any
other phenomenon [135][136]. What was remarkable about this study wasn’t so
much the actual results but people’s reactions to being presented with them. Their
initial response was that their beliefs were valid and the data wasn’t. Since the data
was the team’s own shooting records, the next explanation was that the researchers
idea of a “hot hand” was different from theirs. The authors of the study had however
accounted for this possibility by interviewing a large number of basketball fans
beforehand to ensure that their work reflected the consensus from fans on what
constituted a “hot hand”.
The next attempt was to claim that other factors were at play, for example that the hot
hand was being masked by phenomena that worked in the opposite direction (exactly
how humans were supposed to be able to see through these masking phenomena
when careful statistical analysis couldn’t was never explained).
The researchers ran further experiments to test the various objections and again found
that none were valid. After exploring all possible objections to their results, the
researchers took them to professional basketball coaches… who dismissed them out
of hand. For example the Boston Celtics coach’s assessment of the results was “Who
is this guy? So he makes a study. I couldn’t care less” [137]. No matter how much
disconfirmatory evidence was presented, people persisted in seeing order where there
was none. Imposing patterns and meaning on everything around us may be useful in
predicting important events in the social world [138] but it’s at best misleading and at
worst dangerous when we start imagining links between events that are actually
independent.
The ability to mentally create order out of chaos (which is quite contrary to what
humans do physically, particularly the “children” subclass of humans) is a known
cognitive bias, in this case something called the clustering illusion. The term
“clustering illusion” actually comes from statistics and describes the phenomenon
whereby random distributions appear to have too many clusters of consecutive
outcomes of the same type [139]. You can see this yourself by flipping a coin twenty
fimes and recording the outcome. Can you see any unusual-looking runs of heads or
tails? Unless you’ve got a very unusual coin (or flipping technique, something that a
number of stage magicians have managed to master), what you’re seeing is purely
random data. In any series of twenty coin flips, there’s a 50/50 chance of getting four
consecutive heads or tails in a row, a 25% chance of five in a row, and a 10% chance
of six in a row. There’s no need to invent a “hot hand” (or coin) phenomenon to
explain this, it’s just random chance. (As part of their experiment the researchers
showed basketball fans a series of such random coin flips and told them that they
represented a player’s shooting record. The majority of the fans indicated that this
was proof of the existence of hot hands).
The Psychology of Security Usability
60
If you don’t have a coin handy, here’s a simple gedanken experiment that you can
perform to illustrate a variation of this phenomenon called the gambler’s fallacy. The
gambler’s fallacy is the belief that a run of bad luck must be balanced out at some
point by an equivalent run of good luck [140][141]. Consider a series of coin flips
(say five), of which every single one has come up heads. The flips are performed
with (to use a statistical term) a fair coin, meaning that there’s an exact 50/50 chance
of it coming up heads or tails. After five heads in a row, there should be a higher
probability of tails appearing in order to even things up.
This is how most people think, and it’s known as the gambler’s fallacy. Since it’s a
fair (unbiased) coin, the chance of getting heads on the next flip is still exactly 50/50,
no matter how many heads (or tails) it’s preceded by. If you think about it
emotionally, it’s clear that after a series of heads there should be a much higher
chance of getting tails. If you stop and reason through it rationally, it’s just as clear
that there’s no more chance of getting heads than tails. Depending on which
approach you take, it’s possible to flip-flop between the two points of view, seeing
first one and then the other alternative as the obvious answer.
The gambler’s fallacy is even more evident in the stock market, and provides an
ongoing source of material (and amusement) for psychologists. A huge number of
studies, far too many to go through here, have explored this effect in more detail
(applied psychology professor Keith Stanovich provides a good survey [89]), but
here’s a quick example of how you can turn this to your advantage.
There are large numbers of stock-market prediction newsletters that get sent out each
week or month, some free but most requiring payment for the stock tips (or at least
analysis) that they contain. In order to derive maximum revenue with minimum
effort, you need to convince people that your predictions are accurate and worth
paying for. The easiest way to do this is to buy a list of day traders from a spam
broker and spam out (say) 200,000 stock predictions, with half predicting that a
particular stock will rise and the other half predicting that it’ll fall. At the end of the
week (or whatever the prediction period is), send out your second newsletter to the
half of the 100,000 traders for which your prediction was accurate, again predicting a
rise for half and a fall for the other half. Once that’s done, repeat again for the 50,000
for which your prediction was accurate, and then for the next 25,000, and then for the
next 12,500. At this point you’ll have about 6,000 traders for which you’ve just made
five totally accurate, flawless stock predictions in a row.
Now charge them all $1,000 to read your next prediction.
It’s not just the (coincidental) “winners” in this process that this will work on.
There’s a bunch of psychological biases like confirmation bias (“he was right all the
other times, it must have been just a fluke”) and the endowment effect/sunk cost
fallacy (“we’ve come this far, no turning back now”) that will ensure that even the
losers will keep coming back for more, at least up to a point. It terms of return on
investment it’s a great way to make a return from the stock market for very little
money down, you’re merely offering no-strings-attached advice so it’s not fraud
(although you’d have to come up with some better method of drawing punters than
spamming them), and the people taking your advice probably aren’t that much worse
off than with any other prediction method they might have chosen to use.
Some cultures have evolved complex rituals that act to avoid problems like the
gambler’s fallacy. For example the Kantu farmers in Kalimantan, Borneo, use a
complex system of bird omens to select a location for a new garden. Since the
outcome of these omens is effectively random, it acts to diversify the crop and garden
types across members of the community, and provides some immunity against
periodic flooding by ensuring that garden locations aren’t fixed, for example because
“this location hasn’t flooded in the past five years”, or “this location flooded last year,
so it won’t be hit again this year” [142]. Even if the bird-omen selected site does get
flooded out that year, the amazing human ability to rationalise away anything means
that it’ll be blamed on the fact that the omen was read incorrectly and not because the
bird-omen system itself doesn’t work.
Security and Rationality
61
Rationalising Away Security Problems
The consequences of the human ability to rationalise almost anything were
demonstrated in a phishing study which found that users were able to explain away
almost any kind of site-misdirection with reasons like
www.ssl-yahoo.com being a
“subdirectory” of Yahoo!,
sign.travelocity.com.zaga-zaga.us being an
outsourcing site for
travelocity.com, the company running the site having to
register a different name from its brand because the name was already in use by
someone else, other sites using IP addresses instead of domain names so this IP-
address-only site must be OK, other sites using redirection to a different site so this
one must be OK, and other similar rationalisations, many taken from real-world
experience with legitimate sites [143].
An extreme example of the ability to rationalise anything was demonstrated in
various experiments carried out on medical patients who had had the physical
connection between their brain hemispheres severed in order to treat severe epileptic
attacks. After undergoing this procedure (in medical terms a corpus callosotomy, in
layman's terms a split brain), the left brain hemisphere was able to rationalise away
behaviour initiated by the right hemisphere even though it had no idea what the other
half of the brain was doing or why it was doing it [144] (although there has been
claimed evidence of limited communication between the brain halves via subcortical
connections, this seems to cover mostly processing of sensory input rather than higher
cognitive functions [145]).
In another famous (at least among psychologists) experiment a split-brain patient had
a picture of a snowy scene shown to the left eye (for which information is processed
by the right brain hemisphere) and a chicken claw to the right eye (for which
information is processed by the left brain hemisphere). He was then asked to pick out
matching pictures from a selection, and with his left hand chose a shovel (for the
snow) and with his right a chicken. When he was asked to explain the choice (speech
is controlled by the left brain hemisphere) he responded “Oh that’s simple, the
chicken claw goes with the chicken and you need a shovel to clean out the chicken
shed” [146].
This is a specialised instance of a phenomenon called illusory correlation in which
people believe that two variables are statistically related even though there’s no real
connection. In one early experiment into the phenomenon, researchers took pictures
drawn by people in a psychiatric institution and matched them at random to various
psychiatric symptoms. Subjects who examined the randomly-labelled pictures
reported all manner of special features in the various drawings that were indicative of
the symptom [147].
This remarkable ability to fill in the gaps isn’t limited to cognitive processes but
occurs in a variety of other human processes [148]. One of these occurs to correct the
problem that the human retina is wired up back-to-front, with the nerves and blood
vessels being in front of the light-sensitive cells that register an image rather than
behind them. Not only does this mess up the image quality, but it also leads to a
blind spot in the eye at the point where the nerves have to pas through the retina.
However, in practice we never notice the blind spot because our brains invent
something from the surrounding image details and use it to fill in the gap.
(If you want to annoy intelligent design advocates, you can mention to them that this
flaw in the human visual system doesn’t occur in cephalopods (creatures like octopi
and squid) in which the photoreceptors are located in the inner portion of the eye and
the optic nerves are located in the outer portion of the retina, meaning that either the
designer made a mistake or that humans aren’t the highest design form (an alternative
candidate to cephalopods would be birds, who have a structure called a pecten oculi
that eliminates most blood vessels from the retina, giving them the sharpest eyesight
of all. In either case though it’s not humans who have the best-designed visual
system)).
Another bugfix for flaws in the human vision system, bearing the marvellous name
“confabulation across saccades”, occurs when the brain smoothes over jerky eye
The Psychology of Security Usability
62
movements called saccades that our eyes are constantly performing even when we
think they’re focused steadily on a fixed point (the usual rate is about three per
second) [149]. Even when we’re simply shifting our gaze from one object to another,
the initial movement only brings us close to the target, and the brain has to produce a
corrective saccade to get us directly on target (the human body is in fact a huge mass
of kludges. If anyone ever decodes junk DNA it’ll probably turn out to be a pile of
TODO/FIXME/BUG comments). Since we’re effectively blind during a saccade, it’s
possible (using a carefully synchronised computer setup) to change portions of a
displayed image without the user noticing it, even if they’re warned of it in advance.
This also leads to a phenomenon called chronostasis that occurs when you look at a
clock and the second hand appears to be frozen for a moment before springing into
action. What’s happening there is that the mind is compensating for the saccade that
moved your vision to the clock by back-filling with the image that it had when the
saccade was over. If the saccade occurred while the second-hand was moving, the
movement is edited out by this backfilling process, and the second-hand appears to
have remained in the same position for longer than it actually did.
A similar effect occurs with hearing in a phenomenon called phonemic restoration.
Researchers have carried out experiments where they partially obliterated a word with
a cough and then used it in various sentences. Depending on the surrounding context,
participants “heard” completely different words because their minds had filled in the
obliterated detail so smoothly that they genuinely believed that they’d heard what
their minds were telling them [150][151]. A similar effect was achieved by replacing
the obliterated word not with a plausible human-derived covering noise but simply
with loud white noise [152]. The ability to edit together a coherent sentence from its
mangled form is so powerful that you can chop a recorded sentence into 25ms or
50ms slices and reverse each slice and people will still be able to understand it
(beyond 50ms it gets a bit more tricky).
Another example of the mind’s ability to transparently fix up problems occurs with a
synthesised speech form called sinewave speech, which is generated by using a
formant tracker to detect the formant frequencies in a normal speech and then
generating sinewaves that track the centres of these formants [153]. The first time
that you hear this type of synthesised speech, it sounds like an alien language. If you
then listen to the same message as normal speech, the brain’s speech-recognition
circuits are activated in a process known as perceptual insight, and from then on you
can understand the previously unintelligible sinewave speech. In fact no matter how
hard you try, you can no longer “unhear” what was previously unintelligible, alien
sounds. In another variant of this, it’s possible under the right stimuli of chaotic
surrounding sounds for the brain to create words and even phrases that aren’t actually
there as it tries to extract meaning from the surrounding cacophony [154].
As with a number of the other phenomena discussed in this chapter, self-deception
isn’t a bug but a psychological defence mechanism that’s required in order for
humans to function [155]. Contrary to the at the time widely held belief that
depression and similar emotional disorders stem from irrational thinking,
experimental psychologists in the 1970s determined that depressives have a
better
grasp of reality than non-depressives, a phenomenon known as depressive realism
[156][157][158]. In other words depressives suffer from a deficit in self-deception
rather than an excess. As a generalisation of this, high levels of self-deception are
strongly correlated with conventional notions of good mental health [159]. If the
self-deception is removed or undermined, various mental disorders may emerge.
Security through Rose-tinted Glasses
Emotions have a significant effect on human decision-making. Depressed people
tend to apply a fairly methodical, bottom-up data-driven strategy to problem solving,
while non-depressed people use a flexible top-down heuristic approach with a lot less
attention to detail, an effect that’s been demonstrated in numerous experiments
[160][161][162]. While negative emotions can reduce the effects of various cognitive
biases and lead to more realistic thinking, they also make it more difficult to retrieve
information relevant for solving the current problem and limit creativity [163][164].
Security and Rationality
63
In fact depressed people have been found to do a pretty good job of following the
decision-making process expected by SEU theory [165]. The neuropsychological
explanation for this is that increased dopamine levels (a neurotransmitter that, among
assorted other functions, is related to feelings of satisfaction and pleasure) improve
cognitive flexibility [166], and that entirely different brain structures are involved in
handling information when in a positive or negative mood [167].
When we’re in a positive mood we’re willing to take a few more risks (since we see a
benign situation that presents little danger to us) and apply new and unusual creative
solutions. Conversely, being in a negative mood triggers the primitive fight-or-flight
response (both depression and anxiety are linked to a deficiency in the
neurotransmitter serotonin), giving us a narrow focus of attention and discouraging
the use of potentially risky heuristics [168][169]. All of this doesn’t necessarily mean
that breaking up with your girlfriend just before you take your final exams is going to
turn you into a straight-A student. While the economic decision-making model may
help in solving some types of problems, it can significantly hinder in others
[170][171].
One of the portions of the brain that’s critical in the regulation of emotions is the
amygdala, a part of the primitive limbic system that’s involved in the processing of
emotions. People occasionally sustain brain injuries that affect the amygdala, either
disconnecting or otherwise disabling it so that emotions are severly curtailed. In
theory this disabling of the amygdala should lead to completely rational, logical
decision-making, a perfect execution of the economic decision model with all
emotional biases removed. In reality however people with this type of brain damage
tend to be very ineffective decision-makers because they’ve lost the emotion-driven
ability to make decisions based on what actually matters to them. In other words the
decision-makers don’t really know what they care about any more [172][173].
There’s a well-documented phenomenon in psychology in which people have an
unrealistically positive opinion of themselves, often totally unsupported by any actual
evidence. This phenomenon is sometimes known as the Lake Wobegon effect after
US humorist Garrison Keillor’s fictional community of the same name, in which “the
women are strong, the men are good-looking, and all the children are above average”.
In one survey of a million school students,
all of them considered themselves above
average in terms of their ability to get along with others, sixty percent considered
themselves to be in the top ten percent, and fully a quarter considered themselves in
the top one percent [174]. Looking beyond students, people in general consider
themselves to be above-average drivers [175], more intelligent than others [176], less
prejudiced [177], more fair-minded [178], and, no doubt, better at assessing their own
performance than others.
(It’s theoretically possible to justify at least the above-average driver rating by
fiddling with statistics, for example if we’re using as our measure the average number
of accidents and the distribution is skewed to the right ( a small number of bad drivers
push up the mean) then by this measure most drivers will indeed be above average,
but this is really just playing with statistics rather than getting to the root of the
problem — just because you haven’t ploughed into someone yet doesn’t make you a
good driver. In any case other factors like the IQ distribution are by definition
symmetric so it’s not possible to juggle the majority of people into the “above
average” category in this manner).
This need to see ourselves and our decisions in a positive light is why people see wins
as clear wins, but then explain away losses as near-wins (“it was bad luck”/the
quarterback got hurt during the game”/”the ground was too wet from the recent
rain”/...) rather than obvious losses. This self-delusion is perhaps generalised from a
psychological self-defence mechanism in which we take credit for our own successes,
but blame failures on others [179][180]. Students, athletes, academics, all credit
themselves for their successes, but blame failures on poor judging or refereeing
[181][182][183][184].
This again underlines the fact that (some level of) irrationality is a fundamental aspect
of human nature, and not something that you can “educate” users out of. In fact as
The Psychology of Security Usability
64
the psychology studies discussed above show, it can be quite detrimental to users to
suppress irrationality too far.
Mental Models of Security
Users have very poor mental models not only of security technology but also of
security threats. This is hardly surprising: they’re surrounded by myths and hoaxes
and have little means of distinguishing fact from fantasy. To give one widespread
example of this, nearly anything that goes wrong with a computer is caused by “a
virus”. If a program doesn’t run any more, if the user can’t find the file that they
saved last night, if the hard drive develops bad sectors, the cause is always the same:
“I think I’ve got a virus”. Since the commercial malware industry goes to great
lengths to make its product as undetectable as possible, if whatever’s happened is
significant enough that the user has noticed it then it’s almost certainly anything
but a
virus [185].
Not only do users have a poor idea of what the threats are, they have an equally poor
idea of the value of their defences. In one nationwide survey carried out in the US,
94% of users had anti-virus software installed, but only 87% were aware of this. In
case this sounds like a great result, half of the software had never been updated since
it was installed, rendering it effectively useless [186]. In any case with a failure rate
of up to 80%, even the up-to-date software wasn’t doing much good [187]
Other software didn’t fare much better. Although three quarters of respondents had a
(software) firewall installed, only two thirds of those actually had it enabled
(unfortunately the survey didn’t check to see how many of the firewalls that had
actually been enabled were configured to allow anything that wanted in or out).
Nearly half of users had no idea what the firewall actually did, with a mere 4%
claiming to fully understand its function.
Phishing fared just as badly, with more than half of all respondents not being able to
explain what it was, and fully a quarter never having heard the term before. Consider
that when your security user interface tells users that it’s doing something in order to
protect them from phishing attacks, only a quarter of users will actually know what
it’s talking about!
The high level of disconnect between geeks and the general public is demonstrated by
awareness of topics like large-scale data breaches and the endless problems of
computer voting machines (which are covered elsewhere). Although the typical
Slashdot-reading geek is intimately familiar with and endless succession of data
breach horror stories, to the average user they simply don’t exist [188]. Conversely,
there’s a great deal of concern about online stalkers [189], something that rates way
down the geek threat scale compared to things like viruses, trojan horses, worms,
phishing, DDoS attacks, and the marketing department’s ideas on how to run a web
server.
A final problem caused by the lack of specific knowledge of the threats out there is an
almost fatalistic acceptance of the view that the hacker will always get through
[124][189]. Even here though, the threat model is unrealistic: hackers get in by
breaking “128-bit encryption”, not by using a phishing attack or exploiting a buffer
overflow in a web browser.
In addition people tend to associate primarily with others who share their beliefs and
values, so the opportunity for corrective feedback is minimised, or when it does occur
it’s quickly negated. Numerous geeks have experienced the trauma of finally
convincing family members or neighbours to engage in some form of safe computing
practice only to find that they’ve reverted back to their old ways the next time they
see them because “Ethel from next door does it this way and she’s never had a virus”.
Even concepts like “secure site” (in the context of web browsing) are hopelessly
fuzzy. While security geeks will retreat into muttering about SSL and certificate
verification and encrypted transport channels, the average computer-literate user has
about as much chance of coming up with a clear definition of “secure site” as they
have for coming up with a definition for “Web 2.0”, and for nontechnical users the
Security and Rationality
65
definition is mostly circular: a secure site is one where it’s safe to enter your credit
card details. Typical user comments about what constitutes a “secure site” include
“I’m under the impression that with secure websites any personal information that I
may enter is only accessible to the company that I intend to provide the information
to”, “I think it means that the information I give to the website can’t be accessed by
anyone else. I hope that’s what it means”, and “I think secure Web sites use
encryption when sending information. But I am not sure what encryption really
means, and if certain people can still intercept that information and make use of it”
[190].
It’s not that users are universally unmotivated, it’s that they’re unmotivated to comply
with security measures that they don’t understand — passwords and smart cards
provide the same function, so why use the more complex one when the simpler one
will do just as well? Most users are in fact reasonably security-conscious
if they
understand the need for the security measures
[191]. As the section on theoretical vs.
effective security pointed out, users need to be able to understand the need for a
security measure in order to apply it appropriately.
Consider the case of 802.11 (in)security. After a German court in Hamburg found the
owner of an open (insecure) 802.11 LAN responsible for its misuse by a third party
(this was in response to a music industry lawsuit, so the legal angle is somewhat
skewed), one user complained in a letter to a computer magazine that “My WLAN is
open [insecure] in order to make it useful. Everyone who’s used a WLAN knows this
[...] misuse of a WLAN requires a considerable amount of criminal energy against
which I can’t defend myself, even if I use encryption” [192]. The magazine editors
responded that one mouse click was all the criminal energy that it took to misuse an
open 802.11 access point. This comment, coming from a typical user, indicates that
they both have no idea how easy it is to compromise an open 802.11 LAN, and no
idea that using encryption (at least in the form of WPA, not the broken WEP) could
improve the situation. In other words they didn’t want to apply security measures
because they had no idea that they were either necessary or useful.
Figure 17: Conditioning users to become victims of phishing attacks
Problems in motivating people to think about security also occur at the service
provider side. Most US banking sites are still using completely insecure, unprotected
logins to online banking services because they want to put advertising on their home
pages (low-interest home loans, pre-approved credit cards, and so on) and using SSL
to secure them would make their pages load more slowly than their competitors’.
This practice has been widely decried by security experts for years and has even been
warned about by browser vendors [193] without having any noticeable effect on the
banks’ security practices.
Browsers will actually warn users of this problem, but since the warning pops up
whenever they enter any information of any kind into their browser, and includes an
The Psychology of Security Usability
66
enabled-by-default “Don’t display this warning again” setting (see the earlier
discussion of this issue), the warning is long since disabled by the time the user gets
to their banking page [194].
Even more frighteningly, US financial institutions are actively training users to
become future victims of phishing attacks through messages such as the ones shown
in Figure 17 and Figure 18 (this practice is depressingly common, these two examples
are representative of a widespread practice). More recently, they’ve come up with a
new twist on this by training users to ignore HTTPS indicators in favour of easily-
spoofed (and completely ineffective) site images, a practice covered in more detail in
the section on usability testing. This illustrates that not only end-users but also large
organisations like financial institutions completely misunderstand the nature of SSL’s
certificate-based security model and what it’s supposed to achieve. This is
particularly problematic because surveys of users have found that they are more
likely to trust banks about security than other organisations because of a belief that
banks are more concerned about this [189].
One very detailed book on phishing and phishing counter-measures even includes a
chapter with screenshots illustrating all of the ways in which financial institutions
break their own security rules [195]. Examples include Bank of America email with
clickable links leading to what looks like a spoofed phishing site (the domain is
bankofamerica1 instead of the expected bankofamerica), a Capital One
email directing users to an even phishier-sounding site
capitalone.bfi0.com, a
Network Solutions email containing a call to action to update account details (another
phishing staple), an American Express email telling readers to click on camouflaged
links (
<a href="http://actualsite.com">-
http://expectedsite.com</a>
) to update their account information, and the
usual collection of banking sites running without SSL. The one for MoneyAccess is
particularly beautiful: It’s located at an (apparent) outsourcing site completely
unrelated to MoneyAccess, and the default login credentials that it asks for are your
credit card and social security number!
Figure 18: More user conditioning
In contrast, the use of un-secured online banking logins is almost unheard of outside
the US, when banks are more conscious of customer security. In some countries
there were concerted efforts by all banks to ensure that they had a single, consistent,
secure interface to all of their online banking services, although even there it
occasionally lead to intense debate over whether security should be allowed to
override advertising potential. When you’re planning your security measures, you
should be aware of the conflicting requirements that business and politics will throw
up, often requiring solutions at the business or political rather than the technological
level.
Security at Layers 8 and 9
Users are in general unmotivated and will often choose the path of least resistance
even if they know that it’s less secure. In other words, security is very rarely the
user’s priority, and if it gets in the way they’ll avoid it. For example, students at
Dartmouth College in the US preferred using passwords on public PCs even though
Security at Layers 8 and 9
67
far more (theoretically) secure USB tokens had been made freely available by the
college, because passwords were more convenient [196]. This aversion to the use of
crypto tokens like smart cards and USB tokens isn’t just due to user reticence though.
Usability evaluations of these devices have shown that people find them extremely
difficult to use, with smart card users taking more than twice as long as USB token
users to send a sample set of emails, creating
seven times the volume of tech support
calls, and making more than twice the number of errors in the process. A breakdown
of the problems encountered indicates that they’re mostly due to the poor usability of
the card and reader combination. Users accidentally unplugged the reader, inserted
cards upside-down, inserted them only partially (69% of errors were due to some
form of card insertion problem), and so on. Approximately half of all these errors
resulted in calls to tech support for help. In the final user evaluation after the trial had
been concluded, depicted in the pie chart in Figure 19, not one participant said that
they’d want to use a smart card security token [197].
Would you use a smart card as an
Internet security token?
No, 100%
Figure 19: Pie chart of smart card usability evaluation results
In addition to the actual usage problems, the user tests had to make use of devices that
had been pre-installed and tested by IT administrators, since requiring the users to
install the readers themselves would quite likely have halted the testing at that point
for many participants — one driver evaluation test across a range of device vendors
found drivers that disabled laptop power management, stalled the PC while waiting
for USB activity or alternatively stalled the PC until some timeout value (30s or 45s)
was exceeded, disabled other USB and/or serial devices on the system, performed
constant CPU-intensive device polling that drained laptop batteries, and so on [198].
While requiring that users install the devices themselves couldn’t have lowered the
final score (since it was already at 0%), it would probably have prevented much of
the user evaluation from even taking place.
In hindsight this result seems rather obvious. When smart cards were first proposed
in the 1960s, the idea of putting a computer into a credit card was straight from
science fiction (the microprocessor hadn’t even been invented at the time). Apart
from a vague plan to use them to replace mag-stripe cards, no-one really thought
about what you’d do with the cards when it became possible to realise them. When
they did finally turn up, people were presented with something rather less capable
than the most primitive 1970s kids home computer, no display capabilities, no input
capabilities, no onboard power or clock, in fact nothing that you’d need for an
effective security token. So while smart cards have found niche markets in prepay
micropayment applications areas like prepay phones, bus fares, and pay TV, they
don’t provide any usable, or even useful solution to common computer security
problems.
The smart card result can be generalised to state that users dislike being forced to use
a particular interface, with one Gartner group survey finding that although users
claimed that they wanted more security, when it came down to it they really wanted
to stick with passwords rather than going to more (theoretically) secure solutions like
smart cards and RSA keys [199]. The usability problems of this type of token have
lead to some ingenious efforts to make them more convenient for everyday use. The
SecurID fob camera, in which a user got tired of having to have a SecurID token on
him and “solved” the usability problem by placing it under a webcam that he could
access via any web browser, is one such example [200]. More recently this approach
has been extended with OCR software, completely removing the human from the
loop [201]. Extending this a step further, the ultimate user-friendly authentication
The Psychology of Security Usability
68
token would be one with a built-in web server that allows its output to be
automatically retrieved by anything needing authentication information. Although
this is meant as a tongue-in-cheek observation, this is more or less the approach used
by single-sign-on initiatives like OpenID, with security consequences that have
already been discussed.
User Involvement
An earlier section warned of the dangers of requiring users to make decisions about
things that they don’t care about. Users won’t pay much attention to a security user
interface feature unless it’s a part of the critical action sequence, or in plain English
an integral part of what they’re doing. This is why almost no-one bothers to check
site certificates on web sites, because it’s not an essential part of what they’re trying
to accomplish, which is to use the web site. If a user is trying to perform task A and
an unexpected dialog box B pops up (Figure 20), they aren’t going to stop and
carefully consider B. Instead, they’re going to find the quickest way to get rid of B so
that they can get back to doing their intended task A (Figure 21).
Figure 20: What the developers wrote
This is reflected in studies of the effectiveness of security user interface elements in
web browsers. Carried out on a cross-section of university-educated computer users
who were aware in advance (via the study’s consent form) that they were taking part
in a security usability evaluation study, it found that all of the standard browser
security indicators were incapable of effectively communicating the security states to
the user: 65% ignored the padlock, 59% paid no attention to the
https:// in the
address bar, 77% didn’t notice the Firefox address bar SSL indicator (and of the few
who did notice it, only two users actually understood its significance), and when
presented with an invalid-certificate warning dialog, 68% immediately clicked ‘OK’
without reading the dialog. Of the total number of users in the study, just one single
user was able to explain what they’d done when they clicked on the dialog [98].
This phenomenon isn’t confined just to browser security indicators. In real life as in
online life, few people ever read warnings even though they may claim that they do.
One study of warning labels placed conspicuously on (potentially) dangerous
products found that although 97% of subjects claimed to have read the warning
labels, only 23% actually did [202].
Security at Layers 8 and 9
69
Figure 21: What the user sees
Another study into the effectiveness of browser security mechanisms found that not a
single user checked the certificate when deciding whether a site was secure or not
[203]. Other studies that examined users’ abilities to detect non-SSL-protected or
spoofed sites found similar results: browser security indicators simply don’t work,
with one study of experienced computer users finding that only 18% of them could
correctly identify an unprotected (no SSL vs. SSL-protected) site, concluding that
“current browser indicators are not sufficient for security” [47].
A typical user’s comment on the browser security indicators if found in an online
discussion of certificate security: “I am an end user and I don’t know what any of the
stuff that comes up in the boxes means. I knew about the lock meaning it’s supposed
to be secure, but I didn’t realize how easy it was get that [buy or self-sign a
certificate]. Also, I hadn’t realized that the address bar changing color has to do with
secure sites” [204]. Even hardcore geeks can’t figure it out. As programmer and
human factors specialist Jeff Atwood puts it, “none of that makes any sense to me,
and I’m a programmer. Imagine the poor end user trying to make heads or tails of
this” [205].
There maybe an even deeper problem underlying all of this: many users, including
ones who appear to be quite well-informed about technology like HTTP and
SSL/TLS, seem to be unaware that SSL/TLS provides server authentication (!!)
[190]. So it’s not just that people don’t notice security indicators like the padlock or
don’t know what they signify, they aren’t even aware of what the underlying security
mechanism does! As the authors of the survey that revealed this conclude, “the
evidence suggests that respondents were unaware of the benefits (or importance) or
server authentication in communicating with secure sites, including many
respondents who demonstrated detailed technical knowledge of at least some aspects
of the SSL/TLS protocol”.
An extreme example of the click, whirr response occurs with EULAs (End-User
License Agreements for software), which no-one ever reads because all that they do
is stall progress when setting up an application. Usability researchers have performed
an experiment in which they expended considerable effort to make the EULA easier
to read, but found that this didn’t help because users still didn’t read it [206].
Spyware and malware developers take advantage of this fact when they install their
malware on a PC with the user’s “permission”. Probably the best approach to this
problem is the EULAlyzer, a scanner that scans EULAs for trigger words and phrases
and alerts the user if any are present [207]. The fact that EULAs have become an
arms race between vendors’ lawyers and users is an indication of just how
dysfunctional this mechanism really is.
Copyright notices at the start of a videotape or DVD run into the same problem as
EULAs, with users either fast-forwarding through them on their VCRs or ignoring
The Psychology of Security Usability
70
them after film studios forced DVD player manufacturers to disable fast-forward
while the copyright notice was being displayed. Film enthusiasts will go so far as to
re-master DVDs just to get rid of the annoying messages that interrupt their
enjoyment of the film that they’ve bought. Users want to see a film (or run an
application), and reading a legal notice is just an impediment to doing this. For
example in the EULA study, typical user feedback was “No matter what you do,
eventually I’m going to ignore it and install the software anyway” [206]. Just how
pernicious this issue is was illustrated by Google security researcher Niels Provos,
who explained that when Google warned users about malware-infected pages when
displaying search results, 30-40% of users ignored the warning and clicked through to
the infected site (the interface was later changed to require manually cutting and
pasting the link in order to visit the site) [208].
Figure 22: I can see the dancing bunnies!
This phenomenon, illustrated in Figure 16, is known to user interface developers as
the “dancing bunnies problem”, based on an earlier observation that “given a choice
between dancing pigs and security, users will pick dancing pigs every time” [209].
The dancing bunnies problem is a phishing-specific restatement observing that users
will do whatever it takes to see the dancing bunnies that an email message is telling
them about [210]. In one phishing study, nearly half of the users who fell victim to
phishing sites said that they were concentrating on getting their job done rather than
monitoring security indicators, with several noting that although they noticed some of
the security warnings, they had to take some risks in order to get the job done [211].
Something similar happened during usability testing of a password-manager plugin
for the Firefox browser, users simply gave up trying to use the password manager
rather than looking to the documentation for help [212].
The ‘Simon Says’ Problem
Related to this issue is what usability researcher Ka-Ping Yee has called the “Simon
Says problem”. In the children’s game of the same name, users are expected to do
what a leader tells them when they precede the order with “Simon says...”, but to
change their behaviour in the absence of the “Simon says” phrase. In other words
users are expected to react to the
absence of a stimulus rather than its presence,
something that anyone who’s ever played the game can confirm is very difficult.
This problem is well-known to social psychologists, who note that it’s one of the
things that differentiate novices from experts — an expert will notice the absence of a
Security at Layers 8 and 9
71
particular cue while a novice won’t, because they don’t know what’s supposed to
happen and therefore don’t appreciate the significance of something when it doesn’t
happen.
Psychologists have known about the inability of humans to react to the absence of
stimuli for some time. In one experiment carried out more than a quarter of a century
ago, participants were shown sets of trigrams (groups of three letters) and told that
one of them was special. After seeing 34 sets of trigrams on average, they were able
to figure out that the special feature in the trigram was that it contained the letter T.
When this condition was reversed and the special trigram lacked the letter T, no-one
was ever able to figure this out, no matter how many trigrams they saw [213]. In
other words they were totally unable to detect the absence of a certain stimulus.
Unfortunately this lack of something happening is exactly what web browsers expect
users to respond to: a tiny padlock indicates that SSL security is in effect, but the
absence of a padlock indicates that there’s a problem.
Another contributing factor towards the Simon Says problem is the fact that people
find negative information far more difficult to process than positive information
[214][215]. This problem is well known among educational psychologists, who
advise educators against using negative wording in teaching because information is
learned as a series of positively-worded truths, not a collection of non-facts and false
statements [216]. Consider the following propositional calculus problem:
If today is not Wednesday then it is not a public holiday.
Today is not a public holiday.
Is today not Wednesday? Research has shown that people find negative-information
problems like this much harder to evaluate than positive-information ones (“If today
is Wednesday ...”), and are far more likely to get it wrong. Now compare this to the
problem presented by browser security indicators, “If the padlock is not showing then
the security is not present”. This is that very problem form that psychological
research tells us is the hardest for people to deal with!
Contributing to the problem is the fact that the invisibly secured (via SSL) web
browser looks almost identical to the completely unsecured one, making it easy for
the user to overlook. In technical terms, the Hamming weight of the security
indicators is close to zero. This has been confirmed in numerous studies. For
example one study, which went to some trouble to be as realistic as possible by
having users use their own accounts and passwords and giving them browser security
training beforehand, reported a 100% failure rate for browser HTTPS indicators —
not one user noticed that they were missing [217]. Another example of an indicator
with insufficient Hamming weight is the small warning strip that was added to
Internet Explorer 6 SP2, with one usability test on experienced users and developers
finding that no-one had noticed its presence [218]. Another test that examined the
usability of password managers found that no-one noticed the fact that the password
manager changed the background colour of password fields to indicate that the
password had been secured [212].
The Psychology of Security Usability
72
Figure 23: Vista UAC dialog
Another (informal) evaluation of Windows Vista’s (much-maligned) User Account
Control (UAC) dialog, of which an example is shown in Figure 23, found that not one
user noticed that the UAC dialog title had different colours in different situations
[219], let alone knowing what it was that the different colours signified (this dialog is
another gem from the “Ken Thompson’s car” school of user interface design).
Neither the official Microsoft overview of UAC in Microsoft Technet [220], nor
popular alternative information sources like Wikipedia [221] even document (at the
time of writing) the existence of these colour differences, let alone indicating what
they mean. It requires digging deep down into an extremely long and geeky
discussion of UAC to find that a red title means that the application is blocked by
Windows Group Policy, blue/green means that it’s a Vista administrative application,
grey means that it’s an Authenticode signed application, and yellow means that it’s
not signed [222]. Bonus points if you can explain the significance of those
distinctions.
This problem isn’t confined solely to security indicators. In one user test, the status
bar on the spreadsheet application being tested would flash the message “There is a
$50 bill taped to the bottom of your chair. Take it!”. After a full day of user testing,
not one user had claimed the bill [223]. A better-known example of the phenomenon,
which has been used in a number of pop-psychology TV programs, was demonstrated
by 2004 Ig Nobel prize winners Daniel Simons and Christopher Chabris in a 1999
experiment in which test subjects were asked to observe a video of a people playing
basketball in front of three elevator doors. Halfway through the video, a tall woman
carrying an umbrella or a person dressed in a gorilla suit (both obviously non-players)
walked across the scene. Only 54% of the test subjects noticed [224].
Security Indicators and Inattentional Blindness
The phenomenon whereby people are unable to perceive unexpected objects is known
as inattentional blindness. This occurs because humans have a deficit of something
called “attention”. The exact nature of attention isn’t too well understood yet [225],
but whatever it is we don’t have enough of it to go around. Since attention is needed
in order to spot change and is strongly tied to the motion signals that accompany the
change, a lack of motion and/or a swamping of the signals results in an inability to
spot the change. To see this effect in nature, think of a predator slowly and
cautiously stalking their prey and only risking drawing attention through a burst of
speed right at the end when it’s too late for the victim to go anything.
One very common situation in which this occurs is on the road, where drivers are
looking for other cars and (on some streets) pedestrians, but are unable to register the
presence of unexpected objects. Cyclists and motorbike riders were all too familiar
with this problem decades before it even had a name because they found that they
were more or less invisible to the drivers that they shared the roads with. A simple
change to a motorbike such as mounting a pair of driving lights relatively far apart on
Security at Layers 8 and 9
73
a bike can greatly improve your “visibility” to drivers (at the expense of making your
bike look really ugly) because now you match the visual pattern “car” rather than the
invisible “not-a-car”. The first major work to explore this area concluded that “there
is no conscious perception without attention” [226]. If people aren’t specifically
looking for something like a security indicator then most of them won’t see it when it
appears.
Over several million years of human evolution, we have learned to focus our attention
on what’s important to us (things like imminent danger) and filter out irrelevant
details. Human perception therefore acts to focus us on important details and
prevents us from being distracted by irrelevant (or irrelevant-seeming) noise [227].
Over time, humans have learned to instinctively recognise obvious danger indicators
like snakes, flashing red lights, and used-car salesmen, and can react automatically to
them without having to stop and think about it. Psychologists have found that
subjects who have never even seen something like a snake before are still
instinctively afraid of it the first time that they’re shown one. Having your
application flash up a photo of a cobra about to strike probably isn’t a good idea
though.
On the other hand people pay scant attention to the lack of a padlock because it’s both
unobvious and because it’s never been something that’s associated with danger.
After all, why would a computer allow them to simply go ahead and perform a
dangerous operation? Would someone build a house in which the power was carried
by exposed copper wiring along the walls, with a little lightning-bolt icon down at
ground level to warn users of the danger of electrocution? If they did, how long
would they stay in business?
It’s not just humans that have had problems adapting to modern times. Animals like
sheep will run in a straight line in front of a car (rather than ducking to the side to
escape harm) because they know that by ducking aside they’ll be presenting their
vulnerable flank to the predator that’s chasing them. Kangaroos will actually leap
directly in front of a speeding car for the same reason, and the less said about the
maladaptive behaviour of the hedgehog, the better.
Even the more obvious indicators like the security toolbars that are available as
various forms of browser plugin have little additional value when it comes to securing
users (and that’s assuming that the toolbars themselves aren’t the source of security
holes [228]). A study of the effectiveness of a range of these toolbars on university-
educated users who had been informed in advance that they were taking part in a
phishing study (informed consent is an ethical requirement in studies on human
subjects) found that an average of 39% of users were fooled by phishing sites across
the entire range of toolbars [229]. Without this advance warning the figures would be
far worse, both because users wouldn’t specifically be on the lookout for phishing
attacks and more importantly because most users wouldn’t notice the toolbars and if
they did would have had little idea what they signified.
So is inattentional blindness merely accelerated forgetting (we register something at
some level but don’t retain it, so-called inattentional amnesia), or are we truly blind?
This is an interesting question because experiments with other types of attention have
shown that we often register things even when we’re not consciously aware of it
[230][231][232] (although in general research on unconscious perception is
somewhat controversial and so far, rather inconclusive. Note in particular that the
more outrageous claims that have been made about unconscious perception,
specifically what’s popularly known as “subliminal messages”, are pure
pseudoscience. The only connection they have with psychology is the type that the
marketers of the material are using on a gullible public). When it comes to
inattentional blindness though, we really are blind: functional magnetic resonance
imaging (fMRI) experiments have shown that when attention is occupied with
another task there’s no brain activity whatsoever arising from the new stimulus [233].
In other words we really are totally blind (or deaf, or whatever senses the stimulus
isn’t engaging) in this situation, at least as far as higher-level brain activity is
concerned.
The Psychology of Security Usability
74
(fMRI is a wonderful thing. If you’re a guy then the next time your SO bugs you
about playing too much Halo 3 tell her that the portions of male brains associated
with reward and addiction are more likely to be activated by video games than female
brains, and it’s really not your fault [234]. Hopefully a similar result for beer will be
forthcoming).
User Education, and Why it Doesn’t Work
Don’t rely on user education to try and solve problems with your security user
interface. More than a century ago Thomas Jefferson may have been able to state that
“if we think [people] not enlightened enough to exercise their control with
wholesome discretion, the remedy is not to take it from them, but to inform their
discretion by education” [235], but in today’s computer security is simply too
complicated, and the motivation for most users to learn its intricacies too low for this
strategy to ever work. Even just the basic task of communicating the information to
the user in a meaningful manner is complex enough to fill entire books [236]. As
earlier portions of this section have pointed out, this is just not something that the
human mind is particularly well set-up to deal with.
Nobody wants to read instruction manuals, even if they’re in the form of pop-up
dialogs. Studies of real-world users have shown that they just aren’t interested in
having to figure out how an application works in order to use it. Furthermore, many
concepts in computer security are just too complex for anyone but a small subset of
hardcore geeks to understand. For example one usability study in which technology-
savvy university students were given 2-3 page explanations of PKI technology (as it
applied to SSL) found that none of them could understand it, and that was after
reading a long explanation of how it worked, a point that the typical user would never
even get to [237]. Before the PGP fans leap on this as another example of X.509’s
unusability, it should be mentioned that PGP, which a mere 10% of users could
understand, fared little better.
These results have been confirmed again and again by experiments and studies across
the globe. For example one two-year trial in Italy, which tried to carefully explain the
security principles involved to its users, received feedback like “please remove all
these comments about digital certificates etc., just write in the first page ‘protected by
128bit SSL’ as everybody else does” [238].
This lack of desire and inability to understand applies even more to something where
the benefits are as nebulous as providing security, as opposed to something concrete
like removing red-eye from a photograph. When confronted with a user interface,
people tend to scan some of the text and then click on the first reasonable option, a
technique called satisficing that allows users to find a solution that both satisfies and
suffices (this is a variation of the singular evaluation approach that we encountered
earlier). As a result, they don’t stop to try and figure out how things work, they just
muddle through [239]. The French have formalised this process under the name “le
systéme D”, where the D stands for “se débrouiller”, meaning “to muddle through”.
In addition to applying systéme D, users don’t really appear to mind how many times
they click (at least up to a point), as long as each click is an unambiguous, mindless
choice [240]. People don’t make optimal choices, they satisfice, and only resort to
reading instructions after they’ve failed at several attempts to muddle through.
Unfortunately when working with computer user interfaces we can’t employ standard
approaches to dealing with these sorts of operator errors. In standard scenarios where
errors are an issue (the canonical example being operating a nuclear reactor or an
aircraft), we can use pre-selection screening (taking into account supposedly
representative indices like school grades), applicant screening (application exams,
psychological screening, and so on), and job training (both before the user begins
their job, and continuous assessment as they work). Such processes however aren’t
possible for the majority of cases that involve computer use. In effect we’re dealing
with vast hordes of totally untrained, often totally unsuitable (by conventional
selection methods) operators of equipment whose misuse can have serious
consequences for themselves, and occasionally others.
User Education, and Why it Doesn’t Work
75
Even such mildly palliative measures as trying to avoid making critical decisions in
the early hours of the morning (when more errors occur than at other times of the
day) [241] aren’t possible because we have no control over when users will be at their
computers. No conventional human-centred error management techniques such as
user screening and training, which have evolved over decades of industry practice,
are really applicable to computer use, because in most cases we have no control over
the users or the environment in which they’re operating.
Attackers will then take advantage of the complexity of the user interface, lack of
user understanding, and user satisficing, to sidestep security measures. For example
when users, after several years of effort, finally learned that clicking on random email
attachments was dangerous, attackers made sure that the messages appeared to come
from colleagues, friends, trading partners, or family (going through a user’s address
book and sending a copy of itself to all of their contacts is a standard malware tactic).
For example AOL reported that in 2005 six of the top ten spam subject lines fell into
this category [242], completely defeating the “Don’t click on attachments from
someone you don’t know” conditioning. In addition to this problem, a modern
electronic office simply can’t function without users clicking on attachments from
colleagues and trading partners, rendering years of user education effort mostly
useless.
A better use of the time and effort required for user education would have been to
concentrate on making the types of documents that are sent as attachments purely
passive and unable to cause any action on the destination machine. A generalisation
of this problem is that we have Turing machines everywhere — in the pursuit of
extensibility, everything from Word documents to web site URLs has been turned
into a programming language (there’s even a standards group that manages the
creation of such embedded Turing machines [243][244]). You can’t even trust
hardcopy any more, since it’s a trivial task to use the programmability of printer
languages like Postscript to have the screen display one thing (for example a payment
value of $1,000) and the printout display another ($10,000 or $100, depending on
which way you want to go) [245].
Since many of these embedded Turing machines don’t look anything like
programming languages, it’s very difficult to disable or even detect their use. A
better alternative to trying to screen them would be to only allow them to be run in a
special least-privileges context from which they couldn’t cause any damage, or a
variety of other basic security measures dating back to the 1960s and 70s. For
example most operating systems provide a means of dropping privileges, allowing the
attachment to be viewed in a context in which it’s incapable of causing any damage.
A large amount of work exists in this area, with approaches that range from
straightforward application wrappers through to system-call filtering, in-kernel access
interception and monitoring, and specialised operating system designs in which each
application (or data object) is treated as its own sub-user with its own privileges and
permissions [246].
Unfortunately current practice seems to be moving in exactly the opposite direction, a
recent example being Windows Vista’s Sidebar, whose only possibly security setting
for scripts is “full access” (other settings are theoretically possible but not supported),
and which serves arbitrary third-party scripts/gadgets from a Microsoft official web
site, a sure recipe for disaster once Vista becomes widespread enough for malware
authors to specifically target it.
Figure 24: A typical security dialog translated into plain language
The Psychology of Security Usability
76
Another reason why user education doesn’t work is that it’s often used as a catch-all
for problems that are too hard for the security application developer to solve: “If a
problem is too complicated to solve easily, we’ll make it a user education issue, and
then it’s someone else’s problem”. Any dialog that asks a question phrased
something like “There may or may not be something dangerous ahead, do you want
to continue?” is an example of an instance where the application developer has
simply given up (see Figure 24). Interaction designer Alan Cooper calls this
“uninformed consent”— all the power of the application’s security mechanisms is
now being controlled by a single user judgement call [223]. By offloading this
responsibility, the user will still fall head-first down the mine-shaft, but now it’s their
fault and not the developer’s.
HCI researchers label this use of dialogs warn-and-continue (WC), acknowledging
the fact that the majority of users will dismiss the dialog and continue anyway. The
user’s handling of such confirmation dialogs has been characterised as “Yes, yes, yes,
yes, oh dear” [247]. While dropping security decisions into a WC may satisfy the
application developer, it does little to protect the user. This “not-my-problem”
approach to handling responsibility for security decisions was illustrated in one study
into the effectiveness of browser security which found that “users expect the browser
to make such trust decisions correctly; however browser vendors do not accept this
responsibility, and expect users to make the ultimate trust decision” [47]. As a result,
no-one took responsibility for (in this case) trusting keys and certificates, since both
sides assumed that it was the other side’s problem and that they therefore didn’t have
to concern themselves with it. Psychology professor James Reason, whose specialty
is the breakdown of complex technological systems, calls such design flaws latent
pathogens, problems that aren’t discovered until the user has fallen victim to them
[248].
Another motivation for the proliferation of warning dialogs has been suggested by a
Mozilla developer, who reports them as being “a chronicle of indecision within the
walls of Netscape. Every option, confirmation window, and question to the user
marks another case where two internal camps couldn’t agree on the most secure way
to proceed and instead deferred to the user’s decision” [249]. Although developers
are usually quite capable of shooting users in the foot without outside assistance, this
degree of bureaucratic indecision can’t have helped.
Firefox developers discovered via feedback from users that the users actually saw
through this deception, recognising the warning dialogs as “intentionally obfuscated
warnings that companies can point to later and say ‘Look, we warned you!’ [249].
Since the intent of security mechanisms is to gain the user’s trust, exposing them to
what are obviously weasel-words designed to pin them blame on them seems rather
counterproductive. As Microsoft usability researcher Chris Nodder admits, “security
dialogs present dilemmas, not decisions” [250].
Attacks against the user interface are getting better and better as attackers gain more
experience in this area. As these attacks evolve, they’re tested in the world’s largest
usability testing lab (the real world), with ones that succeed being developed further
and ones that fail being dropped (compare this to general-purpose software, where
buggy and hard-to-use software often persists for years because the same
evolutionary pressures don’t exist). Usability researchers have actually found that
their work makes them much better at attacking users, because by studying security
usability they’re able to easily defeat the (often totally inadequate) security user
interface in applications. Just as spammers have employed professional linguists to
help them to get around spam filters and phishers have employed psychology
graduates to help them scam victims, so it’s only a matter of time before attackers use
user interface research against poorly-designed security applications. As one study
into the effectiveness of phishing puts it, “None of these [papers proposing security
mechanisms] consider that these indicators of trust may be spoofed and that the very
guidelines that are developed for legitimate organisations can also be adopted by
phishers” [98]. Don’t assume that some sort of user education can make a complex
user interface provide security — it’ll only work until the bad guys use its complexity
against it, or a new crop of non-educated (for that particular interface) users appears.
User Education, and Why it Doesn’t Work
77
Only a small number of real-world evaluations of the effectiveness of user education
have been performed to date, and the outcomes have been discouraging. In one
evaluation of the effectiveness of trying to educate users about phishing, researchers
discovered that the education attempts made no difference in users’ ability to detect
phishing email. What it did do was scare them into rejecting more phishing emails,
but also rejecting proportionately more non-phishing emails (the same thing
happened in the false-web-site detection tests discussed earlier). The ratio of rejected
phishing emails to non-phishing emails was identical before and after the
“education”, the only thing that had changed was users’ fear-based rejection threshold
for any email at all [251]. While fear-based marketing has long been a staple of the
security industry (see the discussion of people’s fears of losing something in the next
section for why this is so effective), this may be the first experiment that reveals that
in some cases fear is the sole effect of trying to inform people of security issues.
These results are quickly explained by psychological research into the effectiveness
of fear-based appeals. These types of appeals have been studied extensively in the
two fields of medicine (where the work is mostly theoretical) and marketing (where
it’s applied practically with great enthusiasm). The two main requirements for an
effective fear-based appeal are that the target must be convinced that this is a serious
problem that affects them, and that they can avoid it by taking some specific action
[252][253][254][255]. While it’s not hard to convince someone that spam, viruses,
phishing, and assorted other Internet bogeymen are a very real threat, the best
palliative measure that most users are aware of is the extremely vague “Run some
anti-virus software” (not even up-to-date antivirus software, something that came free
with their Dell PC five years ago and that expired four years ago is fine). So while
the fear-based appeal is half effective because it grabs the user’s attention, the lack of
any obvious ways to deal with the fear means that it manifests itself mostly through
maladaptive behaviour and inappropriate responses.
Other education attempts have fared even worse. In the EV certificate evaluation
discussed earlier, users actually performed worse after they’d been “educated”
because they were inadvertently being trained to rely on the wrong security
indicators, and as other earlier discussions have pointed out, US banks have a proud
tradition of mis-educating users into insecure behaviour. Outside the direct security
context, widely-used applications like Facebook are also busy training users to do the
wrong thing security-wise [256]. Against this level of competition, security
education has little chance.
A more succinct summary of the fallacy of user education as a solution to the
problem has been offered by anti-virus researcher Vesselin Bontchev: “If user
education was going to work, it would have worked by now” [257].
References
[1] “Adaptive Thinking: Rationality in the Real World”, Gerd Gigerenzer, Oxford
University Press, 2000.
[2] “The Psychology of Decision Making (second edition)”, Lee Roy Beach and
Terry Connolly, Sage Publications, 2005.
[3] “Decision making in complex systems”, Baruch Fischhoff,
Proceedings of the
NATO Advanced Study Institute on Intelligent Decision Support on Intelligent
Decision Support in Process Environments
, Springer-Verlag, 1986, p.61.
[4] “Theory of Games and Economic Behaviour”, John von Neumann and Oskar
Morgenstern, Princeton University Press, 1944.
[5] “Emotion and Reason: The Cognitive Neuroscience of Decision Making”, Alain
Berthoz, Oxford University Press, 2006.
[6] “Models of Man : Social and Rational”, Herbert Simon, John Wiley and Sons,
1957.
[7] “Reason in Human Affairs”, Herbert Simon, Stanford University Press, 1983.
[8] “Decision Analysis and Behavioural Research”, Detlof von Winterfeldt and
Ward Edwards, Cambridge University Press, 1986.
The Psychology of Security Usability
78
[9] “Judgement in Managerial Decision Making (4th ed)”, Max Bazerman, Wiley
and Sons, 1997.
[10] “The Fiction of Optimisation”, Gary Klein, in “Bounded Rationality: The
Adaptive Toolbox”, MIT Press, 2001, p.103.
[11] “Human Memory: An Adaptive Perspective”, John Anderson and Robert
Milson,
Psychological Review, Vol.96, No.4 (October 1989), p.703.
[12] “Bounded Rationality in Macroeconomics”, Thomas Sargent, Oxford University
Press, 1993.
[13] “Rethinking Rationality”, Gerd Gigerenzer and Reinhard Selten, in “Bounded
Rationality: The Adaptive Toolbox”, MIT Press, 2001, p.1.
[14] “Fault Trees: Sensitivity of Estimated Failure Probabilities to Problem
Representation”, Baruch Fischhoff, Paul Slovic, and Sarah Lichtenstein,
Journal
of Experimental Psychology: Human Perception and Performance
, Vol.4, No.2
(May 1978), p.330.
[15] “Recognition-primed decisions”, Gary Klein, in “Advances in Man-Machine
Systems Research 5”, JAI Press, 1989, p.47.
[16] “A recognition-primed decision (RPD) model of rapid decision making”, Gary
Klein, in “Decision making in action: Models and Methods”, Ablex Publishing,
1993, p.138.
[17] “Irrationality: The Enemy Within”, Stuart Sutherland, Penguin Books, 1992.
[18] “Sources of Power: How People Make Decisions”, Gary Klein, MIT Press,
1998.
[19] “Die Logik des Mißlingens. Strategisches Denken in komplexen Situationen”,
Dietrich Dörner, rowohlt Verlag, 2003.
[20] “When Do People Use Simple Heuristics and How Can We Tell?”, Jörg
Rieskamp and Ulrich Hoffrage, in “Simple Heuristics that Make Us Smart”,
Oxford University Press, 1999.
[21] “Is There Evidence for an Adaptive Toolbox”, Abdolkarim Sadrieh, Werner
Güth, Peter Hammerstein, Stevan Harnard, Ulrich Hoffrage, Bettina Kuon,
Bertrand Munier, Peter Todd, Massimo Warglien, and Martin Weber, in
“Bounded Rationality: The Adaptive Toolbox”, MIT Press, 2001, p.83.
[22] “The Complete Problem Solver (2
nd
ed)”, John Hayes, Lawrence Erlbaum, 1989.
[23] “The Ideal Problem Solver: A Guide to Improving Thinking, Learning, and
Creativity (2
nd
ed)”, John Bransford and Barry Stein, Worth Publishers, 1993.
[24] “Choice Under Conflict: The Dynamics of Deferred Decision”, Amos Tversky
and Eldar Shafir,
Psychological Science, Vol.3, No.6 (1992), p.358.
[25] “Contingent Weighting in Judgment and Choice”, Amos Tversky, Shmuel
Sattath, and Paul Slovic, in “Choices, Values, and Frames”, Cambridge
University Press, 2000, p.503.
[26] “The disjunction effect in choice under uncertainty”, Amos Tversky and Eldar
Shafir,
Psychological Science, Vol.3, No.5 (September 1992), p.261.
[27] “Consumer Preference for a No-Choice Option”, Ravi Dhar,
Journal of
Consumer Research
, Vol.24, No.2 (September 1997), p.215.
[28] “Elastic Justification: How Unjustifiable Factors Influence Judgments”,
Christopher Hsee,
Organizational Behavior and Human Decision Processes,
Vol.66, No.1 (1996), p.122.
[29] “Elastic Justification: How Tempting but Task-Irrelevant Factors Influence
Decisions”, Christopher Hsee,
Organizational Behavior and Human Decision
Processes
, Vol.62, No.3 (1995), p.330.
[30] “Insights about Insightful Problem Solving”, Janet Davidson, in “The
Psychology of Problem Solving”, Cambridge University Press, 2003, p.149.
[31] “Characteristics of Skilled Option Generation in Chess”, Gary Klein, S. Wolf,
Laura Militello, and Carolyn Zsambok,
Organizational Behavior and Human
Decision Processes
, Vol.62, No.1 (April 1995), p.63.
[32] “Metacognitive Aspects of Reading Comprehension: Studying Understanding in
Legal Case Analysis”, Mary Lundeberg,
Reading Research Quarterly, Vol.22,
No.4 (Autumn 1987), p.407.
[33] “Problem Solving”, Alan Lesgold, “The Psychology of Human Thought”,
Cambridge University Press, 1988, p.188.
User Education, and Why it Doesn’t Work
79
[34] “Problem finding and teacher experience”, Michael Moore, Journal of Creative
Behavior
, Vol.24, No.1 (1990), p.39.
[35] “Motivating Self-Regulated Problem Solvers”, Barry Zimmerman and Magda
Campillo, in “The Psychology of Problem Solving”, Cambridge University
Press, 2003, p.233.
[36] “The psychology of experts: An alternative view”, James Shanteau, in
“Expertise and Decision Support”, Plenum Press, 1992, p.11.
[37] “Environmental load and the allocation of attention”, Sheldon Cohen, Advances
in Environmental Psychology: Vol I — The Urban Environment: John Wiley &
Sons, 1978, p.1.
[38] “Decision making under stress: scanning of alternatives under controllable and
uncontrollable threats”, Giora Keinan,
Journal of Personality and Social
Psychology
, Vol.52, No.3 (March 1987), p.639.
[39] “‘Information Load’ and Consumers”, Debra Scammon, Journal of Consumer
Research: An Interdisciplinary Quarterly, Vol.4, Issue 3, 1977, p.148.
[40] “On Leaping to Conclusions When Feeling Tired: Mental Fatigue Effects on
Impressional Primacy”, Donna Webster, Linda Richter, and Arie Kruglanski,
Journal of Experimental Social Psychology, Vol.32, No.2 (March 1996), p.181.
[41] “The Unsafe Sky”, William Norris, Norton, 1982.
[42] “How we Reason”, Philip Johnson-Laird, Oxford University Press, 2006.
[43] “Some experiments on the recognition of speech with one and two ears”,
E.C.Cherry,
Journal of the Acoustic Society of America, Vol.25, No.5 (May
1953), p.975.
[44] “Some Philosophical Problems from the Standpoint of Artificial Intelligence”,
John McCarthy and Patrick Hayes, in “Machine Intelligence 4”, Edinburgh
University Press, 1969, p.463.
[45] “Recognizing, Defining, and Representing Problems”, Jean Pretz, Adam Naples,
and Robert Sternberg, in “The Psychology of Problem Solving”, Cambridge
University Press, 2003, p.3.
[46] “Why we Believe what we Believe”, Andrew Newberg and Mark Waldman,
Free Press, 2006.
[47] “Security and Identification Indicators for Browsers against Spoofing and
Phishing Attacks”, Amir Herzberg and Ahmad Jbara, Cryptology ePrint
Archive,
http://eprint.iacr.org/2004/, 2004.
[48] “Heuristics and Reasoning: Making Deduction Simple”, Maxwell Roberts, in
“The Nature of Reasoning”, Cambridge University Press, 2004, p.234.
[49] “Beyond intuition and instinct blindness: toward an evolutionarily rigorous
cognitive science”, Leda Cosmides and John Tooby,
Cognition. Vol.50, No.1-3
(April-June 1994), p.41.
[50] “The Adapted Mind: Evolutionary Psychology and the Generation of Culture”,
Jerome Barkow, Leda Cosmides, and John Tooby (eds), Oxford University
Press, 1995.
[51] “Evolutionary Psychology: The New Science of the Mind”, David Buss, Allyn
and Bacon, 1998.
[52] “Human Evolutionary Psychology”, Louise Barrett, Robin Dunbar, and John
Lycett, Princeton University Press, 2002.
[53] “The Evolution of Reasoning”, Denise Dellarosa Cummins, in “The Nature of
Reasoning”, Cambridge University Press, 2004, p.273.
[54] “Oxford Handbook of Evolutionary Psychology”, Robin Dunbar and Louise
Barrett (eds), Oxford University Press, 2007.
[55] “Human Error: Cause, Prediction, and Reduction”, John Senders and Neville
Moray, Lawrence Baum Associates, 1991.
[56] “Are humans good intuitive statisticians after all? Rethinking some conclusions
from the literature on judgment under uncertainty”, Leda Cosmides and John
Tooby,
Cognition, Vol.58, No.1 (January 1996), p.1.
[57] “The Adaptive Decision Maker”, John Payne, James Bettman, and Eric Johnson,
Cambridge University Press, 1993.
[58] “Betting on One Good Reason: The Take The Best Heuristic”, Gerd Gigerenzer
and Daniel Goldstein, in “Simple Heuristics that Make Us Smart”, Oxford
University Press, 1999, p.75.
The Psychology of Security Usability
80
[59] “How Good are Simple Heuristics”, Jean Czerlinski, Gerd Gigerenzer, and
Daniel Goldberg, in “Simple Heuristics that Make Us Smart”, Oxford University
Press, 1999, p.97.
[60] “The Recognition Heuristic: How Ignorance Makes us Smart”, Daniel Goldstein
and Gerd Gigerenzer, in “Simple Heuristics that Make Us Smart”, Oxford
University Press, 1999, p.37.
[61] “Accuracy and frugality in a tour of environments”, Jean Czerlinski, Gerd
Gigerenzer, and Daniel Goldstein, in “Simple Heuristics That Make Us Smart”,
Gerd Gigerenzer, Peter Todd, and ABC Research Group (eds), Oxford
University Press, p.59.
[62] “Cognitive Heuristics: Reasoning the Fast and Frugal Way”, Barnaby Marsh,
Peter Todd, and Gerd Gigerenzer, in “The Nature of Reasoning”, Cambridge
University Press, 2004, p.273.
[63] “Bayesian Benchmarks for Fast and Frugal Heuristics”, Laura Martignon and
Kathryn Laskey, in “Simple Heuristics that Make Us Smart”, Oxford University
Press, 1999, p.169.
[64] “Strategies in sentential reasoning” Jean Baptiste Van der Henst, Yingrui Yang,
and Philip Johnson-Laird,
Cognitive Science, Vol.26, No.4 (July-August 2002),
p.425.
[65] “Controlled and Automatic Human Information Processing: 1. Detection,
Search, and Attention”, Walter Schneider and Richard Shiffrin,
Psychological
Review
, Vol.84, No.1 (January 1977), p.1.
[66] “Controlled & automatic processing: behavior, theory, and biological
mechanisms”, Walter Schneider and Jason Chein,
Cognitive Science, Vol.27,
No.3 (May/June 2003), p.525.
[67] “Attention to Action: Willed and automatic control of behaviour”, Donald
Norman and Tim Shallice, in “Consciousness and Self-Regulation: Advances in
Research and Theory”, Plenum Press, 1986, p.1.
[68] “A Connectionist/Control Architecture for Working Memory”, Mark Detweiler
and Walter Schneider, in “The Psychology of Learning and Motivation:
Advances in Research and Theory”,
Vol.21, Academic Press, 1987, p.54.
[69] “On the Control of Automatic Processes: A Parallel Distributed Processing
Account of the Stroop Effect”, Jonathan Cohen, Kevin Dunbar, and James
McClelland,
Psychological Review, Vol.97, No.3 (July 1990), p.332.
[70] “Human Error”, James Reason, Cambridge University Press, 1990.
[71] “Hare Brain, Tortoise Mind: How Intelligence Increases When You Think
Less”, Guy Claxton, Fourth Estate, 1997.
[72] “Listening to one of two synchronous messages”, Donald Broadbent,
Journal of
Experimental Psychology
, Vol.44, No.1 (July 1952), p.51.
[73] “Perception and Communication”, Donald Broadbent, Pergamon Press, 1958.
[74] “Dual-task Interference and Elementary Mental Mechanisms”, Harold Pashler,in
“Attention and Performance XIV: Synergies in Experimental Psychology,
Artificial Intelligence, and Cognitive Neuroscience”, MIT Press, 1993, p.245.
[75] “The Psychology of Attention (2nd ed)”, Elizabeth Styles, Psychology Press,
2006.
[76] “Human memory: A proposed system and its control processes”, Richard
Atkinson and Richard Shiffrin, in “The Psychology of learning and motivation:
Advances in research and theory (vol. 2)”, Academic Press, 1968, p.89.
[77] “On Human Memory: Evolution, Progress, and Reflections on the 30
th
Anniversary of the Atkinson-Shiffrin Model”, Chizuko Izawa (ed), Lawrence
Erlbaum Associates, 1999.
[78] “When Paying Attention Becomes Counterproductive: Impact of Divided Versus
Skill-Focused Attention on Novice and Experienced Performance of
Sensorimotor Skills”, Sian Beilock, Thomas Carr, Clare MacMahon, and Janet
Starkes,
Journal of Experimental Psychology: Applied, Vol.8, No.1 (March
2002), p.6.
[79] “Distinguishing Unconscious from Conscious Emotional Processes:
Methodological Considerations and Theoretical Implications”, Arne Öhman, in
“Handbook of Cognition and Emotion”, John Wiley and Sons, 1999, p.321.
User Education, and Why it Doesn’t Work
81
[80] “More than 450 Phishing Attacks Used SSL in 2005”, Rich Miller, 28 December
2005,
http://news.netcraft.com/archives/2005/12/28/-
more_than_450_phishing_attacks_used_ssl_in_2005.html
.
[81] “Cardholders targetted by Phishing attack using visa-secure.com”, Paul Mutton,
8 October 2005,
http://news.netcraft.com/archives/2004/10/08/-
cardholders_targetted_by_phishing_attack_using_visasecurecom.html
.
[82] “BrainLog, August 21, 2003”, Dan Sanderson,
http://www.dansanderson.com/blog/archives/2003/08/-
clarification_t.php
[83] “Judgement under uncertainty: Heuristics and biases”, Amos Tversky and
Daniel Kahneman,
Science, Vol.185, Issue 4157 (27 September 1974), p.1124.
[84] “Judgment under Uncertainty: Heuristics and Biases”, Daniel Kahneman, Paul
Slovic, and Amos Tversky, Cambridge University Press, 1982.
[85] “Critical Thinking Skills in Tactical Decision Making: A Model and A Training
Strategy”, Marvin Cohen, Jared Freeman, and Bryan Thompson, in “Making
Decisions Under Stress: Implications for Individual and Team Training”,
American Psychological Association (APA), 1998, p.155.
[86] “The new organon and related writings”, Francis Bacon, Liberal Arts Press,
1960 (originally published in 1620).
[87] “On the failure to eliminate hypotheses in a conceptual task”, Peter Wason,
Quarterly Journal of Experimental Psychology, Vol.12, No.4 (1960) p.129.
[88] “Cognitive Ability and Variation in Selection Task Performance”, Keith
Stanovich and Richard West,
Thinking & Reasoning, Vol.4, No.3 (1 July 1998),
p.193.
[89] “The Fundamental Computational Biases of Human Cognition: Heuristics That
(Sometimes) Impair Decision Making and Problem Solving”, Keith Stanovich,
in “The Psychology of Problem Solving”, Cambridge University Press, 2003,
p.291.
[90] “Thinking and Reasoning”, Philip Johnson-Laird and Peter Wason, Penguin,
1968.
[91] “Confirmation Bias: A Ubiquitous Phenomenon in Many Guises”, Raymond
Nickerson,
Review of General Psychology, Vol.2, Issue 2 (June 1998), p.175.
[92] “The Cambridge Handbook of Thinking and Reasoning”, Keith Holyoak and
Robert Morrison (eds), Cambridge University Press, 2005.
[93] “Recent Research on Selective Exposure to Information”, Dieter Frey,
Advances
in Experimental Social Psychology
, Vol.19, 1986, Academic Press, p.41.
[94] “Selection of Information after Receiving more or Less Reliable Self-
Threatening Information”, Dieter Frey and Dagmar Stahlberg,
Personality and
Social Psychology Bulletin
, Vol.12, No.4 (December 1986), p.434.
[95] “Biased Assimilation and Attitude Polarization: The effects of Prior Theories on
Subsequently Considered Evidence”, Charles Lord, Lee Ross, and Mark Lepper,
Journal of Personality and Social Psychology, Vol.37, No.11 (November 1979),
p.2098.
[96] “The Influence of Prior Beliefs on Scientific Judgments of Evidence Quality”
Jonathan Koehler,
Organizational Behavior and Human Decision Processes.
Vol.56, Issue 1 (October 1993), p.28.
[97] “Psychological Defense: Contemporary Theory and Research”, D.Paulhus,
B.Fridhandler, and S.Hayes, Handbook of Personality Psychology, Academic
Press, p.543-579.
[98] “Why Phishing Works”, Rachna Dhamija, J.D.Tygar, and Marti Hearst,
Proceedings of the Conference on Human Factors in Computing Systems
(CHI’06)
, April 2006, p.581.
[99] “On the Conflict Between Logic and Belief in Syllogistic Reasoning”, J.Evans,
J.Barston, and P.Pollard,
Memory and Cognition, Vol.11, No.3 (May 1983),
p.295.
[100] “The Bias Blind Spot: Perceptions of Bias in Self Versus Others”, Emily
Pronin, Daniel Lin, and Lee Ross,
Personality and Social Psychology Bulletin,
Vol.28, No.3 (March 2002), p.369.
The Psychology of Security Usability
82
[101] “Peering into the bias blindspot: People’s Assessments of Bias in
Themselves and Others”, Joyce Ehrlinger, Thomas Gilovich, and Lee Ross,
Personality and Social Psychology Bulletin, Vol.31, No.5 (May 2005), p.680.
[102] “Psychology of Intelligence Analysis”, Richards Heuer, Jr, Center for the
Study of Intelligence, Central Intelligence Agency, 1999.
[103] “Legacy of Ashes: The History of the CIA”, Tim Weiner, Doubleday, 2007.
[104] “Does Personality Matter? An Analysis of Code-Review Ability”,
Alessandra Devito da Cunha and David Greathead,
Communications of the
ACM
, Vol.50, No.5 (May 2007), p.109.
[105] “Defender Personality Traits”, Tara Whalen and Carrie Gates, Dalhousie
University Technical Report CS-2006-01, 10 January 2006.
[106] “A Guide to the Development and Use of the Myers-Briggs Type Indicator”,
Isabel Briggs Myers and Mary McCaulley, Consulting Psychologists Press,
1985.
[107] “Essentials of Myers-Briggs Type Indicator Assessment”, Naomi Quenk,
Wiley 1999.
[108] “Psychology (7
th
ed)”, David Myers, Worth Publishers, 2004.
[109] “Adaptive Thinking: Rationality in the Real World”, Gerd Gigerenzer,
Oxford University Press, 2000.
[110] “Immediate Deduction between Quantified Sentences”, Guy Politzer, in
“Lines of thinking: Reflections on the psychology of thought”, John Wiley &
Sons, 1990, p.85.
[111] “On the interpretation of syllogisms”, Ian Begg and Grant Harris,
Journal of
Verbal Learning and Verbal Behaviour
, Vol.21, No.5 (October 1982), p.595.
[112] “Interpretational Errors in Syllogistic Reasoning”, Stephen Newstead,
Journal of Memory and Language, Vol.28, No.1 (February 1989), p.78.
[113] “Are conjunction rule violations the result of conversational rule
violations?”, Guy Politzer and Ira Noveck,
Journal of Psycholinguistic
Research
, Vol.20, No.2 (March 1991), p.83.
[114] “Task Understanding”, Vittorio Girotto, in “The Nature of Reasoning”,
Cambridge University Press, 2004, p.103.
[115] “Influence: Science and Practice”, Robert Cialdini, Allyn and Bacon, 2001.
[116] “Perseverance in self perception and social perception: Biased attributional
processes in the debriefing paradigm”, Lee Ross, Mark Lepper, and Michael
Hubbard,
Journal of Personality and Social Psychology, Vol.32, No.5
(November 1975), p.880.
[117] “Human Inferences: Strategies and Shortcomings of Social Judgment”,
Richard Nisbett and Lee Ross, Prentice-Hall, 1980.
[118] “Graphology - a total write-off”, Barry Beyerstein, in “Tall Tales about the
Mind and Brain: Separating Fact from Fiction”, p.265.
[119] “The Fallacy of Personal Validation: A classroom Demonstration of
Gullibility”, Bertram Forer,
Journal of Abnormal Psychology, Vol.44 (1949),
p.118.
[120] “The ‘Barnum Effect’ in Personality Assessment: A Review of the
Literature”, D.Dickson, and I.Kelly,
Psychological Reports. Vol.57, No.2
(October 1985), p.367.
[121] “Talking with the dead, communicating with the future and other myths
created by cold reading”, Ray Hyman, in “Tall Tales about the Mind and Brain:
Separating Fact from Fiction”, p.218.
[122] “Three Men in a Boat”, Jerome K. Jerome, 1889.
[123] “Human Error”, James Reason, Cambridge University Press, 1990.
[124] “Pretty good persuasion: : a first step towards effective password security in
the real world”, Dirk Weirich and Angela Sasse,
Proceedings of the 2001 New
Security Paradigms Workshop (NSPW’01)
, September 2001, p.137.
[125] “The default answer to every dialog box is ‘Cancel’”, Raymond Chen,
http://blogs.msdn.com/oldnewthing/archive/2003/09/01/54734.aspx, 1
September 2003.
[126] “XP Automatic Update Nagging”, Jeff Atwood, 13 May 2005,
http://www.codinghorror.com/blog/archives/000294.html.
[127] “Irrationality: The Enemy Within”, Stuart Sutherland, Penguin Books, 1992.
User Education, and Why it Doesn’t Work
83
[128] “Norton must die!”, ‘GFree’, http://ask.slashdot.org/comments.pl?-
sid=205872&cid=16791238
, 10 November 2006.
[129] “Why can't I disable the Cancel button in a wizard?”, Raymond Chen,
http://blogs.msdn.com/oldnewthing/archive/2006/02/24/538655.aspx, 24
February 2006.
[130] “The Belief Engine”, James Alcock,
The Skeptical Enquirer, Vol.19, No.3
(May/June 1995), p.255.
[131] “Why we Lie”, David Livingstone Smith, St.Martin’s Press, 2004.
[132] “Irrationality: Why We Don’t Think Straight!”, Stuart Sutherland, Rutgers
University Press, 1994.
[133] “Human Inference: Strategies and shortcomings of social judgement”,
Richard Nisbett and Lee Ross, Prentice-Hall, 1985.
[134] “Mental Models and Reasoning”, Philip Johnson-Laird, in “The Nature of
Reasoning”, Cambridge University Press, 2004, p.169.
[135] “The Hot Hand in Basketball: On the Misperception of Random
Sequences”, Thomas Gilovich, Robert Allone, and Amos Tversky,
Cognitive
Psychology
, Vol.17, No.3 (July 1985), p.295.
[136] “The Cold Facts about the ‘Hot Hand’ in Basketball”, Amos Tversky and
Thomas Gilovich, in “Cognitive Psychology: Key Readings”, Psychology Press,
2004, p.643.
[137] “How we Know what Isn’t So”, Thomas Gilovich, The Free Press, 1991.
[138] “Interactional Biases in Human Thinking”, Stephen Levinson, in “Social
Intelligence and Interaction: Expressions and Implications of the Social Bias in
Human Intelligence”, Cambridge University Press, 1995, p.221.
[139] “The perception of randomness”, Ruma Falk,
Proceedings of the Fifth
Conference of the International Group for the Psychology of Mathematics
Education (PME5)
, 1981, p.64.
[140] “The social function of intellect”, Nicholas Humphrey, in “Growing Points
in Ethology”, Cambridge University Press, 1976, p.303.
[141] “Shared Mental Models: Ideologies and Institutions”, Arthur Denzau and
Douglass North,
Kyklos, Vol.47, No.1 (1994), p.3.
[142] “Uncertainty, humility, and adaptation in the tropical forest: the agricultural
augury of the Kantu”, Michael Dove,
Ethnology, Vol.32, No.2 (Spring 1993),
p.145.
[143] “Do Security Toolbars Actually Prevent Phishing Attacks”, Min Wu, Robert
Miller, and Simson Garfinkel,
Proceedings of the SIGCHI conference on Human
Factors in Computing Systems (CHI’06)
, April 2006, p.601.
[144] “The Social Brain: Discovering the Networks of the Mind”, Michael
Gazzaniga, Basic Books, 1987.
[145] “A new look at the human split brain”, Justine Sergent,
Brain: A Journal of
Neurology
, October 1987, p.1375.
[146] “Nature’s Mind: The Biological Roots of Thinking, Emotions, Sexuality,
Language, and Intelligence”, Michael Gazzaniga, Basic Books, 1994.
[147] “Genesis of popular but erroneous psychodiagnostic observations”, Loren
Chapman and Jean Chapman,
Journal of Abnormal Psychology, Vol.72, No.3
(June 1967), p.193.
[148] "Phantoms in the Brain: Probing the Mysteries of the Human Mind",
V.S.Ramachandran and Sandra Blakeslee, Harper Perennial, 1999.
[149] “Visual memory for natural scenes: Evidence from change detection and
visual search”, Andrew Hollingworth,
Visual Cognition, Vol.14, No.4-8
(August-December 2006), p.781.
[150] “Perceptual Restoration of Missing Speech Sounds”, Richard Warren,
Science, Volume 167, Issue 3917 (23 January 1970), p.392.
[151] “Auditory Perception: A New Analysis and Synthesis (2
nd
ed)”, Richard
Warren, Cambridge University Press, 1999.
[152] “Phonemic restoration: The brain creates missing speech sounds”, Makio
Kashino,
Acoustic Science and Technology (Journal of the Acoustical Society of
Japan)
, Vol.27, No.6 (2006), p.318.
The Psychology of Security Usability
84
[153] “Speech perception without traditional speech cues”, Robert Remez, Philip
Rubin, David Pisoni, Thomas Carrell,
Science, Vol.212, No.4497 (22 May
1981), p.947.
[154] “Phantom Words and other Curiosities”, Diana Deutsch,
http://www.philomel.com/.
[155] “Self-Deception and Emotional Coherence”, Baljinder Sahdra and Paul
Thagard, in “Hot Thought: Mechanisms and Applications of Emotional
Cognition”, MIT Press, p.219.
[156] “Judgment of contingency in depressed and nondepressed students: sadder
but wiser?”, Lyn Abramson and Lauren Alloy,
Journal of Experimental
Psychology (General)
, Vol.108, No.4 (December 1979), p.441.
[157] “Depression and pessimism for the future: biased use of statistically relevant
information in predictions for self versus others”, Anthony Ahrens and Lauren
Alloy,
Journal of Personality and Social Psychology, Vol.52, No.2 (February
1987), p.366.
[158] “The sad truth about depressive realism”, Lorraine Allan, Shepard Siegel,
and Samuel Hannah,
Quarterly Journal of Experimental Psychology, Vol.60,
No.3 (March 2007), p.482.
[159] “Illusion and Well-Being: A Social Psychological Perspective on Mental
Health”, Shelley Taylor and Jonathon Brown,
Psychological Bulletin, Vol.103,
No.2 (March 1988), p.193.
[160] “Cognitive Processes in Depression”, Lauren Alloy, Guilford Press, 1988.
[161] “The Role of Positive Affect in Syllogism Performance”, Jeffrey Melton,
Personality and Social Psychology Bulletin, Vol.21, No.8 (August 1995), p.788.
[162] “The Influence of Mood State on Judgment and Action: Effects on
Persuasion, Categorization, Social Justice, Person Perception, and Judgmental
Accuracy”, Robert Sinclair and Melvin Mark, in “The Construction of Social
Judgments”, Lawrence Erlbaum Associates, 1992, p.165.
[163] “Handbook of Cognition and Emotion”, Tim Dalgleish and Michael Power
(eds), John Wiley and Sons, 1999.
[164] “An Influence of Positive Affect on Decision Making in Complex
Situations: Theoretical Issues With Practical Implications”, Alice Isen,
Journal
of Consumer Psychology
, Vol.11, No.2 (2001), p.75.
[165] “The Effects of Mood on Individuals’ Use of Structured Decision
Protocols”, Kimberly Elsbach and Pamela Barr,
Organization Science, Vol.10,
No.2 (February 1999) p.181.
[166] “A neuropsychological theory of positive affect and its influence on
cognition”, F. Gregory Ashby, Alice Isen, and And U.Turken,
Psychological
Review
, Vol.106, No.3 (July 1999), p.529.
[167] “Emotional context modulates subsequent memory effect”, Susanne Erk,
Markus Kiefer, J.o Grothea, Arthur Wunderlich, Manfred Spitzer and Henrik
Walter,
NeuroImage, Vol.18, No.2 (February 2003), p.439.
[168] “Some Ways in Which Positive Affect Facilitates Decision Making and
Judgment”, Alice Isen and Aparna Labroo, in “Emerging Perspectives on
Judgment and Decision Research”, Cambridge University Press, 2003, p.365.
[169] “Positive affect facilitates creative problem solving”, Alice Isen, Kimberly
Daubman, and Gary Nowicki,
Journal of Personality and Social Psychology,
Vol.52, No.6 (June 1987), p.1122.
[170] “Affective Causes and Consequences of Social Information Processing”,
Gerald Clore, Norbert Schwarz, and Michael Conway, in “Handbook of Social
Cognition”, Lawrence Erlbaum Associates, 1994, p.323.
[171] “Feeling and Thinking: Implications for Problem Solving”, Norbert Schwarz
and Ian Skurnik, in “The Psychology of Problem Solving”, Cambridge
University Press, 2003, p.263.
[172] “Insensitivity to future consequences following damage to human prefrontal
cortex”, Antoine Bechara, Antonion Damasio, Hanna Damasio, and Steven
Anderson,
Cognition, Vol.50, No.1-3. (April-June 1994), p.7.
[173] “Descartes’ Error: Emotion, Reason, and the Human Brain”, Antonio
Damasio, Avon Books, 1994.
User Education, and Why it Doesn’t Work
85
[174] “Student Descriptive Questionnaire (SDQ)”, College Board Publications,
1976-1977.
[175] “Are we all less risky and more skilful than our fellow drivers?”, Ola
Svenson,
Acta Psychologica, Vol.47, No.2 (February 1981), p.143.
[176] “The Self-Concept, Volume 2: Theory and Research on Selected Topics”,
Ruth Wylie, University of Nebraska Press, 1979.
[177] “Public Beliefs About the Beliefs of the Public”, James Fields and Howard
Schuman,
The Public Opinion Quarterly, Vol.40, No.4 (Winter 1976-1977),
p.427.
[178] “Why we are fairer than others”, David Messick, Suzanne Bloom, Janet
Boldizar and Charles Samuelson,
Journal of Experimental Social Psychology,
Vol.21, No.5 (September 1985), p.407.
[179] “Self-serving biases in the attribution of causality: Fact or fiction?”, Dale
Miller and Michael Ross,
Psychological Bulletin, Vol.82, No.2 (March 1975),
p.213.
[180] “Evidence for a self-serving bias in the attribution of causality”, James
Larson Jr.,
Journal of Personality, Vol.45, No.3 (September 1977), p.430.
[181] “Locus of Control and Causal Attribution for Positive and Negative
Outcomes on University Examinations”, Timothy Gilmor and David Reid,
Journal of Research in Personality, Vol.13, No.2 (June 1979), p.154.
[182] “Why a Rejection? Causal Attribution of a Career Achievement Event”,
Mary Wiley, Kathleen Crittenden, and Laura Birg,
Social Psychology Quarterly,
Vol.42, No.3 (September 1979), p.214.
[183] “Attributions for Exam Performance”, Mark Davis and Walter Stephan,
Journal of Applied Social Psychology, Vol.10, No.3 (June 1980), p.191.
[184] “Attributions in the sports pages”, Richard Lau and Dan Russell,
Journal of
Personality and Social Psychology
, Vol.39, No.1 (July 1980), p.29.
[185] “The Commercial Malware Industry”, Peter Gutmann, talk at Defcon 15,
August 2007,
https://www.defcon.org/images/defcon-15/dc15-
presentations/dc-15-gutmann.pdf
.
[186] “McAfee-NCSA Online Safety Study”, National Cyber Security
Alliance/McAfee, October 2007,
http://staysafeonline.org/pdf/-
McAfee%20NCSA%20NewsWorthy%20Analysis_Final.pdf
.
[187] “Eighty percent of new malware defeats antivirus”, Munir Kotadia, 19 July
2006,
http://www.zdnet.com.au/news/security/soa/Eighty-percent-of-
new-malware-defeats-antivirus/0,130061744,139263949,00.htm
.
[188] “An Honest Man Has Nothing to Fear: User Perceptions on Web-based
Information Disclosure”, Gregory Conti and Edward Sobiesk,
Proceedings of
the Third Symposium on Usable Privacy and Security (SOUPS’07)
, July 2007,
p.112.
[189] “Security as a Practical Problem: Some Preliminary Observations of
Everyday Mental Models”, Paul Dourish, Jessica Delgado de la Flor, and
Melissa Joseph, Workshop on HCI and Security Systems, at the Conference on
Human-Computer Interaction (CHI’03), April 2003,
http://www.andrewpatrick.ca/CHI2003/HCISEC/hcisec-workshop-
dourish.pdf
.
[190] “User Perceptions of Privacy and Security on the Web”, Scott Flinn and
Joanna Lumsden,
Proceedings of the Third Annual Conference on Privacy,
Security and Trust (PST’05)
, October 2005, http://www.lib.unb.ca/-
Texts/PST/2005/pdf/flinn.pdf
.
[191] “Users are not the enemy”, Anne Adams and Martina Sasse,
Communications of the ACM, Vol.42, No.12 (December 1999), p.41.
[192] “Unverhältnismäßiges Urteil”, Ulf Kersing,
c’t Magazin für
Computertechnik
, 2 October 2006, p.11.
[193] “TLS and SSL in the real world”, Eric Lawrence, 20 April 2005,
http://blogs.msdn.com/ie/archive/2005/04/20/-
410240.aspx
.
[194] “SSL without a PKI”, Steve Myers, in “Phishing and Countermeasures”,
Markus Jakonsson and Steven Myers (eds), John Wiley and Sons, 2007.
The Psychology of Security Usability
86
[195] “Human-Centered Design Considerations”, Jeffrey Bardzell, Eli Blevis, and
Youn-Kyung Lim, in “Phishing and Countermeasures: Understanding the
Increasing Problem of Electronic Identity Theft”, John Wiley and Sons, 2007.
[196] “The TIPPI Point: Toward Trustworthy Interface”, Sara Sinclair and Sean
Smith,
IEEE Security and Privacy, Vol.3, No.4 (July/August 2005), p.68.
[197] “The Usability of Security Devices”, Ugo Piazzalunga, Paolo Salvaneschi,
and Paolo Coffetti, in “Security and Usability: Designing Secure Systems That
People Can Use”, O’Reilly, 2005, p.221.
[198] “Interoperabilitätstests von PKCS #11-Bibliotheken”, Matthias Bruestle,
December 2000.
[199] “Gartner: Consumers Dissatisfied with Online Security”, Paul Roberts, PC
World, December 2004.
[200] “FobCam”,
http://fob.webhop.net/.
[201] “Zufall unter Beobachtung”, Michael Schilli,
Linux Magazine, May 2007,
p.98.
[202] “Risk taking and accident causation”, Willem Wagenaar, in “Risk-taking
Behaviour”, John Wiley and Sons, 1992, p.257.
[203] “Gathering Evidence: Use of Visual Security Cues in Web Browsers”, Tara
Whalen and Kori Inkpen, Proceedings of the 2005 Conference on Graphics
Interface, 2005, p.137.
[204] User comment in “Digital Certificates: Do They Work?”, “Emily”, 1
January 2008,
http://www.codinghorror.com/blog/archives/001024.html.
[205] “Digital Certificates: Do They Work?”, Jeff Atwood, 20 December 2007,
http://www.codinghorror.com/blog/archives/001024.html?r=20357.
[206] “Stopping Spyware at the Gate: A User Study of Privacy, Notice and
Spyware”, Nathaniel Good, Rachna Dhamija, Jens Grossklags, David Thaw,
Steven Aronowitz, Deirdre Mulligan, and Joseph Konstan,
Proceedings of the
2005 Symposium on Usable Privacy and Security
, July 2005, p.43.
[207] “EULAlyzer”, Javacool Software,
http://www.javacoolsoftware.com/eulalyzer.html.
[208] “The Ghost In The Browser: Analysis of Web-based Malware”, Niels
Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang, and Nagendra
Modadugu, comments during a presentation at the
First Workshop on Hot
Topics in Understanding Botnets (HotBots'07)
, April 2007.
[209] “Securing Java”, Edward Felten and Gary McGraw, John Wiley and Sons,
1999.
[210] “Beware of the dancing bunnies”, Larry Osterman, 12 July 2005,
http://blogs.msdn.com/larryosterman/archive/2005/-
07/12/438284.aspx
.
[211] “Do Security Toolbars Actually Prevent Phishing Attacks”, Min Wu, Robert
Miller, and Simson Garfinkel,
Proceedings of the SIGCHI conference on Human
Factors in Computing Systems (CHI’06)
, April 2006, p.601.
[212] “A Usability Study and Critique of Two Password Managers”, Sonia
Chasson, Paul van Oorschot, and Robert Biddle,
Proceedings of the 15
th
Usenix
Security Symposium (Security’06)
, August 2006, p.1.
[213] “The Feature-Positive Effect in Adult Human Subjects”, Joseph Newman,
William Wolff, and Eliot Hearst,
Journal of Experimental Psychology: Human
Learning and Memory
, Vol.6, No.5 (September 1980), p.630.
[214] “Thinking and Reasoning”, Alan Garnham and Jane Oakhill, Blackwell
Publishing, 1994.
[215] “Thought and Knowledge: An Introduction to Critical Thinking (4
th
ed)”,
Diane Halpern, Lawrence Erlbaum Associates, 2002.
[216] “Handbook of Classroom Assessment: Learning, Achievement, and
Adjustment”, Gary Phye (ed), Academic Press Educational Psychology Series,
1996.
[217] “The Emperor’s New Security Indicators”, Stuart Schechter, Rachna
Dhamija, Andy Ozment, and Ian Fischer,
IEEE Symposium on Security and
Privacy
, May 2007, to appear.
User Education, and Why it Doesn’t Work
87
[218] “Better Website Identification and Extended Validation Certificates in IE7
and Other Browsers”, Rob Franco, 21 November 2005,
http://blogs.msdn.com/ie/archive/2005/11/21/495507.aspx.
[219] “Tricking Vista’s UAC To Hide Malware”, “kdawson”, 26 February 2007,
http://it.slashdot.org/article.pl?sid=07/02/26/-
0253206
.
[220] “User Account Control Overview”, 7 February 2007,
http://www.microsoft.com/technet/windowsvista/-
security/uacppr.mspx
.
[221] “User Account Control”,
http://en.wikipedia.org/wiki/-
User_Account_Control
.
[222] “Understanding and Configuring User Account Control in Windows Vista”,
http://www.microsoft.com/technet/windowsvista/-
library/00d04415-2b2f-422c-b70e-b18ff918c281.mspx
.
[223] “The Inmates Are Running the Asylum: Why High Tech Products Drive Us
Crazy and How To Restore The Sanity”, Alan Cooper, Sams, 1999.
[224] “Gorillas in our midst: sustained inattentional blindness for dynamic
events”, Dan Simons and Christopher Chabris,
Perception, Vol.28 (1999),
p.1059.
[225] “Cognitive Neuroscience of Attention”, Michael Posner (ed), Guilford
Press, 2004.
[226] “Inattentional Blindness”, Arien Mack and Irvin Rock, MIT Press, 1998.
[227] “How the Mind Works”, Steven Pinker, W.W.Norton and Company, 1997.
[228] “A Remote Vulnerability in Firefox Extensions”, Christopher Soghoian, 30
May 2007,
http://paranoia.dubfire.net/2007/05/remote-
vulnerability-in-firefox.html
.
[229] “A Usability Study and Critique of Two Password Managers”, Sonia
Chasson, Paul van Oorschot, and Robert Biddle,
Proceedings of the 15
th
Usenix
Security Symposium (Security’06)
, August 2006, p.1.
[230] “Information processing of visual stimuli in an ‘extinguished’ field”, Bruce
Volpe, Joseph Ledoux, and Michael Gazzaniga,
Nature, Vol.282, No.5740 (13
December 1979), p.722.
[231] “Unconscious activation of visual cortex in the damaged right hemisphere of
a parietal patient with extinction”, Geraint Rees, Ewa Wojciulik, Karen Clarke,
Masud Husain, Chris Frith and Jon Driver,
Brain, Vol.123, No.8 (August 2000),
p.1624.
[232] “Levels of processing during non-conscious perception: a critical review of
visual masking”, Sid Kouider and Stanislas Dehaene,
Philosophical
Transactions of the Royal Society B (Biological Sciences)
, Vol.362, No.1481 (29
May 2007), p.857.
[233] “Inattentional Blindness Versus Inattentional Amnesia for Fixated But
Ignored Words”, Geraint Rees, Charlotte Russell, Christopher Frith, and Jon
Driver,
Science, Vol.286, No.5449 (24 December 1999), p.2504.
[234] “Gender differences in the mesocorticolimbic system during computer
game-play”, Fumiko Hoeft, Christa Watson, Shelli Kesler, Keith Bettinger,
Allan Reiss,
Journal of Psychiatric Research, 2008 (to appear)
[235] “The Writings of Thomas Jefferson”, Thomas Jefferson and Henry
Washington, Taylor and Maury, 1854.
[236] “Calculated Risks”, Gerd Gigerenzer, Simon and Schuster, 2002.
[237] “Making Security Usable”, Alma Whitten, PhD thesis, Carnegie Mellon
University, May 2004.
[238] “Re: Intuitive cryptography that’s also practical and secure”, Andrea
Pasquinucci, posting to the cryptography@metzdowd.com mailing list, message-
ID
20070130203352.G[email protected]me, 30 January 2007.
[239] “Models of Man: Social and Rational”, Herbert Simon, Wiley and Sons,
1957.
[240] “Don’t Make Me Think : A Common Sense Approach to Web Usability”,
Steve Krug, New Riders Press, 2005.
The Psychology of Security Usability
88
[241] “Human Error: Cause, Prediction, and Reduction”, John Senders and
Neville Moray, Lawrence Baum Associates, 1991.
[242] “AOL Names Top Spam Subjects For 2005”, Antone Gonsalves,
Imformation Week TechWeb News, 28 December 2005,
http://www.informationweek.com/news/showArticle.jhtml?-
articleID=175701011
.
[243] Microformats,
http://microformats.org/wiki/Main_Page.
[244] “Microformats: Empowering your Markup for Web 2.0”, John Allsop,
Friends of Ed Press, 2007.
[245] “Vorgetäuscht: Böse Textdokumente
― Postscript gone wild”, Michael
Backes, Dominique Unruh, and Markus Duermuth, iX, September 2007, p.136.
[246] “Trusted Computing Platforms and Security Operating Systems”, Angelos
Keromytis, in “Phishing and Countermeasures: Understanding the Increasing
Problem of Electronic Identity Theft”, John Wiley and Sons, 2007.
[247] “Design Rules Based on Analyses of Human Error”, Donald Norman,
Communications of the ACM, Vol.26, No.4 (April 1983), p.255.
[248] “Human Error”, James Reason, Cambridge University Press, 1990.
[249] “Firefox and the Worry-Free Web”, Blake Ross, in “Security and Usability:
Designing Secure Systems That People Can Use”, O’Reilly, 2005, p.577.
[250] “Users and Trust: A Microsoft Case Study”, Chris Nodder, in “Security and
Usability: Designing Secure Systems That People Can Use”, O’Reilly, 2005,
p.589.
[251] “Phishing IQ Tests Measure Fear, Not Ability”, Vivek Anandpara, Andrew
Dingman, Markus Jakobsson, Debin Liu, and Heather Roinestad,
Usable
Security 2007 (USEC’07)
, February 2007,
http://usablesecurity.org/program.html.
[252] “Cognitive and physiological processes in fear appeals and attitude change:
A revised theory of protection motivation”, Ronald Rogers, in “Social
Psychophysiology: A Sourcebook”, Guildford Press, 1983, p.153.
[253] “The Protection Motivation Model: A Normative Model of Fear Appeals”
John Tanner, James Hunt, and David Eppright,
Journal of Marketing, Vol.55,
No.3 (July 1991), p.36.
[254] “Protection Motivation Theory”, Henk Boer and Erwin Seydel, in
“Predicting Health Behavior: Research and Practice with Social Cognition
Models”, Open University Press, 1996, p.95.
[255] “Putting the fear back into fear appeals: The extended parallel process
model”, Kim Witte,
Communication Monographs, Vol.59, No.4 (December
1992), p.329.
[256] “Stalking 2.0: privacy protection in a leading Social Networking Site”, Ian
Brown, Lilian Edwards and Chris Marsden, presentation at GikII 2: Law,
Technology and Popular Culture, London, September 2007,
http://www.law.ed.ac.uk/ahrc/gikii/docs2/edwards.pdf.
[257] Vesselin Bontchev, remarks during the “Where have all the OUTBREAKS
gone” panel session, Association of anti Virus Asia Researchers (AVAR) 2006
conference, Auckland, New Zealand, December 2006.
Ease of Use
89
Security Usability Design
Now that we’ve looked at all of the problems that need to be solved (or at least
addressed) in designing a security user interface, we can move on to the security
usability design process. The following sections look at various user interface design
issues and ways of addressing some of the problems mentioned in the previous
chapter.
Ease of Use
Users hate configuring things, especially complex security technology that they don’t
understand. One usability study of a PKI found that a group of highly technical users,
most with PhDs in computer science, took over two hours to set up a certificate for
their own use, and rated it as the most difficult computer task they’d ever been asked
to perform [1]. Even more, when they’d finished they had no idea what they’d just
done to their computers, with several commenting that had something gone wrong
they would have been unable to perform even basic troubleshooting, a problem that
had never encountered before.
In practice, security experts are
terrible at estimating how long a task will take for a
typical user. In the PKI usability study, other security researchers who reviewed the
paper had trouble believing the empirical results obtained because it couldn’t possibly
take users that long to obtain and configure a certificate (“I’m sorry but your facts just
don’t support our theory”). The researchers who set up the study had themselves
managed to complete the task in two-and-a-half minutes. The test users (who, as has
already been mentioned, had PhDs in computer science and were given screenshot-
by-screenshot paint-by-numbers instructions showing them what to do) took two
hours and twenty minutes. A more typical user, without a PhD and paint-by-numbers
instructions to guide them, has no hope of ever completing this task.
On the other hand the equipment vendors (who have direct contact with end users)
were under no illusions about the usability of PKI, expressing surprise that anyone
would take on the complexity of a PKI rather than just going with user names and
passwords. The assumption by the security experts was that if they could do it in ten
minutes then anyone could do it in ten minutes, when in fact a typical user may still
not be able to do it after ten hours. This is because users aren’t interested in finding
out how something works, they just want to use it to do their job. This is very hard
for techies, who are very interested in how things work, to understand [2].
Consumer research has revealed that the average user of a consumer electronics
device such as a VCR or cell phone will struggle with it for twenty minutes before
giving up [3]. Even the best-designed, simplest security mechanism requires more
effort to use than not using any security at all, and once we get to obscure
technologies like certificates, for which the perceived benefits are far less obvious
than for cell phones and VCRs, the user’s level of patience drops correspondingly
(even the two-and-a-half minutes required by seasoned experts is probably too long
for this task).
To avoid problems like this, it should be immediately obvious to a user how the basic
security features of your application work. Unlike other applications like web
browsers, word processors, and photo editors, users don’t spend hours a day inside
the security portions of applications, and don’t have the time investment to memorise
how to use them. Your application should auto-configure itself as much as possible,
leaving only a minimal set of familiar operations for the user. For example a network
server can automatically generate a self-signed certificate on installation and use that
to secure communications to it, avoiding the complexity and expense of obtaining a
certificate from an external CA. An email-application can automatically obtain a
certificate from a local CA if there’s one available (for example an in-house one if the
software is being used in an organisation) whenever a new email address is set up.
Even if you consider this to be a lowering of
theoretical security, it’s raising its
effective security because now it’ll actually be used.
Security Usability Design
90
On the client side, your application can use cryptlib’s plug-and-play PKI facility to
automatically locate and communicate with a CA server [4], requiring that the user
enter nothing more than a name and password to authenticate themselves (this
process takes less than a minute, and doesn’t require a PhD in computer science to
understand). For embedded devices, the operation can occur automatically when the
device is configured at the time of manufacture.
Since all users are quite used to entering passwords, your application can use the
traditional user name and password (tunnelled over a secure channel such as
SSL/TLS or SSH) rather than more complex mechanisms like PKI, which in most
cases is just an awkward form of user name and password (the user name and
password unlock the private key, which is then used to authenticate the user). Many
users choose poor passwords, so protocols like TLS’ password-based failsafe
authentication (TLS-PSK), which never transmit the password even over the secured
link, should be preferred to ones that do. TLS-PSK used in this manner is
automatically part of the critical action sequence.
An additional benefit of TLS’ password-based authentication is that it performs
mutual authentication of both parties, identifying not only the client to the server but
also the server to the client, without any of the expense, overhead, or complexity of
certificates and a PKI. Whereas PKI protects names (which isn’t very useful), TLS-
PSK protects relationships (which is). Interestingly, RSA Data Security, the
company that created Verisign, has recently advocated exactly this method of
authentication in place of certificates [5]. Of course users don’t know (or care) about
the fact that they’re performing mutual authentication, all they care about is that they
have a verified secure channel to the other party, and all they know about is that
they’re entering their password as usual.
TLS-PSK actually provides something that’s rather better than conventional mutual
authentication, which is usually built around some form of challenge/response
protocol. The authentication provided in TLS-PSK is so-called failsafe authentication
in which neither side obtains the other side’s authentication credentials if the
authentication fails. In other words it fails safe, as opposed to many other forms of
authentication (most notably the standard password-based authentication in HTTP-
over-TLS and SSH) in which, even if the authentication fails, the other side still ends
up with a copy of your authentication credentials such as a password (this flaw is
what makes phishing work so well).
A final benefit of TLS-PSK is that it allows the server to perform password-quality
checks and reject poor, easy-to-guess passwords (“What’s your dog’s maiden
name?”). With certificates there’s no such control, since the server only sees the
client’s certificate and has no idea of the strength of the password that’s being used to
protect it on the client machine. A survey of SSH public-key authentication found
that nearly two thirds of all private keys weren’t just poorly protected, they used
no
protection at all
. As far as the server was concerned the clients were using (hopefully
strong) public-key-based authentication, when the private keys were actually being
held on disk as unprotected plaintext files [6]. Furthermore, SSH’s
known-host
mechanism would tell an attacker who gains access to a client key file exactly which
systems they could compromise using the unprotected key.
You can obtain invisible TLS-PSK-type beneficial effects through the use of other
security mechanisms that double up an operation that the user wants to accomplish
with the security mechanism. Perhaps the best-known of these is the use of an
ignition key in a car. Drivers don’t use their car keys as a security measure, they use
them to tell the car when to start and stop. However, by doing so they’re also getting
security at a cost so low that no-one notices.
A more overt piggybacking of security on usability was the design of the common fill
device for the KW-26 teletype link encryptor, which was keyed using a punched card
supported at the end by pins. To prevent the same card from being re-used, it was cut
in half when the card reader door was opened [7]. Since it was supported only at the
two ends by the pins, it wasn’t possible to use it any more. This meant that the
normal process of using the device guaranteed (as a side-effect) that the same key
Automation vs. Explicitness
91
was never re-used, enforcing with the simplest mechanical measures something that
no amount of military discipline had been able to achieve during the previous world
war. Being able to double up the standard use of an item with a security mechanism
in this manner unfortunately occurs only rarely, but when it does happen it’s
extraordinarily effective.
(In practice the KW-26 mechanism wasn’t quite as effective as its designers had
hoped. Since distributing the cards ended up costing $50-100 a pop due to the
security requirements involved, and there were potentially a dozen or more devices to
re-key, mistakes were quite costly. Users discovered that it was possible, with a bit
of patience, to tape the segments back together again in such a way that they could be
re-used, contrary to the designers’ intentions. This is the sort of problem that a post-
delivery review, discussed in the section on usability testing, would have turned up).
Post-delivery reviews can also turn up other problems that aren’t obvious at the
design stage. Sometimes the results from changing a security system to make it
“easier to use” can be counterintuitive. In one evaluation carried out with electronic
door locks, users complained that the high-tech electronic lock was more
cumbersome than using old-fashioned keys. The developers examined video footage
of users and found that both methods took about the same amount of time, but when
using standard keys to open the door users spent most of their time taking keys from
their pockets, finding the correct one, inserting it in the lock, unlocking the door,
removing the key, and so on and so forth. In contrast with the electronic lock all of
these calisthenics became unnecessary and users spent most of their time waiting for
the locking system to perform its actions. As a result, the electronic lock rated lower
than the original key-based one because fiddling with keys acted as a pacifier that
occupied users’ minds during the unlocking process while the electronic lock had no
such pacifier, making every little delay stand out in the mind of the user [8].
Automation vs. Explicitness
When you’re planning the level of automation that you want to provide for users,
consider the relative tradeoffs between making things invisible and automated vs.
obvious but obtrusive. Users will act to minimise or eliminate monotonous computer
tasks if they can, since humans tend to dislike repetitive tasks and will take shortcuts
wherever possible. The more that users have to perform operations like signing and
encryption, the more they want shortcuts to doing so, which means either making it
mostly (or completely) automated, with a concomitant drop in security, or having
them avoid signing/encrypting altogether. So a mechanism that requires the use of a
smart card and PIN will inevitably end up being rarely-used, while one that
automatically processes anything that floats by will be. You’ll need to decide where
the best trade-off point lies — see the section on theoretical vs. effective security
above for more guidance on this.
There are however cases where obtrusive security measures are warranted, such as
when the user is being asked to make important security decisions. In situations like
this, the user should be required to explicitly authorise an action before the action can
proceed. In other words any security-relevant action that’s taken should represent a
conscious expression of the will of the user. Silently signing a message behind the
user’s back is not only bad practice (it’s the equivalent of having them sign a contract
without reading it), but is also unlikely to stand up in a court of law, thus voiding the
reason usually given for signing a document.
If the user is being asked to make a security-relevant decision of this kind, make sure
that the action of proceeding really does represents an informed, conscious decision
on their part. Clear the mouse and keyboard buffers to make sure that a keystroke or
mouse click still present from earlier on doesn’t get accepted as a response for the
current decision. Don’t assign any buttons as the default action, since something as
trivial as bumping the space bar will, with most GUIs, trigger the default action and
cause the user to inadvertently sign the document (in this case the secure default is to
do nothing, rather than allowing the user to accidentally create a signature). If
necessary, consult with a lawyer about requirements for the wording and presentation
of requests for security-related decisions that may end up being challenged in court.
Security Usability Design
92
Making sure that the input that your user interface is getting was directly triggered by
one of the interface elements is an important security measure. If you don’t apply
measures like this, you make yourself vulnerable to a variety of presentation attacks
in which an attacker redirects user input elsewhere to perform various malicious
actions. Consider a case where a web page asks the user to type in some scrambled
letters, a standard CAPTCHA/reverse Turing test used to prevent automated misuse
of the page by bots. The letters that the user is asked to type are “xyz”. When the
user types the ‘x’, the web page tries to install a malicious ActiveX control. Just as
they type the ‘y’, the browser pops up a warning dialog asking the user whether they
want to run the ActiveX control, with a Yes/No button to click. The input focus is
now on the warning dialog rather than the web page, which receives the user’s typed
‘y’ and instantly disappears again as the browser installs the malicious ActiveX
control. This attack, which was first noticed by the Firefox browser developers
[9][10][11] but also affected Internet Explorer [12] is somewhat unusual in that it’s
more effective against skilled users, whose reaction time to unexpected stimuli is far
slower than their typing speed.
This type of attack isn’t limited solely to the keyboard. Since dialogs pop up at
known locations, it’s possible to use enqueued mouse clicks in a similar way to
enqueued keystrokes, having users double-click on something and then popping up a
dialog under the location of the second click, or forcing them to click away a series of
popups with something critical hidden at the bottom of the stack. On most systems
this occurs so quickly that the user won’t even be aware that it’s happened [13].
The Firefox solution to this problem was to clear the input queue and insert a time
delay into the XPI extension installation button, hopefully giving users time to react
to the dialog before taking any action [14]. Unfortunately users weren’t aware of
why the delay was there and perceived it as a nagware tactic, in some cases altering
their browser configuration to reduce the delay to zero [15][16]. There’s even an XPI
plugin to remove the XPI plugin install delay [17]. A “Why is this button greyed out”
tooltip would have helped here.
Apple’s solution to the problem was to force users to use a mouse click to
acknowledge an install dialog, and to add a second “Are you sure?” dialog to confirm
this. While this isn’t useful against user conditioning to click ‘OK’ on any dialog that
pops up, it does insert enough of a speed bump that users can’t be tricked into
installing something without even knowing that they’ve done it, or at least no more so
than a standard click, whirr response would allow anyway. As the second attack
variant described above indicates, just the mouse-only requirement by itself isn’t a
practical defence against this type of attack, and it has the added drawback of making
the dialog inaccessible to non-mouse users.
Help OK Cancel
Warning! You are about to enter into a legally
binding agreement which stipulates that ...
Consider the signature dialog above, which represents the first attempt at an
appropriate warning dialog in a digital signature application. When challenging this
in court, J.P.Shyster (the famous defence lawyer) claims that his client, dear sweet
Granny Smith, was merely acknowledging a warning when she clicked OK, and had
no idea that she was entering into a legally binding contract. The sixty-year-old judge
with a liberal arts degree and a jury of people whose VCRs all blink ‘12:00’ agree
with him, and the contract is declared void.
Automation vs. Explicitness
93
Help Sign Cancel
Warning! You are about to enter into a legally
binding agreement which stipulates that ...
So the application designers try again, and having learned their lesson come up with
the dialog above. This time it’s obvious that a signature is being generated.
However, now J.P.Shyster points out that the buttons are placed in a non-standard
manner (the ‘Sign’ button is where the ‘Cancel’ button would normally be) by
obviously incompetent programmers, and produces a string of expert witnesses and
copies of GUI design guidelines to back up his argument [18]. The judge peers at the
dialog through his trifocals and agrees, and the case is again dismissed.
Sign Cancel Help
Warning! You are about to enter into a legally
binding agreement which stipulates that ...
The designers try again, at the third attempt coming up with the dialog above. This
time, J.P.Shyster argues that Granny Smith was again merely being presented with a
warning that she was about to enter into an agreement, and that there was no
indication in the dialog that she was assenting to the agreement the instant she clicked
‘Sign’. The judge, who’s getting a bit tired of this and just wants to get back to his
golf game, agrees, and the case is yet again dismissed.
Sign Cancel Help
By clicking 'Sign' below I acknowledge that I am
entering into a legally binding agreement ...
The application designers’ fourth attempt is shown above. J.P.Shyster has since
moved on to a successful career in politics, so this time the design isn’t tested in
court. This does, however, show how tricky it is to get even a basic security dialog
right, or at least capable of standing up to hostile analysis in court (a skilled lawyer
will be able to find ambiguity in a “No smoking” sign). More dangerous than the
most devious phisher, more dangerous even than a government intelligence agency, a
hostile expert witness is the most formidable attack type that any security application
will ever have to face.
An example of just how awkward this can become for programmers has been
demonstrated by the ongoing legal wrangling over the source code for some of the
breath analysers used in the US for breath-alcohol measurements in drink-driving
cases. Although the technology has been used for some decades, a US Fifth District
Court of Appeals ruling that “one should not have privileges and freedom jeopardized
by the results of a mystical machine that is immune from discovery” has resulted in at
least 1,000 breath tests being thrown out of court in a single county in 2005 alone, the
year that the ruling was first applied [19]. It didn’t help when, after nearly two years
of legal wrangling, the code was finally released and found to be of somewhat
dubious quality, with erratic and in some cases entirely disabled handling of error
conditions [20]. Something similar occurred in Australia in the 1990s, when the
veracity of a supposedly (but not really) tamperproof security surveillance system
was questioned. Given the state of most software systems it seems that the best way
Security Usability Design
94
to deal with the situation where a product will be subject to legal scrutiny is to leave
that portion of the market to someone else, preferably a problematic competitor.
Safe Defaults
As an earlier section has already pointed out, the provision of user-configurable
security options in applications is often just a way for developers to dump
responsibility onto users. If the developers can’t decide whether option X or Y is
better, they’ll leave it up to the user to decide, who has even less idea than the
developer. Most users simply stay with the default option, and only ever take the
desperate option of fiddling with settings if the application stops working in some
way and they don’t have any other choices.
As the section on the psychology of insecurity has already stated, have plenty of
psychological research to fall back on to explain this phenomenon, called the status
quo bias by psychologists [21]. As the name implies, people are very reluctant to
change the status quo, even if it’s obvious that they’re getting a bad deal out of it.
For example in countries where organ donorship is opt-out (typical in many European
countries), organ donor rates are as high as 80%. In countries where organ donorship
is opt-in (the US), organ donorship can be as low as 10% [22]. The status quo for
most Europeans is to be an organ donor while the status quo in the US is to not be an
organ donor, and few people bother to change this.
If you’re tempted to dismiss this merely as a difference in attitude to organ donorship
between Europe and the US, here’s an example from the US only. In the early 1990s
the two demographically similar, neighbouring states of New Jersey and
Pennsylvania updated their automotive insurance laws. New Jersey adopted as
default a cheaper system that restricted the right to sue while Pennsylvania adopted as
default a more expensive one that didn’t restrict this right (this has been referred to as
a “natural quasi-experiment” by one author [23]). Because of the status quo effect,
most Pennsylvania drivers stayed with the default even though it was costing them
more than the alternative, about 200 million dollars at the time the first analysis of the
issue was published [24]. Looking at something with a rather more human cost,
cancer researchers report that when a doctor recommends a screening mammogram,
90% of patients comply. Without the prompting by the doctor, 90% didn’t take the
screening [25].
The status quo bias effect creates nasty problems for application developers (and by
extension the users of their work) because the more obscure options are unlikely to
ever see much real-world use, with the result being that they can break (either in the
“not-work” or the “broken security” sense) when they do get used.
As the Tor developers point out, the real problem that developers are faced with is
that they end up having to choose between “insecure” and “inconvenient” as the
default configurations for their applications [26]. Either they turn on everything, with
the result that many of the options that are enabled are potentially insecure, or they
lock everything down, with the result that there may be problems with the locked-
down application interacting with one run by someone who has everything enabled.
This problem is compounded by the fact that developers generally run their code on a
shielded network with every bell, whistle, and gong enabled for testing/debugging
purposes, so they never see what happens when the result of their work is run the real
world.
Your application should provide sensible security defaults, and in particular ensure
that the default/most obvious action is the safest one. In other words if the user
chooses to click “OK” for every action (as most users will do), they should be kept
from harming themselves or others. Remember that if you present the user with a
dialog box that asks “A possible security problem has been detected, do you want to
continue [Yes/No]”, what the user will read is “Do you want this message to go away
[Yes/No]” (or more directly “Do you want to continue doing your job [Yes/No]”, see
Figure 25). Ensuring that the Yes option is the safe one helps prevent the user from
harming themselves (and others) when they click it automatically.
Automation vs. Explicitness
95
OK Cancel
A possible security problem has been
detected, do you want to continue?
OK Cancel
Do you want this warning to go away?
Figure 25: What the developer wrote (above); what the user sees (below)
One simple way to test your application is to run it and click OK (or whatever the
default action is) on every single security-related dialog that pops up (usability testing
has shown that there are actually users who’ll behave in exactly this manner). Is the
result still secure?
Now run the same exercise again, but this time consider that each dialog that’s
thrown up has been triggered by a hostile attack rather than just a dry test-run. In
other words the “Are you sure you want to open this document (default ‘Yes’)”
question is sitting in front of an Internet worm and not a Word document of last
week’s sales figures. Now, is your application still secure? A great many
applications will fail even this simple security usability test.
cryptlib already enforces this secure-by-default rule by always choosing safe settings
for security options, algorithms, and mechanisms, but you should carefully check
your application to ensure that any actions that it takes (either implicitly, or explicitly
when the user chooses the default action in response to a query) are the safest ones.
The use of safe defaults is also preferable to endless dialogs asking users to confirm
every action that needs to be taken, which rapidly becomes annoying and trains users
to dismiss them without reading them.
One way avoiding the “Click OK to make this message go away” problem is to
change the question from a basic yes/no one to a multi-choice one, which makes user
satisficing much more difficult. In one real-world test, about a third of users fell prey
to attacks when the system used a simple yes/no check for a security property such as
a verification code or key fingerprint, but this dropped to zero when users were asked
to choose the correct verification code from a selection of five (one of which was
“None of the above”) [27]. The reason for this was that users either didn’t think
about the yes/no question at all, or applied judgemental heuristics to rationalise any
irregularities away as being transient errors, while the need to choose the correct
value from a selection of several actually forced them to think about the problem.
The developers of an SMS-based out-of-band web authentication mechanism used
this approach when they found that users were simply rationalising away any
discrepancies between the information displayed on the untrusted web browser and
the cellphone authentication channel. As a result the developers changed the
interface so that instead of asking the user whether one matched the other, they had to
explicitly select the match, dropping the error rate for the process from 30% to 0%
[28]. Other studies have confirmed the significant drop in error rates when using this
approach, but found as an unfortunate side-effect that the authentication option that
did this had dropped from most-preferred to least-preferred in user evaluations [29],
presumably because it forced them to stop and think rather than simply clicking
‘OK’.
The shareware WinZip program uses a similar technique to force users to stop and
think for the message that it displays when an unregistered copy is run, swapping the
Security Usability Design
96
buttons around so that users are actually forced to stop and read the text and think
about what they’re doing rather than automatically clicking ‘Cancel’ without thinking
about it (this technique has been labelled ‘polymorphic dialogs” by security
researchers evaluating its effectiveness [30]). Similarly, the immigration form used
by New Zealand Customs swaps some of the yes/no questions so that it’s not possible
to simply check every box in the same column without reading the questions (this is a
particularly evil thing to do to a bunch of half-asleep people who have just come off
the 12-hour flight that it takes to get there).
Another technique that you can use is to disable (grey out) the button that invokes the
dangerous action for a set amount of time to force users to take notice of the dialog.
If you do this, make the greyed-out button display a countdown timer to let users
know that they can eventually continue with the action, but have to pause for a short
time first (hopefully they’ll read and think about the dialog while they’re waiting).
The Firefox browser uses this trick when browser plugins are installed, although in
the case of Firefox it was actually added for an entirely different reason which was
obscure enough that it was only revealed when a Firefox developer posted an analysis
of the design rationale behind it [31]. Although this is borrowing an annoying
technique from nagware, it may be the only way that you can get users to consider the
consequences of their actions rather than just ploughing blindly ahead. Obviously
you should restrict the use of this technique to exceptional error conditions rather than
something that the user encounters every time that they want to use your application.
Techniques such as this, which present a roadblock to muscle memory, help ensure
that users pay proper attention when they’re making security-relevant decisions.
Another muscle memory roadblock, already mentioned earlier, is removing the
window-close control on dialog boxes. There also exist various other safety measures
that you can adopt for actions that have potentially dangerous consequences. For
example Apple’s user interface guidelines recommend spacing buttons for dangerous
actions at least 24 pixels away from other buttons, twice the normal distance of 12
pixels [32].
Another way of enforcing the use of safe defaults is to require extra effort from the
user to do things the unsafe way, and to make it extremely obvious that this is a bad
way to do things. The technical term for this type of mechanism, which prevents (or
at least makes unlikely) some type of mistake, is a forcing function [33]. Forcing
functions are used in a wide variety of applications to dissuade users from taking
unwise steps. For example the programming language Oberon requires that users
who want to perform potentially dangerous type casts import a pseudo-module called
SYSTEM that provides the required casting functions. The presence of this import in
the header of any module that uses it is meant to indicate, like the fleur-de-lis brand
on a criminal, that unsavoury things are taking place here and that this is something
you may want to avoid contact with.
An example of a security-related forcing function occurs in the MySQL database
replication system, which has a master server controlling several networked slave
machines. The replication system user starts the slave with
start slave, which
automatically uses SSL to protect all communications with the master. To run
without this protection, the user has to explicitly say
start slave without
security
, which both requires more effort to do and is something that will give
most users an uneasy feeling. Contrast this with many popular mail clients, which
perform all of their communication with the host in the clear unless the user
remembers to check the “Use SSL” box buried three levels down in a configuration
dialog or include the
ssl option on the command-line. As one assessment of the
Thunderbird email client software puts it, “This system is
only usable by computer
experts
. The only reason I was able to ‘quickly’ sort this out was because I knew (as
an experienced cryptoplumber) exactly what it was trying to do. I know that TLS
requires a cert over the other end, and there is a potential client-side cert. But without
that knowledge, a user would be lost [...] It took longer to do the setting up of some
security options than it takes to download, install, and initiate an encrypted VoIP call
over Skype with someone who has
never used Skype before” [34]
Requirements and Anti-requirements
97
Requirements and Anti-requirements
One way to analyse potential problem areas is to create a set of anti-requirements to
parallel the more usual design requirements. In other words, what
shouldn’t your
user interface allow the user to do? Should they really be able to disable all of the
security features of your software via the user interface (see Figure 26)? There are in
fact a whole raft of viruses and worms that disable Office and Outlook security via
OLE automation, and no Internet worm would be complete without including
facilities to disable virus checkers and personal firewalls. This functionality is so
widespread that at one point it was possible to scan for certain malware by checking
not so much for the malware itself but merely the presence of code to turn off
protection features.
Figure 26: Would you buy a car that had a ‘disable the brakes’ option?
Just because malware commonly takes advantage of such capabilities, don’t assume
that these actions will be taken only by malware. Many vendor manuals and websites
contain step-by-step instructions (including screenshots) telling users how to disable
various Windows security features in order to make some piece of badly-written
software run, since it’s easier to turn off the safety checks than to fix the software. So
create a list of anti-requirements — things that your user interface should on no
account allow the user to do — and then make sure that they are in fact prevented
from doing them.
Another way to analyse potential problems in the user interface is to apply the
bCanUseTheDamnThing test (if you’re not familiar with Hungarian notation, the
b prefix indicates that this is a boolean variable and the rest of the variable name
should be self-explanatory). This comes from an early PKI application in which the
developers realised that neither programmers nor users were even remotely interested
in things such as whether an X.509 certificate’s policy-mapping-inhibit constraint on
Security Usability Design
98
a mapped policy derived from an implicit initial policy set had triggered or not, all
that they cared about was
bCanUseTheDamnThing. Far too many security user
interfaces (and at a lower level programming libraries) present the user or developer
with a smorgasbord of choices and then expect them to be able to mentally map this
selection onto
bCanUseTheDamnThing themselves. As the previous section
showed, users will invariably map a confusing choice that they’re presented with to
bCanUseTheDamnThing = TRUE because they don’t understand what they’re
being asked to decide but they do understand that a value of TRUE will produce the
results they desire.
The
bCanUseTheDamnThing test is a very important one in designing usable
security interfaces. If the final stage of your interface algorithm consists of “the user
maps our explanation of the problem to
bCanUseTheDamnThing” then it’s a sign
that your interface design is incomplete, since it’s offloading the final (and probably
most important) step onto the user rather than handling it itself. Lack of attention to
bCanUseTheDamnThing shows up again and again in post-mortem analyses of
industrial accidents and aircraft crashes: by the time the operators have checked the
800 dials and lights to try and discover where the problem lies, the reactor has already
gone critical. It’s traditional to blame such faults on “human error”, although the
humans who made the mistake are really the ones who designed latent pathogens into
the interface and not the operators.
Interaction with other Systems
Secure systems don’t exist in a vacuum, but need to interact not only with users but
also with other, possibly insecure systems. What assumptions is your design making
about these other systems? Which ones does it trust? Given Robert Morris Sr.’s
definition of a trusted system as “one that can violate your security policy”, what
happens if that trust is violated, either deliberately (it’s compromised by an attacker)
or accidentally (it’s running buggy software)? For example a number of SSH
implementations assumed that when the other system had successfully completed an
SSH handshake this constituted proof that it would only behave in a friendly manner,
and were completely vulnerable to any malicious action taken by the other system.
On a similar note, there’s more spam coming from compromised “good” systems than
“bad” ones. Trust but verify — a digitally signed virus is still a virus, even if it has a
valid digital signature attached.
Going beyond the obvious “trust nobody” approach, your application can also
provide different levels of functionality under different conditions. Just as many file
servers will allow read-only access or access to a limited subset of files under a low
level of user authentication and more extensive access or write/update access only
under a higher level of authentication, so your application can change its functionality
based on how safe (or unsafe) it considers the situation to be. So instead of simply
disallowing all communications to a server whose authentication key has changed (or,
more likely, connecting anyway to avoid user complaints), you can run in a “safe
mode” that disallows uploads of (potentially sensitive) data to the possibly-
compromised server and is more cautious about information coming from the server
than usual.
The reason for being cautious about uploads as well as downloads is that setting up a
fake server is a very easy way to acquire large amounts of sensitive information with
the direct cooperation of the user. For example if an attacker knows that a potential
victim is mirroring their data via SSH to a network server, a simple trick like ARP
spoofing will allow them to substitute their own server and have the victim hand over
their sensitive files to the fake server. Having the client software distrust the server
and disallow uploads when its key changes helps prevent this type of attack.
Be careful what you tell the other system when a problem occurs — it could be
controlled by an attacker who’ll use the information against you. For example some
SSH implementations send quite detailed diagnostic information to the other side,
which is great for debugging the implementation, but also rather dangerous because
it’s providing an attacker with a deep insight into the operation of the local system. If
Matching Users’ Mental Models
99
you’re going to provide detailed diagnostics of this kind, make it a special debug
option and turn it off by default. Better yet, make it something that has to be
explicitly enabled for each new operation, to prevent it from being accidentally left
enabled after the problem is diagnosed (debugging modes, once enabled, are
invariably left on “just in case”, and then forgotten about).
Security systems can also display emergent properties unanticipated by their original
designers when they interact, often creating new vulnerabilities in the process.
Consider what happens when you connect a PC with a personal firewall to an 802.11
access point. An attacker can steal the PC’s IP and MAC address and use the access
point, since the personal firewall will see the attacker’s packets as a port scan and
silently drop them. Without the personal firewall security system in place, the
attacker’s connections would be reset by the PC’s IP stack. It’s only the modification
of the two security systems’ designed behaviours that occurs when they interact that
makes it possible for two systems with the same IP and MAC addresses to share the
connection. So as well as thinking about the interaction of security systems in the
traditional “us vs. them” scenario, you should also consider what happens when they
interact constructively to produce an unwanted effect.
Conversely, be very careful with how you handle any information from the remote
system. Run it through a filter to strip out any special non-printable characters and
information before you display it to the user, and present it in a context where it’s
very clear to the user that the information is coming from another system (and is
therefore potentially controlled by a hostile party) and not from your application.
Consider the install dialog in Figure 27. The attacker has chosen a description for
their program that looks like instructions from the application to the user.
Figure 27: Spoofed plugin install dialog
Since the dialog doesn’t make a clear distinction between information from the
application and information from the untrusted source, it’s easy for an attacker to
mislead the user. Such attacks have already been used in the past in conjunction with
Internet Explorer, with developers of malicious ActiveX controls giving them
misleading names and descriptions that appear to be instructions from the browser.
Matching Users’ Mental Models
In order to be understandable to users, it’s essential that your application match the
user’s mental model of how something should work and that it follow the flow of the
users’ conception of how a task should be performed. As an earlier section pointed
out, users’ mental models of how the security mechanisms themselves work are often
wildly inaccurate, and the best approach at is to avoid exposing them to the details as
much as possible. If you don’t follow the users’ conception of how the task should be
performed, they’ll find it very difficult to accomplish what they want to do when they
sit down in front of your application.
In most cases users will already have some form of mental model of what your
software is doing, either from the real world or from using similar software
(admittedly the accuracy of their model will vary from good through to bizarre, but
there’ll be some sort of conception there). Before you begin, you should try and
discover your users’ mental models of what your application is doing and follow
them as much as possible, because an application that tries to impose an unfamiliar
conceptual model on its users instead of building on the knowledge and experience
that the users already have is bound to run into difficulties. This is why (for example)
Security Usability Design
100
photo-management applications go to a great deal of programming effort to look like
photo albums even if it means significant extra work for the application developers,
because that’s what users are familiar with.
Consider the process of generating a public/private key pair. If you’re sitting at a
Unix command line, you fire up a copy of
gpg or openssl, feed it a long string of
command-line options, optionally get prompted for further pieces of input, and at the
end of the process have a public/private key pair stored somewhere as indicated by
one of the command-line options.
Figure 28: KGPG key generation dialog
This command-line interface-style design has been carried over to equivalent
graphical interfaces that are used to perform the same operation.
-k keysize has
become a drop-down combo box.
-a algorithm is a set of checkboxes, and so on,
with Figure 28 being an example of this type of design (note the oxymoronic ‘Expert
mode’ option, which leads to an even more complex interface, dropping the user to a
command prompt). Overall, it’s just a big graphical CLI-equivalent, with each
command-line option replaced by a GUI element, often spread over several screens
for good measure (one large public CA requires that users fill out
eleven pages of
such information in order to be allowed to generate their public/private key pair,
making the process more a test of the user’s pain tolerance than anything useful).
These interfaces violate the prime directive of user interface design: Focus on the
users and their tasks, not on the technology [35].
The problem with this style of interface, which follows a design style known as task-
directed design, is that while it may cater quite well to people moving over from the
command-line interface it’s very difficult to comprehend for the average user without
this level of background knowledge, who will find it a considerable struggle to
accomplish their desired goal of generating a key to protect their email or web
browsing. What’s a key pair? Why do I need two keys instead of just one? What’s a
good key size? What are Elgamal and DSA? What’s the significance of an expiry
date? Why is an email application asking me for my email address when it already
Matching Users’ Mental Models
101
knows it? What should I put in the comment field? Doesn’t the computer already
know my name, since I logged on using it? This dialog should be taken outside and
shot.
Interfaces designed by engineers tend to end up looking like something from Terry
Gilliam’s “Brazil”, all exposed plumbing and wires. To an engineer, the inner
workings of a complex device are a thing of beauty and not something to be hidden
away. In the software world, this results in a user interface that has a button for every
function, a field for every data input, a dialog box for every code module. To a
programmer, such a model is definitively accurate. Interaction with the user occurs in
perfect conformity with the internal logic of the software. Users provide input when
it’s convenient for the application to accept it, not when it’s convenient for the user to
provide it. This problem is exemplified by the Windows Vista UAC dialog discussed
in a previous chapter. Informing the user that an application has been blocked
because of a Windows Group Policy administrative setting may be convenient for the
programmer, but it provides essentially zero information to the user (the manifold
shortcomings of the UAC dialog have provided fertile ground for user interface
designers ever since it was released).
The reason why task-directed design is so popular (apart from the fact that it closely
matches programmers’ mental models) is that as security properties are very abstract
and quite hard to understand, it’s easier for application developers to present a bunch
of individual task controls rather than trying to come up with something that achieves
a broader security goal. However, wonderful though your application may be, to the
majority of users it’s merely a means to an end, not an end itself. Rather than
focusing on the nuts and bolts of the key generation process, the interface should
instead focus on the activity that the user is trying to perform, and concentrate on
making this task as easy as possible. Microsoft has espoused this user interface
design principle in the form of Activity-Based Planning, which instead of giving the
user a pile of atomic operations and forcing them to hunt through menus and dialogs
to piece all the bits and pieces together to achieve their intended goal, creates a list of
things that a user might want to do (see the section on pre-implementation testing
further on) and then builds the user interface around those tasks.
Activity-Based Planning
Activity-based planning matches users’ natural ways of thinking about their activities.
Consider the difference in usability between a car designed with the goal of letting
the user control the fuel system, camshafts, cooling system, ignition system,
turbochargers, and so on, and one designed with goal of making the car go from A to
B in the most expedient manner possible. Outside of a few hardcore petrol-heads, no-
one would be able to use the former type of car. In fact, people pay car
manufacturers significant amounts of money to ensure that the manufacturer spends
even more significant amounts of money to keep all of this low-level detail as far
away from them as possible. The vast majority of car owners see a car as simply a
tool for achieving a goal like getting from A to B, and will spend only the minimal
effort required by law (and sometimes not even that) to learn its intricacies.
A similar thing occurs with security applications. Users focus on goals such as “I
want my medical records kept private” or “I want to be sure that the
person/organisation that I’m talking to really is who they claim to be”, rather than
focusing on technology such as “I want to use an X.509 certificate in conjunction
with triple-DES encryption to secure my communications”. Your application should
therefore present the task involving security in terms of the users’ goals rather than of
the underlying security technology, and in terms that the users can understand (most
users won’t speak security jargon). This both makes it possible for users to
understand what it is they’re doing, and encourages them to make use of the security
mechanisms that are available.
The actual goals of the user often come as a considerable surprise to security people
(there’s more on this in the section on security usability testing). For example
security researchers have been pushing for voter verifiable paper trails (VVPAT) as a
safety mechanism for electronic voting machines in the face of a seemingly never-
Security Usability Design
102
ending stream of reports about the machines’ unreliability and insecurity. However,
when voting machines with VVPAT capabilities were tested on voters, they
completely ignored the paper record, and had less confidence in the VVPAT-enabled
devices than in the purely electronic ones, despite extensive and ongoing publicity
about their unreliability [36]. A two-year study carried out in Italy ran into the same
issues, receiving user comments like “this receipt stuff and checking the votes are
dangerous, please give only the totals at the end and no receipts” [37]. This indicates
that the users of the equipment (the voters) had very different goals to the security
people who were designing them (or at least trying to fix up the designs of existing
devices).
A useful trick to use when you’re creating the test for your user interface is to pretend
that you’re looking over the user’s shoulder explaining how to accomplish the task to
them, because this tends to lead naturally towards a goal-oriented workflow. If your
application is telling the user what to do, use the second person: “Choose the key that
you want to use for encryption”. If the user is telling the application what to do, use
the first person: “Use this key for encryption”.
Using the key generation example from earlier, the two activities mentioned were
generating a key to protect email, and generating a key to protect web browsing (in
other words, for an SSL web server). This leads naturally to an interface in which the
user is first asked which of the two tasks they want to accomplish, and once they’ve
made their choice, asked for their name and email address (for the email protection
key) or their web server address (for the SSL/web browsing key). Obviously if the
key generation is integrated into an existing application, you’d skip this step and go
straight to the actual key generation stage — most users will be performing key
generation as a side-effect of running a standard application, not because they like
playing key administrator with a key management program.
A better option when you’re performing the key generation for an application-specific
purpose is to try to determine the details automatically, for example by reading the
user’s name and email address information from the user’s mail application
configuration and merely asking them to confirm the details. Under Windows you
can use CDO (Collaboration Data Objects) to query the CdoPR_GIVEN_NAME,
CdoPR_SURNAME, and CdoPR_EMAIL fields of the CurrentUser object. Under
OS X you can use the ABAddressBook class of the AddressBook framework to query
the “Me” (current) user’s kABFirstNameProperty, kABLastNameProperty, and
kABEmailProperty and use them to automatically populate the dialog fields. OS X is
particularly good in this regard, asking for your address book data the first time that
you log in, after which applications automatically use the address book information
instead of asking for it again and again in each application. The Opera web browser
tries to fix this problem from the opposite end with its Magic Wand feature, which
initially records user details and then template-matches them to fields in web pages,
providing a browser-based equivalent to the OS X address book, at least for web-
based forms.
Conversely, Linux and the *BSDs seem to have no facility for such centralised user
information management, requiring that you manually enter the same information
over and over again for each application that needs it. One thing that computers are
really good at is managing data, so the user shouldn’t be required to manually re-enter
information that the computer already knows. This is one of the tenets of Macintosh
user interface design, the user should never have to tell the machine anything that it
already knows or can deduce for itself.
Another benefit of pre-filling in fields is that, even if the information isn’t quite what
the user wanted and they have to manually correct it, it still provides them with a
template to guide them, the equivalent of a default choice in dialog box buttons that
provides just-in-time instructions to help them figure out how to complete a field.
Again, see the section on pre-implementation testing for a discussion of how to work
out details such as where to store the generated key.
There are three additional considerations that you need to take into account when
you’re using Activity-Based Planning to design your user interface. First, you need
Matching Users’ Mental Models
103
to be careful to plan the activities correctly, so that you cover the majority of typical
use cases and don’t alienate users by forcing them down paths that they don’t want to
take, or having to try and mentally reverse-engineer the flow to try and guess which
path they have to take to get to their desired goal (think of a typical top-level phone
menu, for which there are usually several initial choices that might lead to any desired
goal). If you have, for example, a key generation wizard that involves more than
three or four steps then it’s a sign that a redesign is in order.
Figure 29: A portion of the GPA key generation wizard
GPA, an application from the same family as KGPG, used to use an almost identical
key generation dialog as KGPG, but in more recent versions has switched to using a
wizard-style interface, of which one screen is shown in Figure 29. Unfortunately this
new interface is merely the earlier (incomprehensible) dialog cut up into lots of little
pieces and presented to the user a step at a time, adding Chinese water torture to the
sins of its predecessor.
The second additional consideration is that you should always provide an opt-out
capability to accommodate users who don’t want to perform an action that matches
one of your pre-generated ones. This would be handled in the key-generation
interface by the addition of a third option to generate some other (user-defined) type
of key, the equivalent of the “Press 0 to talk to an operator” option in a phone menu.
Taking advantage of extensive research by educational psychologists, the dialog uses
a conversational rather than formal style. When the user’s brain encounters this style
of speech rather than the more formal lecturing style used in many dialogs, it thinks
that it’s in a conversation and therefore has to pay more attention to hold up its end.
In other words at some level your brain pays more attention when it’s being talked
with rather than talked at [38].
Finally, you should provide a facility to select an alternative interface, usually
presented as an expert or advanced mode, for users who prefer the nuts-and-bolts
style interface in which they can specify every little detail themselves (dropping to
the command-line, however, is not a good way to do this). Although the subgroup of
users who prefer this level of configurability for their applications is relatively small,
it tends to be a rather vocal minority who will complain loudly about the inability to
specify their favourite obscure algorithm or select some peculiar key size (this level
of flexibility can actually represent a security risk, since it’s possible to fingerprint
users of privacy applications if they choose unusual combinations of algorithms and
key sizes, so that even if their identity is hidden they can be tracked based on their
algorithm choice).
The need to handle special cases is somewhat unfortunate since a standard user
interface design rule is to optimise your design for the top 80% of users (the so-called
“80 percent rule”). The 80% rule works almost everywhere, but there are always
special cases where you need to take extra care. An example of such a case is word
processors, which will be reviewed by journalists who use the software in very
different ways than the average user. So if you want to get a positive review for your
word processor, you have to make sure that features used by journalists like an article
word count are easy to use. Similarly, when you’re designing a security user
Security Usability Design
104
interface, it’s the 1-2% of users who are security experts (self-appointed or otherwise)
who will complain the most when your 80 percent solution doesn’t cater to their
particular requirements.
Figure 30: OS X Wizard interface
If you’re going to provide an expert-mode style interface, remember to make the
simplest, most straightforward interface configuration the default one, since studies
have shown that casual users don’t customise their interfaces (typically for fear of
“breaking something”) even when a configuration capability is available.
Design Example: Key Generation
Let’s look at a simple design exercise for activity-based planning, in this case
involving the task of user key generation for a mail encryption program. The first
page of the wizard, shown in Figure 31, explains what’s about to happen, and gives
the user the choice of using information obtained automatically from the default mail
application, or of entering the details themselves.
To communicate securely with others, you need to create an
encryption key. This key will be labelled as belonging to
Bob
Sample
with the email address bob@sample.com .
If you'd like to change your key settings, click 'Change
Details', otherwise click 'Create Key'.
Create Key Change Details Cancel
Create Key - Step 1 of 2
Figure 31: Key creation wizard, part 1
There are several things to note about this dialog. The most obvious one is the
contents of the title bar, which gives the operation being performed as “Create Key”
rather than “Generate Key” or “Generate Key Pair”. This is because users
create
documents or create images, they don’t generate them, so it makes sense that they
should also create a key as well. In addition what they’re creating is a key, not a key
pair — most users will have no idea what a key pair is or why they need two of them.
Finally, the title bar indicates their progress through the wizard process, removing the
uncertainty over whether they’re going to be subject to any Chinese water torture to
Matching Users’ Mental Models
105
get their key. OS X Assistants (the equivalent of Windows’ wizards, shown in Figure
30) display a list of steps on the left-hand side of the dialog box, including the
progress indicator as a standard part of the dialog.
The other point to note is the default setting ‘Create Key’, and the fact that it’s
worded as an imperative verb rather than a passive affirmation. This is because the
caption for a button should describe the action that the button initiates rather than
being a generic affirmation like ‘OK’, which makes obvious the action that the user is
about to perform. In addition, by being the default action it allows the majority of
users who simply hit Enter without reading the dialog text to Do The Right Thing.
Finally, note the absence of a window-close control, preventing the user from
automatically getting rid of the dialog and then wondering why the application is
complaining about the absence of a key.
Your key has been created and saved.
In order for others to communicate securely with you, you
need to publish your key information. Would you like to do
this now?
Publish Key Don't Publish Key
Create Key - Step 2 of 2
Figure 32: Key creation wizard, part 2
The next step, shown in Figure 32, informs the user that their key has been created
and safely stored for future use. Again, the default action publishes the key for others
to look up. If the user chooses not to publish the key, they’re led to a more expert-
mode style dialog that warns them that they’ll have to arrange key distribution
themselves, and perhaps gives them the option of exporting it in text format to mail to
others or post to a web page.
Your new key is now ready for use.
Finish
Create Key - Done
Figure 33: Key creation wizard, step 3
The final step, shown in Figure 33, completes the wizard and lets the user know that
their key is now ready for use (although completion pages for wizards are in general
frowned upon, in this case the use is permissible in order to make explicit the fact that
the previous action, which would otherwise be invisible to users, has completed
successfully). In the worst case, all that the user has to do is hit Enter three times
without bothering to stop and read the dialog, and everything will be set up for them.
One possible extra step that isn’t shown here is the processing of some form of
password or PIN to protect the newly-generated key. This is somewhat situation-
specific and may or may not be necessary. For example the key might be stored in a
USB security token or smart card that’s already been enabled via a PIN, or protected
by a master password that the user entered when the application started.
An interesting phenomenon occurs when users are exposed to this style of simple-
but-powerful interface. In a usability test of streamlined scanner software, every one
of the test users commented that it was the “most powerful” that they’d tried, even
Security Usability Design
106
though it had fewer features than the competition. What made it powerful was the
effective power realised by the user, not the feature count. A side-effect of this
“powerful” user interface was that it generated a radically smaller number of tech
support calls than was normal for a product of this type [39]. This confirms
interaction designer Alan Cooper’s paraphrasing of architect Mies van der Rohe’s
dictum “Less is more” into the user interface design principle “No matter how cool
your user interface is, less of it would be better” [40] (a remark that’s particularly
applicable to skinnable interfaces).
Use of Familiar Metaphors
Many users are reluctant to activate security measures because the difficulty of
configuring them is greater than any perceived benefits. Using a metaphor that’s
familiar to the user can help significantly in overcoming this reluctance to deal with
security issues. For example most users are familiar with the use of keys as security
tools, making a key-like device an ideal mechanism for propagating security
parameters from one system to another. One of the most usable computer security
devices ever created, the Datakey, is shown in Figure 34. To use the Datakey, you
insert it into the reader and turn it to the right until it clicks into place, just like a
standard key. To stop using it, you turn it back to the left and remove it from the
reader.
Figure 34: A Datakey being used to key a VPN box
Instead of a using conventional key, the device used to initialise security parameters
across devices is a USB memory key that the user takes to each device that’s being
initialised. This mechanism is used in Microsoft’s Windows Network Smart Key
(WNSK), in which Windows stores WiFi/802.11 encryption keys and other
configuration details onto a standard USB memory key, which is then inserted into
the other wireless devices that need to be configured.
Since USB keys can store amounts of information that would be impossible for
humans to carry from one device to another (the typical WNSK file size is around
100KB), it’s possible to fully automate the setup using full-strength security
parameters and configuration information that would be impossible for humans to
manage. In addition to the automated setup process, for compatibility with non-
WNSK systems it’s also possible to print out the configuration parameters, although
the manual data entry process is rather painful. Using the familiar metaphor of
inserting a key into an object in order to provide security greatly increases the
chances that it’ll actually be used, since it requires almost no effort on the part of the
user.
This type of security mechanism is known as a location-limited channel, one in which
the user’s credentials are proven by the fact that they have physical access to the
device(s) being configured. This is a generalisation of an older technique from the
field of secure transaction processing called geographic entitlement, in which users
were only allowed to initiate a transaction from a fixed location like a secure terminal
room in a brokerage house, for which access required passing through various
physical security controls [41]. If the threat model involves attackers coming in over
a network, such a location-limited channel is more secure than any fancy (and
complex) use of devices such as smart cards and certificates, since the one thing that
Use of Familiar Metaphors
107
a network attacker can’t do is plug a physical token into the device that’s being
configured.
A similar type of mechanism, which is often combined with a location-limited
channel, is a time-limited channel in which two devices have to complete a secure
initialisation within a very small time window. An example of such a mechanism is
one in which the user simultaneously presses secure initialisation buttons on both
devices being configured. The device being initialised would then assume that
anything that responded at that exact point in time would be its intended peer device.
A variation of this has the user wait for an indicator such as an LED on one device to
light up before pressing the button on the other device. By repeating this as required,
the chances of an attacker spoofing the exchange can be made arbitrarily small [42].
An additional countermeasure against a rogue device trying to insert itself into the
channel is to check whether more than one response is received (one from the
legitimate device and one from the rogue one) within the given time window, and
reset the process if this type of tampering is detected. Like tamper-evident seals on
food containers, this is a simple, effective measure that stops all but the most
determined attacker. This mechanism combines both location-limited channels (the
user is demonstrating their authorisation by being able to activate the secure
initialisation process) and a time-limited channel (the setup process has to be carried
out within a precise time window in order to be successful).
This type of secure initialisation mechanism has already been adopted by some
vendors of 802.11 wireless devices who are trying to combat the low level of
adoption of secure wireless setups, although unfortunately since there’s no industry
standard for this they all do it differently. An example of this is the use of location-
limited and time-limited channels in Broadcom’s SecureEasySetup, which is used for
secure initialisation of 802.11 WPA devices via a secure-setup pushbutton or an
equivalent mechanism like a mouse click on a PC dialog [43][44]. Since Broadcom
are an 802.11 chipset vendor, anyone using their chipsets has the possibility to
employ this type of simple security setup. Although only minimal technical details
have been published [45], the Broadcom design appears to be an exact
implementation of the type of channel described above. This is a good example of
effective (rather than theoretically perfect) security design. As David Cohen, a senior
product manager at Broadcom, puts it, “The global problem we’re trying to solve is
over 80 percent of the networks out there are wide open. Hackers are going to jump
on these open networks. We want to bring that number down”.
A further extension of the location-limited channel concept provides a secure key
exchange between two portable devices with wireless interfaces. This mechanism
relies for its security on the fact that when transmitting over an open medium, an
opponent can’t tell which of the two devices sent a particular message, but the
devices themselves can. To establish a shared secret, the devices are held together
and shaken while they perform the key exchange, with key bits being determined by
which of the two devices sent a particular message. Since they’re moving around, an
attacker can’t distinguish one device from the other via signal strength measurements
[46]. This is an extremely simple and effective technique that works with an out-of-
the-box unmodified wireless device, providing a high level of security while being
extremely easy to use.
These types of security mechanisms provide both the ease of use that’s necessary in
order to ensure that they’re actually used, and a high degree of security from outside
attackers, since only an authorised user with physical access to the system is capable
of performing the initialisation steps.
Note though that you have to exercise a little bit of care when you’re designing your
location-limited channel. The Bluetooth folks, for example, allowed
anyone (not just
authorised users) to perform this secure initialisation (forced re-pairing in Bluetooth
terminology), leading to the sport of bluejacking, in which a hostile party hijacks
someone else’s Bluetooth device. A good rule of thumb for these types of security
measures is to look at what Bluetooth does for its “security” and then make sure that
you don’t do anything like it.
Security Usability Design
108
References
[1] “In Search of Usable Security: Five Lessons from the Field”, Dirk Balfanz,
Glenn Durfee, Rebecca Grinter, and D.K. Smetters,
IEEE Security and Privacy,
Vol.2, No.5 (September/October 2004), p.19.
[2] “Leading Geeks: How to Manage and Lead the People Who Deliver
Technology”, Paul Glen, David Maister, and Warren Bennis, Jossey-Bass, 2002.
[3] “Scientist: Complexity causes 50% of product returns”, Reuters, 6 March 2006,
http://www.computerworld.com/hardwaretopics/hardware/story/-
0,10801,109254,00.html
.
[4] “Plug-and-Play PKI: A PKI Your Mother Can Use”, Peter Gutmann,
Proceedings of the 12
th
Usenix Security Symposium, August 2003, p.45.
[5] “Trends and Attitudes in Information Security — An RSA Security e-Book”,
RSA Data Security, 2005.
[6] “Inoculating SSH Against Address Harvesting”, Stuart Schechter, Jaeyon Jung,
Will Stockwell, and Cynthia McLain,
Proceedings of the 13
th
Annual Network
and Distributed System Security Symposium (NDSS’06)
, February 2006.
[7] “Securing Record Communications: The TSEC/KW-26”, Melville Klein, Center
for Cryptologic History, National Security Agency, 2003.
[8] “Lessons Learned From the Deployment of a Smartphone-Based Access Control
System”, Lujo Bauer, Lorrie Faith Cranor, Michael Reiter, and Kami Vaniea,
Proceedings of the Third Symposium on Usable Privacy and Security
(SOUPS’07)
, July 2007, p.64.
[9] “Race conditions in security dialogs”, Jesse Ruderman,
http://www.squarefree.com/2004/07/01/race-conditions-
in-security-dialogs/
, 1 July 2004.
[10] “Mozilla XPInstall Dialog Box Security Issue”, Secunia Advisory SA11999,
http://secunia.com/advisories/11999/, 5 July 2004.
[11] “Race conditions in security dialogs”, Jesse Ruderman,
http://archives.neohapsis.com/archives/-
fulldisclosure/2004-07/0264.html
, 7 July 2004.
[12] “Microsoft Internet Explorer Keyboard Shortcut Processing Vulnerability”,
Secunia Research,
http://secunia.com/secunia_research/2005-
7/advisory/
, 13 December 2005.
[13] “Internet Explorer Suppressed “Download Dialog” Vulnerability”, Secunia
Research,
http://secunia.com/secunia_research/2005-
21/advisory/
, 13 December 2005.
[14] “Bugzilla Bug 162020: pop up XPInstall/security dialog when user is about to
click”,
https://bugzilla.mozilla.org/-
show_bug.cgi?query_format=specific&order=relevance+-
desc&bug_status=__open__&id=162020
.
[15] “Disable Extension Install Delay (Firefox)”,
http://kb.mozillazine.org/Disable_Extension_Install_-
Delay_(Firefox)
.
[16] “MR Tech Disable XPI Install Delay”, Mel Reyes,
https://addons.mozilla.org/firefox/775/, 20 Apr 2006.
[17] “MR Tech Disable XPI Install Delay”, 23 March 2007,
https://addons.mozilla.org/firefox/775/.
[18] “Liability and Computer Security: Nine Principles”, Ross Anderson,
Proceedings of the European Symposium on Research in Computer Security
(ESORICS’94)
, Springer-Verlag Lecture Notes in Computer Science No.875,
1994, p.231.
[19] “Florida Standoff on Breath Tests Could Curb Many DUI Convictions”, Lauren
Etter,
The Wall Street Journal, 16 December 2005, p.1.
[20] “Secret Breathalyzer Software Finally Revealed”, Lawrence Taylor, 4
September 2007,
http://www.duiblog.com/2007/09/04/secret-
breathalyzer-software-finally-revealed/
.
[21] “Status Quo Bias in Decision Making”, William Samuelson and Richard
Zeckhauser,
Journal of Risk and Uncertainty, Vol.1, No.1 (March 1988), p.7.
Use of Familiar Metaphors
109
[22] “Do Defaults Save Lives?”, Eric Johnson and Daniel Goldstein, Science,
Vol.302, No.5649 (21 November 2003), p.1338.
[23] “Psychology in Economics and Business: An Introduction to Economic
Psychology”, Gerrit Antonides, Springer-Verlag, 1996.
[24] “Framing, probability distortions, and insurance decisions”, Eric Johnson, John
Hershey, Jacqueline Meszaros, and Howard Kunreuther,
Journal of Risk and
Uncertainty
, Vol.7, No.1 (August 1993), p.35.
[25] “The role of the physician as an information source on mammography”, Lisa
Metsch, Clyde McCoy, H. Virginia McCoy, Margaret Pereyra, Edward Trapido,
and Christine Miles,
Cancer Practice, Vol.6, No.4 (July-August 1998), p.229.
[26] “Anonymity Loves Company: Usability and the Network Effect”, Roger
Dingledine and Nick Mathewson, in “Security and Usability: Designing Secure
Systems That People Can Use”, O’Reilly, 2005, p.547.
[27] “Users are not dependable — how to make security indicators to better protect
them”, Min Wu,
Proceedings of the First Workshop on Trustworthy Interfaces
for Passwords and Personal Information
, June 2005.
[28] “Fighting Phishing at the User Interface”, Robert Miller and Min Wu, in
“Security and Usability: Designing Secure Systems That People Can Use”,
O’Reilly, 2005, p.275.
[29] “Usability Analysis of Secure Pairing Methods”, Ersin Uzun, Kristiina
Karvonen, and N. Asokan,
Proceedings of Usable Security 2007 (USEC’07),
February 2007,
http://www.usablesecurity.org/papers/uzun.pdf.
[30] “Improving Security Decisions with Polymorphic and Audited Dialogs”, José
Brustuloni and Ricardo Villamarin-Salomón,
Proceedings of the 3
rd
Symposium
on Usable Privacy and Security (SOUPS’07)
, July 2007, p.76.
[31] “Race conditions in security dialogs”, Jesse Ruderman,
http://www.squarefree.com/2004/07/01/race-conditions-
in-security-dialogs/
, 1 July 2004.
[32] “Apple Human Interface Guidelines”, Apple Computer Inc, November 2005.
[33] “Programmers are People, Too”, Ken Arnold,
ACM Queue, Vol.3, No.5 (June
2005), p.54.
[34] “Case Study: Thunderbird’s brittle security as proof of Iang’s 3
rd
Hypothesis in
secure design: there is only one mode, and it’s secure”, Ian Grigg, 23 July 2006,
http://financialcryptography.com/mt/archives/000755.ht
ml
.
[35] “GUI Bloopers: Don’ts and Do’s for Software Developers and Web Designers”,
Jeff Johnson, Morgan Kaufmann, 2000.
[36] “The Importance of Usability Testing of Voting Systems”, Paul Herson, Richard
Niemi, Michael Hanmer, Benjamin Bederson, Frederick Conrad, and Michael
Traugott,
Proceedings of the 2006 Usenix/Accurate Electronic Voting
Technology Workshop
, August 2006.
[37] “Re: Intuitive cryptography that’s also practical and secure”, Andrea
Pasquinucci, posting to the cryptography@metzdowd.com mailing list, message-
ID
20070130203352.G[email protected]me, 30 January 2997.
[38] “The Media Equation: How People Treat Computers, Television, and New
Media Like Real People and Places”, Byron Reeves and Clifford Nass,
Cambridge University Press, 1996.
[39] “The Inmates Are Running the Asylum: Why High Tech Products Drive Us
Crazy and How To Restore The Sanity”, Alan Cooper, Sams, 1999.
[40] “About Face 2.0: The Essentials of Interaction Design”, Alan Cooper and Robert
Reimann, Wiley, 2003.
[41] “Principles of Transaction Processing”, Philip Bernstein and Eric Newcomer,
Morgan Kaufman, 1997.
[42] "BEDA: Button-Enabled Device Pairing", Claudio Soriente, Gene Tsudik, and
Ersin Uzun,
First International Workshop on Security for Spontaneous
Interaction (IWSSI'07)
, September 2007.
[43] “Another Take on Simple Security”, Glenn Fleishman, 6 January 2005,
http://wifinetnews.com/archives/004659.html.
Security Usability Design
110
[44] “Under the Hood with Broadcom SecureEasySetup”, Glenn Fleishman, 12
January 2005,
http://wifinetnews.com/archives/004685.html.
[45] “Securing Home WiFi Networks: A Simple Solution Can Save Your Identity”,
Broadcom white paper Wireless-WP200-x, 18 May 2005.
[46] “Shake them Up!: A movement-based pairing protocol for CPU-constrained
devices”, Claude Castelluccia and Pars Mutaf,
Proceedings of the 3
rd
International Conference on Mobile Systems, Applications, and Services
(MobiSys’05)
, June 2005, p.51.
Speaking the User’s Language
111
Security User Interaction
An important part of the security usability design process is how to interact with users
of the security application in a meaningful manner. The following section looks at
various user interaction issues and discusses some solutions to user communications
problems.
Speaking the User’s Language
When interacting with a user, particularly over a topic as complex as computer
security, it’s important to speak their language. To evaluate a message presented by a
security user interface, users have to be both motivated and able to do so. Developers
who spend their lives immersed in the technology that they’re creating often find it
difficult to step back and view it from a non-technical user’s point of view, with the
result that the user interface that they create assumes a high degree of technical
knowledge in the end user. An example of the type of problem that this leads to is the
typical jargon-filled error message produced by most software. Geeks love to
describe the problem, when they should instead be focusing on the solution. While
the maximum amount of detail about the error may help other geeks diagnose the
problem, it does little more than intimidate the average user.
Figure 35: “The experienced user will usually know what's wrong”
On the other hand you should be careful not to present such minimal information that
any resulting decision made by the user, whether a rank beginner or hardcode geek,
as reduced to a coin toss. Figure 35, from the “Ken Thompson’s car” user interface
design school, is an example of such a coin-toss interface.
The easiest way to determine how to speak the user’s language when your application
communicates with them is to ask the users what they’d expect to see in the interface.
Studies of users have shown however that there are so many different ways to
describe the same sorts of things that using the results from just one or two users
would invariably lead to difficulties when other users expecting different terms or
different ways of explaining concepts use the interface.
A better alternative is to let users vote on terminology chosen from a list of user-
suggested texts and then select the option that garners the most votes. A real-world
evaluation of this approach found that users of the interface with the highest-polling
terminology made between
two and five times less mistakes than when they used the
same interface with the original technical terminology, the interface style that’s
currently found in most security applications.
The same study found that after prolonged use, error rates were about the same for
both interfaces, indicating that, given enough time, users can eventually learn more or
less anything... until an anomalous condition occurs, at which point they’ll be
completely lost with the technical interface.
Security User Interaction
112
In a similar vein, consider getting your user manuals written by non-security people
to ensure that the people writing the documentation use the same terminology and
have the same mindset as those using it. You can always let the security people
nitpick the text for accuracy after it’s finished.
Effective Communication with Users
In addition to speaking the user’s language, you also need to figure out how to
effectively communicate your message to them and turn the click, whirr response into
controlled responding in which users react based on an actual analysis of the
information that they’ve been given. Previous sections have pointed out a number of
examples of ineffective user communication which would imply that this is tricky
area to get right, however in this case we’re lucky to have an entire field of research
(with the accompanying industries of advertising and politics) dedicated to the
effective communication of messages. For example social psychologists have
determined that a request made with an accompanying explanation is far more likely
to elicit an appropriate response than the request on its own [1]. So telling the user
that something has gone wrong and that continuing with their current course of action
is dangerous “because it may allow criminals to steal money from your bank account”
is far more effective than just the generic warning by itself.
The text of this message also takes advantage of another interesting result from
psychology research: People are more motivated by the fear of losing something than
the thought of gaining something [2][3]. For example doctors’ letters warning
smokers of the number of years of life that they’d lose by not giving up smoking have
been found to be more effective than ones that describe the number of extra years
they’d have if they do kick the habit [4]. This has interesting ramifications.
Depending on whether you frame an issue as a gain or a loss, you can completely
change people’s answers to a question about it. The theory behind this was
developed by Nobel prize-winners Daniel Kahneman and Amos Tversky under the
name Prospect Theory [5][6]. In Kahneman and Tversky’s original experiment,
subjects were asked to choose between a sure gain of $500 and a 50% chance of
gaining $1000 / 50% chance of gaining nothing. The majority (84%) chose the sure
gain of $500. However, when the problem was phrased in reverse, with subjects
being told they would be given $1000 with a sure loss of $500 or a 50% chance of
losing the entire $1000 / 50% chance of losing nothing, only 31% chose the sure loss,
even though it represented the exact same thing as the first set of choices.
One real-world example of the deleterious effects of this can be seen in a study of the
working habits New York taxi drivers [7] which found that many of the drivers would
set themselves a given earning target each day and quit once they’d reached their
target (setting targets can be very motivating when performing boring or tedious
activities, which is why it’s so popular with people on things like exercise programs).
However, while this can be a great motivator when there’s nothing to be gained or
lost (except for weight in an exercise program), it doesn’t work so well when there’s
more at stake than this. In the case of the taxi drivers, what they were doing was
quitting early when they were making good money, and working longer hours when
they were earning little. If instead they had worked longer hours on good days and
quit early on bad days, their earnings would have increased by 15%. Simply working
the same hours each day would have increased their income by 8% (this result is
directly contrary to supply-side economics, which argues that if you increase wages,
people will work more in order to earn more).
Taking advantage of the findings from Prospect Theory, the previous message was
worded as a warning about theft from a bank account rather than a bland reassurance
that doing this would keep the user’s funds safe. As the discussion of the rather
nebulous term “privacy” in the previous chapter showed, some fundamental concepts
related to security, and users’ views of security, can in fact only be defined in terms
of what users will lose rather than anything that they’ll gain.
An additional important result from psychology research is the finding that if
recipients of such a fear appeal aren’t given an obvious way of coping then they’ll
just bury their heads in the sand and try and avoid the problem [8]. So as well as
Speaking the User’s Language
113
describing the consequences of incorrect action, your message has to provide at least
one clear, unambiguous, and specific means of dealing with the problem. The
canonical “Something bad may be happening, do you want to continue?” is the very
antithesis of what extensive psychological research tells us we should be telling the
user.
Another result from psychology research (although it’s not used in the previous
message example) is that users are more motivated to think about a message if it’s
presented as a question rather than an assertion. The standard “Something bad may
be happening, do you want to continue?” message is an assertion dressed up as a
question. “Do you want to connect to the site even though it may allow criminals to
steal money from your bank account?” is a question that provides users with
appropriate food for thought for the decision that they’re about to make. A button
labelled “Don’t access the site” then provides the required clear, specific means of
dealing with the problem.
A further psychological result that you can take advantage of is the phenomenon of
social validation, the tendency to do something just because other people (either an
authority figure or a significant number of others) have done it before you. This
technique is well-understood and widely used in the advertising and entertainment
industries through tricks such as salting donation boxes and collection trays with a
small amount of seed money, the use of claques (paid enthusiastic audience members)
in theatres, and the use of laugh tracks in TV shows. The latter is a good example of
applying psychology to actual rather than claimed human behaviour: both performers
and the audience dislike laugh tracks, but entertainment companies keep using them
for the simple reason that they work, increasing viewer ratings for the show that
they’re used with. This is because the laugh track, even though it’s obviously fake,
triggers the appropriate click, whirr response in the audience and provides social
validation of the content. Laugh tracks are the MSG of comedy. Even though
audience members, if asked, will claim that it doesn’t affect them, real-world
experience indicates otherwise. The same applies for many of the other results of
psychology research mentioned above — you can scoff at them, but that won’t
change the fact that they work when applied in the field.
You can use social validation in your user interface to guide users in their decision-
making. For example when you’re asking the user to make a security-related
decision, you can prompt them that “most users would do
xyz” or “for most users, xyz
is the best action”, where xyz is the safest and most appropriate choice. This both
provides them with guidance on what to do (which is particularly important in the
common case where the user won’t understand what it is that they’re being asked)
and gently pushes them in the direction of making the right choice, both now and in
the future where this additional guidance may not be available.
If you’d like to find out more about this field, some good starting points are
Influence: Science and Practice by Robert Cialdini, Persuasion: Psychological
Insights and Perspectives
by Timothy Brooks and Melanie Green, and Age of
Propaganda: Everyday Use and Abuse of Persuasion
by Anthony Pratkanis and
Elliot Aronson (if the phishers ever latch onto books like this, we’ll be in serious
trouble).
As with the user interface safety test that was described in the section on safe
defaults, there’s a relatively simple litmus test that you can apply to the messages that
you present to users. Look at each message that you’re displaying to warn users of a
security condition and see if they deal with the responses “Why?” and “So what?”.
For example you may be telling the user that “The server’s identification has changed
since the last time that you connected to it”. So what? “This may be a fake server
pretending to be the real thing, or it could just mean that the server software has been
reinstalled”. So what? “If it’s a fake server then any information that you provide to
it may be misused by criminals. Are you sure that you really want to continue?”.
Finally the user knows why they’re being shown the dialog! The “Why?” and “So
what?” tests may not apply to all dialogs (usually only one applies to any particular
dialog), but if the dialog message fails the test then it’s a good indication that you
need to redesign it.
Security User Interaction
114
Design Example: Connecting to a Server whose Key has Changed
Let’s look at a design exercise for speaking the user’s language in which a server’s
key (which is usually tied to its identity) has changed when the user connects to it.
Many applications will present the user with either too little information (“The key
has changed, continue?”), too much information (a pile of incomprehensible X.509
technobabble, in one PKI usability study
not a single user was able to make any sense
of the certificate information that Windows displayed to them [9] and a SANS
(SysAdmin, Audit, Network, Security Institute) analysis of a phishing attack that
examined what a user would have to go through to verify a site using certificates,
described the certificate dialog as “filled with mind-numbing gobbledygook […]
most of it seemed to be written in a foreign language” [10]), or the wrong kind of
information (“The sky is falling, run away”).
Figure 36: Internet Explorer certificate warning dialog
The standard certificate dialog used by Internet Explorer is shown in Figure 36. The
typical user’s response to this particularly potent latent pathogen will be something
like “What the &*^#@*! is that supposed to mean?”, and this is the improved version
— earlier versions were even more incomprehensible (recognising the nature of this
type of question, pre-release versions of Windows ’95 used the text “In order to
demonstrate our superior intellect, we will now ask you a question you cannot
answer” as a filler where future text was to be added [11]). A few rare users may
click on “View Certificate”, but then they’ll have no idea what they’re supposed to be
looking for there. In any case this additional step is completely pointless since if the
certificate’s contents can’t be verified there’s no point in examining them as the
certificate’s creators could have put anything they wanted in there.
In addition, users have no idea what the certifying authority (CA) that’s mentioned in
the dialog is. In one PKI usability study carried out with experienced computer users,
81% identified VeriSlim as a trusted CA (VeriSlim doesn’t exist), 84% identified
Visa as a trusted CA (Visa is a credit card company, not a CA), and no-one identified
Saunalahden as a trusted CA (Saunalahden is a trusted CA located in Finland) [9].
22% of these experienced users didn’t even know what a CA was, and an informal
survey of typical (non-experienced) users was unable to turn up anyone who knew
what a CA was. In situations like this, applying judgemental heuristics (in other
words guessing) makes perfect sense, since it’s completely unclear which option is
best. Since (from the user’s point of view) the best option is to get rid of the dialog
so that they can get on with their work, they’ll take whatever action is required to
make it go away.
Speaking the User’s Language
115
Finally, the dialog author has made no attempt to distinguish between different
security conditions — a day-old expired certificate is more benign than a year-old
expired certificate, which in turn is more benign that a certificate belonging to
another domain or issued by an unknown CA. The dialog doesn’t even bother to
filter out things that aren’t a problem (“the certificate date is valid”) from things that
are (“the name on the certificate is invalid”). This is simply a convenient (to the
application developer) one-size-fits-all dialog that indicates that something isn’t quite
right somewhere, and would the user like to ignore this and continue anyway. The
only good thing that can be said about this dialog is that the default action is not
‘Yes’, requiring that the user at least move the mouse to dismiss it.
Figure 37: One-size-fits-all password entry
The inability to distinguish between different security levels is endemic to other
browsers as well. For example Firefox uses a single master password to protect all
secrets in the system, whether it’s the password for the Knitting Pattern Weekly or the
password for your online bank account. As shown in Figure 37, users end up either
over-protecting something of little to no value (a long, complex master password used
to protect access to Knitting Pattern Weekly) or under-protecting something of
considerable value (a short, easy-to-type master password used to protect access to
your PayPal account).
A better trade-off would have been to break the master-password control mechanism
into two or even three levels, one with no master password at all for the large number
of sites that require nuisance signups before they’ll allow you to participate, one with
a relatively easy-to-type master password for moderate-value sites, and a high-
security one with a more complex master password that has to be re-entered on each
use, for high-value sites such as online bank account access. Having to explicitly
enter the high-value master password both makes users more aware of the
consequences of their actions and ensures that a master password-enabled action
performed some arbitrary amount of time in the past can’t be exploited later when the
user browses to a completely unrelated (and possibly malicious) site. Finally, since
the browser now knows (via the use of the standard vs. high-value master password
selection) that the user is performing a high-value transaction, it can apply additional
safety checks such as more stringent filtering of what information gets sent where.
Security User Interaction
116
Doing this for every site visited would “break” a lot of sites, but by taking advantage
of the inside knowledge of which sites are considered important by the user, the
browser can only apply the potentially site-breaking extra security measures to the
cases where it really matters.
Internet Explorer 7 finally appears to be taking some steps towards fixing the
incomprehensible certificate warning problem, although it remains to be seen how
effective these measures will really be. For example one of the measures consists of
warning users when they visit suspected phishing sites, even though an AOL UK user
survey found that 84% of users didn’t know what phishing was and were therefore
unlikely to get anything from the warning. Another potential problem with the
proposed changes is a profusion of URL-bar colour-codes for web sites and colour-
code differences between browsers — Firefox uses yellow for SSL-secured sites
while MSIE 7 uses it to indicate a suspected phishing site, so that Firefox users will
think an MSIE 7 phishing site is secured with SSL while MSIE 7 users will think that
SSL-secured sites are phishing sites (this colour-change booby trap is a bit like
changing the meaning of red and green traffic lights in different cities). Finally, there
are plans to display the certificate issuer name in the URL bar alternating with the
certificate subject name, a proposal that has the potential to equal the
<blink> tag
in annoyance value (displaying it as a tooltip would be a better idea), as well as being
more or less meaningless to most users.
Look at the problem from the point of view of the user. They’re connecting to a
server that they’ve connected to many times in the past and that they need to get to
now in order to do their job. Their natural inclination will be to do whatever it takes
to get rid of the warning and connect anyway, making it another instance of the “Do
you want this message to go away” problem presented earlier.
Your user interface should therefore explain the problem to them, for example “The
server’s identification has changed since the last time that you connected to it. This
may be a fake server pretending to be the real thing, or it could just mean that the
server software has been reinstalled. If it’s a fake server rather then any information
that you provide to it may be misused by criminals. Are you sure that you really want
to continue?”. Depending on the severity of the consequences of connecting to a fake
server, you can then allow them to connect anyway, connect in a reduced-
functionality “safe” mode such as one that disallows uploads of (potentially sensitive)
data to the possibly-compromised server and is more cautious about information
coming from the server than usual, or perhaps even require that they first verify the
server’s authenticity by checking it with the administrator who runs it. If you like,
you can also include an “Advanced” option that displays the usual X.509
gobbledegook.
An alternative approach, which is somewhat more drastic but also far more effective,
is to treat a key or certificate verification failure in the same way as a standard
network server error. If the user is expecting to talk to a server in a secure manner
and the security fails, then that’s a fatal error, not just a one-click speed-bump. This
approach has already been adopted by some newer network clients such as Linux’s
xsupplicant and Windows XP’s PEAP (Protected Extensible Authentication
Protocol) client. This is a failure condition that users will instinctively understand,
and that shifts the burden from the user to the server administrators. Users no longer
have to make the judgement call, it’s now up to the server administrators to get their
security right. In an indistinguishable-from-placebo environment this is probably the
only safe way to handle key/certificate verification errors.
There’s also a third alternative that runs the middle ground between these two
extremes, which provides a mechanism for allowing the user to safely accept a new
key or certificate. Instead of allowing the user to blindly click ‘OK’ to ignore the
error condition, you can require that they enter an authorisation code for the new key
or certificate that they can only obtain from the server administrator or certificate
owner, forcing them to verify the key before they enable its use. The “authorisation
code” is a short string of six to eight characters that’s used to calculate an HMAC
(hashed Message Authentication Code, a cryptographic checksum that incorporates
an encryption key so that only someone else who has the key can recreate the
Speaking the User’s Language
117
checksum) of the new key or certificate, with the first two characters being used as
the HMAC key and the remaining characters being the (truncated) HMAC result, as
illustrated in Figure 38. For example if you set the first two characters to “ab” then
computing HMAC( “ab”, key-or-certificate ) will produce a unique HMAC value for
that key or certificate. Taking the base64 encoding of the last few bytes of the
HMAC value (say, “cdefg”) produces the six-character authorisation code “abcdefg”.
When the user enters this value, their application performs the same calculation and
only permits the use of the key or certificate if the calculated values match.
Certificate
HMAC
"ab"
"pqrstuvcdefg"
Certificate
HMAC
"abcdefg"
" . . . . . . . . . . . "
"abcdefg" Compare
Figure 38: Generation of a certificate authorisation code
Obviously this use of an HMAC as a salted hash isn’t terribly secure, but it doesn’t
have to be — what it’s doing is raising the bar for an attacker, changing the level of
effort from the trivial (sending out phishing/spam email) to nontrivial (impersonating
an interactive communication between the key/certificate owner and the user). A
determined attacker can still do this, but their job has suddenly become a whole lot
harder since they now have to control the authorisation side-channel as well.
Incidentally, the reason for using a keyed hash (the HMAC) rather than a standard
hash is that most software already displays a hash of the key to the user, usually
labelled as a fingerprint or thumbprint. If they copied this value across to the
authorisation check, the user could bypass the separate side-channel that they’d
otherwise be forced to use.
One thing that SSH does which SSL/TLS should really copy is keep a record of
whether a trusted domain (that is, a server using SSH or SSL/TLS) has been visited
before, and as an extension how many times it’s been visited before (neither SSH nor
SSL/TLS currently do the latter). With this information at hand the application can
change its behaviour depending on whether this is a first visit, an infrequent visit, or a
frequent visit. For example if the user frequently visits
https://www.paypal.com but is now visiting https://www.paypai.com
for the first time, the application can warn that this is a potentially suspicious site that
the user doesn’t normally visit. This has been shown to significantly increase a user’s
ability to detect spoofed web sites [12]. Because SSL use is infrequent and is
normally only applied to sites where the user has to enter valuable information such
as credit card details, you can take advantage of the fact that the users themselves will
be telling you when to be careful.
If you implement this measure you need to be careful to mask the list of hosts visited
to avoid both privacy concerns and the ability of an attacker who gains access to the
list to perform address-harvesting of the list of known/trusted hosts, a particular
problem with SSH’s
known-hosts mechanism. Various workarounds for this
problem are possible [13], the simplest of which is to store a MAC of the host name
rather than the actual name. Since all that we’re interested in is a presence-check, a
comparison of the MAC value will serve just as well as a comparison of the full
name.
Security User Interaction
118
Design Example: Inability to Connect to a Required Server
A variation of the problem in the previous design example occurs when you can’t
connect to the other system at all, perhaps because it’s down, or has been taken
offline by a DDoS attack, or because of a network outage. Consider for example the
use of OCSP, a somewhat awkward online CRL query protocol, in combination with
a web browser. The user visits a couple of sites with OCSP enabled, and everything
works fine (although somewhat slowly, because of the extra OCSP overhead). Then
they switch to a disconnected LAN, or a temporary network outage affects access to
the OCSP server, or some similar problem occurs. Suddenly their browser is
complaining whenever they try to access SSL sites (such problems are already being
reported with OCSP-enabled browsers like FireFox [14][15]). When they disable
OCSP, everything works again, so obviously there was a problem with OCSP. As a
result, they leave it disabled, and don’t run into any more problems accessing SSL
servers.
The failure pattern that we see here is that this is a feature that provides no directly
visible benefit to the user while at the same time visibly reducing reliability. Since
it’s possible to turn it off and it’s not necessary to turn it on again, it ends up disabled.
The survivability of such a “feature” is therefore quite low.
What the addition of the extra security features has done is make the system
considerably more brittle, reducing its reliability to the lowest common denominator
of the web server and the OCSP server. While we’ve learned to make web servers
extremely reliable, we haven’t yet done the same for OCSP servers, and it’s unlikely
that there’ll ever be much evolutionary pressure to give them the same level of
reliability and performance that web servers enjoy. In fact things seem to be going
very much in the opposite direction: since the OCSP protocol is inherently non-
scalable, a recent performance “enhancement” was to remove protection against man-
in-the-middle attacks, making it possible for a server (or an attacker) to replay an old
response instead of having to generate a new one that reflects the true state of the
certificate [16].
Exactly such a lowest-common-denominator reliability problem has already occurred
with the Windows 2000 PKI implementation. Microsoft hardcoded the URL for a
Verisign CRL server into their software, so that attempts to find a CRL for any
certificate (no matter who the CA actually was) went to this particular Verisign
server. When Verisign reorganised their servers, the URL ceased to function. As a
result, any attempt to fetch a CRL resulted in Windows groping around blindly on the
net for a minute, after which it timed out and continued normally.
In practice this wasn’t such a big problem because CRL checking was turned off by
default so almost no-one noticed, but anyone who did navigate down through all the
configuration dialogs to enable it quickly learned to turn it off again. Another
example is found in some JCE implementations, in which the JVM checks a digital
signature on the provider when it’s instantiated. This process involves some form of
network access, with the results being the same as for the Windows CRL check —
the JVM gropes around for awhile and then times out and continues anyway. All the
user notices of this is the fact that the application stalls for quite some time every time
it starts (one Java developer referred to this process as “being held captive to some
brain-dead agenda” [17]).
This is another example of the Simon Says problem. From the certificate (or site)
owner’s point of view, it’s in their best interests
not to use OCSP, since this reduces
the chances of site visitors being scared away by error messages when there’s a
problem with the OCSP server. The nasty misfeature of this mechanism is that it’s
only when you
enable the use of OCSP that users start seeing indications of trouble
— if you just go ahead and use the certificate without trying to contact the OCSP
server, everything seems to work OK.
To determine how to fix this (or whether it needs fixing at all), it’s instructive to
perform a cost/benefit analysis of the use of OCSP with SSL servers. First of all, it’s
necessary to realise that OCSP can’t prevent most type of phishing attacks. Since
Use of Visual Cues
119
OCSP was designed to be fully compatible with CRLs and can only return a negative
response, it can’t be used to obtain the status of a forged or self-signed certificate.
For example when fed a freshly-issued certificate and asked “Is this a valid
certificate”, it can’t say “Yes” (a CRL can only answer “revoked”), and when fed an
Excel spreadsheet it can’t say “No” (the spreadsheet won’t be present in any CRL).
More seriously, CRLs and OCSP are incapable of dealing with a manufactured-
certificate attack in which an attacker issues a certificate claiming to be from a
legitimate CA — since the legitimate CA never issued it, it won’t be in its CRL,
therefore a blacklist-based system can’t report the certificate as invalid. Finally,
when used with soundalike certificates in secure phishing attacks, the certificate will
be reported as not-revoked (valid) by OCSP (since it was issued by a legitimate CA)
until such time as the phish is discovered, at which point the site will be shut down by
the hosting ISP, making it mostly irrelevant whether its certificate is revoked or not.
The result of this analysis is that there’s no real benefit to the use of OCSP with SSL
servers, but considerable drawbacks in the form of adverse user reaction if there’s a
problem with the OCSP server. The same problem affected the NSA-designed
system mentioned earlier, in which the users’ overriding concern was availability and
not confidentiality/security.
Looking beyond the problems inherent in the use of the OCSP mechanism, we can
use the X.509 CRL reason codes used by OCSP to try and determine whether
revocation checking is even necessary. Going through each of the reason codes, we
find that “key compromise” is unlikely to be useful unless the attacker helpfully
informs the server administrator that they’ve stolen their key, “affiliation changed” is
handled by obtaining a new certificate for the changed server URL, “superseded” is
handled in the same way, and “cessation of operation” is handled by shutting down
the server. In none of these cases is revocation of much use.
No doubt some readers are getting ready to jump up and down claiming that
removing a feature in this manner isn’t really an example of security user interface
design. However, as the analysis shows, it’s of little to no benefit, but potentially a
significant impediment. The reason why OCSP was used in this design example is
because such cases of reduncandy
6
only seem to occur in the PKI world. Outside of
PKI, they’re eliminated by normal Darwinian processes, but these don’t seem to
apply to PKI. So this is an example of a user interface design process that removes
features in order to increase usability instead of adding or changing them.
Use of Visual Cues
The use of colour can play an important role in alerting users to safe/unsafe
situations. Mozilla-based web browsers updated their SSL indication mechanism
from the original easily-overlooked tiny padlock at the bottom of the screen to
changing the background colour of the browser’s location bar when SSL is active and
the certificate is verified, as shown in Figure 39 (if you’re seeing this on a black-and-
white printout, the real thing has a yellow background). Changing the background
colour or border of the object that the user is looking at or working with is an
extremely effective way of communicating security information to them, since they
don’t have to remember to explicitly look elsewhere to try and find the information.
The colour change also makes it very explicit that something special has occurred
with the object that’s being highlighted (one usability study found that the number of
users who were able to avoid a security problem doubled when different colours were
used to explicitly highlight security properties).
Figure 39: Unambiguous security indicators for SSL
6
“Redundancy” is a term used to refer to fault-prone systems run in parallel so that if one fails another can take
over. “Reduncandy” refers to fault-prone systems run in series so that a fault in any will bring them all down.
Security User Interaction
120
When you do this, you need to take care to avoid the angry-fruit-salad effect in which
multiple levels of security indicator overlap to do little more than confuse the user.
For example a copy of Firefox with various useful additional security plugins
installed might have a yellow URL bar from Firefox telling the user that SSL is in
use, a red indicator from the Petnames plugin telling the user that it’s an unrecognised
site, a green indicator from the Trustbar plugin telling the user that they’ve been there
before, and another yellow indicator from an OCSP responder indicating the that
OCSP status isn’t available.
When you’re using colour or similar highlighting methods in your application,
remember that the user has to be aware of the significance of the different colours in
order to be able to make a decision based on them, that some users may be colour-
blind to particular colour differences, and that colours have different meanings across
different cultures. For example the colour red won’t automatically be interpreted to
indicate danger in all parts of the world, or its meaning as a danger/stop signal may
work differently in different countries. In the UK, heavy machinery is started with a
green button (go) and stopped with a red button (stop). Across the channel in France,
it’s started with a red button (a dangerous condition is being created) and stopped
with a green button (it’s being rendered safe). Similar, but non-colour-based,
indicator reversals occur for applications like software media players, where some
players use the
► ‘Play’ symbol to indicate that content is now being played back,
while others use it to indicate that content will be played back if the symbol is clicked
(without firing it up to check, can you say which option your media player application
or applications use?).
When it comes to colour-blindness, about 8% of the population will be affected, with
the most common type being partial or complete red-green colour-blindness (in case
you’re wondering how this works with traffic lights, they have a fixed horizontal
ordering so that colour-blind people still have some visual indication through the
position of the light that’s lit). Ensuring that your interface also works without the
use of colour, or at least making the colour settings configurable, is one way of
avoiding these problems [18]. If you ever get a chance to compare the Paris and
London underground/tube/subway maps, see if you can guess which one was
designed with colour-blind users in mind.
Here’s a simple way of handling visual indications for colour-blind users. Use the
configuration dialog shown in Figure 40, which provides a simple, intuitive way of
letting colour-blind users choose the colour scheme that provides the best visual
indication of a particular condition.
Danger
Caution
Safe
Which of these looks right?
Figure 40: Visual cue colour chooser
Another way to handle colour issues, which works if there are only one or two
colours in use (for example to indicate safe vs. unsafe) and the colour occurs in a long
band like a title bar, is to use a colour gradient fading from a solid colour on one side
to a lighter shade on the other. This makes the indicator obvious even to colour-blind
users.
Indicators for blind or even partially sighted users are a much harder problem. Blind
users employ Braille keyboards and readers (which translate onscreen text
electromechanically into Braille’s raised dots) or text-to-speech software that scans
onscreen text and can also announce the presence of certain user interface events like
drop-down menus and option tabs.
Use of Visual Cues
121
Since virtually all security indicators are visual, this makes them almost impossible to
use for blind users. Paradoxically, the ones that do work well with screen-reader
software are the ones that are typically embedded in web page content by phishing
sites (or US banks). This is a nasty catch-22 situation because in order to be non-
spoofable the security user interface elements have to be customised and therefore
more or less inaccessible to screen readers that read out the principal content of a
window but generally don’t pay any attention to any further user interface
pyrotechnics happening on the periphery in order to avoid overloading the user with
noise. Imagine how painful web browsing would be if the screen-reader software had
to announce things like “the URL bar is displaying the value ‘http://www.amazon.-
com/Roadside-Picnic-Collectors-Arkadi-Strugatsky/dp/0575070536/ref=-
pd_bbs_sr_1/002-6272379-7676069?ie=UTF8&s=books&qid=1186064937&sr=1-1’
in black text over a light yellow background with the middle portion in boldface
while the rest is in a standard font” (this is how Firefox 2 displays security indicators
in its URL bar). Now extend this to cover all of the other eye candy that’s produced
by a browser when visiting a web site and you can see that trying to interpret
conventional security indicators would quickly cause the web browsing experience to
grind to a halt.
There doesn’t seem to be any work available on security user interface design for
blind users (if you’re an academic reading this, take it as an interesting research
opportunity). The most usable option is probably something like TLS-PSK, whose
totally unambiguous “yes, with both sides authenticated” or “no” outcome doesn’t
rely on user interpretation of GUI elements or other leaps of faith. Even this though
would require some cooperation (or at least awareness) from screen reader software
that can indicate that a secure (rather than spoofed via a web page) interface element
is in effect.
Figure 41: Unprotected login screen
Visual cues can also be used to provide an indication of the absence of security,
although how to effectively indicate the absence of a property is in general a hard
problem to solve (see the earlier discussion on the Simon Says problem). For
example password-entry fields in dialog boxes and web pages always blank out the
Security User Interaction
122
typed password (typically with asterisks or circles) to give the impression that the
password is secret and/or protected in some manner. Even if the password is sent in
the clear without any protection (which is the case for many web pages, see Figure
41), it’s still blanked out in the user display. Conversely, information such as credit
card numbers, which are usually sent encrypted over SSL connections, are displayed
to the user. By not blanking the password field when there’s no protection being used
(see Figure 42), you’re providing instant, unmistakeable feedback to the user that
there’s no security active.
Figure 42: Unprotected login screen, with (in)security indicators
The fact that their password is being shown in the clear will no doubt make many
users nervous, because they’ve been conditioned to seeing it masked out. However,
making users nervous is exactly what this measure is meant to do: a password
displayed in this manner may now be vulnerable to shoulder surfing, but it’s even
more vulnerable to network sniffing and similar attacks (this is disregarding the
question of why a user would be accessing sensitive information in a password-
protected account in an environment that’s vulnerable to shoulder-surfing in the first
place).
Displaying the password in the clear makes real and present what the user cannot see,
that there’s no security active to protect either their password or any sensitive
information that the password will unlock. To avoid adverse user reactions, you
should add a tooltip “Why is my password showing?” to the password-entry box
when the password isn’t masked (see Figure 42), explaining to users what’s going on
and the potential consequences of their actions (tooltips have other names in different
environments, for example OS X calls them help tags). The tooltips act as a clue box
in this type of application.
Although studies of users have shown that they completely ignore tooltips in
(equally-ignored) user interface elements like security toolbars [19], it’s only acting
as an optional explanatory element in this case, so it doesn’t matter if users ignore it
or not. In any case since the non-masked password has already got their attention and
they’ll be after an explanation for its presence, the tooltip provides this explanation if
they need it.
This combination of measures provides both appropriate warning and enough
information for the user to make an informed decision about what to do next.
Use of Visual Cues
123
Figure 43: TargetAlert displaying browser link activation details
Tooltip-style hints are useful in other situations as well. For example you can use
them on mouseover of a screen element to provide additional security-relevant
information about what’ll happen when the user activates that element with the
mouse. An example of this type of behaviour is shown in Figure 43, in which the
TargetAlert plugin for the Mozilla web browser is indicating that clicking on the link
will cause the browser to hang trying to load the Adobe Acrobat plugin. TargetAlert
has other indicators to warn the user about links that are executable, pop up new
windows, execute Javascript, and so on.
Figure 44: Slashdot displaying the true destination of a link
A variation of this technique is used by the Slashdot web site to prevent link
spoofing, in which a link that appears to lead to a particular web site instead leads to a
completely different one. This measure, shown in Figure 44, was introduced to
counter the widespread practice of having a link to a supposedly informational site
lead instead to the (now-defunct)
goatse.cx (a site that may euphemistically be
described as “not work-safe”), the Internet equivalent of a whoopee cushion. A
similar such simple measure, displaying on mouseover the domain name of the site
that a link leads to, would help combat the widespread use of disguised links in
phishing emails.
<form action="http://www.bankofamerica.com">
<input type="password" name="password">
<input type="submit" value="submit"
unclick='this.form.action="http://www.phishing.com"'>
</form>
Figure 45: User interface spoofing using Javascript
When you use measures like this, make sure that you display the security state in a
manner that can’t be spoofed by an attacker. For example web browsers are
vulnerable to many levels of user interface spoofing using methods such as HTML to
change the appearance of the browser or web page, or Javascript or XUL to modify or
over-draw browser UI elements. An example of this type of attack, which uses
Javascript to redirect a typed password to a malicious web site, is shown in Figure 45.
A better-known example from the web is the use of cross-site scripting (XSS), which
allows an attacker to insert Javascript into a target’s web page. One such attack,
employed against financial institution sites like Barclay’s Bank and MasterCard,
allowed an attacker to deliver their phishing attack over SSL from the bank’s own
secure web server [20]. To protect against these types of attack, you should ensure
that your security-status display mechanism can’t be spoofed or overridden by
external means.
Security User Interaction
124
Design Example: TLS Password-based Authentication
A useful design exercise for visual cues involves the use of TLS’ password-based
failsafe authentication (TLS-PSK). What’s required to effectively apply this type of
failsafe authentication are three things:
1. A means of indicating that TLS-PSK security is in effect, namely that both
client and server have performed a failsafe authentication process.
2. An unmistakeable means of obtaining the user password that can’t be
spoofed by something like a password-entry dialog on a normal web page.
3. An unmistakeable link between the TLS-PSK authentication process and the
web page that it’s protecting.
The obvious way to meet the first requirement is to set the URL bar to a distinctive
colour when TLS-PSK is in effect. For TLS-PSK we’ll use light blue to differentiate
it from the standard SSL/TLS security, producing a non-zero Hamming weight for the
security indicators. Using an in-band indicator (for example something present on the
web page) is a bad idea, both because as the previous section showed it’s quite easily
spoofable by an attacker, and because usability tests on such an interface have shown
that users just consider it part of the web page and don’t pay any attention to it [9].
Unfortunately a number of usability tests (discussed elsewhere) have shown that
simply colouring the URL bar isn’t very effective, both because users don’t notice it
and, if they do, they have no idea what the colouring signifies. This is where the
second and third design elements come in.
Figure 46: Non-spoofable password-entry dialog
To meet the second and third requirements, instead of popping up a normal password-
entry dialog box in front of the web page (which could be coming from hostile code
on the web page itself), we make the blue URL bar zoom out into a blue-tinted or
blue-bordered password-entry dialog, and then zoom back into the blue URL bar once
the TLS-PSK authentication is complete. The Camino browser for OS X already uses
a non-spoofable interface of this kind, as shown in Figure 46. When the browser
requests a password from the user, the password-entry dialog scrolls out of the
browser title bar (outside the normal display area of the browser) in a manner and at a
location that no web content can emulate (since this is a complex animation, the
single static image of the dialog’s final form and location shown above doesn’t really
do it justice). An additional benefit of pinning the password-entry dialog to the
window that it corresponds to is that it can’t be confused with input intended for
another window, as a standard floating password dialog can.
This process creates a clear indication even for novice users of a connection between
the URL bar indicating that TLS-PSK security is in effect, the TLS-PSK password-
entry system, and the final result of the authentication. The user learning task has
been simplified to a single bit, “If you don’t see the blue indicators and graphical
effects, run away”.
Use of Visual Cues
125
Finally, this authentication mechanism is an integral part of the critical action
sequence. If it’s implemented as described above then you can’t do TLS-PSK
authentication without being exposed to the security interface. Unlike the certificate
check in standard SSL/TLS security, you can’t choose to avoid it, and as the
discussion of users’ mental models in a previous section showed, it matches users’
expectations of security: When TLS-PSK is in effect, entering your using name and
password as a site authenticity check is perfectly valid since only the genuine site will
be able to authenticate itself by demonstrating prior knowledge of the name and
password. A fake site won’t know the password in advance and therefore won’t be
able to demonstrate its TLS-PSK credentials to the user.
Design Example: Other Password Protection Mechanisms
TLS-PSK is the most powerful password mechanism, but sometimes the need for
compatibility with legacy systems means that it’s not possible to employ it. There are
however a variety of alternatives that you can use that go beyond the current “hand
over the user’s password to anyone who asks for it” approach. These alternatives
work by adding an extra layer of indirection to the password-entry process, sending to
the remote system not the actual user password but some unrelated value specific to
that particular system. So for example a user password of “mypassword” might
translate to a Hotmail password of “5kUqedtM2I”, a PayPal password of
“Y6WOMZuWLG”, and an Amazon password of “xkepKEoVOG”. This concept
has been around for quite some time, going back more than ten years to the Lucent
Personalised Web Assistant, which used a master password supplied to a proxy server
(this was before browser plugins) [21][22][23].
The advantage of this extra level of indirection is that it provides password
diversification. Every site gets its own unique, random password, so that if one of
these derived passwords is ever compromised it won’t affect the security of any other
site. Password diversification is an important element in protecting user passwords,
since users tend to re-use the same password across multiple sites, with one survey
finding that 96% of users reused passwords [24] and another survey of more than
3,000 users finding that more than half used the same password for all their accounts
[25]. This password cross-pollination practice makes it easy for attackers to perform
leapfrog attacks in which they obtain the password for a low-value site that users
don’t take much care to protect and then use it to access a high-value site. The fact
that attackers are making use of this has been confirmed by the phishers themselves
[26].
An additional benefit to password diversification is that since the derived password is
unrelated to the user’s actual password, they still get to use strong passwords on every
site even if their master password is relatively weak. Finally, this approach provides
a good deal of phishing protection. Since passwords are site-specific, a phishing site
will be sent a password that’s completely unrelated to the one used at the site that it’s
impersonating.
Security User Interaction
126
5vGAEXr2hP
NsVcW4Z2Is
GejeSF98Ot
Site 1 PW
Site 2 PW
Site
n PW
Encrypted
password wallet
Master password
Site 1
Site 2
Site n
Figure 47: Password diversification using a password wallet
There are two possible approaches to password diversification. The first one is the
password wallet technique shown in Figure 47. The user-supplied master password is
used to decrypt the wallet, which contains per-site random passwords generated by
the application. When the user wants to access a site, the application looks up the
appropriate password by server name or URL. If it’s a site that the user hasn’t visited
before, the application can warn the user (in case it’s a phishing site) and if they’re
sure that they want to continue, create a new random password for them.
Master password +
salt
Hash
Hash
Hash
Site 1 URL Site 1 PW
Site 2 URL
Site n URL
Site 2 PW
Site n PW
Figure 48: Password diversification using hashing
The second approach to password diversification is shown in Figure 48. The user-
supplied master password is hashed with a random salt value and the server name/site
URL to again provide a site-specific password unrelated to the original user
password. The random salt makes it impossible for an attacker to guess the user’s
master password if they acquire one of the site passwords. An additional level of
diversification involves hashing the application name into the mix, making the
password application-specific as well as site-specific. In this way a compromise of
(for example) an SSL/TLS password doesn’t compromise the corresponding SSH
password.
As with the password wallet approach, this approach guarantees that the spoofed
phishing site can never obtain the password for the site that it’s impersonating.
Use of Visual Cues
127
Master password +
salt
Hash
Master secret
HashSite URL Site PW
Figure 49: Extra protection for the master password
An enhancement of this technique that provides extra protection against offline
password-guessing attacks adds a pre-processing step that converts the master
password into a master secret value via a lengthy iterated hashing process and then
uses the master secret to generate site passwords instead of applying the master
password directly [27]. This additional step is shown in Figure 49, and would
typically be done when the software is first installed. By adding this one-off
additional step, the time for an attacker to guess each password becomes the sum of
the lengthy initial setup time and the quick per-site password generation time. In
contrast a legitimate user only experiences the quick per-site password generation
time.
Astoundingly, these obvious approaches to password protection, which date back
more than ten years, aren’t used by any current password-using application. All of
them simply connect to anything listening on the appropriate port and hand over the
user-entered password (most browsers implement some form of password-storage
mechanism, but that just records user-entered passwords rather than managing
randomly generated, un-guessable, site-specific ones). The level of interest in this
style of password management is demonstrated by the existence of at least half a
dozen independently-created Firefox browser plugins that retroactively add this
functionality [28], but despite positive third-party evaluations such as “[PwdHash] is
so seamless that were it installed in every browser since the foundation of the web,
users would notice virtually no difference aside from improved security” [27], no
browser supports this functionality out of the box. The existence of these various
implementations in the form of browser plugins does however provide a nice
opportunity for evaluating the usability of various approaches to solving the
password-management problem.
The chapter on security usability testing contains further information on requirements
for password interfaces.
Design Example: Strengthening Passwords against Dictionary Attacks
One slight drawback to passwords is that they’re vulnerable to dictionary attacks, in
which an attacker tries every possible word in the dictionary in the hope that one of
them is what the user is using as a password. If you’re using one of the password-
diversification schemes described above, this becomes a great deal more difficult
since the passwords are completely random strings and so dictionary attacks don’t
work any more. However, there may be cases where you can’t do this (for example
when the user enters their master password) where it would be good to have some
form of protection against a dictionary attack.
Security User Interaction
128
The master password unlocks the access codes used to
secure your accounts. Enter it below and then re-enter it a
second time to confirm it.
Set Password Cancel
Set Master Password
Password
Confirm Password
To strengthen your password protection, you can add a
processing delay to the password each time that it's entered.
The longer the delay, the stronger the protection.
0 s 5 s 10s
Processing delay
Figure 50: Master password entry dialog box
One standard technique for strengthening password protection against dictionary
attacks is to iterate the hashing to slow down an attacker. A means of building this
into your user interface is shown in Figure 50. This gives the user the choice of a
small number of iterations and correspondingly lower dictionary-attack resistance for
impatient users, or a larger number of iterations and higher dictionary-attack
resistance for more patient users. To determine the correlation between hashing
iterations and time, have your application time a small, fixed number of iterations
(say 1000) and then use the timing information to mark up the slider controls. This
way of doing things has the advantage over hard-coding in a fixed number of
iterations that as computers get faster, the attack resistance of the password increases.
Conversely, it doesn’t penalise users with less powerful machines.
Leave the default processing time at 1 second. Most users won’t change this, and it’s
short enough not to be a noticeable inconvenience. If you want to provide more
meaningful feedback on what the delay is buying the user, you can also add an
estimate of the resulting protection strength below the slider: “One day to break”,
“One week to break”, and so on.
Design Example: Enhanced Password-entry Fields
The practice of blanking out password-entry fields arises from a thirty- to forty-year-
old model of computer usage that assumes that users are sitting in a shared terminal
room connected to a mainframe. In this type of environment, depicted in Figure 51,
printing out the user’s password on the hardcopy terminal (a “terminal” being a
keyboard/keypunch and printer, video displays existed only for a select few
specialised graphics devices) was seen as a security risk since anyone who got hold of
the discarded printout would have been able to see the user’s password.
Use of Visual Cues
129
Figure 51: The reference model for Internet user authentication
(Image courtesy Alcatel-Lucent)
When CRT-based terminals were introduced in the mid-1970s, the practice was
continued, since shoulder surfing in the shared mainframe (or, eventually,
minicomputer) terminal room was still seen as a problem, especially since the line-
mode interface meant that it would take awhile before the displayed input scrolled off
the screen.
The two paragraphs above show just how archaic the conceptual password-entry
model that we still use today actually is. Tell any current user about using teletypes
to communicate with computers and they’ll give you the sort of look that a cow gives
an oncoming train, and yet this is the usage model that password-entry dialogs are
built around. Such a model was perfectly acceptable thirty years ago when users
were technically skilled and motivated to deal with computer quirks and peculiarities,
the password-entry process provided a form of location-limited channel (even basic
communication with the computer required that the user demonstrate physical access
to the terminal room), and users had to memorise only a single password for the
computer that they had access to.
Today none of these conditions apply any more. What’s worse, some of these
historic design requirements play right into the hands of attackers. Blanking out
entered passwords so that your model 33 teletype doesn’t leave a hardcopy record for
the next user means that you don’t get any feedback about the password that you’ve
just entered. As a result, if the system tells users that they’ve entered the wrong
password, they’ll re-enter it (often several times), or alternatively try passwords for
other accounts (which in the password-entry dialog all appear identical) on the
assumption that they unthinkingly entered the password for the wrong account. This
practice, which was covered in a previous chapter, is being actively exploited by
phishers in man-in-the-middle attacks, and to harvest passwords for multiple accounts
in a single attack.
So how can we update the password-entry interface and at least bring it into the
1980s? The problem here is that the (usually) unnecessary password blanking
removes any feedback to the user about the entered password. While it could be
argued that performing no blanking at all wouldn’t be such a bad approach since it
might help discourage users from doing online banking in random Internet cafes, in
practice we need to provide at least some level of comfort blanking to overcome
users’ deeply-ingrained conditioning that a visible password isn’t protected while a
blanked one is (this fallacy was examined in a previous chapter).
Security User Interaction
130
Apple usability guru Bruce Tognazzini has come up with a nice way to handle this
using a rolling password blackout. With a rolling blackout, the entered password
characters are slowly faded out so that the last two or three characters are still visible
to some degree, but after that point they’ve been faded/masked out to the usual
illegible form. As reported in the user evaluation results for this design, “users were
able to comfortably and accurately detect errors, while eavesdropping failed’ [29].
This type of password handling, which only became possible with the more
widespread use of graphical interfaces in the 1980s and 1990s, is a nice tradeoff
between user comfort and security functionality.
Even this level of protection isn’t actually necessary in many environments. What’s
the eavesdropping threat on a home user’s password that blanking is protecting
against? Their cat? On the other hand home users are exactly the ones who’ll be
most vulnerable to phishing attacks that play on the weaknesses of the password-
entry model that’s in use today. Some applications like Lotus Notes carry this to
ridiculous lengths, not only masking out the password characters but echoing back a
random number of (blanked) characters for each one typed, which actually creates
artificial typos even when the user manages to enter the password correctly.
Other applications that store passwords for the user make it almost impossible for the
user, the putative owners of the data, to see them. For example Firefox first requires
that users jump through multiple sets of hoops to see their own passwords, and then
removes the ability to copy them to another application, requiring that users manually
re-type them into the target window, an issue that helped kill multilevel secure (MLS)
workstations in the 1980s. More recently, the same issue dissuaded users from
employing a password manager plugin for Firefox since it made them feel that they’d
lost control over their own passwords, a problem explored in more detail in the
chapter on usability testing.
Hopeless Causes
An earlier design example looked at revocation checking in SSL/TLS and concluded
that, as implemented via CRLs or OCSP, it offered no actual benefits but had
considerable drawbacks. There are numerous other security usability situations in
which, to quote the computer in the movie
Wargames, “the only winning move is not
to play”.
Consider the use of threshold schemes for key safeguarding. A threshold scheme
allows a single key to be split into two or more shares, of which a certain number
have to be recombined to recover the original key. For example a key could be split
into three shares, of which any two can be recombined to recover the original.
Anyone who gains possession of just a single share can’t recover the original key.
Conversely, if one of the shares is lost, the original key can be recovered from the
other two. The shares could be held by trustees or locked in a bank vault, or have any
of a range of standard, established physical controls applied to them.
Threshold schemes provide a high degree of flexibility and control since they can be
made arbitrarily safe (many shares must be combined to recover the key) or
arbitrarily fault-tolerant (many shares are distributed, only a few need to be
recombined to recover the key). In addition they can be extended in various fancy
ways, for example to allow shareholders to vote out other shareholders who may no
longer be trusted, or to recreate lost shares, or to allow arbitrary boolean expressions
like ‘A and (B or C)’ for the combination of shares.
Lets consider just the simplest case, a basic
m-of-n threshold scheme where any m
shares of a total of n will recover the key. What sort of user interface would you
create for this?
There are actually two levels of interface involved here, the programming interface in
which the application developer or user interface designer talks to the crypto layer
that implements the threshold scheme, and the user interface layer in which the
threshold scheme is presented to the user. At the crypto API layer, a typical operation
might be:
encrypt( message, length, key );
Use of Visual Cues
131
or:
sign( message, length, key );
Now compare this to what’s required for a threshold scheme:
“add share 7 of a total of 12, of which at least 8 are needed,
returning an error indicating that more shares are required”
with a side order of:
“using 3 existing valid shares, vote out a rogue share and regenerate
a fresh share to replace it”
if you want to take advantage of some of the more sophisticated features of threshold
schemes. If you sit down and think about this for a while, the operations are quite
similar to what occurs in a relational database, or at least what a database API like
ODBC provides. Obviously full ODBC is overkill, but the data representation and
access model used is a reasonably good fit, and it’s an established, well, defined
standard.
That immediately presents a problem: Who would want to implement and use an
ODBC complexity-level API just to protect a key? And even if you can convince a
programmer to work with an API at this level of complexity, how are you going to fit
a user interface to it?
The closest real-world approximation that we have to the process of applying
threshold scheme-based shares to crypto keying is the launch process for a nuclear
missile, which requires the same carefully choreographed sequence of operations all
contributing to the desired (at least from the point of view of the launch officer)
effect. In order to ensure that users get this right, they are pre-selected to fit the
necessary psychological profile and go through extensive training and ongoing drills
in mock launch centres, with evaluators scrutinising every move from behind one-
way mirrors. In addition to the normal, expected flow of operations, these training
sessions expose users to a barrage of possible error and fault conditions to ensure that
they can still operate the equipment when things don’t go quite as smoothly as
expected.
This intensive drilling produces a Pavlovian conditioning in which users
mechanically iterate through pre-prepared checklists that cover each step of the
process, including handling of error conditions. Making a single critical error results
in a lot of remedial training for the user and possibly de-certification, which causes
both loss of face and extra work for their colleagues who have to work extra shifts to
cover for them.
Unfortunately for user interface designers, we can’t rely on being able to subject our
users to this level of training and daily drilling. In fact for the typical end user they’ll
have no training at all, with the first time that they’re called on to do this being when
some catastrophic failure has destroyed the original key and it’s necessary to recover
it from its shares.
So we’re faced with the type of task that specially selected, highly trained, constantly
drilled military personnel can have trouble with, but we need to somehow come up
with an interface that makes the process usable by arbitrary untrained users. Oh, and
since this is a security procedure that fails closed, if they get it wrong by too much
then they’ll be locked out, so it has to more or less work on the first try or two.
This explains why no-one has ever seriously deployed threshold scheme-based key
safeguarding outside of a few specialised crypto hardware modules where this may be
mandated by FIPS requirements [30]. The cognitive load imposed by this type of
mechanism is so high that it’s virtually impossible to render it practical for users, with
even the highly simplified “insert tab A in slot B” mechanisms employed by some
crypto hardware vendors (which usually merely XOR two key parts together to
recover the original rather than using a full threshold scheme) reportedly taxing users
to their limits, to the extent that they’re little-used in practice.
Security User Interaction
132
There are a great many (non-computer-related) examples of geeks becoming so
caught up in the technology that they forget the human angle. For example when the
UK embarked on its high-speed train project, the engineers came up with an
ingenious scheme that avoided the need to lay expensive cambered track along the
entire route as had been done in other countries. Instead, they designed a train whose
carriages could lean hydraulically into corners. Unfortunately when the train was
eventually tested the passengers indicated that being seasick on a train was no more
enjoyable than on a ship, and the project was abandoned.
Although there’s a natural geek tendency (known as the “reindeer effect”) to dive in
and start hacking away at an interesting problem, some problems just aren’t worth
trying to solve because there’s no effective solution. In this case, as the
Wargames
computer says, the only winning strategy is not to play.
Legal Considerations
As the earlier section has already pointed out, when you’re designing your user
interface you need to think about the legal implications of the messages that you
present to the user [31]. Aside from the problem of failing to adequately protect users
already covered earlier, you also have to worry about over-protecting them in a way
that could be seen as detrimental to their or a third party’s business. If your security
application does something like mistakenly identify an innocent third party’s software
as malicious, they may be able to sue you for libel, defamation, trade
libel/commercial disparagement, or tortious interference, a lesser-known adjunct to
libel and defamation in which someone damages the business relationship between
two other parties. For example if your application makes a flat-out claim that a
program that it’s detected is “spyware” (a pejorative term with no widely- accepted
meaning) then it had better be
very sure that it is in fact some form of obviously
malicious spyware program. Labelling a grey-area program such as a (beneficial to
the user) search toolbar with assorted (not necessarily beneficial to the user)
supplemental functionality as outright spyware might make you the subject of a
lawsuit, depending on how affronted the other program’s lawyers feel.
This unfortunate requirement for legal protection leads to a direct conflict with the
requirement to be as direct with the user as possible in order for the message to sink
in. Telling them that program XYZ that your application has detected may possibly
be something that, all things considered, they’d prefer not to have on their machine,
might be marvellous from a legal point of view but won’t do much to discourage a
user from allowing it onto their system anyway.
There are two approaches to addressing this inherent conflict of interests. The first
(which applies to any security measures, not just the security user interface) is to
apply industry best practice as much as possible. For example if there’s a particular
widely-used and widely-accepted classification mechanism for security issues then
using that rather than one that you’ve developed yourself can be of considerable help
in court. Instead of having to explain why your application has arbitrarily declared
XYZ to be malicious and prevented it from being installed, you can fall back on the
safety net of accepted standards and practices, which makes a libel claim difficult to
support since merely following industry practice makes it hard to claim deliberate
malicious intent.
A related, somewhat weaker defence if there are no set industry standards is to
publicise the criteria under which you classify something as potentially dangerous. In
that case it’ll be more difficult to sue over a false positive because you were simply
following your published policies, and not applying arbitrary and subjective
classification mechanisms.
The second defence is to use weasel-words. As was mentioned above, this is rather
unfortunate, since it diminishes the impact of your user interface’s message on the
user. If you’re not 100% certain then instead of saying “application XYZ from XYZ
Software Corporation is adware”, say “an application claiming to be XYZ from XYZ
Software Corporation may produce unwanted pop-up messages on your system” (it
may be only pretending to be from XYZ Software Corporation, or the pop-up
Use of Visual Cues
133
messages could be marginally useful so that not all users would immediately perceive
them as unwanted). Since spamware/spyware/adware vendors try as hard as possible
to make their applications pseudo-legitimate, you have to choose your wording very
carefully to avoid becoming a potential target for a lawsuit. The only thing that saved
SpamCop in one spammer-initiated lawsuit was the fact that they merely referred
complaints to ISPs (rather than blocking the message) and included a disclaimer that
they couldn’t verify each and every complaint and that it might in fact be an
“innocent bystander” [32], which is great as a legal defence mechanism but less
useful as a means of effectively communicating the gravity of the situation to a user.
One simply way of finding the appropriate weasel-words (which was illustrated in the
example above) is to describe the properties of a potential security risk rather than
applying some subjective tag to it. Although there’s no clear definition of the term
“adware”, everyone will agree that it’s a pejorative term. On the other hand no-one
can fault you for saying that the application will create possibly unwanted pop-up
messages. The more objective and accurate your description of the security issue, the
harder it will be for someone to claim in court that it’s libellous. This technique
saved Lavasoft (the authors of the popular Ad-Aware adware/spyware scanner) in
court [33]. The downside to this approach is that it’s now up to the user to perform
the necessary mental mapping from “potentially unwanted popups” to “adware” (a
variant of the
bCanUseTheDamnThing problem), and not all users will be able to
do that.
References
[1] “The mindlessness of ostensibly thoughtful action: The role of ‘placebic’
information in interpersonal interaction”, Ellen Langer, Arthur Blank, and
Benzion Chanowitz,
Journal of Personality and Social Psychology, Vol.36,
No.6 (June 1978), p.635.
[2] “Gain-Loss Frames and Cooperation in Two-Person Social Dilemmas: A
Transformational Analysis”, Carsten de Dreu and Christopher McCusker,
Journal of Personality and Social Psychology, Vol.72, No.5 (1997), p.1093.
[3] “The framing of decisions and the psychology of choice”, Amos Tversky and
Daniel Kahneman,
Science, Vol.211, No.4481 (30 January 1981), p.453.
[4] “Framing of decisions and selection of alternatives in health care”, Dawn
Wilson, Robert Kaplan, and Lawrence Schneiderman,
Social Behaviour, No.2
(1987). p.51.
[5] “Prospect Theory: An Analysis of Decision under Risk”, Daniel Kahneman and
Amos Tversky,
Econometrica, Vol.47, No.2 (March 1979), p.263.
[6] “Against the Gods: The Remarkable Story of Risk”, Peter Bernstein, John Wiley
and Sons, 1998.
[7] “Taxi Drivers and Beauty Contests”, Colin Camerer,
Engineering and Science,
California Institute of Technology,
Vol.60, No.1 (1997), p.11.
[8] “Age of Propaganda: Everyday Use and Abuse of Persuasion”, Anthony
Pratkanis and Elliot Aronson, W.H.Freeman and Company, 1992.
[9] “Design Principles and Patterns for Computer Systems That Are Simultaneously
Secure and Usable”, Simson Garfinkel, PhD thesis, Massachusetts Institute of
Technology, May 2005.
[10] “Phollow the Phlopping Phish”, Tom Liston, 13 February 2006,
http://isc.sans.org/diary.html?storyid=1118.
[11] “In order to demonstrate our superior intellect, we will now ask you a question
you cannot answer”, Raymond Chen,
http://blogs.msdn.com/oldnewthing/-
archive/2004/04/26/120193.aspx
, April 2004.
[12] “Phishing with Rachna Dhamija”, Federico Biancuzzi, 19 June 2006,
http://www.securityfocus.com/columnists/407.
[13] “Inoculating SSH Against Address Harvesting”, Stuart Schechter, Jaeyon Jung,
Will Stockwell, and Cynthia McLain,
Proceedings of the 13
th
Annual Network
and Distributed System Security Symposium (NDSS’06)
, February 2006.
Security User Interaction
134
[14] “VeriSign digital certificates with Firefox”, Stuart Fermenick, posting to
netscape.public.mozilla.crypto, 24 January 2006, message-ID
11tdkb65dm21r8f@corp.supernews.com.
[15] “Re: VeriSign digital certificates with Firefox”, Nelson Bolyard, posting to
netscape.public.mozilla.crypto, 25 January 2006, message-ID
ctudnWMsabyYdUre[email protected]rg.
[16] “Lightweight OCSP Profile for High Volume Environments”, Alex Deacon and
Ryan Hurst,
draft-ietf-pkix-lightweight-ocsp-profile-03.txt, January
2006.
[17] Ian Grigg, private communications.
[18] “How to make figures and presentations that are friendly to color blind people”,
Masataka Okabe and Kei Ito,
http://jfly.iam.u-tokyo.ac.jp/html/-
color_blind/
.
[19] “Why Phishing Works”, Rachna Dhamija, J.D.Tygar, and Marti Hearst,
Proceedings of the Conference on Human Factors in Computing Systems
(CHI’06)
, April 2006, p.581.
[20] “Bank’s own developers a much bigger problem than browsers”, ‘mhp’, 18 July
2004,
http://news.netcraft.com/archives/2004/07/18/-
banks_own_developers_a_much_bigger_problem_than_browsers.html
.
[21] “How to Make Personalized Web Browsing Simple Secure and Anonymous”,
Eran Gabber, Phillip Gibbons, Yossi Matias, and Alain Mayer,
Proceedings of
Financial Cryptography 1997 (FC’97)
, Springer-Verlag Lecture Notes in
Computer Science No.1318, February 1997, p.17.
[22] “On Secure and Pseudonymous Client-Relationships with Multiple Servers”
Eran Gabber, Phillip Gibbons, David Kristol, Yossi Matias, and Alain Mayer,
ACM Transactions on Information and System Security (TISSEC), Vol.2, No.4
(November 1999), p.390.
[23] “Consistent, Yet Anonymous, Web Access with LPWA”, Eran Gabber, Phillip
Gibbons, David Kristol, Yossi Matias, and Alain Mayer,
Communications of the
ACM
, Vol.42, No.2 (February 1999), p.42.
[24] “A Usability Study and Critique of Two Password Managers”, Sonia Chasson,
Paul van Oorschot, and Robert Biddle,
Proceedings of the 15
th
Usenix Security
Symposium (Security’06)
, August 2006, p.1.
[25] “ Mobile phones ‘dumbing down brain power’”, Ben Quinn, Daily Telegraph,
15 July 2007,
http://www.telegraph.co.uk/news/-
main.jhtml?xml=/news/2007/07/13/nbrain113.xml
.
[26] “Phishing Social Networking Sites”, “RSnake”, 8 May 2007,
http://ha.ckers.org/blog/20070508/phishing-social-
networking-sites/
.
[27] “A convenient method for securely managing passwords”, J. Alex Halderman,
Brent Waters and Ed Felten,
Proceedings of the 14
th
International World Wide
Web Conference (WWW’05)
, May 2005, p.471.
[28] “Stronger Password Authentication Using Browser Extensions”, Blake Ross,
Collin Jackson, Nick Miyake, Dan Boneh, and John Mitchell,
Proceedings of the
14
th
Usenix Security Symposium (Usenix Security’05), August 2005, p.17.
[29] “Design for Usability”, Bruce Tognazzini, in “Security and Usability: Designing
Secure Systems That People Can Use”, O’Reilly, 2005, p.31.
[30] “Cryptographic security and key management systems — the nFast/KM
solution”, nCipher Corporation, January 1998.
[31]“Building a Better Filter: How To Create a Safer Internet and Avoid the
Litigation Trap”, Erin Egan and Tim Jucovy,
IEEE Security and Privacy, Vol.4,
No.3 (May/June 2006), p.37.
[32] OptInRealBig.com, LLC v. IronPort Systems, Inc, US District Court, Northern
District of California, Oakland Division, case number 4:04-CV-01687-SBA, 2
September 2004.
[33] New.net, Inc. v. Lavasoft, U.S. District Court, Central District of California,
case number CV 03-3180 GAF.
Pre-implementation Testing
135
Security Usability Testing
Designing a usable security interface for an application is inherently difficult (even
more so than general user interface design) because of the high level of complexity in
the underlying security mechanisms, the nebulous nature of any benefits to the user,
and the fact that allowing the user to muddle through (a practice that’s sufficient for
most interfaces) isn’t good enough when they’re up against an active and malicious
adversary. You therefore need to get the user interface designers in on the process as
early as possible, and ensure that the interface drives the security technology and not
the other way round. Usability testing is a step that you can’t avoid, because even if
you choose not to do it explicitly, it’ll be done for you implicitly once your
application is released. The major difference is that if you perform the testing
explicitly, you get to control the testing process and the manner in which results are
applied, whereas if you leave it to the market to test, you’re liable to get test results
like “d00d, your warez SUCKS”, or even worse, a CERT advisory.
Usability testing is a two-phase process, pre-implementation testing (trying to figure
out what you want to build) and post-implementation testing (verifying that what you
eventually built — which given the usual software development process could be
quite different from what was planned — is actually the right thing). This section
covers both pre- and post-implementation testing of the security usability of an
application.
Pre-implementation Testing
Testing at the design stage (before you even begin implementation, for example using
a mock-up on paper or a GUI development kit) can be enormously useful in assessing
the users’ reactions to the interface and as a driver for further design effort [1].
Consider having the designers/developers play the part of the computer when
interacting with test users, to allow them to see what their planned interface needs to
cope with. Although users aren’t professional user interface designers, they are very
good at reacting to designs that they don’t like, or that won’t work in practice.
Looking at this from the other side, you could give users the various user interface
elements that you’ll need to present in your design and ask them to position them on
blank page/dialog, and explain how they’d expect each one to work.
One thing to be aware of when you’re creating a paper prototype is to make sure that
it really is a paper prototype and not a polished-looking mock-up created with a GUI
building toolkit or drawing package. If you create a highly-polished design, people
will end up nitpicking superficial details and overlook fundamental design issues.
For example they might fixate on the colour and style of button placement rather than
questioning why the button is there in the first place [2]. If you like doing your UI
prototyping in Java, there’s a special pluggable napkin look-and-feel for Java that’ll
give your prototype the required scrawled-on-a-napkin look [3].
A useful design technique to get around the engineering-model lock-in is the “pretend
it’s magic” trick, in which you try and imagine how the interface would work if it was
driven by magic instead of whatever API or programming environment you’re
working with. Find a user (or users) and ask them how they’d want it to work,
without interrupting them every minute or two to tell them that what they’re asking
for isn’t possible and they’ll have to start again.
Another useful trick is to sit down and write down each step of the process that you’ll
be expecting users to perform so that you can see just how painful it (potentially) is in
practice. Carrying out this exercise would have quickly helped identify the
unworkability of the certificate-enrolment process described in a previous section.
Stereotypical Users
A useful pre-implementation testing technique is to imagine a stereotypical end user
(or several types of stereotypical users if this applies) and think about how they’d use
the software. What sort of things would they want to do with it? How well would
they cope with the security mechanisms? How would they react to security
Security Usability Testing
136
warnings? The important thing here is that you shouldn’t just add a pile of features
that you think are cool and then try and figure out how to justify their use by the end
user, but that you look at it from the user’s point of view and add only those features
that they’ll actually need and be able to understand. When left to their own devices,
developers tend to come up with self-referential designs where the category of “user”
doesn’t extend any further than people very much like the developer [4].
There’s a particular art to the use of stereotypical users, which usability designer Alan
Cooper covers in some detail in his book
The Inmates are running the Asylum. In
choosing your user(s), it’s important to recreate as much of a real person as you can:
Given them names, write a short bio for them, and try and find a representative photo
(for example from a magazine or online) that allows you to instantly identify with
them. The more specific you can be (at least up to a point), the better.
The reason for this specificity is that a generic cardboard-cut-out user (sometimes
referred to as “the elastic user”) is far too flexible to provide a very real test of the
user interface. Need to choose a key storage location? No problem, the user can
handle it. Need to provide an X.500 distinguished name in a web form? Sure, the
user can do that. On the other hand 70-year-old Aunty May, whose primary use for
her computer is to spam her relatives with emailed jokes, will never go for this.
Designing for the elastic user gives you a free hand to do whatever you feel like while
still appearing to serve “the user”. Creating a user who’s as close as possible to a real
person (not necessarily an actual person, just something more concrete than a
cardboard cut-out) on the other hand lets you directly identify with them and put their
reactions to your user interface design into perspective. How would Aunty May
handle a request for a public/private key pair file location? By turning off the
computer and heading out to do a spot of gardening. Time to rethink your design.
A similar problem occurs with people planning for or deploying security technology.
In this case the elastic user becomes a nebulous entity called “the IT department”,
which takes care of all problems. Take all of the points raised in the previous
paragraph and substitute “the IT department can formulate a policy to cover it” for
“the user can handle it” and you can see where this type of thinking leads. Only a
few large corporations can afford the luxury of having an IT department define
policies for every security eventuality, and even then truly effective policies usually
only appear after a crisis has occurred. For everyone else, they
are the IT
department, leading to farcical situations such as Aunty May sitting at her home PC
in front of a dialog box telling her to contact her system administrator for help.
Note though that you should never employ the technique of stereotypical users as a
substitute for studying real users if such access is available. An amazing amount of
time is wasted at the design stage of many projects as various contributors argue over
what users might in theory do if they were to use the system, rather than simply going
to the users and seeing what they actually do.
All too frequently, user interfaces go against the user’s natural expectations of how
something is supposed to work. For example a survey of a range of users from
different backgrounds on how they expected public keys and certificates to be
managed produced results that were very, very different from how X.509 says it
should be done, suggesting at least one reason for X.509’s failure to achieve any real
penetration [5].
A final useful function provided by the stereotypical user is that they act as a sanity
check for edge cases. The unerring ability of geeks to home in on small problems
(and then declare the entire approach un-workable because of the corner case they’ve
thought up) has already been covered. This problem is so significant that the
developers of the ZoneAlarm firewall, which has a design goal of being (among other
things) “an application your mom could use”, have made an explicit design decision
for their product to not to sacrifice common-use cases for obscure corner cases that
may never happen [6].
Geeks have major problems distinguishing possibility from probability. To a geek
(and especially a security geek) a probability of one in a million is true. To a
cryptographer, a probability of 1 in 2
56
or even 1 in 2
80
(that’s 1 in 1.2 million million
Pre-implementation Testing
137
million million, a one followed by twenty-four zeroes) is true. To anyone else
(except perhaps Terry Pratchett fans), a one-in-a-million chance is false — there’s a
possibility of it being true, but the actual probability is miniscule to the point of
irrelevance. Personas provide a sanity check for such edge cases. Yes, this is a
special case, but would Aunty May ever want to do that?
Input from Users
Asking users how they think that something should work is an extremely useful
design technique. Consider the question of storing users’ private keys. Should they
be stored in one big file on disk? Multiple files? In the registry (if the program is
running under Windows)? On a USB token? In their home directory? In a hidden
directory underneath their home directory? What happens if users click on one of
these files? What if they want to move a particular key to another machine? How
about all of their keys? What happens when they stop using the machine or account
where the keys are stored? How are the keys protected? How are they backed up?
Should they even be backed up?
All of these questions can be debated infinitely, but there’s a far simpler (and more
effective) way to resolve things. Go and ask the users how they would expect them to
be done. Many users won’t know, or won’t care, but eventually you’ll see some sort
of common model for key use and handling start to appear. This model will be the
one that most clearly matches the user’s natural expectations of how things are
supposed to work, and therefore the one that they’ll find the easiest to use. The
problems that can occur when an application doesn’t meet users’ expectations for key
storage was illustrated in one PKI-based tax filing scheme where users weren’t able
to figure out how key storage worked and solved the problem by requesting a new
certificate at each interim filing period (two months). This resulted in an enormous
and never-ending certificate churn that completely overloaded the ability of the
certificate-issuing process to handle it, and lead to unmanageable large CRLs.
Testing by asking users for input has been used for some years by some companies
when developing new user interface features. For example in the early 1980s
whenever a new interface feature was implemented for the Apple Lisa, Apple
developer Larry Tesler would collar an Apple employee to try it out. If they couldn’t
figure it out, the feature was redesigned or removed.
Another advantage of asking users what they want is that they frequently come up
with issues that the developers haven’t even dreamed about (this is why writers have
editors to provide an external perspective and catch things that the writers themselves
have missed). If you do this though, make sure that you occasionally refresh your
user pool, because as users spend more and more time with your interface they
become less and less representative of the typical user, and therefore less able to pick
up potential usability problems.
When you ask users for input, it’s important to ask the
right users. Another problem
that the PKI tax filing scheme mentioned above ran into was the difference between
the claimed and the actual technology level of the users. When various managers
were surveyed during the requirements process, they all replied that their staff had the
latest PCs on their desks and were technology-literate. In other words the managers
were describing themselves. In actual fact the people doing the tax filing were, as
one observer put it, “little old ladies sitting in front of dusty PCs with post-it notes
telling them what to do stuck to the monitor”. The post-it notes contained paint-by-
numbers instructions for the tax filing process, and as soon as one of the post-it’s
didn’t match what was on the screen, the users called the help desk. The result was
that most of the electronic filing was being done by proxy by the helpdesk staff, and
the system haemorrhaged money at an incredible rate until it was finally upgraded
from electronic back to paper-based filing.
The importance of going directly to the end users (rather than relying on testimony
from their superiors) can’t be over-emphasised. No manager will ever admit that
their employees aren’t capable of doing something (it would make the manager look
bad if they did), so the response to “can your people handle X” is invariably “yes”,
Security Usability Testing
138
whether they really can or not. I once went into the paging centre at a large hospital,
where messages to and from doctors are dispatched to and from other doctors to talk
to the staff about their requirements. After a few minutes there I was somewhat
disturbed to discover that this was the first time that anyone had ever asked the users
what they actually needed the software to do for them. In the entire lifetime of the
hospital, no-one had ever asked the users what they needed! Needless to say, using
the software was a considerable struggle (it was an extreme example of task-directed
design), and even a preliminary set of minor changes to the interface improved the
users’ satisfaction considerably.
Post-implementation Testing
Once you’ve finished your application, take a few non-technical people, sit them in a
room with a copy of the software running, and see how they handle it. Which parts
take them the longest? At what points do they have to refer to the manual, or even
ask for help? Did they manage to get the task done in a secure manner, meaning that
their expectations of security (not just yours) were met? Can a section that caused
them problems be redesigned or even eliminated by using a safe default setting? Real
testing before deployment (rather than shipping a version provisionally tagged as a
beta release and waiting for user complaints) is an important part of the security
usability evaluation process.
Logging of users’ actions during this process can help show up problem areas, either
because users take a long time to do something or because their actions generate
many error messages. Logging also has the major advantage that (except for privacy
concerns) it’s totally non-invasive, so that users can ignore the logging and just get on
with what they’re doing. Microsoft used logging extensively in designing the new
interface for Office 2007/Office 12, analysing 1.3 billion Office 2003 sessions in
order to determine what users were and weren’t using [7].
Logging can also help reveal discrepancies between users’ stated behaviour and their
actual behaviour. Personal firewall maker ZoneAlarm carried out user surveys in
which users across a wide range of skill levels were unanimous in stating that they
wanted to be involved in every decision made by the firewall software, but analysis of
actual user behaviour showed the exact opposite — users wanted to know that they
were being protected, but didn’t want to be bothered with having to make the decision
each time [8].
There’s another useful litmus test that you can use for your post-implementation
testing to find potential security weaknesses. Imagine that your application has been
deployed for awhile and there’s been a report of a catastrophic security failure in it.
Yes, we know that your application is perfect in every way, but somehow some part
of it has failed and the only error information that you have to work with is the report
that it failed. Where do you think the failure was? How would you fix it?
This type of analysis is an interesting psychological technique called a premortem
strategy [9]. The US Navy gave it the name “crystal-ball technique” in its review of
decision-making under stress that occurred after the erroneous shootdown of a
civilian airliner by the USS Vincennes [10]. In the Navy version, people are told to
assume that they have a crystal ball that’s told them that their favoured hypothesis is
incorrect, so that they have to come up with an alternative explanation for an event.
This is also one of the techniques used to try to combat cognitive bias that was
mentioned in a previous chapter in the discussion of the CIA analyst training manual.
No matter what you call it, what premortem analysis does is compensate for the
overconfidence in a work that anyone who’s intimately involved in its creation
develops over extended exposure to it. If you ask a designer or programmer to
review their application, their review will be rather half-hearted, since they want to
believe that what they’ve created is pretty good. The premortem strategy helps them
break their emotional attachment to the project’s success and objectively identify
likely points of failure. Real-world testing has shown that it takes less than ten
minutes for failures and their likely causes to be discovered [11].
Post-implementation Testing
139
User Testing
User interface design is usually a highly iterative process, so that the standard
{ design, implement, test } cycle described above probably won’t be enough to shake
out all potential problems, particularly in the case of something as complex and hard
to predict as security user interface design. Instead of a single cycle, you may need to
use multiple cycles of user testing, starting with a relatively generic design
(sometimes known as low-fi prototyping) and then refining it based on user feedback
and experience.
This testing process needn’t be complex or expensive. Usability expert Jakob Nielsen
has shown that once you go beyond about five users tested, you’re not getting much
more information in terms of usability results [12]. This phenomenon occurs because
as you add more and more users, there’s increasing overlap in what they do, so that
you learn less and less from each new user that you add. So if you have (say) 20 test
users, it’s better to use them in four different sets of tests on different versions or
iterations of the interface than to commit all 20 to a single test. A variation of this
situation occurs when a single group contains highly distinct subgroups of users, such
as one where half the users are technical and the other half are non-technical. In this
case you should treat each subgroup as a separate unit for 5-user test purposes, since
they’re likely to produce very different test results.
A useful tool to employ during this iterative design process is to encourage users to
think out loud as they’re using the software. This verbalisation of users’ thoughts
helps track not just
what users are doing but why they’re doing it, allowing you to
locate potential stumbling blocks and areas that cause confusion. Make sure though
that you actually analyse a user’s comments about potential problems. If a user
misses an item in a dialog or misreads a message, they may end up in trouble at some
point further down the road, and come up with complex rationalisations about why
the application is broken at the point where they realise that they’re in trouble, rather
than at the point where they originally made the error.
Note also that the very act of verbalising (and having to provide an explanation for)
their actions can make a user think much more about what they’re doing, and as a
result change their behaviour. Tests with users have shown that they’re much better
at performing a user interface task when they’re required to think out loud about what
they’re doing.
To get around this, you can allow the user to perform less thinking out loud, and
instead prompt them at various points for thoughts on what they’re doing. Asking
questions like “What do you expect will happen if you do this?” or “Is that what you
expected would happen?” are excellent ways of turning up flawed assumptions in
your design.
A variation of thinking out loud is constructive interaction, in which two users use a
system together and comment on each other’s actions (imagine your parents sitting in
front of their PC trying to figure out how to send a photo attachment via their Hotmail
account). This type of feedback-gathering is somewhat more natural than thinking
out loud, so there’s less chance of experimental bias being introduced.
Another trick that you can use during user interface testing is to insert copier’s traps
into the interface to see if users really are paying attention. Copier’s traps are little
anomalies inserted into maps by mapmakers that allow them to detect if a competitor
has copied one of their maps, since a map prepared from original mapping data won’t
contain the fictitious feature shown in the trap.
You can use the same technique in your user interface to see if users really have
understood the task that they’re performing or whether they’re just muddling through.
In a standard application, muddling through a task like removing red-eye from a
photo is fine as long as the end result looks OK, but in a security context with an
active and malicious adversary it can be downright dangerous even if the result does
appear to be OK. Adding a few copier’s traps during the testing phase will tell you
whether the interface really is working as intended, or whether the user has simply
managed to bluff their way through.
Security Usability Testing
140
Testing/Experimental Considerations
Evaluating the actual effectiveness of security mechanisms (rather than just
performing basic usability testing) is a bit of an art form, because you need to
determine not just whether a user can eventually muddle their way through your user
interface but whether they’re secure when they do so. The most important factor that
you need to take into consideration when you’re evaluating the security effectiveness
of your application is the issue of pollutants in the evaluation methodology.
Pollutants are an undesired contaminant that affects the outcome of the evaluation.
For example if you tell your test users that you’re evaluating the security of the
system, they’ll tend to be far more cautious and suspicious than they’d normally be.
On the other hand if you tell them that you’re evaluating the usability aspects of your
application, they’ll assume that they’re running in a benign environment since
they’ve been asked specifically to comment on usability and by implication not to
worry about security. As a result, they’ll be less cautious then they’d normally be.
This is a bit of a tricky problem to solve. Perhaps the best option is to tell the users
that you’re evaluating the effectiveness specifically of the security user interface
(rather than the security of the application as a whole), or the effectiveness of the
workflow, which is halfway between instilling too much and too little paranoia.
Another problem arises with your choice of data for testing. If you give users what’s
obviously artificial test data to play with, they’ll be less careful with it than they
would with real data like their own account credentials or credit card information.
One strategy here is to use real user data, under the guise of usability or workflow
evaluation, but be very careful to never record or store any of the information that’s
entered. Even this can be problematic because users might feel apprehensive about
the use of real data and instead invent something that they can be relatively careless
with. One workaround for this is to load a small amount of value onto their credit
card (if that’s what you’re testing) and let them spend it, which both guarantees that
they’re using their real credit card information and encourages them to be careful
with what they do with it. Using real data is only safe though if you can carefully
control the environment and ensure that no data ever leaves the local test setup, which
can be difficult if the evaluation requires interacting with and providing credentials to
remote servers.
Another approach that’s been used with some success in the past is to have the users
play the role of a person who’d need to interact with the security features of the
application as part of their day-to-day work. For example the ground-breaking
evaluation of PGP’s usability had users play the role of an election campaign
manager running an election via PGP in the presence of hostile opposition campaign
organisers [13].
Another aid to helping participants get into the spirit of things is to allow them to
wager small amounts on the outcome of their actions, a technique that’s frequently
used in various forms in psychological experiments. Not only does this incentivise
participants to take the whole thing more seriously, but it also provides a good
indication of their level of confidence in what they’ve done. Someone may claim that
they’re certain that they’ve acted securely, but it’s the actual value that they’re
prepared to attach to this assertion that’s the best indicator of how they really feel
about it.
Once the evaluation is over, you may need to debrief the participants. The exact level
at which you do this depends on the overall formality of the evaluation process. If
you’re just asking a few colleagues to play with the user interface and give their
opinions then perhaps all you need to do is reassure them that no sensitive data was
logged or recorded and, if you’re a manager, that this isn’t going to appear on their
next performance review.
If it’s a more formal evaluation, and in particular one with outside participants, you’ll
need to perform a more comprehensive debriefing, letting the participants know the
purpose of the evaluation and explaining what safeguards you’ve applied to protect
any sensitive data. As a rule of thumb, the more formal the evaluation and the more
Usability Testing Examples
141
public the participation, the more careful you have to be about how you conduct it.
For general evaluations you’ll need to take various legal considerations into account
[14], and for academic research experimentation there are also ethical considerations
[15].
A problem with this follow-the-rules-to-the-letter approach is that you can find
yourself terminally bogged down in red tape every time you want to determine the
effects of moving the position of a checkbox in a security dialog. One rule of thumb
that you can use in determining how formal you need to make things is to use the
analogy of borrowing someone’s car. If you want to borrow your brother’s car, it’s
just a case of picking up the keys. If you want to borrow a friend or neighbour’s car,
it may take a little reassurance and persuasion. If you borrow a stranger’s car it’ll
take an exchange of money, filling in a rental agreement, and proof of fitness to drive
and creditworthiness. Roughly the same scaling applies to user evaluation tests,
depending on whether you’re performing the testing using a friend or colleague,
someone slightly more distant, or a complete stranger.
Other Sources of Input
One of the most valuable but at the same time one of the most under-utilised sources
of user input is the contents of user support calls, email, and web forums. If any part
of the user interface receives more than its share of user support inquiries then that’s a
sign that there’s an problem there that needs to be resolved. Customer support
channels constitute the largest and cheapest usability-testing lab in the world. These
are real users employing your software in real situations, and providing you with
feedback at no cost. While they can’t replace a proper usability testing lab, they can
provide a valuable adjunct to it, and for smaller organisations and in particular open-
source developers who can’t afford a full-blown usability lab they’re often the next-
best thing.
The cryptlib development process has benefited extensively from this feedback
mechanism, allowing areas that caused problems for users to be targeted for
improvement. A result of this user-driven development process has been that many
usability obstacles have been removed (or at least moderated), an affect that can be
measured directly by comparing the number of user requests for help on the cryptlib
support forum with the number on similar for a such as the OpenSSL mailing list. A
convenient side-effect of this type of usability refinement is that it significantly
reduces the user support load for the product developers.
Usability Testing Examples
This section presents a number of case studies of security usability problems that
were turned up by user testing. Unfortunately almost all of the testing was reactive
rather than proactive and has resulted in few changes to products either because it’s
too late to fix things now or because the affected organisations aren’t interested in
making changes. As well as providing for interesting usability case studies, these
examples could be seen as a strong argument for pre-release testing.
Encrypted Email
An example of the conflict between user expectations and security design was turned
up when security usability studies showed that email users typically weren’t aware
that (a) messages can be modified as they move across the Internet, (b) encrypting a
message doesn’t provide any protection against such modification, and (c) signing a
message does protect it. The users had assumed that encrypting a message provided
integrity protection but signing it simply appended the equivalent of a pen-and-paper
signature to the end of it [16]. Furthermore, users often have a very poor grasp of the
threat model for email, assuming for example that the only way to spoof email is to
break into someone’s account and plant it there [17].
A similar gap in the understanding of what the crypto provides was found in a survey
of SSL users, with more than a third of respondents indicating that as far as they were
aware SSL (as used to secure web sites) didn’t protect data in transit [18].
Security Usability Testing
142
Real-world testing and user feedback is required to identify these issues so that they
can be addressed, for example by explaining signing as protecting the message from
tampering rather than the easily-misunderstood “signing”. Similarly, the fact that
encryption doesn’t provide integrity protection can be addressed either at the user
interface level by warning the user that the encrypted message isn’t protected from
modification (trying to “fix” the user), or at the technical level by adding a MDC
(modification detection code) inside the encryption layer or a MAC (message
authentication code) outside it (actually fixing the problem). Of these two, the latter
is the better option since it “fixes” the encryption to do what users expect without
additionally burdening the user. This is the approach taken by OpenPGP, which
added a SHA-1 hash to the encrypted data (S/MIME doesn’t appear to be interested
in fixing this). Modifying the application to do what the user wants is always
preferable to trying to modify the user to do what the application wants.
Browser Cookies
Another example of a problem that would have been turned up by post-
implementation testing occurs with the handling of cookies in browsers. This has
slowly (and painfully) improved over the years from no user control over what a
remote web site could do to rather poor control over what it could do. The reason for
this was that cookies are a mechanism designed purely for the convenience of the
remote site to make the stateless HTTP protocol (slightly) stateful. No-one ever
considered the consequences for users, and as a result it’s now extremely hard to fix
the problem and make the cookie mechanism safe [19][20]. For example once a
browser connects to a remote site, it automatically sends any cookies it has for the
site to the remote server instead of requiring that the server explicitly request them.
While more recent browsers allow users to prevent some types of cookies from being
stored, it’s not the storage that’s the problem but their usage by the remote system,
and the user has no control over that since changing current browsers’ behaviour
would require the redesign of vast numbers of web sites. Similarly, while in recent
browsers users have been given the ability to selectively enable storage of cookies
from particular sites, clearing them afterwards is still an all-or-nothing affair. There’s
no way to say “clear all cookies except for the ones from the sites I’ve chosen to
keep”.
Another problem arises because of the way that the browser’s user interface presents
cookie management to users. Recent versions of Internet Explorer have grouped
cookies (or at least the option to clear cookies) with the options for the browser
cache, with both being covered by explanatory text indicating that they speed up
browsing. As a result, many users who were technical enough to know about cookies
believed that they’re used primarily to speed up web browsing, often confusing them
with the browser cache [18]. Since few people are keen to deliberately slow down
their web browsing, there’s a reluctance to use a browser’s cookie management
facilities to delete the cookies.
Other, more sophisticated cookie-management techniques based on social validation
(something like eBay’s seller feedback ratings) have also been proposed [21],
although it’s not certain whether the required infrastructure could ever be deployed in
practice, or how useful the ratings would actually be in the light of the dancing-
bunnies problem.
With more testing of the user side of the cookie mechanism, it should have been
obvious that having the user’s software volunteering information to a remote system
in this manner was a poor design decision. Now that usability researchers have
looked at it and pointed out the problems, it’s unfortunately too late to change the
design.
(Note that fixing cookies wouldn’t have solved the overall problem of site control
over data stored on the user’s machine, because there are cookie-equivalent
mechanisms that sites can use in place of cookies, and these can’t be made safe (or at
least safer) in the way that cookies can without significantly curtailing browser
operations. For example the browser cache operates in somewhat the same way as
cookies, allowing site-controlled data to be temporarily stored on the user’s machine.
Usability Testing Examples
143
By setting the Last-Modified field in the header (which is required in order for
caching to work) and reading it back when the browser sends its If-Modified-Since in
future requests, a server can achieve the same effect as storing a cookie on the client’s
machine. There are other tricks available to servers if the client tries to sidestep this
cache-cookie mechanism [22], and the capabilities provided can be quite
sophisticated, acting as a general-purpose remote memory structure rather than their
originally intended basic remote-state-store [23]. So even with a better user interface
and a fixed design that makes the cookie client-controlled, malicious servers will
always have a cookie-like mechanism available to them).
Key Storage
Post-implementation testing can often turn up highly surprising results arising from
issues that would never have occurred to implementers. A representative example
from outside the security world occurred in the evolution of what we’re now familiar
with as the ‘OK’ button, which in its early days was labelled quite differently since it
was felt that ‘OK’ was a bit too colloquial for serious computer use. In 1981 when
Apple was performing early user testing on the nascent Macintosh user interface, the
button was labelled ‘Do It’. However, the testing revealed that ‘Do It’ was a bit too
close visually to ‘Dolt’, and some users were becoming upset that a computer touted
for its user-friendliness was calling them dolts [24]. The designers, who knew that
the text said Do It because they were the ones who had written it, would never have
been able to see this problem because they knew a priori what the text was meant to
say. The alternative interpretation was only revealed through testing with users
uncontaminated by involvement in the Macintosh design effort.
Getting back to the security world, the developers of the Tor anonymity system found
that Tor users were mailing out their private keys to other Tor users, despite the fact
that they were supposed to know not to do this. Changing the key filename to include
a
secret_ prefix at the front solved the problem by making it explicit to users that
this was something that shouldn’t be shared [25]. PGP solves the problem in a
similar manner by only allowing the public key components to be exported from a
PGP keyring, even if the user specifies that the PGP private keyring be used as the
source for the export.
Conversely, Windows/PKCS #12 takes exactly the opposite approach, blurring any
distinction between the two in the form of a single “digital identity” or PKCS
#12/PFX file, so that users are unaware that they’re handing over their private keys as
part of their digital identity (one paper likens this practice to “pouring weed killer into
a fruit juice bottle and storing it on an easily accessible shelf in the kitchen
cupboard”) [26]. The term “digital identity” is in fact so meaningless to users that in
one usability test they weren’t able to usefully explain what it was
after they’d used it
for more than half an hour
[27]. Think about this yourself for a second: Excluding
the stock response of “It’s an X.509 certificate”, how would you define the term
“digital identity”?
Another issue with private keys held in crypto tokens like USB keys or smart cards
involves how users perceive these devices. In theory, USB tokens are superior to
smart cards in every way: They’re a more convenient form factor, less physically
fragile, easier to secure (because they’re not limited to the very constrained smart
card form factor), more flexible through the ability to add additional circuitry, don’t
require a separate reader, and so on. Smart cards only have one single advantage over
user USB tokens: the USB tokens are (conceptually) very close to standard keys,
which get shared among members of the family, lent to relatives or friends who may
be visiting, or left with the neighbours so that they can feed the cat and water the
plants when the owners are away.
Smart cards, when the correct measures are used, don’t have this problem. If you
take a smart card and personalise it for the user with a large photo of the owner, their
name and date of birth, a digitised copy of their signature, and various extras like a
fancy hologram and other flashy bits, they’ll be strongly inclined to guard it closely
and highly reluctant to lend it out to others. The (somewhat unfortunate) measure of
Security Usability Testing
144
making the card an identity-theft target ensures that it’ll get looked after better than
an anonymous USB token.
Banking Passwords
The threat model for passwords on the Internet is quite different from the historic
threat model, which dates back to the 1960s with users logging onto centralised
mainframes via dedicated terminals. In this mainframe environment, the attacker
keeps trying passwords against a user account until they guess the right one. What
this means is that the user name stays constant and the password varies. The defence
against this type of attack is the traditional “three strikes and you’re out” one in which
three incorrect password attempts set off alarms, cause a delay of several minutes
before you can try again, or in the most paranoid cases lock the account, a marvellous
self-inflicted denial-of-service attack.
That was the threat model (and corresponding defence) from forty years ago. Today,
the threat is quite different. In response to the three-strikes-and-you’re-out defence,
attackers are keeping the password constant and varying the user name instead of the
other way round. With a large enough number of users, they’ll eventually find a user
that’s using the password that they’re trying against each account, and since they only
try one password per account (or more generally a value less than the lockout
threshold), they never trigger the defence mechanisms. This is a 21
st
-century Internet
attack applied against an anachronistic threat model that hasn’t really existed for
some decades.
Consider the following example of this attack, based on research carried out on the
Norwegian banking system in 2003-4 [28]. The Norwegian banks used a standard
four-digit PIN and locked the account after the traditional three attempts. Using the
fixed-PIN/varying-userID approach, an attacker would be able to access one account
out of every 220,000 tried (a botnet would be ideal for this kind of attack). On the
next scan with a different PIN, they’d get another account. Although this sounds like
an awfully low yield, the only human effort required is pointing a botnet at the target
and then sitting back and waiting for the results to start rolling in. The banking
password authentication mechanisms were never designed to withstand this type of
attack, since they used as the basis for their defence the 1960s threat model that
works just fine when the user is at an ATM.
There are many variants of this attack. Some European banks use dynamic PIN
calculators, which generate a new time-based or pseudorandom-sequence based PIN
for each logon. In order to accommodate clock drift or a value in the sequence being
lost (for example due to a browser crash or network error), the servers allow a
window of a few values in either direction of the currently expected value. As with
the static-password model, this works really well against an attacker that tries to
guess the PIN for a single account, but really badly against an attacker that tries a
fixed PIN across all accounts, because as soon as their botnet has hit enough accounts
they’ll come up a winner.
For all of these attacks (and further variations not covered here), a basic level of post-
release analysis would have uncovered the flaw in the threat model. Unfortunately
the testing was only performed some years later by academic researchers, and the
affected organisations mostly ignored their findings [28].
Password Managers
The previous chapter looked at the use of strengthened password mechanisms to
protect users’ passwords, and mentioned that facilities of this type are already
available in some cases, typically as plugins for the Firefox web browser. How do
these plugins stand up in practice? A usability study of two popular password
managers, PwdHash and Password-Multiplier, found that they fall far short of their
authors’ expectations due to a variety of user interface problems [29].
The biggest problem with these browser plugins is that they are exactly that, browser
plugins. The lack of integration into the browser created almost all of the usability
problems that users experienced. For example if the plugin wasn’t installed or was
Usability Testing Examples
145
bypassed by a malicious web page using Javascript or a similar technique, the user
would end up entering their master password on a remote login page instead of
having the plugin provide a site-specific random password.
This reiterates an important point that’s already been made elsewhere: In order to be
effective, a security measure has to be a native part of the underlying application. It
has to be present and active at all times. It can’t be an optional add-on component
that may or may not be currently active, or for which users have to expend conscious
effort to notice its presence, because they simply won’t notice its absence (see the
earlier discussion on the psychological aspects of the security user interface for more
on this problem).
A second problem with the lack of direct integration is that the add-on nature of the
browser plugins lead to complex and awkward interaction mechanisms because of the
lack of direct access to browser-internal mechanisms. A direct consequence of this
awkwardness was that only one single task of the five that users were asked to
complete in the study had a success rate over 50%, with failure rates being as high as
84%. Alarmingly, one of the failure modes that was revealed was that users tried
entering every password they could think of when they couldn’t access the site using
the plugin.
For the plugin tested in the usability study, users were required to use special
attention-key sequences like ‘@@’ or Alt-P or F2 to activate the security
mechanisms, and these were only effective if the cursor was already present in the
password text fields. Users either forgot to use the attention sequence, or got them
wrong, or used them at the wrong time. They therefore found it very hard to tell
whether they’d successfully activated and applied the plugin security mechanisms,
and several said that if they hadn’t been participating in a study they’d have long
since signed up for a new account with a standard password rather than struggle
further with the password-manager plugins.
These problems came about entirely because of the need to implement the security
features as a plugin. If they’d been built directly into the browser, none of this would
have occurred.
Another interesting feature that was turned up by the user testing was that people
were profoundly uneasy about the fact that they no longer knew the passwords that
they were using, leading to complaints like “I wish it would show me my password
when it first generates it. I won’t lose it or share it!” [29]. This loss of control
negatively affected users’ perceptions of the password manager. One way of
mitigating this problem, already provided by the rudimentary password-saving
features built into existing browsers, is to display the password when the user
requests it. This helps fight the users’ perception that they’ve lost control of their
passwords when they let the password manager handle them.
File Sharing
A similar problem to the Tor one was turned up by post-implementation testing of the
Kazaa file-sharing application (“post-implementation testing” in this case means that
after the software had been in use for awhile, some researchers went out and had a
look at how it was being used) [30]. They found that Kazaa exhibited a considerable
number of user interface problems, with only two of twelve users tested being able to
determine which files they were sharing with the rest of the world. Both design
factors and the Kazaa developers’ lack of knowledge of user behaviour through pre-
or post-implementation testing contributed to these problems. For example Kazaa
manages shared files through two independent locations, via the “Shared Folders”
dialog box and the “My Media” downloads folder. Items that were shared through
one weren’t reflected in the other, so if a user chose to download files to their
Windows C: drive, they inadvertently shared the entire drive with other Kazaa users
(!!!) without the “Shared Folders” dialog indicating this. The number of users caught
out by this was indicated by over four hundred sample searches carried out in a period
of twelve hours, with 61% of the searches returning hits for Kazaa users’ Outlook
Express mail files, a representative file that would never (knowingly) be shared with
Security Usability Testing
146
the rest of the world. Possibly in response to this, Apple’s security usability
guidelines explicitly warn developers that “if turning on sharing for one file also lets
remote users read any other file in the same folder the interface must make this clear
before sharing is turned on” [31].
Another file-sharing study, which looked only for banking files, found large numbers
of files containing sensitive banking information being inadvertently shared by bank
employees [32]. Kazaa’s poor default settings have even lead one lawyer to comment
that it offers “no reasonable expectation of privacy” [33].
An aspect of user behaviour that was unanticipated by the Kazaa developers was the
fact that users were in general unaware that sharing a folder (directory) would share
the contents of all of the subdirectories beneath it, and were also unaware that sharing
a folder shared all of the files in it rather than just a particular file type such as music
files. Part of this problem was again due to the user interface design, where clicking
on a parent folder such as “My Documents” (which is automatically recommended
for sharing by Kazaa when it’s set up) gave no indication that all files and subfolders
beneath it would also be shared.
As with the mismatch of user expectations over message encryption that were
covered earlier, there are two ways to address this problem. The first is to attempt to
“fix” the user by warning them that they’re sharing all files and subdirectories, an
action that the previous sections have shown is likely to have little effect on security
(users will satisfice their way past it — they want to trade files, not read warnings).
A much better approach uses activity-based planning to avoid ever putting the user in
a situation where such a warning is necessary. With this style of interface, the user is
given the option to share music in the current folder, share pictures in the current
folder, share movies in the current folder (let’s fact it, Kazaa isn’t used to exchange
knitting patterns), or go to an advanced sharing mode interface. This advanced/expert
mode interface allows the specification of additional file types to share and an option
to share such file types in subdirectories, disabled by default.
Some P2P applications in fact do the exact opposite of this, searching users’ hard
drives for any folders containing music or videos and then sharing
the entire folder
that contains the music file(s) [32]. This fault is compounded by the fact that many
users don’t really understand the concept of folders and tend to save documents
wherever the ‘Save’ dialog happens to be pointing to when it pops up (one sysadmin
describes the resulting collection of data as “not so much filed as sprayed at random
across the filesystem”). As a result, the letter to the bank is stored next to the holiday
photos, the Quicken account data, and the video of the dancing bunnies, all shared
with anyone else with an Internet connection. Compounding the problem even
further, the set-and-forget nature of P2P applications and the lack of interaction with
the user once they’ve been started leaves users with no indication that saving or
copying any new files into shared folders is publishing that information for the entire
Internet to see.
An additional safety feature would be to provide the user with a capsule summary of
the types and number of files being shared (“21 video files, 142 sound files, 92
images, 26 documents, 4 spreadsheets, 17 programs, 218 other files”) as an additional
warning about what it is they’re doing (“Why does it say that I’m sharing documents
and spreadsheets when I thought I was only sharing sounds and images?”). Strangely
enough, the My Media folder (ostensibly meant for incoming files) provides exactly
this summary information, while the Shared Folders interface doesn’t, merely
showing a directory tree view. This simple change to the user interface now makes
the application behave in the way that the user expects it to, with no loss of
functionality but a significant gain in security. Even a quick change to the current
user interface, having it auto-expand the first one or two levels of directories in the
tree view to show that all of the sub-folders are selected, would at least go some way
towards fixing the interface’s security problems.
Usability Testing Examples
147
Site Images
In 2005 the US Federal Financial Institutions Examination Council (FFIEC) issued
guidance requiring that US financial institutions use two-factor authentication
(strictly speaking they said that single-factor authentication was inadequate and
required that “financial institutions offering Internet-based products and services to
their customers should use effective methods to authenticate the identity of
customers”) [34]. The poor security practices of US financial institutions have
already been covered in previous chapters; in this case they redefined “two-factor
authentication” so that it no longer required the use of a security token like a SecurID
or a challenge/response calculator of the type used by European banks (which would
have cost money to deploy), but merely required them to display a personalised
image on the user’s logon page [35]. In other words their definition of “two-factor
authentication” was “twice as much one-factor authentication”.
They then compounded the error by training users to ignore the standard HTTPS
indicators in favour of the site images. Figure 52 and Figure 53 provide two
examples of this problem.
Figure 52: Training users to ignore HTTP indicators
When security researchers looked at the effectiveness of these security indicators, the
results were alarming, but predictable: Users were ignoring the existing HTTPS
indicators (in the study not one user was stopped by the absence of HTTPS
indicators), but also not paying much attention to the absence of the site image either.
Simply replacing the image with a message telling users that “
bank-name is currently
upgrading our award-winning
site-image brand-name feature. Please contact
customer service if your
site-image brand-name does not reappear within the next 24
hours” was enough to convince 92% of the participants in the study that it was safe to
use the site [36]. Although it wouldn’t have been too hard to simply copy the site
image from the genuine site (it took about a minute to defeat the purported additional
challenge-question security measures to obtain the sample image shown in Figure
52), an attacker doesn’t even have to go to this minimal level of effort to defeat it — a
maintenance message is all that’s required, and thanks to the banks’ conditioning of
users the SSL indicators are bypassed to boot.
Security Usability Testing
148
Figure 53: More user insecurity training
In a real-world demonstration of its ineffectiveness, the analysis of one widespread
piece of malware found that the most popular banking target for the software was
https://sitekey.bankofamerica.com (the URL for the site-image logon page)
indicating that site images present no problems for criminals [37]
The effectiveness of so trivial a measure as removing the site images through a bogus
“under construction” message is a follow-on effect of the “all the ads all the time”
nature of today’s web sites. Just as users expect ASP and Javascript problems,
transient network outages, broken links and 404 errors, and similar issues whenever
they go online, they’re also quite used to constantly-mutating web sites where almost
anything can change between visits. As with the SSL indicators mentioned in an
earlier section, trying to detect security problems using a mechanism with a close to
100% false positive rate isn’t notably useful.
Other attacks on site images include a standard man-in-the-middle attack (which is
quite simple to perform, despite claims from the marketing manager of the service
that it’s impossible) [38], or just displaying a random image from the selection
provided by the bank. Although the effectiveness of the latter approach hasn’t been
experimentally evaluated, the results of other studies on users’ attention to security
indicators of this type suggests that a significant number of users won’t notice that
anything is amiss.
In any case the redefinition of “two-factor authentication” to mean “twice as much
one-factor authentication” presented the merest speed-bump to malware authors, who
bypassed it with little effort. For example the Gozi Trojan, among its many other
capabilities, has a “grabs” module that hooks into the browser’s Javascript engine to
obtain any extra credentials communicated via AJAX mechanisms rather than a
standard password-entry dialog [39]. There’s no indication from the malware
community that the twice-as-much-one-factor approach is presenting any difficulty to
attackers.
Signed Email
Today virtually all use of signed messaging occurs in automated protocols and
processes like EDI buried deep down in the IT infrastructure. The vision that
flourished during the crypto wars of the 1990s that everyone would eventually be
using signed and/or encrypted email has pretty much evaporated. So why is no-one
signing their messages?
Part of the blame can be laid at the feet of the un-usability of the PKI or PKI-like
mechanisms that are required to support signing, but another part of the problem is
the fact that while geeks will do something with a computer just because it’s geeky,
the rest of the world needs a reason to do things with their computers. Why would
the average user care about signed email? If it’s from someone that they know then
they’ll verify the message’s authenticity based on the message contents, so-called
semantic integrity, and not a digital signature [40]. On the other hand if it’s from
someone that they don’t know then it doesn’t matter whether the message is signed or
not. In neither case is the large amount of effort required in order to work with digital
signatures justified in the eyes of the typical user.
Usability Testing Examples
149
This doesn’t apply only to everyday users. When the S/MIME standards group
debated whether they should switch to using S/MIME signed email for their
discussions, they came to the same conclusion. In other words one of the groups that
sets the standards for digitally signed messages decided that there wasn’t much point
to actually using them. Although it can be argued (endlessly) that everyone should be
using mechanisms like signed email to prevent things like phishing attacks, a user
base that has problems with something as basic as a padlock icon will never be able
to cope with the massive complexity that comes with digitally signed email. So
despite the best efforts of the protocol designers and programmers and the ensuing
result that the majority of the world’s desktops have digital-signature-enabled email
clients built into them, the market has decided that, by and large, digitally-signed
email just isn’t worth the effort. As with several of the other examples presented
here, real-world user testing would have saved considerable misspent effort (both at
the IT and the government/legislative level, consider all of the moribund digital
signature legislation that half the world’s governments were busy passing in the late
1990s) and helped focus efforts elsewhere.
(Another concern, which because of the lack of digital signature usage comes up
mostly among privacy advocates, is the problem of incrimination. There is already
concern among some users about the size of the digital footprint or data shadow that
they create in their everyday use of computers and the Internet. Digitally signing
everything, the equivalent of creating a notarised document under some digital
signature regimes, doesn’t help allay these concerns).
Signed Email Receipts
An even more extreme example then simple signed email occurs with signed email
receipts. If you dig down into Microsoft Outlook to find the security configuration
tag in the options dialog you’ll find checkboxes for options like “Request a secure
receipt for all digitally signed messages”. Even finding this facility requires
extensive spelunking inside the Outlook user interface, where it’s hidden several
levels down, and in the case of the full Outlook rather than Outlook Express, quite
some way away from the normal receipt configuration settings, which itself are
behind a gauntlet of dialogs, tabs, and buttons. Anyway, assuming that you’ve
managed to dig up the necessary configuration setting, let’s look at what happens
when you enable it.
As the checkbox implies, your mailer will now include a request for a signed delivery
receipt when it sends an S/MIME signed message. How many of those do you send
every day? Assuming though that you have S/MIME signing turned on by default
(perhaps as a requirement of corporate policy) then the recipient (or at least the
recipient’s mailer) will receive a request to send a signed S/MIME receipt. Most
mailers won’t understand this, or if they do will ignore it, but let’s say for argument’s
sake that the target mailer not only understands it but decides to act on it. If the
recipient has a smart card or other crypto token, they now have to locate and insert
the token and enter their PIN. Even if it’s a software-only implementation, they
usually still need to enter a password or authorise the signing action in some manner
(software that silently signs messages behind your back is a dangerous concept, as
discussed in the section on legal issues). Outlook’s default setting (unless the user or
a group policy setting has changed it) is to ask the user what to do every time that a
receipt request is received, which is the safest setting for signing, but will doubtless
lead to it being quickly disabled after about the first half-dozen receipt dialogs have
popped up.
Assuming though that they decide to send a signed receipt, the target user’s mailer
then generates the receipt and sends it back to your mailer. If this is a message sent to
a mailing list, consider how many people you’ve managed to annoy and the volume
of mail that’s about to hit your mailer from this one action alone!
At this point the receipt comes back to you. It may get dropped on the way, or
perhaps blocked by spam filters, but eventually it may end up at your mailer, which in
turn may ignore it or discard it, but at best will put up some minute indicator next to
the message in the sent-mail folder to indicate that a receipt was received.
Security Usability Testing
150
Let’s look at the security implications of this mechanism. Assume a worst-case
scenario in which an active attacker able to intercept and modify all of your
communications is sitting between you and the email recipient, meticulously
intercepting and deleting every single signed receipt that appears. The security
consequences of this are... nothing. No alarm is raised, nothing at all happens. In
fact, nothing
can happen, because if an alarm was raised every time the recipient’s
mailer didn’t understand the request, or ignored it, or sent an invalid one (for example
one that’s signed by a certificate issued by a CA that you don’t recognise), or it got
lost, or caught in spam filters, you’d be buried in false alarms. Even under near-
perfect conditions there’s no clear idea as to when to raise a no-receipt-received
alarm. Do you do it after an hour? A day? A week? What if the recipient is away
for the day, it’s just before a weekend, or they’re taking their annual leave?
So the correct label for this option should really be “Annoy random recipients and
cause occasional email floods”. This is a great example of “because we can” security
user interface design. It’s a feature that was added so that the developers could show
off the fact that their software knows what a signed receipt is, because without this
option to enable no user would even know that this “feature” existed. In terms of the
actual user experience though, the only thing missing is a Douglas Adams-inspired
sign that lights up to telling users not to click on this option again when they click on
it.
Post-delivery Reviews
A final stage of testing is the post-delivery review, sometimes referred to as a
retrospective. Most advocates of this process suggest that 3-12 months after release
is the best time to carry out this type of review, this being the point at which users
have become sufficiently familiar with the software to locate problem areas, and at
which point the software has had sufficient exposure to the real world to reveal any
flaws in the design or its underlying assumptions.
This final stage of the design process is extremely important when deploying a
security system. The reason why the Walker spy ring was able to compromise the
NSA-designed security of the US Navy so effectively was that the NSA and Navy in
combination had ended up creating an overall system that was (as the post-mortem
report mentioned earlier puts it) “inherently insecure and unusable”, despite the fact
that it had been built on (theoretically) secure components [41]. The report goes on
to say that “time and again, individuals made decisions based on assumptions that
proved to be woefully incorrect. In many cases, these assumptions were based on
nothing more than wishful thinking, or on the fact that it would be very convenient if
certain things were true […] Just as good design involves finding out how the
encryptor behaves as the battery loses its charge or the device gets splashed with
water, so also good system design should take into account what happens when the
operators do not behave as they ought to — whether through malice, carelessness, or
simple inability to carry out the requirements with the resources available. The latter
two cases can be minimized or even eliminated through better design: that is, the
designer must make it as easy as possible to do the right thing and as hard as possible
to do the wrong thing. This needs to be an iterative process, based on close
observation of what ordinary sailors actually do during fleet deployments, and
incorporating improvements and innovations as they become available”.
The rest of the report constitutes a fascinating insight into just how badly a
theoretically secure system that ignores real-world considerations can fail in practice,
with almost every aspect of the system compromised in one way or the other once it
came into contact with the real world. This shows just how important both studying
real users (during the pre-implementation phase) and observing how it’s used once
it’s deployed (during the post-implementation phase) can be in ensuring that the
system actually has the properties that it’s supposed to have.
Post-delivery reviews are important for shaking out emergent properties unanticipated
by the designers that even post-implementation testing with users can’t locate. For
example when the folks who wrote RFC 1738 provided for URLs of the form
user@hostname, they never considered that a malicious party could use this to
Post-delivery Reviews
151
construct URLs like http://www.bankofamerica.com@1234567/, which
points to a server whose numeric IP address is 1234567 while appearing to users to be
a legitimate bank server’s address. Testing in a hostile environment (the real world)
provides additional feedback on secure user interface design. Although it’s unlikely
that attackers will cooperate in performing this type of testing for you, over the years
a large body of knowledge has been established that you can use to ensure that your
application doesn’t suffer from the same weaknesses. Books on secure programming
like
Building Secure Software by John Viega and Gary McGraw and Writing Secure
Software
by Michael Howard and David LeBlanc contain in-depth discussions of
“features” to avoid when you create an application that needs to process or display
security-relevant information.
References
[1] “Paper Prototyping: The Fast and Easy Way to Design and Refine User
Interfaces”, Carolyn Snyder, Morgan Kaufmann, 2003.
[2] “Matching design sketches to the desired level of design feedback”, Jan
Miksovsky, 26 October 2006,
http://miksovsky.blogs.com/-
flowstate/2006/10/using_crude_ske.html
.
[3] “Napkin Look and Feel”,
http://napkinlaf.sourceforge.net/.
[4] “About Face 2.0: : The Essentials of Interaction Design”, Alan Cooper and
Robert Reimann, Jihn Wiley and Sons, 2003.
[5] “PKI Technology Survey and Blueprint”, Peter Gutmann,
Proceedings of the
2006 New Security Paradigms Workshop (NSPW’06)
, October 2006.
[6] “ZoneAlarm: Creating Usable Security Products for Consumers”, Jordy Berson,
in “Security and Usability: Designing Secure Systems That People Can Use”,
O’Reilly, 2005, p.563.
[7] “Inside Deep Thought (Why the UI, Part 6)”, Jensen Harris, 31 October 2005,
http://blogs.msdn.com/jensenh/archive/2005/10/31/-
487247.aspx
.
[8] “ZoneAlarm: Creating Usable Security Products for Consumers”, Jordy Berson,
in “Security and Usability: Designing Secure Systems That People Can Use”,
O’Reilly, 2005, p.563.
[9] “Critical Thinking Skills in Tactical Decision Making: A Model and A Training
Method”, Marvin Cohen, Jared Freeman, and Bryan Thompson, in “Making
Decisions Under Stress: Implications for Individual and Team Training”,
American Psychological Association (APA), 1998, p.155.
[10] “Critical Thinking Skills in Tactical Decision Making: A Model and A Training
Strategy”, Marvin Cohen, Jared Freeman, and Bryan Thompson, “Making
Decisions Under Stress: Implications for Individual and Team Training”,
American Psychological Association (APA), 1998, p.155.
[11] “Sources of Power: How People Make Decisions”, Gary Klein, MIT Press,
1998.
[12] “Why You Only Need to Test With 5 Users”, Jakob Nielsen,
http://www.useit.com/alertbox/20000319.html, March 2000.
[13] “Why Johnny Can't Encrypt: A Usability Evaluation of PGP 5.0”, Alma Whitten
and J. D. Tygar,
Proceedings of the 8th Usenix Security Symposium
(Security’99)
, August 1999, p.169.
[14] “Legal Considerations in Phishing Research”, Beth Cate, in “Phishing and
Countermeasures: Understanding the Increasing Problem of Electronic Identity
Theft”, John Wiley and Sons, 2007.
[15] “Designing and Conducting Phishing Experiments”, Peter Finn and Markus
Jakobsson, in “Phishing and Countermeasures: Understanding the Increasing
Problem of Electronic Identity Theft”, John Wiley and Sons, 2007.
[16] “Johnny 2: A User Test of Key Continuity Management with S/MIME and
Outlook Express”, Simson Garfinkel and Robert Miller,
Proceedings of the 2005
Symposium on Usable Privacy and Security (SOUPS'05)
, July 2005, p.13.
[17] “Social Phishing”, Tom Jagatic, Nathaniel Johnson, Markus Jakobsson, and
Filippo Menczer,
Communications of the ACM, to appear.
Security Usability Testing
152
[18] “User Perceptions of Privacy and Security on the Web”, Scott Flinn and Joanna
Lumsden,
Proceedings of the Third Annual Conference on Privacy, Security and
Trust (PST’05)
, October 2005,
http://www.lib.unb.ca/Texts/PST/2005/pdf/flinn.pdf.
[19] “Cookies and Web Browser Design: Toward Realizing Informed Consent
Online”, Lynette Millett, Batya Friedman, and Edward Felten,
Proceedings of
the 2001 Conference on Human Factors in Computing Systems (CHI’01)
, April
2001, p.46.
[20] “Protecting Browser State”, Collin Jackson, Andres Bortiz, Dan Boneh, and
John Mitchell, in “Phishing and Countermeasures: Understanding the Increasing
Problem of Electronic Identity Theft”, John Wiley and Sons, 2007.
[21] “Social Approaches to End-User Privacy Management”, Jeremy Goecks and
Elizabeth Mynatt, in “Security and Usability: Designing Secure Systems That
People Can Use”, O’Reilly, 2005, p.523.
[22] “Silence on the Wire”, Michal Zalewski, No Starch Press, 2004.
[23] “Cookies for Authentication”, Ari Juels, Markus Jakobson, and Tom Jagatic, in
“Phishing and Countermeasures: Understanding the Increasing Problem of
Electronic Identity Theft”, John Wiley and Sons, 2007.
[24] “Revolution in the Valley”, Andy Hertzfeld, O’Reilly Media Inc, 2005.
[25] Nick Mathewson, private communications.
[26] “Lessons Learned in Implementing and Deploying Crypto Software”, Peter
Gutmann,
Proceedings of the 11
th
Usenix Security Symposium, August 2002,
p.315.
[27] “Design Principles and Patterns for Computer Systems That Are Simultaneously
Secure and Usable”, Simson Garfinkel, PhD thesis, Massachusetts Institute of
Technology, May 2005.
[28] “Case Study: Online Banking Security”, Kjell Hole, Vebjørn Moen, and Thomas
Tjøstheim,
IEEE Security and Privacy, Vol.4, No.2 (March/April 2006), p.14.
[29] “A Usability Study and Critique of Two Password Managers”, Sonia Chasson,
Paul van Oorschot, and Robert Biddle,
Proceedings of the 15
th
Usenix Security
Symposium (Security’06)
, August 2006, p.1.
[30] “Usability and privacy: a study of Kazaa P2P file-sharing”, Nathaniel Good and
Aaron Krekelberg,
Proceedings of the SIGCHI Conference on Human Factors in
Computing Systems
, April 2003, p.137.
[31] “Application Interfaces That Enhance Security”, Apple Computer, 23 May 2006,
http://developer.apple.com/documentation/Security/-
Conceptual/SecureCodingGuide/Articles/-
AppInterfaces.html
.
[32] “Inadvertent Disclosure: Information Leaks in the Extended Enterprise”, M.Eric
Johnson and Scott Dynes,
Proceedings of the 6
th
Workshop on the Economics of
Information Security (WEIS’07)
, June 2007.
[33] “RIAA “extortion”: why the only RICO they fear is Suave”, Eric Bangeman, 6
May 2007,
http://arstechnica.com/news.ars/post/20070506-
riaa-extortion-why-the-only-rico-they-fear-is-
suave.html
.
[34] “Authentication in an Internet Banking Environment”, Federal Financial
Institutions Examination Council, October 2005,
http://www.ffiec.gov/pdf/authentication_guidance.pdf.
[35] “Fraud Vulnerabilities in SiteKey Security at Bank of America”, Jim Youll,
Challenge/Response LLC, 18 July 2006,
http://cr-labs.com/-
publications/SiteKey-20060718.pdf
.
[36] “The Emperor’s New Security Indicators”, Stuart Schechter, Rachna Dhamija,
Andy Ozment, and Ian Fischer,
IEEE Symposium on Security and Privacy, May
2007, to appear.
[37] “[Prg] Malware Case Study”, Secure Science Corporation and Michael Ligh, 13
November 2006,
http://www.securescience.net/FILES/-
securescience/10378/pubMalwareCaseStudy.pdf.
[38] “A Deceit-Augmented Man In The Middle Attack Against Bank of America's
SiteKey® Service”, Christopher Soghoian and Markus Jakobsson, 10 April
Post-delivery Reviews
153
2007, http://paranoia.dubfire.net/2007/04/deceit-
augmented-man-in-middle-attack.html
.
[39] “Gozi Trojan”, Don Jackson, 20 March 2007,
http://www.secureworks.com/research/threats/gozi.
[40] “A Case (Study) For Usability in Secure Email Communication”, Apu Kapadia,
IEEE Security and Privacy, Vol.5, No.2 (March/April 2007), p.80.
[41] “An Analysis of the System Security Weaknesses of the US Navy Fleet
Broadcasting System, 1967-1974, as exploited by CWO John Walker”, Laura
Heath, Master of Military Art and Science thesis, US Army Command and
General Staff College, Ft.Leavenworth, Kansas, 2005.
Security Usability Testing
154