Executive Summary ................................................................i
I. Introduction .................................................................1
II. Life Cycle of Big Data .........................................................3
III. Big Data’s Benefits and Risks ..................................................5
IV. Considerations for Companies in Using Big Data ................................12
A. Potentially Applicable Laws ..................................................12
Questions for Legal Compliance .............................................24
B. Special Policy Considerations Raised by Big Data Research .....................25
Summary of Research Considerations .........................................32
V. Conclusion .................................................................33
Separate Statement of Commissioner Maureen K. Ohlhausen .................... A-1
Executive Summary
We are in the era of big data. With a smartphone now in nearly every pocket, a computer in nearly every
household, and an ever-increasing number of Internet-connected devices in the marketplace, the amount of
consumer data owing throughout the economy continues to increase rapidly.
e analysis of this data is often valuable to companies and to consumers, as it can guide the
development of new products and services, predict the preferences of individuals, help tailor services and
opportunities, and guide individualized marketing. At the same time, advocates, academics, and others have
raised concerns about whether certain uses of big data analytics may harm consumers, particularly low-
income and underserved populations.
To explore these issues, the Federal Trade Commission (“FTC” or “the Commission”) held a public
workshop, Big Data: A Tool for Inclusion or Exclusion?, on September 15, 2014. e workshop brought
together stakeholders to discuss both the potential of big data to create opportunities for consumers and
to exclude them from such opportunities. e Commission has synthesized the information from the
workshop, a prior FTC seminar on alternative scoring products, and recent research to create this report.
ough “big data” encompasses a wide range of analytics, this report addresses only the commercial use
of big data consisting of consumer information and focuses on the impact of big data on low-income and
underserved populations. Of course, big data also raises a host of other important policy issues, such as
notice, choice, and security, among others. ose, however, are not the primary focus of this report.
As “little” data becomes “big” data, it goes through several phases. e life cycle of big data can be
divided into four phases: (1)collection; (2)compilation and consolidation; (3)analysis; and (4)use.
is report focuses on the fourth phase and discusses the benets and risks created by the use of big data
analytics; the consumer protection and equal opportunity laws that currently apply to big data; research in
the eld of big data; and lessons that companies should take from the research. Ultimately, this report is
intended to educate businesses on important laws and research that are relevant to big data analytics and
provide suggestions aimed at maximizing the benets and minimizing its risks.
Big Data’s Benefits and Risks
Big data analytics can provide numerous opportunities for improvements in society. In addition to
more eectively matching products and services to consumers, big data can create opportunities for low-
income and underserved communities. For example, workshop participants and others have noted that big
data is helping target educational, credit, healthcare, and employment opportunities to low-income and
underserved populations. At the same time, workshop participants and others have noted how potential
inaccuracies and biases might lead to detrimental eects for low-income and underserved populations.
For example, participants raised concerns that companies could use big data to exclude low-income and
underserved communities from credit and employment opportunities.
Federal Trade Commission
Consumer Protection Laws Applicable to Big Data
Workshop participants and commenters discussed how companies can use big data in ways that provide
benets to themselves and society, while minimizing legal and ethical risks. Specically, they noted that
companies should have an understanding of the various laws, including the Fair Credit Reporting Act, equal
opportunity laws, and the Federal Trade Commission Act, that may apply to big data practices.
1. Fair Credit Reporting Act
e Fair Credit Reporting Act (“FCRA”) applies to companies, known as consumer reporting agencies
or CRAs, that compile and sell consumer reports, which contain consumer information that is used or
expected to be used for credit, employment, insurance, housing, or other similar decisions about consumers
eligibility for certain benets and transactions. Among other things, CRAs must implement reasonable
procedures to ensure maximum possible accuracy of consumer reports and provide consumers with access to
their own information, along with the ability to correct any errors.
Traditionally, CRAs include credit bureaus, employment background screening companies, and other
specialty companies that provide particularized services for making consumer eligibility decisions, such as
check authorizations or tenant screenings. Some data brokers may also be considered CRAs subject to the
FCRA, particularly if they advertise their services for eligibility purposes. e Commission has entered into
a number of consent decrees with data brokers that advertise their consumer proles for employment and
tenant screening purposes. Companies that use consumer reports also have obligations under the FCRA.
Workshop panelists and commenters discussed a growing trend in big data, in which companies may
be purchasing predictive analytics products for eligibility determinations. Under traditional credit scoring
models, companies compare known credit characteristics of a consumer—such as past late payments—with
historical data that shows how people with the same credit characteristics performed over time in meeting
their credit obligations. Similarly, predictive analytics products may compare a known characteristic of a
consumer to other consumers with the same characteristic to predict whether that consumer will meet his or
her credit obligations. e dierence is that, rather than comparing a traditional credit characteristic, such
as debt payment history, these products may use non-traditional characteristics—such as a consumers zip
code, social media usage, or shopping history—to create a report about the creditworthiness of consumers
that share those non-traditional characteristics, which a company can then use to make decisions about
whether that consumer is a good credit risk. e standards applied to determine the applicability of the
FCRA in a Commission enforcement action, however, are the same.
Only a fact-specic analysis will ultimately determine whether a practice is subject to or violates the
FCRA, and as such, companies should be mindful of the law when using big data analytics to make FCRA-
covered eligibility determinations.
Big Data: A Tool for Inclusion or Exclusion?
2. Equal Opportunity Laws
Companies should also consider a number of federal equal opportunity laws, including the Equal Credit
Opportunity Act (“ECOA”), Title VII of the Civil Rights Act of 1964, the Americans with Disabilities
Act, the Age Discrimination in Employment Act, the Fair Housing Act, and the Genetic Information
Nondiscrimination Act. ese laws prohibit discrimination based on protected characteristics such as race,
color, sex or gender, religion, age, disability status, national origin, marital status, and genetic information.
Of these laws, the FTC enforces ECOA, which prohibits credit discrimination on the basis of race, color,
religion, national origin, sex, marital status, age, or because a person receives public assistance. To prove a
violation of ECOA, plaintis typically must show “disparate treatment” or “disparate impact.” Disparate
treatment occurs when a creditor treats an applicant dierently based on a protected characteristic. For
example, a lender cannot refuse to lend to single persons or oer less favorable terms to them than married
persons even if big data analytics show that single persons are less likely to repay loans than married
persons. Disparate impact occurs when a company employs facially neutral policies or practices that have
a disproportionate adverse eect or impact on a protected class, unless those practices or policies further a
legitimate business need that cannot reasonably be achieved by means that are less disparate in their impact.
For example, if a company makes credit decisions based on consumers’ zip codes, such decisions may have a
disparate impact on particular ethnic groups because certain ethnic groups are concentrated in particular zip
codes. Accordingly, the practice may be a violation of ECOA. e analysis turns on whether the decisions
have a disparate impact on a protected class and are not justied by a legitimate business necessity. Even if
evidence shows the decisions are justied by a business necessity, if there is a less discriminatory alternative,
the decisions may still violate ECOA.
Workshop discussions focused on whether advertising could implicate equal opportunity laws. In most
cases, a companys advertisement to a particular community for a credit oer that is open to all to apply
is unlikely, by itself, to violate ECOA, absent disparate treatment or an unjustied disparate impact in
subsequent lending. Nevertheless, companies should proceed with caution in this area. For advertisements
relating to credit products, companies should look to Regulation B, which is the implementing regulation
for ECOA. It prohibits creditors from making oral or written statements, in advertising or otherwise,
to applicants or prospective applicants that would discourage on a prohibited basis a reasonable person
from making or pursuing an application. With respect to prescreened solicitations, Regulation B also
requires creditors to maintain records of the solicitations and the criteria used to select potential recipients.
Advertising and marketing practices could impact a creditor’s subsequent lending patterns and the terms and
conditions of the credit received by borrowers, even if credit oers are open to all who apply. In some cases,
the Department of Justice has cited a creditors advertising choices as evidence of discrimination.
Federal Trade Commission
Ultimately, as with the FCRA, whether a practice is unlawful under equal opportunity laws is a
case-specic inquiry, and as such, companies should proceed with caution when their practices could result
in disparate treatment or have a demonstrable disparate impact based on protected characteristics.
3. The Federal Trade Commission Act
Workshop participants and commenters also discussed the applicability of Section 5 of the Federal Trade
Commission Act (“FTC Act”), which prohibits unfair or deceptive acts or practices, to big data analytics.
Companies engaging in big data analytics should consider whether they are violating any material promises
to consumers—whether that promise is to refrain from sharing data with third parties, to provide consumers
with choices about sharing, or to safeguard consumers’ personal information—or whether they have failed
to disclose material information to consumers. In addition, companies that maintain big data on consumers
should take care to reasonably secure consumers’ data. Further, at a minimum, companies must not sell
their big data analytics products to customers if they know or have reason to know that those customers will
use the products for fraudulent or discriminatory purposes. e inquiry will be fact-specic, and in every
case, the test will be whether the company is oering or using big data analytics in a deceptive or unfair way.
Research on Big Data
Workshop participants, academics, and others also addressed the ways big data analytics could aect
low-income, underserved populations, and protected groups. Some pointed to research that demonstrates
that there is a potential for incorporating errors and biases at every stage—from choosing the data set used
to make predictions, to dening the problem to be addressed through big data, to making decisions based
on the results of big data analysis—which could lead to potential discriminatory harms. Others noted that
these concerns are overstated or simply not new, and emphasized that rather than disadvantaging minorities,
big data can create opportunities for low-income and underserved populations.
To maximize the benets and limit the harms of big data, the Commission encourages companies to
consider the following questions raised by research in this area:
How representative is your data set? Companies should consider whether their data
sets are missing information about certain populations, and take steps to address issues of
underrepresentation and overrepresentation. For example, if a company targets services to consumers
who communicate through an application or social media, they may be neglecting populations that
are not as tech-savvy.
Does your data model account for biases? Companies should consider whether biases are being
incorporated at both the collection and analytics stages of big datas life cycle, and develop strategies
to overcome them. For example, if a company has a big data algorithm that only considers
applicants from “top tier” colleges to help them make hiring decisions, they may be incorporating
previous biases in college admission decisions.
Big Data: A Tool for Inclusion or Exclusion?
How accurate are your predictions based on big data? Companies should remember that while
big data is very good at detecting correlations, it does not explain which correlations are meaningful.
A prime example that demonstrates the limitations of big data analytics is Google Flu Trends, a
machine-learning algorithm for predicting the number of u cases based on Google search terms.
While, at rst, the algorithms appeared to create accurate predictions of where the u was more
prevalent, it generated highly inaccurate estimates over time. is could be because the algorithm
failed to take into account certain variables. For example, the algorithm may not have taken into
account that people would be more likely to search for u-related terms if the local news ran a story
on a u outbreak, even if the outbreak occurred halfway around the world.
Does your reliance on big data raise ethical or fairness concerns? Companies should assess the
factors that go into an analytics model and balance the predictive value of the model with fairness
considerations. For example, one company determined that employees who live closer to their jobs
stay at these jobs longer than those who live farther away. However, another company decided
to exclude this factor from its hiring algorithm because of concerns about racial discrimination,
particularly since dierent neighborhoods can have dierent racial compositions.
e Commission encourages companies to apply big data analytics in ways that provide benets
and opportunities to consumers, while avoiding pitfalls that may violate consumer protection or equal
opportunity laws, or detract from core values of inclusion and fairness. For its part, the Commission will
continue to monitor areas where big data practices could violate existing laws, including the FTC Act, the
FCRA, and ECOA, and will bring enforcement actions where appropriate. e Commission will also
continue to examine and raise awareness about big data practices that could have a detrimental impact on
low-income and underserved populations, and promote the use of big data that has a positive impact on
such populations.
Federal Trade Commission
Big Data: A Tool for Inclusion or Exclusion?
I. Introduction
e era of big data has arrived. While companies historically have collected and used information about
their customer interactions to help improve their operations, the expanding use of online technologies has
greatly increased the amount of consumer data that ows throughout the economy. In many cases, when
consumers engage digitally—whether by shopping, visiting websites, paying bills, connecting with family
and friends through social media, using mobile applications, or using connected devices, such as tness
trackers or smart televisions—companies collect information about their choices, experiences, and individual
characteristics. e analysis of this consumer information is often valuable to companies and to consumers,
as it provides insights into market-wide tastes and emerging trends, which can guide the development of new
products and services. It is also valuable to predict the preferences of specic individuals, help tailor services,
and guide individualized marketing of products and services.
e term “big data” refers to a conuence of factors, including the nearly ubiquitous collection of
consumer data from a variety of sources, the plummeting cost of data storage, and powerful new capabilities
to analyze data to draw connections and make inferences and predictions.
A common framework for characterizing big data relies on the “three Vs,” the volume, velocity, and
variety of data, each of which is growing at a rapid rate as technological advances permit the analysis and use
of this data in ways that were not possible previously.
Volume refers to the vast quantity of data that can
be gathered and analyzed eectively. e costs of collecting and storing data continue to drop dramatically.
And the ability to access millions of data points increases the predictive power of consumer data analysis.
Federal Trade Commission
Velocity is the speed with which companies can accumulate, analyze, and use new data. Technological
improvements allow companies to harness the predictive power of data more quickly than ever before,
sometimes instantaneously.
Variety means the breadth of data that companies can analyze eectively. Companies can now combine
very dierent, once unlinked, kinds of data—either on their own or through data brokers or analytics
rms—to infer consumer preferences and predict consumer behavior, for example.
Together, the three Vs allow for more robust research and correlation. Previously, nding a
representative data sample sucient to produce statistically signicant results could be very dicult and
expensive. Today, the present scope and scale of data collection enables cost-eective, substantial research of
even obscure or mundane topics (e.g., the amount of foot trac in a park at dierent times of day).
Big data can produce tremendous benets for society, such as advances in medicine, education, health,
and transportation, and in many instances, without using consumers’ personally identiable information.
Big data also can allow companies to improve their oerings, provide consumers with personalized goods
and services, and match consumers with products they are likely to be interested in. At the same time,
advocates, academics, and others have raised concerns about whether certain uses of big data analytics may
harm consumers. For example, if big data analytics incorrectly predicts that particular consumers are not
likely to respond to prime credit oers, certain types of educational opportunities, or job openings requiring
a college degree, companies may miss a chance to reach individuals that desire this information. In addition,
if big data analytics incorrectly predicts that particular consumers are not good candidates for prime credit
oers, educational opportunities, or certain lucrative jobs, such educational opportunities, employment, and
credit may never be oered to these consumers. Some fear that such incorrect predictions could perpetuate
existing disparities.
To examine these issues, the Federal Trade Commission (“FTC” or “the Commission”) held a public
workshop, Big Data: A Tool for Inclusion or Exclusion?, on September 15, 2014.
In particular, the workshop
explored the potential impact of big data on low-income and underserved populations. e workshop
brought together academics, government representatives, consumer advocates, industry representatives,
legal practitioners, and others to discuss the potential of big data to create opportunities for consumers or
exclude them from such opportunities. e workshop consisted of four panels addressing the following
topics: (1)current uses of big data; (2)potential uses of big data; (3)the application of equal opportunity
and consumer protection laws to big data; and (4)best practices to enhance consumer protection in the use
of big data. e Commission also received sixty-ve public comments on these issues from private citizens,
industry representatives, trade groups, consumer and privacy advocates, think tanks, and academics.
Big Data: A Tool for Inclusion or Exclusion?
e Commission has synthesized the discussions and comments from the workshop—along with
the record from a prior FTC seminar on alternative scoring products
and recent research—to create this
report, which focuses on the impact of big data on low-income and underserved populations. e report is
divided into four sections. First, the report describes the “life cycle” of big data and how “little” data turns
into big data. Second, it discusses some of the benets and risks created by the use of big data. ird, it
describes some of the consumer protection laws that currently apply to big data. Finally, it discusses certain
research in the eld of big data and lessons that companies should take from the research in order to help
them maximize the benets of big data while mitigating risks. Importantly, though the term “big data
encompasses a wide range of analytics, this report addresses only the commercial use of big data consisting of
consumer information.
II. Life Cycle of Big Data
e life cycle of big data can be divided into four phases: (1) collection; (2) compilation and
consolidation; (3) data mining and analytics; and (4) use.
As to the rst step, not all data starts as big data. Rather, companies collect bits of data from a variety
of sources.
For example, as consumers browse the web or shop online, companies can track and link their
activities. Sometimes consumers log into services or identify themselves when they make a purchase. Other
Democracy, & Edmund Mierzwinski, U.S. PIRG Educ. Fund, to Fed. Trade Comm’n (May 9, 2014), https://www.ftc.gov/
Big Data: A Tool for Inclusion or Exclusion?
of statistical models to generate new data.
Developing and testing the models that nd patterns and make
predictions can require the collection and use of copious amounts of data.
In a market context, a common
purpose of big data analytics is to draw inferences about consumers’ likely choices. Companies may decide
to adopt big data analytics to better understand consumers, potentially by using data to attribute to an
individual the qualities of those who appear statistically similar, e.g., those who have made similar decisions
in similar situations in the past. us, a retail rm might use data on its customers’ past purchases, web
searches, shopping habits, and prices paid to create a statistical model of consumers’ purchases at dierent
prices. With that model, the retailer could then compare a prospective consumer’s characteristics or past
purchases, web searches, and location information to predict how likely the consumer is to purchase a
product at various price points.
e nal step in the life cycle of big data is use. e Commissions May 2014 report entitled Data
Brokers: A Call for Transparency and Accountability focused on the rst three steps in the life cycle of big data
within that industry—collection, compilation, and analytics.
It discussed how information gathered for
one purpose (e.g., paying for goods and services) could be compiled and analyzed for other purposes, such as
for marketing or risk mitigation. In contrast, this report focuses on certain uses of big data. It examines the
question of how companies use big data to help consumers and the steps they can take to avoid inadvertently
harming consumers through big data analytics.
III. Big Data’s Benefits and Risks
Companies have been analyzing data from their own customer interactions on a smaller scale for many
years, but the era of big data is still in its infancy.
As a result, mining large data sets to nd useful, non-
obvious patterns is a relatively new but growing practice in marketing, fraud prevention, human resources,
and a variety of other elds. Companies are still learning how to deal with big data and unlock its potential
while avoiding unintended or unforeseen consequences.
Appropriately employing big data algorithms on data of sucient quality can provide numerous
opportunities for improvements in society. In addition to the market-wide benets of more eciently
matching products and services to consumers, big data can create opportunities for low-income and
21 See, e.g., Big Data Tr. 31–32 (Gene Gsell), 32–33 (Joseph Turow), 34 (Mallory Duncan), 107–08 (Pamela Dixon).
22 See, e.g., Big Data Tr. 31–32 (Gene Gsell), 32–33 (Joseph Turow), 78 (danah boyd), 233 (Michael Spadea).
Federal Trade Commission
underserved communities.
Workshop participants and others have noted that big data is already being
used to:
Increase educational attainment for individual students. Educational institutions have used
big data techniques to identify students for advanced classes who would otherwise not have been
eligible for such classes based on teacher recommendations alone.
ese institutions have also used
big data techniques to help identify students who are at risk of dropping out and in need of early
intervention strategies.
Similarly, organizations have used big data analytics to demonstrate how
certain disciplinary practices, such as school suspensions, aect African-American students far more
than Caucasian students, thereby partly explaining the large discrepancy between the graduation
rates of these two groups.
Provide access to credit using non-traditional methods. Companies have used big data to provide
alternative ways to score populations that were previously deemed unscorable.
For example,
LexisNexis has created an alternative credit score called RiskView.
is product relies on traditional
public record information, such as foreclosures and bankruptcies, but it also includes educational
history, professional licensure data, and personal property ownership data. us, consumers who
may not have access to traditional credit, but, for instance, have a professional license, pay rent
on time, or own a car, may be given better access to credit than they otherwise would have.
Furthermore, big data algorithms could help reveal underlying disparities in traditional credit
markets and help companies serve creditworthy consumers from any background.
Provide healthcare tailored to individual patients’ characteristics. Organizations have used big
data to predict life expectancy, genetic predisposition to disease, likelihood of hospital readmission,
and likelihood of adherence to a treatment plan in order to tailor medical treatment to an
individual’s characteristics.
is, in turn, has helped healthcare providers avoid one-size-ts-all
treatments and lower overall healthcare costs by reducing readmissions.
Ultimately, data sets
with richer and more complete data should allow medical practitioners more eectively to perform
precision medicine,” an approach for disease treatment and prevention that considers individual
variability in genes, environment, and lifestyle.
Provide specialized healthcare to underserved communities. IBM, for example, has worked with
hospitals to develop an Oncology Diagnosis and Treatment Advisor. is system synthesizes vast
amounts of data from textbooks, guidelines, journal articles, and clinical trials to help physicians
make diagnoses and identify treatment options for cancer patients. In rural and low-income areas,
where there is a shortage of specialty providers, IBM’s Oncology Diagnosis and Treatment Advisor
can provide underserved communities with better access to cancer care and lower costs.
Increase equal access to employment. Companies have used big data to help promote a more
diverse workforce.
Google, for example, recognized that its traditional hiring process was
resulting in a homogenous work force. rough analytics, Google identied issues with its hiring
process, which included an emphasis on academic grade point averages and “brainteaser” questions
during interviews. Google then modied its interview practices and began asking more structured
behavioral questions (e.g., how would you handle the following situation?).
is new approach
helped ensure that potential interviewer biases had less eect on hiring decisions.
While recognizing these potential benets, some researchers and others have expressed concern that the
use of big data analytics to make predictions may exclude certain populations from the benets society and
markets have to oer. is concern takes several forms. First, some workshop participants and commenters
expressed concerns about the quality of data, including its accuracy, completeness, and representativeness.
Another concern is that there are uncorrected biases in the underlying consumer data.
For example,
one academic has argued that hidden biases in the collection, analysis, and interpretation stages present
considerable risks.
If the process that generated the underlying data reects biases in favor of or against
certain types of individuals, then some statistical relationships revealed by that data could perpetuate those
biases. When not recognized and addressed, poor data quality can lead to inaccurate predictions, which in
turn can lead to companies erroneously denying consumers oers or benets. Although the use of inaccurate
or biased data and analysis to justify decisions that have harmed certain populations is not new,
commenters worry that big data analytics may lead to wider propagation of the problem and make it more
dicult for the company using such data to identify the source of discriminatory eects and address it.
36 See, e.g., Big Data Tr. 251 (Christopher Wolf). See also Future of Privacy Forum Comment #00027, supra note 23, attached
report entitled, B D: A T  F D  E G, at 2; David Amerland, 3 Ways
Big Data Changed Google’s Hiring Process, F (Jan. 21, 2014), http://www.forbes.com/sites/netapp/2014/01/21/big-data-
google-hiring-process/; Adam Bryant, In Head-Hunting, Big Data May Not Be Such a Big Deal, N.Y. T (June 19, 2013),
Big Data: A Tool for Inclusion or Exclusion?
Second, while big data may be highly eective in showing correlations, it is axiomatic that correlation is
not causation.
Indeed, with large enough data sets, one can generally nd some meaningless correlations.
For example, in eighteen out of the past twenty U.S. Presidential elections, if the Washington, D.C.
professional football team won its last home game before the election, the incumbents party continued to
hold the presidency; if the team lost that last home game, the out-of-oce party unseated the incumbent
Other examples of spurious correlations abound.
If companies use correlations to make decisions
about people without understanding the underlying reasons for the correlations, those decisions might be
faulty and could lead to unintended consequences or harm for consumers and companies.
Ultimately, all of these concerns feed into the larger concern about whether big data may be used to
categorize consumers in ways that can result in exclusion of certain populations. Workshop participants and
others have noted how potential inaccuracies and biases might lead to detrimental eects for low-income and
underserved populations.
According to these commenters, particular uses of big data may:
Result in more individuals mistakenly being denied opportunities based on the actions of
others. Participants raised concerns that big data can lead to decision-making based on the actions
of others with whom consumers share some characteristics.
Several commenters explained that
some credit card companies have lowered a customer’s credit limit, not based on the customer’s
payment history, but rather based on analysis of other customers with a poor repayment history that
had shopped at the same establishments where the customer had shopped.
Indeed, one credit card
company settled FTC allegations that it failed to disclose its practice of rating consumers as having a
greater credit risk because they used their cards to pay for marriage counseling, therapy, or tire-repair
services, based on its experiences with other consumers and their repayment histories.
Using this
type of a statistical model might reduce the cost of credit for some individuals, but may also result
in some creditworthy consumers being denied or charged more for credit than they might otherwise
have been charged.
Create or reinforce existing disparities. Participants raised concerns that when big data is used to
target ads, particularly for nancial products, low-income consumers who may otherwise be eligible
for better oers may never receive them.
Expose sensitive information. Participants also raised concerns about the potential exposure
of characteristics that people may view as sensitive.
For example, one study combined data on
Facebook “Likes” and limited survey information to determine that researchers could accurately
predict a male users sexual orientation 88 percent of the time; a users ethnic origin 95 percent of
time; and whether a user was Christian or Muslim (82 percent), a Democrat or Republican (85
percent), or used alcohol, drugs, or cigarettes (between 65 percent and 75 percent).
Assist in the targeting of vulnerable consumers for fraud. Unscrupulous companies can use big
data to oer misleading oers or scams to the most vulnerable prospects.
According to public
reports, unscrupulous companies can obtain lists of people who reply to sweepstakes oers and thus
are more likely to respond to enticements, as well as lists of “suering seniors” who are identied
as having Alzheimer’s or similar maladies.
Big data analytics allows companies to more easily and
accurately identify such vulnerable prospects.
Create new justications for exclusion. Big data analytics may give companies new ways to
attempt to justify their exclusion of certain populations from particular opportunities. For example,
one big data analytics study showed that “people who ll out online job applications using browsers
that did not come with the computer . . . but had to be deliberately installed (like Firefox or Google’s
Chrome) perform better and change jobs less often.
If an employer were to use this correlation
to refrain from hiring people who used a particular browser, they could be excluding qualied
applicants for reasons unrelated to the job at issue.
Result in higher-priced goods and services for lower income communities. Some commentators
have raised concerns about potential eects on prices on lower income communities.
For example,
research has shown that online companies may charge consumers in dierent zip codes dierent
prices for standard oce products.
If such pricing results in consumers in poorer neighborhoods
having to pay more for online products than consumers in auent communities, where there is
more competition from brick-and-mortar stores, these poorer communities would not realize the full
competition benet of online shopping.
Weaken the eectiveness of consumer choice. Some researchers have argued that, even when
companies oer consumers choices about data collection, the companies may still use big data to
draw inferences about consumers who choose to restrict the collection of their data.
Indeed, using
data from consumers who opt in or decline to opt out, big data algorithms can still be employed to
infer information about similarly-situated individuals who chose not to share their data.
As these examples show, big data oers companies the opportunity to facilitate inclusion or exclusion.
Companies can use big data to advance education, credit, and employment opportunities for low-income
communities or to exclude them from these opportunities. ey can use big data to target products to those
who are most interested or to target products in ways that could exclude certain populations. e remainder
of this report is intended to guide companies on some of the laws that may apply when using big data, raise
awareness about the ethical implications of using big data, and to highlight potential biases that companies
should consider as they use big data.
IV. Considerations for Companies in Using Big Data
e challenge for companies is not whether they should use big data; indeed, the reality of todays
marketplace is that big data now fuels the creation of innovative products and systems that consumers and
companies quickly are coming to rely upon and expect. Rather, the challenge is how companies can use big
data in a way that benets them and society, while minimizing legal and ethical risks.
In assessing risks, companies should rst have an understanding of the laws that may apply to big data
practices. Second, they should be aware of important research in the eld of big data aimed at identifying
potential biases and inaccuracies. is section provides a starting point for companies using big data
analytics. It is not intended to provide an exhaustive list of considerations. Rather, companies using big
data should consider the issues raised in this report as they engage in big data practices and build on the
questions posed to examine the legal, privacy, and ethical implications of their work.
A. Potentially Applicable Laws
e following section describes some of the laws that may apply to big data practices.
Although the
laws discussed do not address every potential misuse, as noted above, this report is not intended to identify
61 See, e.g., Big Data Tr. 38 (Kristin Amerling), 45–47, 69–70 (David Robinson), 95, 120–22 (Stuart Pratt), 99, 108 (Pamela
Dixon), 268 (Christopher Calabrese), 163–213 (Leonard Chanin, Carol Miasko, Montserrat Miller, C. Lee Peeler,
and Peter Swire in conversation); Alternative Scoring Tr. 36–37, 71 (Stuart Pratt). See generally Comment #00075 from
Michelle De Mooy, Ctr. for Democracy & Tech., to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/
les/documents/public_comments/2014/10/00075-92928.pdf; Comment #00068 from Julie Kearney & Alexander
Reynolds, Consumer Elecs. Assoc., to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/les/documents/
public_comments/2014/10/00068-92917.pdf; Software & Info. Indus. Assoc. Comment #00067, supra note 2; Future
of Privacy Forum Comment #00065, supra note 2; Direct Mktg. Assoc. Comment #00063, supra note 23; Comment
#00062 from David Homan, Intel Corp., to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/les/
documents/public_comments/2014/10/00062-92887.pdf; Comment #00061 from Je Chester, Ctr. for Dig. Democracy,
& Edmund Mierzwinski, U.S. PIRG Educ. Fund, to Fed. Trade Comm’n (Oct. 29, 2014), https://www.ftc.gov/system/les/
documents/public_comments/2014/10/00061-92886.pdf; Comment #00059 from Laura Murphy & Rachel Goodman,
Am. Civil Liberties Union, to Fed. Trade Comm’n (Oct. 27, 2014), https://www.ftc.gov/system/les/documents/public_
comments/2014/10/00059-92874.pdf; Ctr. for Data Innovation Comment #00026, supra note 8; Comment #00025 from
Dennis Hirsch, Cap. Univ. L. Sch., to Fed. Trade Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/les/documents/
public_comments/2014/08/00025-92435.pdf; Comment #00021 from U.S. Chamber of Commerce, to Fed. Trade
Comm’n (Aug. 15, 2014), https://www.ftc.gov/system/les/documents/public_comments/2014/08/00021-92389.pdf;
Comment #00020 from Jim Halpert, Internet Commerce Coal., to Fed. Trade Comm’n (Aug. 15, 2014), https://
Big Data: A Tool for Inclusion or Exclusion?
legal or policy gaps; rather, it attempts to guide companies on laws, such as the Fair Credit Reporting Act,
equal opportunity laws, and the Federal Trade Commission Act, that may apply to big data practices.
1. The Fair Credit Reporting Act
e FTC has the authority to enforce compliance with the Fair Credit Reporting Act (“FCRA”).
e FCRA applies to companies, known as consumer reporting agencies or CRAs, that compile and sell
consumer reports, which contain consumer information that is used or expected to be used for credit,
employment, insurance, housing, or other similar decisions about consumers’ eligibility for certain benets
and transactions.
Among other things, CRAs must implement reasonable procedures to ensure maximum
possible accuracy of consumer reports
and provide consumers with access to their own information, along
with the ability to correct any errors.
CRAs can only provide consumer reports to those entities that will
use them for certain specied permissible purposes, such as for credit, employment, insurance, or housing
eligibility determinations.
Traditionally, CRAs include credit bureaus, employment background screening companies, and other
specialty companies that provide particularized services for making consumer eligibility decisions, such as
check authorizations or tenant screenings. Some data brokers that compile non-traditional information,
including social media information, may also be considered CRAs subject to the FCRA, as demonstrated
by the Commissions enforcement actions. For example, the Commission entered into a consent decree
with online data broker Spokeo to resolve allegations that the company violated the FCRA.
As set forth
in the FTC’s complaint, Spokeo assembled personal information from hundreds of online and oine data
sources, including social networks, and merged that data to create detailed personal proles, including name,
address, age range, hobbies, ethnicity, and religion, and marketed these proles for use by human resources
departments in making hiring decisions.
Based on the allegations that the company marketed consumer
proles specically for employment purposes, the Commission charged that Spokeo was subject to, but had
failed to comply with, the FCRA. Accordingly, the FTC entered into a consent decree that required Spokeo
to pay $800,000 in civil penalties.
In another matter, the Commission alleged that the data broker Instant Checkmate advertised potential
uses of its consumer data for employment and tenant screening purposes, both through its website and
through blog posts, but did not comply with the FCRA.
According to the complaint, the company
used a Google AdWords campaign to display ads for its services that would appear in search results when
consumers sought background checks on “nannies,” “babysitters,” “maids,” and “housekeepers.” us, the
Commission alleged that the company was subject to the FCRA, entered into a consent order to ensure
future compliance, and obtained $550,000 in civil penalties.
In both Spokeo and Instant Checkmate, the
companies included a disclaimer on their websites stating that they were not CRAs and that users could not
use their data for eligibility purposes. ese disclaimers were not eective in insulating the companies from
FTC enforcement. As these cases demonstrate, the scope of the FCRA extends beyond traditional credit
Companies that use consumer reports also have obligations under the FCRA. ey must, among
other things, provide consumers with “adverse action” notices if the companies use the consumer report
information to deny credit, insurance, employment, housing, or certain other covered benets.
companies that use consumer reports must provide “risk-based pricing” notices if they charge consumers
more to obtain credit or insurance based on consumer report information.
e purpose of both types
of notices is to enable consumers to check their consumer reports and correct any inaccuracies.
Commission has brought actions against various companies for violation of these provisions.
For example,
determine whether to require deposits on consumers’ cable bills.
e complaint alleged that consumers
who were charged a deposit should have received a risk-based pricing notice informing them that the
charge was based on information in their consumer report. e consent order barred Time Warner Cable
from future violations of the Risk-Based Pricing Rule and required the company to pay $1.9 million in
civil penalties.
In addition, in 2015, the Commission brought an action against Sprint alleging that the
company failed to give proper risk-based pricing notices to consumers who were placed in a program for
customers with lower credit scores and charged an extra monthly fee.
e consent order requires Sprint to
pay a $2.95 million penalty and to give timely notice to consumers placed in such a program.
e FCRA, however, does not apply to companies when they use data derived from their own
relationship with their customers for purposes of making decisions about them.
But if an unaliated
rm regularly evaluates companies’ own data and provides the evaluations to the companies for eligibility
determinations, the unaliated rm would likely be acting as a CRA, each company would likely be a
user of consumer reports, and all of these entities would be subject to Commission enforcement under the
Workshop panelists and commenters discussed a growing trend in big data, in which companies may
be purchasing predictive analytics products for eligibility determinations.
Under traditional credit scoring
the consumer and the reporting entity is also within the exception.”).
81 See, e.g., Big Data Tr. 38 (Kristin Amerling), 69–70 (David Robinson), 99–100 (Pamela Dixon); Alternative Scoring Tr.
100–101 (Pamela Dixon). See also Nat’l Consumer L. Ctr. Comment #00018, supra note 1, at 20–23; World Privacy
Forum Comment #00014, supra note 19, at 19–21; Ctr. for Dig. Democracy & U.S. PIRG Educ. Fund Comment
#00003, supra note 8, at 13–15; Comment #00006 from Je Chester, Ctr. for Dig. Democracy, & Edmund Mierzwinski,
models, companies compare known credit characteristics of a consumer—such as past late payments—with
historical data that shows how people with the same credit characteristics performed over time in meeting
their credit obligations. Similarly, predictive analytics products may compare a known characteristic of a
consumer to other consumers with the same characteristic to predict whether that consumer will meet his or
her credit obligations. e dierence is that, rather than comparing a traditional credit characteristic, such
as debt payment history, these products may use non-traditional characteristics—such as a consumers zip
code, social media usage, or shopping history—to create a report about the creditworthiness of consumers
that share those non-traditional characteristics, which a company can then use to make decisions about
whether that consumer is a good credit risk.
e standards applied to determine the applicability of the
FCRA, however, are the same.
In exercising its enforcement authority, the Commission looks to the FCRAs denition of a “consumer
report.” e FCRA denes a consumer report as a communication from a CRA (1)bearing on a consumers
personal characteristics or mode of living
(2)that “is used or expected to be used...for the purpose of
serving as a factor in establishing the consumer’s eligibility.
Under this denition, the communication
must be prepared or provided to others to make an eligibility determination about a particular consumer.
Suppose a company asks a consumer to provide her zip code and information about her social media
and shopping behavior on a credit application, strips the consumers identifying information, and sends
the application to an analytics rm. e rm then analyzes the creditworthiness of people in the same zip
code with similar social media and shopping behaviors as the consumer and provides that analysis—be it,
for example, in the form of a score, a grade, or a recommendation—to the company, knowing that it is to
be used for a credit decision. Because the company is using information about the consumer to generate an
analysis of a group that shares some characteristics with the consumer and then is using that analysis to make
a decision about the consumer, the Commission would likely regard the analysis to be a consumer report,
and FCRA requirements and protections would likely apply.
U.S. PIRG Educ. Fund, to Fed. Trade Comm’n (Mar. 18, 2014), https://www.ftc.gov/system/les/documents/public_
82 See, e.g., Big Data Tr. 69–70 (David Robinson) (noting that these “thinly aggregated scores . . . may be used to lower
[consumers’] credit limits”); 99–100 (Pamela Dixon) (noting that these scores are “problematic for ensuring privacy and
fairness” because they rely on “[un]regulated data”); Alternative Scoring Tr. 94 (Pamela Dixon) (describing “cohort scoring,
which is a type of score based on a consumers social media friends). See also World Privacy Forum Comment #00014, supra
note 19, at 32–38. But see supra text accompanying notes 27–30 (explaining how big data analytics can be used to expand
credit availability).
83 As noted in Trans Union Corp. v. FTC, this part of the test is not a very demanding one, for almost any information about
consumers arguably bears on their personal characteristics or mode of living. 81 F.3d 228, 231 (D.C. Cir. 1996).
84 15 U.S.C. § 1681a(d)(1) (emphasis added).
85 In 2011, FTC sta issued the  Y FCRA R. In that report, sta stated that “[i]nformation that does not identify
a specic consumer does not constitute a consumer report even if the communication is used in part to determine eligibility.
 Y FCRA R, supra note 80, at 20. e Commission does not believe that this statement is accurate. If a report
is crafted for eligibility purposes with reference to a particular consumer or set of particular consumers (e.g., those that have
In contrast, if a company uses an analytics rms report simply to inform its general policies, then the
Commission would likely not regard the report to be a consumer report under the FCRA because such a
general report does not relate to a particular consumer. For example, if an analytics rms report simply
provides an “aggregate credit score” for every zip code in the United States, a company nds the report
through a search engine, and the company uses the report to inform its policies, the Commission would
likely not consider the analytics rms report to be a consumer report or the analytics rm to be a CRA.
As noted above, it is well settled under the FCRA that when a company denies a consumer credit,
or charges a higher price for credit, based on information from a CRA, the company must provide the
consumer with an adverse action notice. But a creditor may still have obligations under the FCRA even
in cases where the creditor obtains information from a company other than a CRA. Section 615(b) of the
FCRA provides that, when a company denies a consumer credit, or charges a higher price for credit, based
on information from a person other than a CRA, the consumer may request, in writing, that the company
disclose to him or her the nature of the information leading to the denial or increase in charge.
continuing with the example above, even if a store nds a general analytics company report through a search
engine and then uses the report to inform its credit granting policies, the store would have to disclose the
nature of the report upon the consumers request if the consumer’s application for credit is denied or the
charge for such credit is increased as a result of reliance on the report.
Only a fact-specic analysis will ultimately determine whether a practice is subject to or violates
the FCRA, and as such, companies should be mindful of the law when using big data analytics to make
FCRA-covered eligibility determinations.
2. Equal Opportunity Laws
When engaging in big data analytics, companies should also consider federal equal opportunity laws,
including the Equal Credit Opportunity Act (“ECOA”),
Title VII of the Civil Rights Act of 1964,
applied for credit), the Commission will consider the report a consumer report even if the identifying information of the
consumer has been stripped.
86 Companies that determine eligibility based on zip codes should exercise caution. Such a practice could still implicate equal
opportunity laws, if that policy has a disproportionate adverse eect or impact on a protected class, unless those practices or
policies further a legitimate business need that cannot reasonably be achieved by means that are less disproportionate in their
impact. See discussion infra Part IV.A.2.
87 See 15 U.S.C. §1681m(b).
88 15 U.S.C. §§ 1691 et seq. (2014). In addition to prohibiting discrimination, ECOA and Regulation B include other
requirements that may be implicated by business practices that utilize big data analytics. Informing credit applicants
about adverse actions related to applications for credit and identifying the specic reasons an adverse action was taken
may be challenging when those reasons implicate big data analytics. See 12 C.F.R. § 1002.9. Lenders may also need to
review Regulation B requirements on how information is obtained and retained in the credit application process. See
12 C.F.R. § 1002.5(b)–(d), 1002.12(a)(2).
89 42 U.S.C. §§ 2000e et seq. (2014). e Civil Rights Act of 1964 also applies to education, voting, and public
the Americans with Disabilities Act,
the Age Discrimination in Employment Act (“ADEA”),
the Fair
Housing Act (“FHA”),
and the Genetic Information Nondiscrimination Act (“GINA”).
ese laws
prohibit discrimination based on protected characteristics such as race, color, sex or gender, religion, age,
disability status, national origin, marital status, and genetic information.
Companies should review these laws and take steps to ensure their use of big data analytics complies
with the discrimination prohibitions that may apply. is section discusses some examples of relevant
considerations under these laws related to employment and credit, as highlighted in the workshop.
To prove a violation of federal equal credit or employment opportunity laws, plaintis typically must
show “disparate treatment” or “disparate impact.
Disparate treatment occurs when an entity, such as
a creditor or employer, treats an applicant dierently based on a protected characteristic such as race or
national origin.
Systemic disparate treatment occurs when an entity engages in a pattern or practice of
dierential treatment on a prohibited basis.
In some cases, the unlawful dierential treatment could be
based on big data analytics.
For example, an employer may not disfavor a particular protected group
because big data analytics show that members of this protected group are more likely to quit their jobs
within a ve-year period.
Similarly, a lender cannot refuse to lend to single persons or oer less favorable
terms to them than married persons even if big data analytics show that single persons are less likely to repay
loans than married persons. Evidence of such violations could include direct evidence of the reasons for
the companys choices, or circumstantial evidence, such as signicant statistical disparities in outcomes for
protected groups that are unexplained by neutral factors.
90 42 U.S.C. §§ 12101 et seq. (2014).
91 29 U.S.C. §§ 621 et seq. (2014).
92 42 U.S.C. §§ 3601 et seq. (2014).
93 42 U.S.C. §§ 2000 et seq. (2014). GINA also applies to health insurance.
94 A number of dierent agencies have the authority to enforce the various equal opportunity laws. e Equal Employment
Opportunity Commission, for example, is responsible for enforcing Title VII of the Civil Rights Act of 1964 (along with
the Department of Justice (“DOJ”)), the Age Discrimination in Employment Act of 1967, and GINA. e Department of
Housing and Urban Development and the DOJ enforce the FHA. e FTC, DOJ, and the Consumer Financial Protection
Bureau (“CFPB”), among other agencies, enforce ECOA and its implementing Regulation B.
95 See, e.g., Big Data Tr. 168–170 (Carol Miasko). Disparate impact claims are not permitted under Title II of GINA.
Background Information for EEOC Notice of Proposed Rulemaking on Title II of the Genetic Information Nondiscrimination Act
of 2008, U.S. E E O C’, http://www.eeoc.gov/policy/docs/qanda_geneticinfo.html (last modied
May 12, 2009).
96 See, e.g., 29 U.S.C. § 623(a)(1); 42 U. S. C. § 2000e–2(k)(1)(A)(i); 42 U.S.C. § 12112(b)(1); 12 C.F.R. Part 1002 Supp. I §
97 See, e.g., Int’l Bhd. of Teamsters v. United States, 431 U.S. 324, 334–35 (1977).
98 See, e.g., Big Data Tr. 168–170 (Carol Miasko).
99 Cf. id. (explaining how the various equal opportunity laws may apply to big data analytics).
Practices that have a “disparate impact” on protected classes may also violate equal credit or employment
opportunity laws.
While specic disparate impact standards vary depending on the applicable law, in
general, disparate impact occurs when a company employs facially neutral policies or practices that have a
disproportionate adverse eect or impact on a protected class,
unless those practices or policies further a
legitimate business need
that cannot reasonably be achieved by means that have less disparate an impact.
Disparate impact analysis has important implications for big data.
Under such an analysis, a
company that avoids, for example, expressly screening job applicants based on gender and instead uses big
data analytics to screen job applicants in a way that has a disparate impact on women may still be subject
to certain equal employment opportunity laws, if the screening does not serve a legitimate business need
or if the need can reasonably be achieved by another means with a smaller disparate impact.
if a company makes credit decisions based on zip codes, it may be violating ECOA if the decisions have
a disparate impact on a protected class and are not justied by a legitimate business necessity.
Even if
evidence shows the decisions are justied by a business necessity, if there is a less discriminatory alternative,
the decisions may still violate ECOA.
100 See, e.g., 29 U.S.C. § 631(a); 42 U.S.C. § 2000e–2 (k); 42 U.S.C. § 12112(b)(6); 24 C.F.R. § 100.500; 12 C.F.R. Part 1002
Supp. I § 1002.6(a)–2. On June 25, 2015, the Supreme Court in Texas Department of Housing and Community Aairs v.
Inclusive Communities Project, Inc., 135 S.Ct. 2507 (2015), held that the disparate impact theory is valid under the FHA.
101 See, e.g., 12 C.F.R. § 1002.6 (citing Griggs v. Duke Power Co., 401 U.S. 424 (1971), and Albemarle Paper Co. v. Moody, 422
U.S. 405, 430–31 (1975)); 12 C.F.R. Part 1002 Supp. I § 1002.6(a)–2; Policy Statement on Discrimination in Lending, 59
Fed. Reg. 18,266, 18,268 (Apr. 14, 1994).
102 See, e.g., Tex. Dept of Cmty. Aairs v. Burdine, 450 U.S. 248, 256–58 (1981); N.Y. City Transit Auth. v. Beazer, 440 U.S.
568, 587 (1979); Zamlen v. City of Cleveland, 906 F.2d 209, 218–20 (6th Cir. 1990); Evans v. City of Evanston, 881 F.2d
382, 383 (7th Cir. 1989); Aguilera v. Cook County Police & Corr. Merit Bd., 760 F.2d 844, 846–47 (7th Cir. 1985). See
also 12 C.F.R. § 1002.6(a). However, with respect to ADEA cases, the formulation applied by courts is slightly dierent. See,
e.g., Smith v. City of Jackson, 544 U.S. 228, 243 (2005) (holding that the “reasonable factor other than age” test, rather than
the business necessity test, is the appropriate standard for determining lawfulness of a practice that disproportionally aects
older workers under the ADEA). See also Questions and Answers on EEOC Final Rule on Disparate Impact and “Reasonable
Factors Other an Age” Under the Age Discrimination Employment Act of 1967, U.S. E E O C’,
http://www.eeoc.gov/laws/regulations/adea_rfoa_qa_nal_rule.cfm (last visited on Dec. 28, 2015).
103 See, e.g., Albermarle Paper, 422 U.S. at 425; Int’l Bhd. of Elec. Workers, AFL-CIO, Local Unions Nos. 605 & 985 v. Miss.
Power & Light Co., 442 F.3d 313, 318–19 (5th Cir. 2006); Smith v. City of Des Moines, Iowa, 99 F.3d 1466, 1473 (8th Cir.
1996); Contreras v. City of Los Angeles, 656 F.2d 1267, 1285 (9th Cir. 1981); El v. Se. Pa. Transp. Auth., 418 F. Supp. 2d
659, 672 (E.D. Pa. 2005) a’d, 479 F.3d 232 (3d Cir. 2007).
104 Big data can also facilitate the identication of disparate impact. See infra notes 145–47 and accompanying text.
105 See, e.g., Big Data Tr. 170 (Carol Miasko).
106 e use of zip codes can also raise concerns of redlining, a form of discrimination involving dierential treatment on the
basis of the race, color, national origin, or other protected characteristic of residents of those areas in which the credit seeker
resides, or will reside, or in which residential property to be mortgaged is located. e CFPB and DOJ recently concluded a
redlining enforcement action against Hudson City Savings Bank. See Complaint, CFPB v. Hudson City Sav. Bank, No. 15-
07056 (D.N.J. Sept. 24, 2015), http://les.consumernance.gov/f/201509_cfpb_hudson-city-joint-complaint.pdf. See also
C F. P B, CFPB E P: ECOA B R M 16–18 (2013),
107 e examples above are illustrative and do not necessarily provide an exhaustive list of all ways that big data could have a
disparate impact on consumers.
e FTC’s enforcement actions include dozens of consent orders resolving alleged violations of ECOA.
Some of these cases have been based on a disparate treatment theory. For example, ECOA prohibits
discrimination against applicants who are receiving public assistance.
e Commission has brought cases
against lenders that allegedly excluded public assistance income in deciding whether to extend credit.
Likewise, ECOA prohibits discounting or refusing to consider income on the basis of marital status.
e FTC has brought cases against lenders that allegedly failed to aggregate the income of unmarried joint
applicants, while combining incomes for applicants who were married.
e FTC also has alleged discrimination under a disparate impact legal standard under ECOA. For
example, the FTC settled two cases alleging that lenders failed to appropriately monitor loan ocers
whose mortgage loans resulted in minority applicants’ being charged higher prices than non-Latino white
e Commission alleged that the statistically signicant pricing disparities could not be
explained by any legitimate underwriting risk factors or credit characteristics of the applicants.
Workshop discussions focused in particular on whether advertising could implicate equal opportunity
For example, suppose big data analytics show that single women are more likely to apply for
subprime credit products. Would targeting advertisements for these products to single women violate
Certainly, prohibiting single women from applying for a prime credit card based on their marital
status would violate ECOA.
But what if a single woman would qualify for the prime product, but because
of big data analytics, the subprime product with a higher interest rate is the only one advertised to her?
In most cases, a companys advertisement to a particular community for a credit oer that is open to
all to apply is unlikely, by itself, to violate ECOA, absent disparate treatment or an unjustied disparate
108 15 U.S.C. § 1691(a)(2).
109 See, e.g., Complaint, United States v. Franklin Acceptance Corp., No. 99-cv-2435 (E.D. Penn. led May 13, 1999), https://
110 15 U.S.C. § 1691(a)(1).
111 See, e.g., Complaint, United States v. Ford Motor Credit Co., No. 99-cv-57887 (GEW) (E.D. Mich. led Dec. 9, 1999),
112 See Complaint, FTC v. Gateway Diversied Funding Mortg. Servs., No. 08-5805 (E.D. Pa. led Dec. 16, 2008), https://
www.ftc.gov/sites/default/les/documents/cases/2008/12/081216gatewaycmpt.pdf; Complaint, FTC v. Golden Empire
Mortgage, Inc., No. 09-03227 CAS(SHx) (C.D. Cal. led May 7, 2009), https://www.ftc.gov/sites/default/les/documents/
113 See, e.g., Big Data Tr. 179–83 (Peter Swire), 187–90 (Peter Swire, Leonard Chanin, and C. Lee Peeler in conversation),
204–05 (Peter Swire), 268–69 (Christopher Calabrese).
114 In the context of mortgage advertising, creditors should also consider the FHA. 42U.S.C. §§3601–3631; 24 C.F.R. Parts
100, 103, and 104. Regulations that implement the FHA prohibit “[f]ailing or refusing to provide to any person information
regarding the availability of loans or other nancial assistance, application requirements, procedures or standards for the
review and approval of loans or nancial assistance, or providing information which is inaccurate or dierent from that
provided others, because of race, color, religion, sex, handicap, familial status, or national origin.” 24 C.F.R. § 100.120(b)(1).
115 15 U.S.C. § 1691(a)(1).
impact in subsequent lending.
Nevertheless, companies should proceed with caution in this area. In
credit transactions,
Regulation B, which is the implementing regulation for ECOA, prohibits creditors
from making oral or written statements, in advertising or otherwise, to applicants or prospective applicants
that would discourage on a prohibited basis a reasonable person from making or pursuing an application.
With respect to prescreened solicitations, Regulation B also requires creditors to maintain records of the
solicitations and the criteria used to select potential recipients.
Advertising and marketing practices
could impact a creditors subsequent lending patterns and the terms and conditions of the credit received
by borrowers, even if credit oers are open to all who apply. In some cases, the DOJ has cited a creditor’s
advertising choices as evidence of discrimination.
Ultimately, as with the FCRA, the question of whether a practice is unlawful under equal opportunity
laws is a case-specic inquiry. Accordingly, companies should proceed with caution if their practices could
suggest disparate treatment or have a demonstrable disparate impact based on protected characteristics.
3. The Federal Trade Commission Act
Section 5 of the Federal Trade Commission Act (“Section 5”) prohibits unfair or deceptive acts or practices
in or aecting commerce.
Unlike the FCRA or equal opportunity laws, Section 5 is not conned to particular
market sectors but is generally applicable to most companies acting in commerce.
Under Section 5, an act
or practice is deceptive if it involves a material statement or omission that is likely to mislead a consumer acting
reasonably under the circumstances.
For example, if a company violates a material promise—whether that
116 See, e.g., Big Data Tr. 178–191 (Peter Swire, C. Lee Peeler, and Leonard Chanin in conversation).
117 Under Regulation B, credit transaction means “every aspect of an applicant’s dealings with a creditor regarding an application
for credit or an existing extension of credit (including, but not limited to, information requirements; investigation procedures;
standards of creditworthiness; terms of credit; furnishing of credit information; revocation, alteration, or termination of
credit; and collection procedures).” 12 C.F.R. § 1002.2(m).
118 Under Regulation B, a creditor “does not include a person whose only participation in a credit transaction involves honoring
a credit card.Id. § 1002.2(l).
119 Id. § 1002.4(b).
120 Id. § 1002.12(b)(7).
121 See, e.g., Complaint, United States v. First United Sec. Bank, No. 1 09-cv-00644 (S.D. Ala. led Sept. 30, 2009), http://www.
122 15 U.S.C. § 45(a)(1) (2012).
123 e FTC’s consumer protection mandate is broad. Under Section 5 of the FTC Act, 15 U.S.C. § 45, the Commission has
the power to prevent “persons, partnerships, and corporations” from using unfair or deceptive acts or practices in or aecting
commerce, with certain limited exceptions. ose exceptions include: (1) banks and savings and loan institutions as described
in 15 U.S.C. § 57a(f)(2) and (3); (2) federal credit unions as described in 15 U.S.C. § 57a(f)(4); (3) common carrier
activities subject to subtitle IV of title 49 and the Communications Act of 1934; and (4) air carriers and foreign air carriers.
124 FTC Policy Statement on Deception, 103 F.T.C. 110, 174 (1984) (appended to Clidale Assocs., Inc., 103 F.T.C. 110, 174
(1984)). See also POM Wonderful LLC, No. C-9344, 2013 WL 268926, at *18 (F.T.C. Jan. 16, 2013).
promise is to refrain from sharing data with third parties,
to provide consumers choices about sharing,
to safeguard consumers’ personal information
—it will likely be engaged in a deceptive practice under
Section 5.
Likewise, a failure to disclose material information may violate Section 5. In CompuCredit, for instance,
the FTC included an allegation in the complaint that although a credit card marketing company touted
the ability of consumers to use the card for cash advances, it deceptively failed to disclose that, based on a
behavioral scoring model, consumers’ credit lines would be reduced if they used their cards for such cash
advances or if they used their cards for certain types of transactions, including marriage counseling, bars and
nightclubs, pawn shops, and massage parlors.
Among other things, the settlement prohibits CompuCredit
from making misrepresentations to consumers in the marketing of credit cards, including misrepresentations
about the amount of available credit.
In addition, under Section 5, an act or practice is unfair if it is likely to cause substantial consumer
injury, the injury is not reasonably avoidable by consumers, and the injury is not outweighed by benets to
consumers or competition.
One example of a potentially unfair practice is the failure to reasonably secure
consumers’ data where that failure is likely to cause substantial injury.
Companies that maintain big data
on consumers should take care to reasonably secure that data commensurate with the amount and sensitivity
125 See, e.g., Goldenshores Techs., LLC, No. C-4446 (F.T.C. Mar. 31, 2014), https://www.ftc.gov/system/les/documents/
cases/140409goldenshoresdo.pdf; FTC v. Myspace LLC, No. C-4369 (F.T.C. Aug. 30, 2012), https://www.ftc.gov/sites/
126 See, e.g., Compete, Inc., No. D-4384 (F.T.C. Feb. 20, 2013), https://www.ftc.gov/sites/default/les/documents/
cases/2013/02/130222competedo.pdf; United States v. Path, Inc., No. C-13-0448 (N.D. Cal. Feb. 8, 2013), https://www.
ftc.gov/sites/default/les/documents/cases/2013/02/130201pathincdo.pdf; Google Inc., No. C-4336 (F.T.C. Oct. 13, 2011),
https://www.ftc.gov/sites/default/les/documents/cases/2011/10/111024googlebuzzdo.pdf; Facebook, Inc., No. C-4365
(F.T.C. July 27, 2012), https://www.ftc.gov/sites/default/les/documents/cases/2012/08/120810facebookdo.pdf; Chitika,
Inc., No. C-4324 (F.T.C. June 7, 2011), https://www.ftc.gov/sites/default/les/documents/cases/2011/06/110617chitikado.
127 See, e.g., Snapchat, Inc., C-4501 (F.T.C. Dec. 23, 2014), https://www.ftc.gov/system/les/documents/
cases/141231snapchatdo.pdf; Fandango, LLC, No. C-4481 (F.T.C. Aug. 13, 2014), https://www.ftc.gov/system/les/
documents/cases/140819fandangodo.pdf; Credit Karma, Inc., C-4480 (F.T.C. Aug. 13, 2014), https://www.ftc.gov/system/
les/documents/cases/1408creditkarmado.pdf; Twitter, Inc., No. C-4316 (F.T.C. Mar. 2, 2011), https://www.ftc.gov/sites/
default/les/documents/cases/2011/03/110311twitterdo.pdf; Reed Elsevier Inc., No. C-4226 (F.T.C. July 29, 2008), https://
128 Complaint, CompuCredit, No. 1:08-cv-1976-BBM-RGV (N.D. Ga. led June 10, 2008), https://www.ftc.gov/sites/default/
129 Id.
130 15 U.S.C. § 45(n) (2012). See also FTC Policy Statement on Unfairness (appended to Int’l Harvester Co., 104 F.T.C. 949,
1070 (1984)).
131 See, e.g., GMR Transcription Servs., Inc., No. C-4482 (F.T.C. Aug. 14, 2014), https://www.ftc.gov/system/les/documents/
cases/140821gmrdo.pdf; GeneWize Life Scis., Inc., No. C-4457 (F.T.C. May 8, 2014), https://www.ftc.gov/system/les/
documents/cases/140512foruintdo.pdf; HTC Am., Inc., No. C-4406 (F.T.C. June 25, 2013), https://www.ftc.gov/sites/
default/les/documents/cases/2013/07/130702htcdo.pdf; Compete, No. C-4384 (F.T.C. Feb. 20, 2013), https://www.ftc.gov/
sites/default/les/documents/cases/2013/02/130222competedo.pdf; Upromise, Inc., No. C-4351 (F.T.C. Mar. 27, 2012),
of the data at issue, the size and complexity of the companys operations, and the cost of available security
For example, a company that maintains Social Security numbers or medical information about
individual consumers should have particularly robust security measures as compared to a company that
maintains consumers’ names only.
Another example of a potentially unfair practice that the Commission has challenged is the sale of data
to customers that a company knows or has reason to know will use the data for fraudulent purposes. e
Commissions cases against Sequoia One and ChoicePoint are instructive in this regard. In Sequoia One, the
FTC’s complaint alleges that the company sold the personal information of nancially distressed payday loan
applicants—including Social Security numbers, nancial account numbers, and bank routing numbers—to
non-lender third-parties and one of these third parties used the information to withdraw millions of dollars
from consumers’ accounts without their authorization.
In ChoicePoint, the Commission alleged that the company sold the personal information of more than
163,000 consumers to identity thieves posing as legitimate subscribers, despite obvious red ags that should
have alerted the company to the potential fraud.
As these cases show, at a minimum, companies must not
sell their big data analytics products to customers if they know or have reason to know that those customers
will use the products for fraudulent purposes.
Section 5 may also apply under similar circumstances if products are sold to customers that use the
products for discriminatory purposes.
e inquiry will be fact-specic, and in every case, the test will be
whether the company is oering or using big data analytics in a deceptive or unfair way.
132 See generally F. T C’, S W S: A G F B (2015), https://www.ftc.gov/system/les/
133 FTC v. Sequoia One, LLC, No. 2:15-cv-01512 (D. Nev. Aug. 10, 2015), https://www.ftc.gov/system/les/documents/case
s/150812sequoiaonemcdonnellstip.pdf; Complaint, Sequoia One, No. 2-15-cv-01512 (D. Nev. led Aug. 7, 2015), https://
www.ftc.gov/system/les/documents/cases/150812sequoiaonecmpt.pdf. See also Press Release, Fed. Trade Comm’n, FTC
Charges Data Broker with Facilitating the eft of Millions of Dollars from Consumers’ Accounts (Dec. 23, 2014), https://
www.ftc.gov/news-events/press-releases/2014/12/ftc-charges-data-broker-facilitating-theft-millions-dollars. In LeapLab, the
Commissions complaint alleges that the company bought payday loan applications of nancially strapped consumers, and
then sold that information—including Social Security numbers and nancial account numbers—to marketers whom it knew
had no legitimate need for it. Complaint at 5–10, LeapLab, LLC, No. 2:14-cv-02750 (D. Ariz. led Dec. 22, 2014), https://
www.ftc.gov/system/les/documents/cases/141223leaplabcmpt.pdf. One of these marketers allegedly used the information to
withdraw millions of dollars from consumers’ accounts without their authorization. Id. at 9–10.
134 United States v. ChoicePoint, Inc., No. 1:06-cv-0198-JTC (N.D. Ga. Feb. 15, 2006), https://www.ftc.gov/sites/default/les/
135 Cf. D B R, supra note 7, at 56.
Questions for Legal Compliance
In light of these existing laws, companies already using or considering engaging in big data
analytics should, among other things, consider the following:
If you compile big data for others who will use it for eligibility decisions (such as credit,
employment, insurance, housing, government benefits, and the like), are you complying with
the accuracy and privacy provisions of the FCRA? FCRA requirements include requirements
to (1) have reasonable procedures in place to ensure the maximum possible accuracy of the
information you provide, (2) provide notices to users of your reports, (3) allow consumers to
access information you have about them, and (4) allow consumers to correct inaccuracies.
If you receive big data products from another entity that you will use for eligibility decisions,
are you complying with the provisions applicable to users of consumer reports? For
example, the FCRA requires that entities that use this information for employment purposes
certify that they have a “permissible purpose” to obtain it, certify that they will not use it in a
way that violates equal opportunity laws, provide pre-adverse action notice to consumers,
and thereafter provide adverse action notices to those same consumers.
If you are a creditor using big data analytics in a credit transaction, are you complying with
the requirement to provide statements of specific reasons for adverse action under ECOA?
Are you complying with ECOA requirements related to requests for information and record
If you use big data analytics in a way that might adversely aect people in their ability to
obtain credit, housing, or employment:
Are you treating people dierently based on a prohibited basis, such as race or national
Do your policies, practices, or decisions have an adverse eect or impact on a member
of a protected class, and if they do, are they justified by a legitimate business need that
cannot reasonably be achieved by means that are less disparate in their impact?
Are you honoring promises you make to consumers and providing consumers material
information about your data practices?
Are you maintaining reasonable security over consumer data?
Are you undertaking reasonable measures to know the purposes for which your customers
are using your data?
If you know that your customer will use your big data products to commit fraud, do not
sell your products to that customer. If you have reason to believe that your data will be
used to commit fraud, ask more specific questions about how your data will be used.
If you know that your customer will use your big data products for discriminatory
purposes, do not sell your products to that customer. If you have reason to believe that
your data will be used for discriminatory purposes, ask more specific questions about
how your data will be used.
B. Special Policy Considerations Raised by Big Data Research
Workshop and seminar panelists, academics, and others have also engaged in important research in
the eld of big data.
Some of this research has focused on how big data analytics could negatively aect
low-income and underserved populations.
Researchers note there is a potential for incorporating errors
and biases at every stage, from choosing the data set used to make predictions, to dening the problem
to be addressed through big data, to making decisions based on the results of big data analysis.
having the ability to use more data can increase the power of the analysis, simply adding more data does
not necessarily correct inaccuracies or remove biases. In addition, the complexity of the data and statistical
models can make it dicult for analysts to fully understand and explain the underlying model or its results.
Even when data analysts are very careful, the results of their analysis may aect particular sets of individuals
dierently because their models may use variables that turn out to operate no dierently than proxies for
protected classes.
Or researchers may simply lack information that would allow them to determine
whether their results have such eects. Numerous researchers and commenters discuss how big data could
be used in the future to the disadvantage of low-income and underserved communities and adversely aect
consumers on the basis of legally protected characteristics in hiring, housing, lending, and other processes.
136 See generally Robinson + Yu Comment #00080, supra note 53; Ctr. for Data Innovation Comment #00055, supra note
8; Comment #00042 from Peter Swire, Ga. Inst. of Tech. & Future of Privacy Forum, to Fed. Trade Comm’n (Sept. 15,
2014), https://www.ftc.gov/system/les/documents/public_comments/2014/09/00042-92638.pdf; Future of Privacy Forum
Comment #00027, supra note 23; Ctr. on Privacy & Tech. at Geo. L. Comment #00024, supra note 8; Nat’l Consumer L.
Ctr. Comment #00018, supra note 1; N.Y.U. Info. L. Inst. Comment #00015, supra note 8; World Privacy Forum Comment
#00014, supra note 19; Tech. Pol’y Inst. Comment #00010, supra note 8; Ctr. for Dig. Democracy & U.S. PIRG Educ. Fund
Comment #00003, supra note 8.
137 See, e.g., Solon Barocas & Andrew Selbst, Big Datas Disparate Impact, 104 C. L R. _ (forthcoming 2016), http://papers.
ssrn.com/sol3/papers.cfm?abstract_id=2477899##; Alex Rosenblat et al., Networked Employment Discrimination, (Data &
Society Research Inst.,Working Paper Oct. 8, 2014), http://www.datasociety.net/pubs/fow/EmploymentDiscrimination.pdf;
Gary Marcus & Ernest Davis, Eight (No, Nine!) Problems With Big Data, N.Y. T (Apr. 6, 2014), http://www.nytimes.
com/2014/04/07/opinion/eight-no-nine-problems-with-big-data.html?_r=0; Tim Harford, Big Data: Are We Making a Big
Mistake?, FT M (Mar. 28, 2014), http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html.
See generally JT, T D Y: H  N A I  D Y I  Y
W (2012).
138 See, e.g., Big Data Tr. 19–25 (Solon Barocas). See also Nat’l Consumer L. Ctr. Comment #00018, supra note 1, at 14–15;
World Privacy Forum Comment #00014, supra note 19, at 6–17. See generally Barocas & Selbst, supra note 137.
139 Barocas & Selbst, supra note 137, at 20–22. Researchers note that data mining poses the additional problem of giving
data miners the ability to disguise intentional discrimination as unintentional. Id. at 22–23. See also Paul Ohm, Changing
the Rules: General Principles for Data Use and Analysis, in P, B D,   P G: F 
E 100–02 (Julia Lane et al. eds., 2014). For examples of the kinds of analyses that can be conducted to
detect whether model variables are proxies for protected characteristics, see generally F. T C’, C-B
I S: I  C  A I (2007), http://www.ftc.gov/sites/default/les/
p044804facta_report_credit-based_insurance_scores.pdf, and B.  G   F. R S., R 
C  C S  I E   A  A  C (2007), http://www.
140 See generally Robinson + Yu Comment #00080, supra note 53; Am.’s Open Tech. Inst. Comment #00078, supra note 46; Ctr.
for Democracy & Tech. Comment #00075, supra note 61; Am. Civil Liberties Union Comment #00059, supra note 61; Ctr.
on Privacy & Tech. at Geo. L. Comment #00024, supra note 8; Nat’l Consumer L. Ctr. Comment #00018, supra note 1;
On the other hand, several stakeholders argue that these concerns are overstated.
Some emphasize that,
to the extent the various steps in data mining lead to disparate impact, these issues are not new—they are
inherent in any statistical analysis.
Other writers note that, rather than disadvantaging minorities in the
hiring process, big data can help to create “a labor market thats fairer to people at every stage of their careers.
For example, companies can use big data algorithms to nd employees from within underrepresented segments
of the population.
ey can also use big data to identify biases so that they can choose candidates based on
merit rather than using mechanisms that depend on the reviewers’ biases.
Furthermore, as other stakeholders
have noted, big data can help “reduce the rate of ‘false positive’ cases that potentially make disparate treatment a
and can help identify whether correlations exist between prices and variables such as race, gender or
ese stakeholders do not argue that we should ignore discrimination where it occurs; rather, they
argue that we should recognize the potential benets of big data to reduce discriminatory harm.
Common Sense Media Comment #00016, supra note 8; N.Y.U. Info. L. Inst. Comment #00015, supra note 8; World Privacy
Forum Comment #00014, supra note 19; Ctr. for Dig. Democracy & U.S. PIRG Educ. Fund Comment #00003, supra note
8. See also Barocas & Selbst, supra note 137; Crawford, supra note 39.
141 See, e.g., Big Data Tr. 75 (Gene Gsell). See generally Comment #00081 from Berin Szoka & Tom Struble, TechFreedom,
& Georey Manne & Ben Sperry, Int’l Ctr. for L. & Econ., to Fed. Trade Comm’n (Nov. 3, 2014), https://www.ftc.
gov/system/les/documents/public_comments/2014/11/00081-92956.pdf; Comment #00074 from Howard Fienberg,
Mktg. Research Assoc., to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/les/documents/public_
comments/2014/10/00074-92927.pdf; Comment #00070 from Bijan Madhani, Computer & Commc’ns Indus. Assoc., to
Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/system/les/documents/public_comments/2014/10/00070-92912.
pdf; NetChoice Comment #00066, supra note 23; Ctr. for Data Innovation Comment #00055, supra note 8; Ctr. for Data
Innovation Comment #00026, supra note 8; Tech. Pol’y Inst. Comment #00010, supra note 8; V M-S
 K C, B D: A R T W T H W L, W, A T (2013).
142 See, e.g., Dan Gray, Ethics, Privacy and Discrimination in the Age of Big Data, D (Dec. 3, 2014), http://
dataconomy.com/ethics-privacy-and-discrimination-in-the-age-of-big-data/. But see Je Leek, Why Big Data Is in Trouble:
ey Forgot About Applied Statistics, SS (May 7, 2014), http://simplystatistics.org/2014/05/07/why-big-data-is-
in-trouble-they-forgot-about-applied-statistics/ (noting that big data users have not given sucient attention to issues that
statisticians have been thinking about for a long time: sampling populations, multiple testing, bias, and overtting).
143 See, e.g., Don Peck, eyre Watching You at Work, A (Dec. 2013), http://www.theatlantic.com/magazine/
144 See, e.g., Big Data Tr. 126 (Mark MacCarthy), 251 (Christopher Wolf). See also Software & Info. Indus. Assoc. Comment
#00067, supra note 2, at 7; Future of Privacy Forum Comment #00027, supra note 23, attached report entitled, B D: A
T  F D  E G, at 1–2.
145 See, e.g., Anne Loehr, Big Data for HR: Can Predictive Analytics Help Decrease Discrimination in the Workplace?, H
P (Mar. 23, 2015), http://www.hungtonpost.com/anne-loehr/big-data-for-hr-can-predi_b_6905754.html.
146 W H F.  R, supra note 56, at 16.
147 Id. at 17. Economists have documented ways that data can help identify discrimination against protected groups in a wide
variety of settings. For example, a randomized experiment changed the names on resumes sent to employers from white-
sounding names to African-American sounding names; resumes with white-sounding names were 50 percent more likely to be
called back for an interview. Marianne Bertrand & Sendhil Mullainathan, Are Emily and Greg More Employable an Lakisha
and Jamal? A Field Experiment on Labor Market Discrimination, 94 A. E. R. 991, 991–1013 (2004). Research from
the early days of the Internet found that African-Americans and Latinos paid about 2 percent more for used cars purchased
oine, but paid similar prices for those purchased online; the proered reason was that individuals were anonymous online.
Fiona Scott Morton et al., Consumer Information and Discrimination: Does the Internet Aect the Pricing of New Cars to Women
and Minorities?, 1 Q M.  E. 65, 65–92 (2003). See also Devin Pope & Justin Sydnor, Implementing
Anti-Discrimination Policies in Statistical Proling Models, 3 A. E. J.: E. P’ 206, 206–231 (2011), http://faculty.
Collectively, this research suggests that big data oers both new potential discriminatory harms and
new potential solutions to discriminatory harms. To maximize the benets and limit the harms, companies
should consider the questions raised by research in this area. ese questions include the following:
1. How representative is your data set?
Workshop participants and researchers note that the data sets, on which all big data analysis relies, may
be missing information about certain populations, e.g., individuals who are more careful about revealing
information about themselves, who are less involved in the formal economy, who have unequal access or
less uency in technology resulting in a digital divide
or data desert,
or whose behaviors are simply not
observed because they are believed to be less protable constituencies.
Recent examples demonstrate the impact of missing information about particular populations on data
analytics. For example, Hurricane Sandy generated more than twenty million tweets between October 27
and November 1, 2012.
If organizations were to use this data to determine where services should be
deployed, the people who needed services the most may not have received them. e greatest number of
tweets about Hurricane Sandy came from Manhattan, creating the illusion that Manhattan was the hub
of the disaster. Very few messages originated from more severely aected locations, such as Breezy Point,
Coney Island, and Rockaway—areas with lower levels of smartphone ownership and Twitter usage. As
extended power blackouts drained batteries and limited cellular access, even fewer tweets came from the
worst hit areas. As one researcher noted, “data are assumed to accurately reect the social world, but there
are signicant gaps, with little or no signal coming from particular communities.
Organizations have developed ways to overcome this issue. For example, the city of Boston developed
an application called Street Bump that utilizes smartphone features such as GPS feeds to collect and
report to the city information about road conditions, including potholes. However, after the release of
the application, the Street Bump team recognized that because lower income individuals may be less likely
to carry smartphones, the data was likely not fully representative of all road conditions. If the city had
148 A digital divide refers to the fact that certain populations may not have access to the Internet. See, e.g., Ctr. for Data
Innovation Comment #00055, supra note 8, at 2; Nat’l Consumer L. Ctr. Comment #00018, supra note 1, at 9, 27; Ctr. for
Dig. Democracy & U.S. PIRG Educ. Fund Comment #00003, supra note 8, at 2.
149 Data deserts are geographic “areas characterized by a lack of access to high-quality data that may be used to generate social
and economic benets.” Ctr. for Data Innovation, Comment #00055, supra note 8, at 3. “[I]f some communities are not
represented in the data, decisions may overlook members of these communities and their unique needs.Id., attached report
entitled, W E R A D D, at 1.
150 See, e.g., Big Data Tr. 100–02 (Dr. Nicol Turner-Lee), 256–58 (Daniel Castro). See also Ctr. for Dig. Democracy & U.S.
PIRG Educ. Fund Comment #00003, supra note 8, at 2; Quentin Hardy, Why Big Data Is Not Truth, N.Y. T (June 1,
2013), http://bits.blogs.nytimes.com/2013/06/01/why-big-data-is-not-truth/?_php=true&_type=blogs&_r=1 (reviewing
a speech provided by Kate Crawford); danah boyd & Kate Crawford, Critical Questions for Big Data, 15 I., C’ 
S’ 662, 668–70 (2012), http://dx.doi.org/10.1080/1369118X.2012.678878.
151 See, e.g., Crawford, supra note 39. See also Grinberg et al., supra note 37.
152 Crawford, supra note 39.
continued relying on the biased data, it might have skewed road services to higher income neighborhoods.
e team addressed this problem by issuing its application to city workers who service the whole city and
supplementing the data with that from the public.
is example demonstrates why it is important to
consider the digital divide and other issues of underrepresentation and overrepresentation in data inputs
before launching a product or service in order to avoid skewed and potentially unfair ramications.
2. Does your data model account for biases?
While large data sets can give insight into previously intractable challenges, hidden biases at both the
collection and analytics stages of big datas life cycle could lead to disparate impact.
Researchers have
noted that big data analytics “can reproduce existing patterns of discrimination, inherit the prejudice of
prior decision-makers, or simply reect the widespread biases that persist in society.
For example, if
an employer uses big data analytics to synthesize information gathered on successful existing employees
to dene a “good employee candidate,” the employer could risk incorporating previous discrimination in
employment decisions into new employment decisions.
Even prior to the widespread use of big data,
there is some evidence of the use of data leading to the reproduction of existing biases. For example, one
researcher has noted that a hospital developed a computer model to help identify “good medical school
applicants” based on performance levels of previous and existing students, but, in doing so, the model
reproduced prejudices in prior admission decisions.
Companies can also design big data algorithms that learn from human behavior; these algorithms
may “learn” to generate biased results. For example, one academic found that Reuters and Google queries
for names identied by researchers to be associated with African-Americans were more likely to return
advertisements for arrest records than for names identied by researchers to be associated with white
e academic concluded that determining why this discrimination was occurring was beyond
the scope of her research, but reasoned that search engines’ algorithms may learn to prioritize arrest record
ads for searches of names associated with African-Americans if people click on such ads more frequently than
other ads.
is could reinforce the display of such ads and perpetuate the cycle.
153 See, e.g., Big Data Tr. 21–22 (Solon Barocas), 259–60 (Michael Spadea). See also Tech. Pol’y Inst. Comment #00010, supra
note 8, at 4 & attached report at 15; W H M  R, supra note 1, at 51–52.
154 See, e.g., Big Data Tr. 19–25 (Solon Barocas), 40–41 (Joseph Turow).
155 Barocas & Selbst, supra note 137, at 3–4.
156 See, e.g., Big Data Tr. 168–70 (Carol Miasko). Cf. Barocas & Selbst, supra note 137, at 9–11.
157 See generally Stella Lowry & Gordon Macpherson, A Blot on the Profession, 296 B M. J., 657, 657–58 (1988), http://
158 See generally Latanya Sweeney, Discrimination in Online Ad Delivery, 56 C’   ACM 44 (2013), http://papers.
ssrn.com/sol3/papers.cfm?abstract_id=2208240&download=yes. See also Big Data Tr. 64–65 (David Robinson); Robinson +
Yu Comment #00080, supra note 53, at 16–17; N.Y.U. Info. L. Inst. Comment #00015, supra note 8, at 6.
159 Sweeney, supra note 158, at 34. See also Bianca Bosker, Googles Online Ad Results Guilty of Racial Proling,
According to New Study, H P (Feb. 5, 2013), http://www.hungtonpost.com/2013/02/05/online-
Companies should therefore think carefully about how the data sets and the algorithms they use have
been generated. Indeed, if they identify potential biases in the creation of these data sets or the algorithms,
companies should develop strategies to overcome them. As noted above, Google changed its interview and
hiring process to ask more behavioral questions and to focus less on academic grades after discovering that
replicating its existing denitions of a “good employee” was resulting in a homogeneous tech workforce.
More broadly, companies are starting to recognize that if their big data algorithms only consider applicants
from “top tier” colleges to help them make hiring decisions, they may be incorporating previous biases in
college admission decisions.
As in the examples discussed above, companies should develop ways to use
big data to expand the pool of qualied applicants they will consider.
3. How accurate are your predictions based on big data?
Some researchers have also found that big data analysis does not give sucient attention to traditional
applied statistics issues, thus leading to incorrect results and predictions.
ey note that while big data is
very good at detecting correlations, it does not explain which correlations are meaningful.
A prime example that demonstrates the limitations of big data analytics is Google Flu Trends, a machine-
learning algorithm for predicting the number of u cases based on Google search terms. To predict the spread
of inuenza across the United States, the Google team analyzed the top fty million search terms for indications
that the u had broken out in particular locations. While, at rst, the algorithms appeared to create accurate
predictions of where the u was more prevalent, it generated highly inaccurate estimates over time.
could be because the algorithm failed to take into account certain variables. For example, the algorithm may
not have taken into account that people would be more likely to search for u-related terms if the local news
ran a story on a u outbreak, even if the outbreak occurred halfway around the world. As one researcher has
noted, Google Flu Trends demonstrates that a “theory-free analysis of mere correlations is inevitably fragile.
racial-proling_n_2622556.html (“[O]ver time, as certain templates are clicked more frequently than others, Google will
attempt to optimize its customers ad by more frequently showing the ad that garners the most clicks.”).
160 See supra notes 35–36 and accompanying text. See also Am.’s Open Tech. Inst. Comment #00078, supra note 46, at 60–61.
161 Cf. Matt Richtel, How Big Data Is Playing Recruiter for Specialized Workers, N.Y. T (Apr. 27, 2013), http://www.nytimes.
com/2013/04/28/technology/how-big-data-is-playing-recruiter-for-specialized-workers.html (noting that some companies
are using technology to nd candidates based on their ability to succeed on the job rather than traditional markers, such as a
degree from a top college).
162 e Commission recognizes that, to address data sets that incorporate previous prejudices, companies may need to collect
demographic information about consumers that they would not otherwise collect. If they do collect this information, they
should provide disclosures and choices to consumers where appropriate.
163 See, e.g., David Lazer et al., e Parable of Google Flu: Traps in Big Data Analysis, 343 S. 1203, 1203–05 (2014), http://
gking.harvard.edu/les/gking/les/0314policyforum.pdf; Marcus & Davis, supra note 137; Steve Lohr, Google Flu Trends:
e Limits of Big Data, N.Y. T (Mar. 28, 2014), http://bits.blogs.nytimes.com/2014/03/28/google-u-trends-the-limits-
164 See, e.g., Marcus & Davis, supra note 137. Likewise, these researchers note that whenever the source of information for a big
data analysis is itself a product of big data, opportunities for reinforcing errors exist. See id.
165 See supra note 163 and accompanying text. Cf. Tech. Pol’y Inst. Comment #00010, supra note 8, attached report at 5–6.
If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break
As another example, workshop participants discussed the fact that lenders can improve access to credit by
using non-traditional indicators, e.g., rental or utility bill payment history.
Consumers, however, have the
right to withhold rent if their landlord does not provide heat or basic sanitation services. In these instances,
simply compiling rental payment history would not necessarily demonstrate whether the person is a good
credit risk.
In some cases, these sources of inaccuracies are unlikely to have signicant negative eects on consumers.
For example, it may be that big data analytics shows that 30 percent of consumers who buy diapers will
respond to an ad for baby formula. at response rate may be enough for a marketer to nd it worthwhile
to send buyers of diapers an advertisement for baby formula. e 70 percent of consumers who buy diapers
but are not interested in formula can disregard the ad or discard it at little cost. Similarly, consumers who
are interested in formula and who do not buy diapers are unlikely to be substantially harmed because they
did not get the ad.
On the other hand, if big data analytics are used as the basis for access to credit, housing, or other
similar benets, the potential eects on consumers from inaccuracies could be substantial.
For example,
suppose big data analytics predict that people who do not participate in social media are 30 percent more
likely to be identity thieves, leading a fraud detection tool to ag such people as “risky.” Suppose further
that a wireless company uses this tool and requires “risky” people to submit additional documentation before
they can obtain a cell phone contract. ese people may not be able to obtain the contract if they do not
have the required documentation. And they may never know why they were denied the ability to complete
166 Harford, supra note 137, at 133.
167 See, e.g., Big Data Tr. 51–52 (David Robinson), 83–84 (Mark MacCarthy), 102–06 (Stuart Pratt), 231–32 (Michael Spadea).
See also Software & Info. Indus. Assoc. Comment #00067, supra note 2, at 5–6 and attached report at 7; Tech. Pol’y Inst.
Comment #00010, supra note 8, at 5–6.
168 Some workshop participants and commenters note other challenges of using utility payments as a non-traditional indicator.
See, e.g., Big Data Tr. 51–53 (David Robinson). See also Robinson + Yu Comment #00080, supra note 53, at 10–11; Nat’l
Consumer L. Ctr. Comment #00018, supra note 1, at 13–14; Ctr. for Dig. Democracy & U.S. PIRG Educ. Fund Comment
#00003, supra note 8, at 17.
169 See, e.g., Frank Pasquale, e Dark Market for Personal Data, N.Y. T (Oct. 16, 2014), http://www.nytimes.
com/2014/10/17/opinion/the-dark-market-for-personal-data.html?module=Search&mabReward=relbias%3Aw; Danielle
Keats Citron, Big Data Should Be Regulated By ‘Technological Due Process,N.Y. T (Aug. 6, 2014), http://www.nytimes.
Cathy O’Neil, e Dark Matter of Big Data, M (June 25, 2014), http://mathbabe.org/2014/06/25/the-dark-
matter-of-big-data/; boyd & Crawford, supra note 150, at 670–73; Ylan Q. Mui, Little Known Firms Tracking Data Used in
Credit Scores, W. P (July 16, 2011), http://www.washingtonpost.com/business/economy/little-known-rms-tracking-
data-used-in-credit-scores/2011/05/24/gIQAXHcWII_story.html. For the reasons set forth in her separate statement,
Commissioner Ohlhausen believes that to assess properly any risks of harm from big data inaccuracies, such risks must be
evaluated in the context of the competitive process.
the transaction or be able to correct the information used to ag them as “risky” even if the underlying
information was inaccurate.
In using big data to make decisions that aect consumers’ ability to complete transactions, companies
should consider the potential benets and harms, especially where their policies could negatively aect
certain populations.
4. Does your reliance on big data raise ethical or fairness concerns?
Companies should consider performing their own assessment of the factors that go into an analytics
model and balancing the predictive value of the model with fairness considerations.
Indeed, overreliance
on the predictions of big data analytics could potentially result in a company not thinking critically
about the value, fairness, and other implications of their uses of big data.
For example, one company
determined that employees who live closer to their jobs stay at these jobs longer than those who live farther
However, another company decided to exclude this factor from its hiring algorithm because of
concerns about racial discrimination, particularly since dierent neighborhoods can have dierent racial
Many companies are not only considering ethical concerns with using big data, but are actively using
big data to advance the interests of minorities and ght discrimination. For example, there are now
recruiting tools available that match companies in search of employees with candidates who hold the
necessary qualications, but also ensure that those candidates are not limited to particular gender, racial,
and experiential backgrounds.
Individual companies are also changing their hiring techniques to promote
170 See D B R, supra note 7, at 53–54.
171 See, e.g., Big Data Tr. 238–40 (Jeanette Fitzgerald). See generally e Internet Assoc. Comment #00073, supra note 23;
Comment #00071 from Pam Dixon, World Privacy Forum, to Fed. Trade Comm’n (Oct. 31, 2014), https://www.ftc.gov/
system/les/documents/public_comments/2014/10/00071-92911.pdf; Computer & Commc’ns Indus. Assoc. Comment
#00070, supra note 141; Consumer Elecs. Assoc. Comment #00068, supra note 61; Intel Corp. Comment #00062, supra
note 61; Comment #00060 from Yael Weinman, Info. Tech. Indus. Council, to Fed. Trade Comm’n (Oct. 27, 2014), https://
www.ftc.gov/system/les/documents/public_comments/2014/10/00060-92877.pdf; Info. Accountability Found. Comment
#00049, supra note 2; Comment #00048 from Bojana Bellamy & Markus Heyder, Ctr. for Info. Pol’y Leadership, to Fed.
Trade Comm’n (Oct. 8, 2014), https://www.ftc.gov/system/les/documents/public_comments/2014/10/00048-92775.pdf;
Future of Privacy Forum Comment #00027, supra note 23.
172 See, e.g., Michael Schrage, Big Datas Dangerous New Era of Discrimination, H. B. R. (Jan. 29, 2014), https://hbr.
org/2014/01/big-datas-dangerous-new-era-of-discrimination/. Cf. Alessandro Acquisti et al., Face Recognition and Privacy in
the Age of Augmented Reality, 6 J.  P  C 1–20 (2014), http://repository.cmu.edu/cgi/viewcontent.
cgi?article=1122&context=jpc (showing that big data analytics can now identify strangers online (on a dating site where
individuals protect their identities by using pseudonyms) and oine (in a public space), based on photos made publicly
available on a social network site, and then infer additional and sensitive information about those consumers with relative
173 See, e.g., Robinson + Yu Comment #00080, supra note 53, at 15. See also Joseph Walker, Meet e New Boss: Big Data, W
S. J. (Sept. 20, 2012), http://online.wsj.com/news/articles/SB10000872396390443890304578006252019616768.
174 See supra note 173.
175 See, e.g., Future of Privacy Forum Comment #00027, supra note 23, attached report entitled, B D: A T 
F D  E G, at 1.
Xerox now uses an online evaluation tool developed by a data analytics rm to assess applicants,
in addition to conducting interviews, to determine which applicants are most qualied for available
In developing this new assessment process, Xerox also learned that previous similar employment
experience—one of the few criteria that Xerox had explicitly prioritized in the past—turns out to have no
bearing on either productivity or retention.
In addition, state and local government entities are using big data to help underrepresented communities
obtain better municipal services. For example, states are using big data to identify the needs of lesbian, gay,
bisexual, and transgender individuals and to create more tailored approaches to reduce health disparities
impacting these individuals.
And big data was used to convince a city to redraw its boundaries to extend
city services to historically African-American neighborhoods.
As these examples show, organizations can
use big data in ways that provide opportunity to underrepresented and underserved communities.
176 See, e.g., Tim Smedley, Forget the CV, Data Decide Careers, F. T (July 9, 2014), http://www.ft.com/cms/s/2/e3561cd0-
177 See, e.g., Peck, supra note 143.
178 Id.
179 See, e.g., Future of Privacy Forum Comment #00027, supra note 23, attached report entitled, B D: A T 
F D  E G, at 4; Computer & Commc’ns Indus. Assoc. Comment #00070,
supra note 141, at 6–7. See also Laura Nahmias, State Agencies Launch LGBT Data-Collection Eort, P N.Y. (July 24,
2014), http://www.capitalnewyork.com/article/albany/2014/07/8549536/state-agencies-launch-lgbt-data-collection-eort.
180 See, e.g., Future of Privacy Forum Comment #00027, supra note 23, attached report entitled, B D: A T 
F D  E G, at 3.
Summary of Research Considerations
In light of this research, companies already using or considering engaging in big data analytics
Consider whether your data sets are missing information from particular populations and, if
they are, take appropriate steps to address this problem.
Review your data sets and algorithms to ensure that hidden biases are not having an
unintended impact on certain populations.
Remember that just because big data found a correlation, it does not necessarily mean
that the correlation is meaningful. As such, you should balance the risks of using those
results, especially where your policies could negatively aect certain populations. It may be
worthwhile to have human oversight of data and algorithms when big data tools are used to
make important decisions, such as those implicating health, credit, and employment.
Consider whether fairness and ethical considerations advise against using big data in
certain circumstances. Consider further whether you can use big data in ways that advance
opportunities for previously underrepresented populations.
V. Conclusion
Big data will continue to grow in importance, and it is undoubtedly improving the lives of underserved
communities in areas such as education, health, local and state services, and employment. Our collective
challenge is to make sure that big data analytics continue to provide benets and opportunities to consumers
while adhering to core consumer protection values and principles. For its part, the Commission will
continue to monitor areas where big data practices could violate existing laws, including the FTC Act, the
FCRA, and ECOA, and will bring enforcement actions where appropriate. In addition, the Commission
will continue to examine and raise awareness about big data practices that could have a detrimental impact
on low-income and underserved populations and promote the use of big data that has a positive impact on
such populations. Given that big data analytics can have big consequences, it is imperative that we work
together—government, academics, consumer advocates, and industry—to help ensure that we maximize big
datas capacity for good while identifying and minimizing the risks it presents.
Separate Statement of Commissioner Maureen K. Ohlhausen
Big Data: A Tool for Inclusion or Exclusion?
January 6, 2016
I support todays report on big data as a useful contribution to the ongoing policy discussion about the
eect of big data analysis on low-income, disadvantaged, and vulnerable consumers. One part of the report
summarizes the concerns of several privacy advocates and academics over the potential inaccuracies of big
data analytics. I write separately to emphasize the importance of evaluating these opinions in the context of
market and competitive forces that aect all companies using big data analytics.
e report details the use of big data as it aects low-income, disadvantaged, or vulnerable consumers.
Importantly, the report describes some of the many ways companies are already using big data to benet
such consumers—and others. e report also recognizes big datas massive potential benets. In addition,
the report sketches the legal landscape implicated by big data and oers questions that companies may nd
useful as they apply big data techniques to solve their business challenges.
e report also describes certain concerns about big data tools raised by some consumer advocates and
researchers. Specically, some fear that big data analysis will produce inaccurate or incomplete results, and
that actions based on such awed analysis will harm low-income, disadvantaged, or vulnerable consumers.
For example, some worry that companies may use inaccurate big data analysis to deny opportunities to
otherwise eligible low-income or disadvantaged consumers, or to fail to advertise high-quality lending
products to eligible low-income customers.
Concerns about the eects of inaccurate data are certainly legitimate, but policymakers must evaluate
such concerns in the larger context of the market and economic forces companies face. Businesses have
strong incentives to seek accurate information about consumers, whatever the tool. Indeed, businesses use
big data specically to increase accuracy. Our competition expertise tells us that if one company draws
incorrect conclusions and misses opportunities, competitors with better analysis will strive to ll the gap.
1 F. T C’, B D: A T  I  E U  I 8–11, 25–27
(2016). e report also references other concerns that big data analysis will be too accurate: companies will understand their
consumers too well and misuse that data to the consumers detriment. Market forces also constrain many such potential
harms, but other such harms could actually undermine market forces. For example, the report describes concerns that
unscrupulous businesses will use big data techniques to develop “sucker lists” of consumers particularly vulnerable to scams
and misleading oers. e report does a good job laying out the existing legal framework that applies to such harmful uses.
2 Id. at 9–11.
3 A real world example of the competitive advantages of novel but accurate application of data analytics was famously
chronicled in the book (and movie) Moneyball. See M L, M: T A  W  U G
(2004). Oakland’s strategy succeeded precisely because it “liberated” baseball players from “unthinking prejudice rooted in
baseball’s traditions . . . allowing them to demonstrate their true worth.Id. at iiv. Each baseball franchise continually faces
erefore, to the extent that companies today misunderstand members of low-income, disadvantaged,
or vulnerable populations, big data analytics combined with a competitive market may well resolve these
misunderstandings rather than perpetuate them.
In particular, a companys failure to communicate
premium oers to eligible consumers presents a prime business opportunity for a competitor with a better
To understand the benets and risks of tools like big data analytics, we must also consider the powerful
forces of economics and free-market competition. If we give undue credence to hypothetical harms, we risk
distracting ourselves from genuine harms and discouraging the development of the very tools that promise
new benets to low income, disadvantaged, and vulnerable individuals.
Today’s report enriches the conversation about big data. My hope is that future participants in this
conversation will test hypothetical harms with economic reasoning and empirical evidence.
marketplace pressures to improve player quality predictions. Similarly, companies using big data analytics face competitive
forces that punish inaccuracy and reward accuracy.
4 Indeed, there is strong theoretical and empirical economic evidence that low income and other disadvantaged households
stand to gain more than the wealthy from many applications of big data analytics. See J C. C, S,
P,  P P H  B D: C B  C 38–49 (2015), http://
ssrn.com/abstract=2655794 (describing theoretical and empirical studies on the eects of big data in credit markets, price
discrimination, and labor markets for low income individuals). One simple example: lenders do not need big data analytics
to identify creditworthy high-income persons, as nearly all have credit les and most are lower-risk. However, lower-income
groups contain both high- and low-risk borrowers. Big data analysis can help bring credit to the lower-risk low income
borrowers with thin or no credit les. See id. at 38–39.
5 Transcript of Big Data: A Tool for Inclusion or Exclusion?, in Washington, D.C. (Sept. 15, 2014), at 231–32 (Daniel
Castro and Michael Spaeda in conversation), https://www.ftc.gov/system/les/documents/public_events/313371/bigdata-
transcript-9_15_14.pdf (highlighting the business opportunities in improved accuracy of credit scoring for low-income
individuals). Indeed, our workshop on lead generation showed that lenders and other businesses are highly motivated to
reach potential customers and spend a lot of money and eort to do so. See generally Follow the Lead: An FTC Workshop on
Lead Generation, F. T C’ (Oct. 30, 2015), https://www.ftc.gov/news-events/events-calendar/2015/10/follow-
6 For example, Cooper describes a useful framework to help identify under which conditions the presumption should be for or
against big data uses. See C, supra note 4, at 33–38.