A Security Analysis of the Facebook Ad Library
Abstract:
Actors engaged in election disinformation are using online advertising platforms to spread political messages. In response to this threat, online advertising networks have started making political advertising on their platforms more transparent in order to enable third parties to detect malicious advertisers. We present a set of methodologies and perform a security analysis of Facebook's U.S. Ad Library, which is their political advertising transparency product. Unfortunately, we find that there are several weaknesses that enable a malicious advertiser to avoid accurate disclosure of their political ads. We also propose a clustering-based method to detect advertisers engaged in undeclared coordinated activity. Our clustering method identified 16 clusters of likely inauthentic communities that spent a total of over four million dollars on political advertising. This supports the idea that transparency could be a promising tool for combating disinformation. Finally, based on our findings, we make recommendations for improving the security of advertising transparency on Facebook and other platforms.
Introduction
Online advertising plays an increasingly important role in political elections and has thus attracted the attention of attackers focused on undermining free and fair elections. This includes both foreign electoral interventions, such as those launched by Russia during the 2016 U.S. elections, and continued deceptive online political advertising by domestic groups. In contrast to traditional print and broadcast media, online U.S. political advertising lacks specific federal regulation for disclosure.
Absent federal online political ad regulation, platforms have enacted their own policies, primarily focused on fact checking and political ad disclosure. The former is concerned with labelling ads as truthful or misleading, and the latter refers to disclosing alongside political ads who is financially and legally responsible for them. However, big challenges remain to understanding political ad activity on platforms due to personalization (ads tailored to potentially small audiences) and scale (both in terms of advertisers, and number of unique ads). One common feature of the platforms’ voluntary approaches to mitigating these issues has been to deploy public political ad transparency systems to enable external auditing by independent third parties. These companies promote their transparency products as a method for securing elections. Yet to date, it is unclear whether this intervention can be effective.
Because these systems are so new, we currently lack a framework for third parties to audit the transparency efforts of online advertising networks.1 Anecdotal reports discussed issues with the implementation and security of Facebook’s transparency efforts. However, absent a third-party auditor, it is unclear how severe or systematic these problems are.
In this paper, we focus on a security analysis of Facebook’s Ad Library for ads about social issues, elections or politics. The key questions we investigate are: Does the Facebook Ad Library provide sufficient transparency to be useful for detecting illicit behaviour? To what extent is it possible for adversarial advertisers to evade that transparency? What prevents the Ad Library from being more effective?
We propose a set of methodologies and conduct a security audit of Facebook’s Ad Library with regards to inclusion and disclosure. In addition, we propose a clustering method for identifying advertisers that are engaged in undeclared coordinated advertising activities, some of which are likely disinformation campaigns.
During our study period from May 7th, 2018 to June 1st, 2019, we encountered a variety of technical issues, which we brought to Facebook’s attention. More recently, Facebook’s Ad Library had a partial outage, resulting in 40% of ads in the Ad Library being inaccessible. Facebook did not publicly report this outage; researchers had to discover it themselves. We have also found that contrary to their promise of keeping political ads accessible for seven years, Facebook has retroactively removed access to certain ads in the archive.
We also found persistent issues with advertisers failing to disclose political ads. Our analysis shows that 68,879 pages (54.6% of pages with political ads included in the Ad Library) never provide a disclosure string. Overall, 357,099 ads were run without disclosure strings, and advertisers spent at least $37 million on such ads. We also found that even advertisers who did disclose their ads sometimes provided disclosure strings that did not conform to Facebook’s requirements.
Facebook has created a policy against misrepresentation that prohibits "Mislead[ing] people about the origin of content" and has periodically removed ‘Coordinated Inauthentic Activity’ from its platform. Google and Twitter have also increased their efforts to remove inauthentic content from their platforms. We applaud these policies and the improvements in their enforcement by the platforms. However, our clustering method, and manual analysis of these clusters, still find numerous likely inauthentic groups buying similar ads in a coordinated way. Specifically, we found 16 clusters of likely inauthentic communities that spent $3,867,613 on a total of 19,526 ads. The average lifespan of these clusters was 210 days, demonstrating that Facebook is not effectively enforcing their policy against misrepresentation.
We will make publicly available all of our analysis code, and we will also make our ad data available to organisations approved to access Facebook’s Ad Library API.
In summary, our main contributions are as follows:
We present an algorithm for discovering advertisers engaging in potentially undeclared coordinated activity. We then use our method to find advertisers likely violating Facebook’s policies. This demonstrates that transparency can potentially improve security.
We show that Facebook’s Ad Library, as currently implemented, has both design and implementation flaws that degrade transparency.
We suggest improvements to the security of political advertising transparency on Facebook and other platforms.
Background
A key feature of advertising on social media platforms is fine grained targeting based on users’ demographic and behavioural characteristics, allowing advertisers to create custom-tailored messaging for narrow audiences. As a result, different users see different ads, and it is challenging for outsiders to expose unethical or illegal advertising activity.
In an effort to provide more transparency in the political advertising space, several social media platforms have created public archives of ads that are deemed political. Different platforms have taken different approaches about which ads they include in their archive, and how much metadata they make available. In the remainder of this paper, we focus on Facebook’s approach, as it is the largest both in size and scope.
We also restrict our analysis to the U.S. market.
A. Facebook
Ads on Facebook resemble posts in the sense that in addition to the text, image, or video, they always contain the name and picture of a Facebook page as their "author." In practice, advertisers do not necessarily create their own pages to run ads. Instead, they may hire social media influencers to run ads on their behalf; these ads appear as if "authored" by the influencer’s page. In the remainder of this paper, we refer to the entity that pays for the ad as the advertiser, and the Facebook page running the ad as the ad’s sponsor. If an ad’s advertiser and sponsor are different, the advertiser does not interact with Facebook; the sponsor creates the ad in the system and is responsible for complying with Facebook’s policies.
1) Scope
Facebook has relatively broad criteria for making ads transparent, including not only ads about political candidates at any level of public office, but also ads about social issues. In detail, Facebook includes any ad that "(1) Is made by, on behalf of, or about a current or former candidate for public office, a political party, a political action committee, or advocates for the outcome of an election to public office; (2) Is about any election, referendum, or ballot initiative, including ‘get out the vote’ or election information campaigns; (3) Is about social issues in any place where the ad is being run; (4) Is regulated as political advertising". Relevant social issues include Abortion, Budget, Civil Rights, Crime, Economy, Education, Energy, Environment, Foreign Policy, Government Reform, Guns, Health, Immigration, Infrastructure, Military, Poverty, Social Security, Taxes, Terrorism, and Values .
2) Policies & Enforcement
In the political space, Facebook aims to provide some transparency by requiring ad sponsors to declare each individual ad as political, and to disclose the identity of the advertiser who paid for it. Many details of Facebook’s policies changed over the course of our research, often without public announcement, and sometimes retroactively. For instance, Facebook retroactively introduced a grace period before enforcing the requirement that political ads be declared, and retroactively exempted ads run by news outlets. Here, we give a broad overview of the policies in effect at the time the ads in our dataset were created.
Before ad sponsors can declare that an ad is political, they must undergo a vetting process, which includes identity verification. As part of this process, they also create "disclaimers," which we call disclosure strings. During the time period covered by our dataset, disclosure strings were free-form text fields with the requirement that they "accurately represent the name of the entity or person responsible for the ad," and "not include URLs or acronyms, unless they make up the complete official name of the organisation" . Once the vetting process has completed, for each new ad that they create, ad sponsors can (and must) declare whether it is political by selecting a checkbox. As a consequence of declaring an ad as political, the ad will be archived in Facebook’s public Ad Library for seven years. Furthermore, the disclosure string will be displayed with the ad when it is shown to users on Facebook or Instagram.
Facebook primarily relies on the cooperation of ad sponsors to comply proactively with this policy. Only vetted accounts can declare an ad as political, and even then, ad sponsors must "opt in" each individual ad. According to our understanding, Facebook uses a machine learning approach to detect political ads that their sponsors failed to declare. Undeclared ads detected prior to the start of the campaign are terminated, and not included in the Ad Library. Once ads are active, users can report them as not complying with disclosure requirements. Furthermore, Facebook appears to conduct additional, potentially manual, ad vetting depending on the ad’s reach, i.e., for ads with high impression counts. Undeclared political ads that are caught after they have already been shown to users are terminated, and added to the Ad Library with an empty disclosure string. According to private conversations with Facebook, enforcement was done at an individual ad level. As a result, there appeared to be little to no consequences for similar undisclosed ads, or for repeat offenders.
3) Implementation
Facebook operates a general Ad Library, which contains all ads that are currently active on Facebook and Instagram . At the time of writing, the website is freely accessible and contains ad media such as the text, image or video. However, access through automated processes such as web crawlers is disallowed. For political ads only, the library also includes historical data. The website started in May 2018, and notes that political ads are to be archived for seven years.
The political ads in the library are accessible through an API . For each ad, the API contains a unique ID, impression counts and the dollar amount spent on the ad, as well as the dates when the ad campaign started and ended. Facebook releases ad impressions and spend data in imprecise ranges, such as 100 spend, or 1,000 – 5,000 impressions. At the time of our study, some data available through the web portal were not accessible through the API. Specifically, ad images and videos were not programmatically accessible.
In addition to the ad library, Facebook also publishes a daily Ad Library Report containing all pages that sponsored political ads, as well as the disclosure strings used, and the exact dollar amount spent (if above $100). During our study period, 126k pages sponsored at least one political ad.
A. Online Ad Transparency
Prior work has proposed methods for independently collecting and analysing data about online ad networks. Guha et al.proposed a set of statistical methodologies for improving online advertising transparency. Barford et al. deployed Adscape, a crawler-based method of collecting and analysing online display ads independent of the ad network. Lecuyer et al. proposed a statistical method for inferring´ customization of websites including targeted ads. The Sunlight system was able to infer some segment and demographic targeting information of online ads using statistical methods . All of this prior work was limited by the small amount of data these systems could collect, and the inherent noise of attempting to infer information from likely biassed data.
More recently, Facebook has deployed an ad targeting transparency feature, which provides a partial explanation to users why they are seeing a certain ad. Andreou et al. investigated the limitations and usefulness of this explanation. In a separate work, Andreou et al. built a browser plugin that collected crowdsourced ad and targeting information, and performed an analysis of the advertisers using Facebook’s ad network. This prior work focuses on understanding transparency around ad targeting.
Closest to our work is a pair of studies analysing political advertisers using data from Facebook’s Ad Library and ProPublica’s browser plugin. Ghosh et al. demonstrated that larger political advertisers frequently use lists of Personally Identifiable Information (PII) for targeting. Edelson et al. mentioned the existence of problematic political for-profit media and corporate astroturfing advertisers. However, our study is, to the best of our knowledge, the first to propose an auditing framework for online ad transparency portals, and to conduct a security analysis of Facebook’s Ad Library.
B. Disinformation/Information Warfare
There is a growing amount of prior work reviewing recent Russian attempts to interfere in the democratic elections of other countries via information attacks. Farrell and Schneier examine disinformation as a common-knowledge attack against western-style democracies. Caufield et al.review recent attacks in the United States and United Kingdom as well as potential interventions through the lens of usable security. Starbird et al. present case studies of disinformation campaigns on Twitter and detail many of the key features that such disinformation campaigns share. One insight is that disinformation attacks often involve the creation of inauthentic communities. This is a key part in the design of our algorithm for detecting likely undisclosed coordinated advertising.
C. Clustering Based Abuse Detection Methods
There is a wealth of prior work exploring how to detect spam and other abuse by using content analysis and clustering methods. There are many studies which have proposed text similarity methods and clustering to detect email , Online Social Networking (OSN) , SMS , and website spam , and other types of abusive activities. Our method of detecting undisclosed coordinated activity between political advertisers is largely based on this prior work. In the space of political advertising, Kim et al. manually annotated ads with topics and advertisers for the purpose of grouping and analysis. In contrast, our clustering method is automated except for manual validation of parameter thresholds.
Methodology Framework
The goal of this paper is twofold. First, we aim to provide a framework of methodologies for auditing the tools introduced by social media platforms to improve transparency around advertising of political and societal topics. From a security point of view, issues of interest are how the platform’s implementation of transparency affects ad sponsors’ compliance with transparency policies, how the platform handles noncompliance, and whether the available data is rich enough to detect advertising behaviour that likely violates the platform’s policies. Based on the transparency tools currently available, this concretely involves retrieving the complete archive of ads deemed political, verifying the consistency of the archive, auditing the disclosures of who paid for ads, and detecting undesirable advertising behaviour in the archive, especially with respect to potential violations of platform policies. In addition to proposing this methodology framework, as the second goal of this paper, we apply this methodology to conduct a security analysis of Facebook’s Ad Library. We chose Facebook because to date it is the largest archive, both in scale and scope.
Limitations: Ideally, efforts to audit transparency tools should also assess the completeness of the ad archive, i.e., how many (undetected) political ads on the platform are incorrectly missing in the archive. For platforms that ban political advertising, an important question is whether the ban is enforced effectively. Another key issue is whether disclosures are accurate, i.e., whether they identify the true source of funding. Unfortunately, answering these important questions is difficult, or impossible with the data made available by the social media platforms at the time of our study. As we have to operate within the constraints of the available data, we can only provide limited insight into these aspects at this time. We leave a more comprehensive study of archive completeness and disclosure accuracy for future work. Similarly, we focus our current efforts on metadata analysis, and plan to investigate ad contents, such as topics, messaging, and customization, in more detail in future work.
A. Data Collection
As a prerequisite for all subsequent analysis, we need to retrieve all ad data available in the transparency archive. In the case of Facebook’s Ad Library, at the time of our study, API access to ads was only provided through keyword search, or using the identifier of the sponsoring page. Therefore, we proceed in two steps. The first step consists in collecting a comprehensive list of Facebook pages running political ads. We obtained this list from Facebook’s Ad Library Report . We download this report once a week, selecting a seven-day time range. Subsequently, we use Facebook’s Ad Library API to retrieve all (new) ads from that week’s batch of pages. We also execute occasional backfills to compensate for failures.
Even though Facebook’s Ad Library went into effect on May 7th, 2018, actual enforcement began at a later date, on May 24th, 2018. For the purpose of our analysis, we use a study period running from May 24th, 2018, when enforcement began, to June 1st, 2019. Our dataset contains 3,685,558 ads created during the study period. Add data collected via the API is right censored, in the sense that ads created during our study period can still undergo changes after the end of the study period. For example, an undisclosed ad might be detected with a delay, and be added to the Ad Library after our last observation, meaning that it would be incorrectly excluded from our analysis. In order to avoid this issue, when performing time-based analysis, we do not report data for the last month of our study period . As a result, for each ad included in our analysis, we capture all possible changes that occurred within a delay of up to one month after ad creation.
Related Articles
How to Export Leads from Facebook Ads
Facebook introduced Lead Ads in 2015, and since then, it’s become a powerful tool for everyday businesses to improve their sales and marketing abilities. But, with the many changes occurring on the platform, we thought it’d be worth walking you ...
A step-by-step guide on how to use Facebook Business Manager
Still managing your Facebook Pages and ad accounts for your brand through your personal account? Sharing logins is opening the door to security and privacy concerns, and manually adding and assigning roles to teammates on a Facebook Page can be ...
How to Run Facebook Ads: Step-by-Step Guide to Advertising on Facebook
When setting up a paid Facebook ad, there are a lot of boxes to check. Are you targeting the right people? Did you choose the right bidding model for your ad? Are you running the right type of ad? If we’re being honest, it can get a little confusing. ...
What is Facebook Business Manager? How to Configure the Settings
Whatever the size of your eCommerce brand, having a well-organised Facebook page and ad accounts is crucial to your social media marketing success. A big part of that is setting up your Facebook Business Manager. Way back in 2014, Facebook launched ...
How to Write the Perfect Facebook Bio for Your Profile
When you think about social media marketing, you might be focused on post creation, social branding, and trying to find ways to scale your content across multiple platforms. You think about groups and hashtags and images and video playlists. When ...