On November 4, 2022, LinkedIn announced a “significant win” for the platform and its members against “personal data scraping.” The win resulted from a 6-year legal battle that asked, in part, whether LinkedIn must allow hiQ Labs to scrape data from the public profiles of LinkedIn members.
Last Friday, the U.S. District Court for the Northern District of California answered that question by ruling that LinkedIn’s User Agreement “unambiguously prohibits hiQ’s scraping and unauthorized use of the scraped data.” And as such, hiQ breached LinkedIn’s User Agreement “through its own scraping of LinkedIn’s site and using scraped data.”[1]
An Overview of Data Scraping
Data scraping is a technique by which a computer program extracts data from another program or source. The technique typically uses scraper bots, which send a request to a specific website and, when the site responds, the bots parse and extract specific data from the site in accordance with their creators’ wishes.
Scraper bots can be built for a multitude of purposes, including:
- Content scraping – pulling content from a site to replicate it elsewhere.
- Price scraping – extracting prices from a competitor.
- Contact scraping – compiling email, phone number, and other contact information.
In today’s economy, data is key, and data scraping is an efficient means of acquiring huge amounts of specific data. Yet, this court ruling signals that companies may need to be more cautious about how and where they use data scraping bots.
hiQ’s Data Scraping Violates LinkedIn’s User Agreement
Founded in 2012 as a “people analytics” company, hiQ Labs provides information to businesses about their workforces. To do this, hiQ extensively relied on using automated software to scrape data from LinkedIn’s public profiles. hiQ then aggregated, analyzed, and summarized that data to create two products, “Keeper” and “Skill Mapper,” which allowed businesses to improve their employee engagement and reduce costs associated with external talent acquisition.
However, in 2017, LinkedIn sent a cease-and-desist letter threatening legal action against hiQ, arguing that LinkedIn’s User Agreement prohibits data scraping. Specifically, the User Agreement states:
You agree that you will not:
- Scrape or copy profiles and information of others through any means (including crawlers, browser plugins and add-ons, and any other technology or manual work);
. . .
- Use manual or automated software, devices, scripts[,] robots, other means or processes to access, ‘scrape,’ ‘crawl’ or ‘spider’ the Services or any related data or information;
- Use bots or other automated methods to access the Services, add or download contracts, send, or redirect messages.
Court records indicate that hiQ knew about this prohibition since 2015 yet continued scraping data from LinkedIn’s public profiles and even “attempted to reverse engineer LinkedIn’s systems . . . to avoid detection by simulating human site-access behaviors.” Based on these facts, LinkedIn sought a partial summary judgment finding hiQ liable for breach of contract.
From hiQ Labs’ perspective, while the above User Agreement language may appear clear, language elsewhere in the User Agreement seemed to provide users and members with a right to scrape data from public profiles. Specifically, the User Agreement provides the following when delineating members’ rights and obligations:
2. Obligations
. . .
When you share information, others can see, copy and use that information.
. . .
3.1 Your License to LinkedIn
. . .
c. We will get your consent if we want to give others the right to publish your posts beyond the Service. However, other Members and/or Visitors may access and share your content and information, consistent with your settings and degree of connection with them.
hiQ argued that the User Agreement’s statements that “Visitors may access and share your content and information consistent with your settings” and that “[w]hen you share information, others can see, copy and use that information” are inconsistent with the prohibition of scraping data. And that, as a user and member of LinkedIn who agreed to the User Agreement, hiQ read this inconsistency to mean that hiQ had the right to scrape data from public profiles.
Unfortunately for hiQ, this argument failed. The court concluded that informing users that their data may be copied and used does not contradict LinkedIn’s prohibition against scraping, crawling, or spidering. “The two concepts are not mutually exclusive – a warning to members that a third party may collect their public-facing data is not a blessing for third parties to do so through expressly prohibited means.”
Thus, hiQ breached LinkedIn’s User Agreement, which “clear[ly]” prohibits data scraping, by scraping LinkedIn’s site and using that scraped data.
LinkedIn May Lose Despite This Victory
It is important to note that, although LinkedIn considered this a victory, the court only granted partial summary judgment in favor of LinkedIn on its breach of contract claim.
hiQ raised numerous defenses to LinkedIn’s breach of contract claim, including waiver and estoppel, arguing that LinkedIn knew about hiQ’s data scraping as early as 2014 yet failed to act until the cease-and-desist letter in 2017. hiQ’s argument goes, in short, that because LinkedIn knew about hiQ’s data scraping but delayed in taking legal steps to prevent it, LinkedIn either waived its right to enforce the breach of contract claim or should be estopped because hiQ reasonably relied on LinkedIn’s acquiescence to the data scraping.
The court concluded that there is at least a genuine dispute of material fact as to whether LinkedIn knew about hiQ’s data scraping as early as 2014, which – if sufficiently proven – could provide grounds for hiQ to raise the defenses of waiver and estoppel.
These arguments remain unresolved, and it is not clear at this time whether hiQ and LinkedIn will continue battling in court – especially given that hiQ has gone dormant since 2019 – but we will continue monitoring for further developments.
Further Privacy Concerns
Lastly, this case brings to mind broader legal issues regarding publicly available personal information.
Under the California Consumer Privacy Act of 2018 (CCPA), as amended by the California Privacy Rights Act of 2020 (CPRA), businesses must satisfy numerous obligations when processing personal information. However, the definition of “personal information” does not include “information made available by a person to whom the consumer has disclosed the information if the consumer has not restricted the information to a specific audience.”
Similarly, under the EU’s General Data Protection Regulation (GDPR), the law’s prohibition against the processing of special data categories (e.g., race, ethnicity, religion, health, etc.) does not apply if the “processing relates to personal data which are manifestly made public by the data subject.”
These exceptions are reminiscent of hiQ’s argument in this case: that LinkedIn’s User Agreement expressly said that “[v]isitors [of LinkedIn] may access and share your content and information consistent with your settings.” Meaning, the users themselves provided their information to LinkedIn and purposefully, via their settings choices, made their information available to the public.
Putting aside that LinkedIn’s User Agreement prohibited data scraping, hiQ’s argument raises the question: was hiQ scraping publicly available personal information, as it is understood under the GDPR and CCPA / CPRA? And if so, does that mean that hiQ would not have to comply with some requirements imposed by applicable general data protection laws?
The answer will likely depend on a fact-specific inquiry on the circumstances surrounding the user content, such as (i) which data protection law applies to the data subjects in question; (ii) whether privacy settings were readily apparent to users when they initially posted their profiles/content; and (iii) whether users took affirmative actions to publicly post their information.
In the meantime, businesses should remain aware that scraping personal information, even publicly available information, requires proper planning and due diligence.
Key Takeaways
- Data scraping remains a prevalent data collection practice, but individuals and companies may be liable for breach of contract claims stemming from data scraping practices in violation of a User Agreement.
- On the other hand, if a business wants to quash a company’s known data scraping practices that violate the User Agreement, waiting too long to take legal steps may result in the business forfeiting a breach of contract claim.
- Either way, this ruling indicates that companies must take User Agreements seriously, both their own (if they want to prevent data scraping) and those belonging to others (if they want to scrape data).
- Lastly, a question remains as to whether the data in this case was made publicly available, as the term is understood under US and EU data regulation laws.
[1] Note: The court also concluded that hiQ separately breached LinkedIn’s User Agreement by hiring independent contractors to create fake LinkedIn accounts to conduct “quality assurance” while logged into LinkedIn by “viewing and confirming hiQ customers’ employees’ identities manually.” LinkedIn’s User Agreement expressly prohibits creating false identities.