Is web scraping illegal in the United States?

Web scraping of publicly available data is generally not illegal in the United States. The hiQ v. LinkedIn ruling established that scraping public data does not violate the Computer Fraud and Abuse Act, and the Supreme Court's Van Buren decision narrowed the CFAA's scope to exclude unauthorized manner-of-access claims. However, scraping behind authentication, republishing copyrighted content, and collecting personal data without privacy compliance can create legal liability.

Does violating a website's Terms of Service make scraping illegal?

Not automatically. Courts have generally held that Terms of Service violations, on their own, do not constitute criminal unauthorized access under the CFAA when the data is publicly available. However, ToS violations can support breach of contract claims (especially if you agreed to them via clickwrap) and may be used as evidence of bad faith in other legal actions.

Can I scrape personal data like names and email addresses?

Scraping publicly displayed personal data is possible but triggers privacy regulation obligations. Under GDPR, you need a lawful basis (such as legitimate interest) for processing personal data of EU residents. Under CCPA, you must comply with consumer rights requests. Always minimize the personal data you collect, implement proper security, and have a data retention policy.

What happened in the hiQ v. LinkedIn case?

hiQ Labs scraped publicly available LinkedIn profiles for workforce analytics. LinkedIn sent a cease-and-desist and blocked hiQ's access. The Ninth Circuit ruled that scraping public data does not violate the CFAA, and ordered LinkedIn to remove its blocking measures. The case established the strongest legal precedent supporting web scraping of public data in the US. The case eventually settled.

Is web scraping legal under GDPR in Europe?

GDPR does not prohibit web scraping per se, but it strictly regulates the collection and processing of personal data. If your scraping collects personal data of EU residents, you need a lawful basis (typically legitimate interest), must practice data minimization, must honor data subject rights including erasure requests, and must implement appropriate security measures. Scraping non-personal data (prices, product specs, public statistics) does not trigger GDPR obligations.

Is Web Scraping Legal? Laws & Rules Explained

Q: Do I have to follow robots.txt when scraping?

Robots.txt is a voluntary standard, not a legal requirement. No court has ruled that ignoring robots.txt is illegal on its own. However, courts consider robots.txt compliance as evidence of good or bad faith. Respecting robots.txt directives strengthens your legal position and demonstrates responsible scraping practices.

Why the Legality of Web Scraping Matters

Web scraping is one of the most powerful data collection techniques available, but it operates in a legal gray area that confuses businesses, developers, and researchers alike. The question "is web scraping legal?" does not have a simple yes-or-no answer — it depends on what data you scrape, how you scrape it, where you are located, and how you use the data afterward.

Understanding the legal landscape is not just an academic exercise. Companies have faced lawsuits, injunctions, and significant legal costs over web scraping activities. At the same time, some of the most valuable businesses in the world — search engines, price comparison sites, market research firms — are built on web scraping. The difference between a legally defensible scraping operation and a legally risky one often comes down to specific technical and procedural choices that are easy to get right once you understand the rules.

Web scraping legal landscape across major jurisdictions

This guide covers the major legal frameworks that apply to web scraping, the landmark court cases that have shaped current law, and the practical guidelines that help you scrape responsibly. This is educational content, not legal advice — consult an attorney for guidance on your specific situation.

The Computer Fraud and Abuse Act (CFAA)

The Computer Fraud and Abuse Act is the primary US federal law that gets invoked in web scraping disputes. Originally passed in 1986 to criminalize computer hacking, the CFAA prohibits accessing a computer "without authorization" or "exceeding authorized access." The key legal question for scraping is whether visiting a public website and extracting data constitutes "unauthorized access."

The "Without Authorization" Debate

For decades, there was genuine legal uncertainty about whether scraping a public website could violate the CFAA. Website operators argued that their Terms of Service defined the scope of "authorized access," and that scraping in violation of those terms constituted unauthorized access under the CFAA. This interpretation, if accepted broadly, would have made virtually all web scraping a potential federal crime — since most major websites prohibit automated access in their Terms of Service.

Van Buren v. United States (Supreme Court)

The legal landscape shifted dramatically with the Supreme Court's decision in Van Buren v. United States. While this case involved a police officer accessing a license plate database for personal reasons (not web scraping), the Court's ruling narrowed the CFAA's scope significantly. The Court held that "exceeding authorized access" means accessing information that a person is not entitled to obtain at all — not accessing information in a manner that violates a policy or agreement. This interpretation strongly suggests that scraping publicly available data — information that anyone can access by visiting a website — does not violate the CFAA, even if the website's Terms of Service prohibit scraping.

Legal vs illegal web scraping activities comparison by jurisdiction

Practical Implications

After Van Buren, the CFAA is much less likely to be a successful legal weapon against scrapers who access only publicly available data. However, the ruling does not protect all scraping activities:

Logging in to access data: If you create an account, agree to Terms of Service, and then scrape data that is only available behind authentication, you may be "exceeding authorized access" under the CFAA. The data behind a login wall is not publicly available — you accessed it through credentials, and the ToS may define the scope of that access.
Circumventing technical barriers: If a website implements IP blocking, CAPTCHAs, or other technical measures to prevent scraping, and you circumvent those measures, the legal analysis becomes more complex. While the CFAA's applicability is debatable, other laws (like the DMCA) may apply to the act of circumvention.
Government and protected systems: Scraping government databases, financial systems, or healthcare portals carries significantly higher legal risk regardless of whether the data appears publicly accessible.

hiQ Labs v. LinkedIn: The Landmark Scraping Case

The hiQ Labs v. LinkedIn case is the most important legal precedent specifically addressing web scraping, and it is worth understanding in detail because its reasoning directly applies to most commercial scraping activities.

Background

hiQ Labs was a data analytics company that scraped publicly available LinkedIn profile data to build workforce analytics products — tools that helped employers predict employee turnover and identify skills gaps. hiQ had been scraping LinkedIn profiles for years when LinkedIn sent a cease-and-desist letter demanding hiQ stop scraping, asserting that continued scraping would violate the CFAA. LinkedIn also implemented technical measures to block hiQ's access.

hiQ's Response

Rather than comply, hiQ sued LinkedIn, seeking an injunction that would prevent LinkedIn from blocking its access to public profiles. hiQ argued that LinkedIn's publicly available profile data was not protected by the CFAA because anyone could view it without logging in.

The Ninth Circuit's Ruling

The Ninth Circuit Court of Appeals ruled in hiQ's favor on multiple occasions (the case bounced between courts). The key holdings were:

Public data is not protected by the CFAA. The court found that when a website makes data available to the general public without requiring authentication, accessing that data does not constitute "unauthorized access" under the CFAA. A website cannot use the CFAA to create a private right of action simply by including a prohibition in its Terms of Service.
LinkedIn cannot unilaterally block access to public data. The court granted hiQ a preliminary injunction requiring LinkedIn to remove technical barriers that blocked hiQ's access. This was a remarkable outcome — a court ordering a website to allow scraping.
hiQ had a legitimate business interest. The court considered the balance of harms and found that blocking hiQ's scraping would destroy its business, while allowing scraping caused LinkedIn minimal harm since the data was already public.

What hiQ v. LinkedIn Means for Scrapers

This ruling provides the strongest legal foundation for web scraping of public data in the United States. However, it has important limitations:

It is a Ninth Circuit decision — binding in California and western states but only persuasive authority elsewhere.
It applies to publicly available data that does not require authentication.
It does not address all potential legal theories — LinkedIn could still pursue claims under state unfair competition laws, contract law, or copyright.
The case settled before a final trial, so there is no definitive jury verdict on the underlying claims.

Despite these limitations, hiQ v. LinkedIn established a practical precedent that most US courts have followed: scraping publicly available data is generally permissible under the CFAA.

Terms of Service and robots.txt: Do They Carry Legal Weight?

Two of the most frequently cited references in scraping legality discussions are website Terms of Service (ToS) and robots.txt files. Understanding what legal weight each actually carries helps you make informed decisions about your scraping practices.

Terms of Service

Almost every major website includes language in its Terms of Service prohibiting automated access, scraping, crawling, or data extraction. These prohibitions are standard boilerplate — you will find them on Amazon, Google, Facebook, LinkedIn, TikTok, and virtually every other platform. The legal question is whether violating these terms creates actionable legal liability.

Contract law theory: Website operators argue that visiting their site creates a binding contract (a "browsewrap" agreement) and that scraping in violation of the ToS constitutes breach of contract. Courts have been skeptical of this theory for browsewrap agreements — terms that are only accessible through a small link at the bottom of the page, which most users never see or read. For this theory to work, the website must show that the scraper had actual or constructive knowledge of the terms and took some affirmative action to accept them.

Clickwrap agreements are different. If you create an account and explicitly agree to Terms of Service by clicking an "I Agree" button, that creates a much stronger contractual obligation. Scraping after agreeing to ToS that prohibit scraping is a clearer breach of contract than scraping a site you have never logged into.

Impact of key legal rulings on web scraping practices

Practical reality: Very few ToS violation cases result in significant legal consequences for scrapers. Website operators typically enforce their ToS through technical measures (blocking IPs, deploying CAPTCHAs) rather than lawsuits. Litigation is expensive, and the damages from scraping public data are often difficult to quantify. That said, respect for ToS signals good faith and reduces your overall legal risk.

robots.txt

The robots.txt file is a text file at the root of a website (e.g., example.com/robots.txt) that tells web crawlers which parts of the site they should not access. It is part of the Robots Exclusion Protocol, a voluntary standard created in 1994.

robots.txt is not legally binding. It is a convention, not a law. No court has ruled that violating robots.txt directives constitutes illegal activity on its own. The robots.txt file is a request, not a command — it says "please do not crawl this" rather than "you are prohibited from crawling this."

However, robots.txt is legally relevant. Courts consider robots.txt compliance as evidence of good faith or bad faith. A scraper who respects robots.txt directives demonstrates responsible behavior, while a scraper who explicitly ignores robots.txt restrictions may have that held against them in a legal dispute. Think of robots.txt as a factor that influences legal outcomes, not as a legal barrier itself.

Practical Recommendation

Check robots.txt before scraping any site. Respect the directives when practical. If you need to scrape a section that robots.txt restricts, understand that you are accepting additional risk — not necessarily illegal risk, but reputational and evidentiary risk that could matter if a dispute arises. For more on responsible scraping practices, see our web scraping best practices guide.

GDPR, CCPA, and International Privacy Regulations

While the CFAA and ToS focus on whether you are allowed to access and collect data, privacy regulations focus on what kind of data you collect and how you handle it afterward. Privacy law is where web scraping legality gets significantly more complex, especially when personal data is involved.

GDPR (General Data Protection Regulation)

The EU's GDPR is the most comprehensive privacy regulation affecting web scraping. Key GDPR principles that apply to scraping:

Lawful basis for processing: GDPR requires a lawful basis for collecting and processing personal data. For scrapers, the most commonly cited basis is "legitimate interest" — the argument that your business has a legitimate reason for processing the data that is not overridden by the data subject's rights. Market research, competitive analysis, and academic research can qualify as legitimate interests, but you must conduct a balancing test that weighs your interest against the individual's privacy expectations.

Personal data scope: GDPR defines personal data broadly — any information that can identify a natural person, directly or indirectly. Names, email addresses, photos, usernames, and even IP addresses are personal data under GDPR. If your scraping collects any of these data points about EU residents, GDPR applies regardless of where your company is located.

Data minimization: GDPR requires that you collect only the personal data necessary for your stated purpose. If you are scraping product prices, you do not need to collect seller names. If you are analyzing market trends, you may not need individual reviewer identities. Collect the minimum personal data required for your research objective.

💡 Key Insight

The 2022 hiQ v. LinkedIn Supreme Court ruling established that scraping publicly available data is not a violation of the CFAA.

Right to erasure: Data subjects have the right to request deletion of their personal data. If you scrape and store personal data, you need a process for handling erasure requests. This is particularly relevant for scraped datasets that include names, usernames, or profile information.

CCPA (California Consumer Privacy Act)

California's CCPA gives consumers rights over their personal information, including the right to know what data a business collects, the right to delete that data, and the right to opt out of data sales. CCPA applies to businesses that collect personal information of California residents and meet certain revenue or data volume thresholds. If your scraping operation collects personal data about California residents and you meet the thresholds, CCPA compliance is required.

Other International Regulations

Many countries have enacted their own data protection laws that affect web scraping:

Brazil's LGPD mirrors GDPR in many respects and applies to processing of personal data of Brazilian residents.
Canada's PIPEDA requires consent for collection of personal information, with limited exceptions for publicly available data.
Australia's Privacy Act regulates collection and handling of personal information, including data scraped from websites.
Japan's APPI requires proper handling of personal information and restricts cross-border data transfers.

Practical Privacy Compliance for Scrapers

The safest approach to privacy compliance when scraping:

Avoid personal data when possible. If your research objective can be met with aggregated or anonymized data, do not collect personal identifiers.
Document your legitimate interest. Write a brief assessment explaining why your data collection is justified and how you balance it against privacy interests.
Implement data retention limits. Do not store personal data indefinitely. Set retention periods aligned with your research purpose and delete data when it is no longer needed.
Secure the data. Scraped personal data must be stored securely, with access controls and encryption appropriate to the sensitivity of the data.
Be prepared for subject requests. Have a process for responding to access and deletion requests from individuals whose data you have scraped.

Copyright and Database Rights

Beyond computer access laws and privacy regulations, copyright and database rights represent a third legal dimension that affects web scraping. These rights protect the content itself, regardless of how it was accessed.

Copyright and Scraped Content

Most content on the web is protected by copyright — articles, images, product descriptions, reviews, and creative works are all copyrightable. Scraping copyrighted content does not automatically constitute copyright infringement, but what you do with it matters enormously:

Copying for analysis is generally permissible. Making copies of copyrighted content for purposes of analysis, research, and indexing has strong legal support. Search engines copy the entire web to build their indexes, and this has been held to be fair use. Similarly, scraping website content to analyze trends, extract facts, or build aggregated datasets is generally treated as transformative use.

Republishing copied content is infringement. If you scrape articles, product descriptions, or images from a website and publish them on your own site, that is straightforward copyright infringement — even if you cite the source. Scraping for republication without permission is one of the clearest legal risks in web scraping.

Facts are not copyrightable. Under US law (established in Feist v. Rural Telephone), facts cannot be copyrighted. A product's price, a company's address, a stock price, and a weather reading are all facts that are freely usable regardless of where you found them. The creative expression used to present facts may be copyrightable, but the underlying facts are not. This is a critical distinction for data scrapers — extracting factual data points from websites is fundamentally different from copying creative content.

Database Rights (EU Specific)

The European Union has a sui generis database right that does not exist in US law. The EU Database Directive protects databases in which the maker has made a "substantial investment" in obtaining, verifying, or presenting the contents. This right exists independently of copyright and can protect databases of facts that would not be copyrightable.

For web scrapers, the EU database right means that extracting a "substantial part" of a protected database could violate the database maker's rights, even if the individual data points are factual and non-copyrightable. A substantial part can be measured quantitatively (a large percentage of the database) or qualitatively (the most valuable portion). Repeated extraction of small portions can also cumulatively violate database rights.

Practical Copyright and Database Right Guidance

Extract facts, not creative content. Prices, ratings, dates, specifications, and other factual data points carry minimal copyright risk. Full-text articles, detailed product descriptions, and images carry high risk.
Transform the data. Using scraped data as input for analysis, aggregation, or derivative works strengthens fair use arguments. The more your output differs from the scraped input, the stronger your position.
Do not replicate databases wholesale. Scraping an entire database and presenting it in a substantially similar format is risky under both copyright and EU database rights. Extract what you need for your specific purpose.
Attribute when appropriate. While attribution does not cure copyright infringement, it demonstrates good faith and may support fair use arguments in some contexts.

Industry-Specific Legal Considerations

Certain industries and data types carry additional legal considerations beyond the general frameworks discussed above.

Financial Data

Scraping financial data (stock prices, trading volumes, financial statements) from sources like Yahoo Finance, Bloomberg, or SEC filings intersects with securities regulations. SEC filings are public records and freely scrapeable. However, real-time stock price data is often licensed from exchanges, and redistributing it may violate exchange data licensing agreements. Delayed price data and historical data carry lower risk.

Healthcare Data

If your scraping captures any information that could be considered Protected Health Information (PHI) under HIPAA, you face significant regulatory obligations. Scraping public health directories, physician review sites, or clinical trial databases requires careful consideration of whether any data points constitute PHI. Generally, publicly available provider directory information (names, addresses, specialties) is not PHI, but patient-related data is.

💡 Key Insight

85% of Fortune 500 companies use web scraping for competitive intelligence, all operating within legal frameworks.

Real Estate Data

Real estate data scraping (from Zillow, Realtor.com, MLS listings) is common but involves specific legal considerations. MLS data is typically licensed and restricted — scraping MLS listings may violate licensing agreements and NAR rules. Public property records from government databases are generally freely scrapeable, as they are public records. For detailed guidance, see our article on scraping Zillow data.

Social Media Platforms

Social media scraping (Facebook, Instagram, TikTok, Twitter/X) is one of the most legally contested areas. Meta (Facebook/Instagram) has been particularly aggressive in pursuing legal action against scrapers, including obtaining criminal convictions in some jurisdictions. The legal landscape varies by platform and by the specific data being scraped. Public posts and profiles are generally lower risk, while private messages, friend lists, and advertising data are high risk.

Government and Public Records

Government data published on public websites is generally the safest category to scrape. Freedom of Information principles support public access to government data, and many government datasets are explicitly published for public use. However, some government systems have specific Terms of Use, and overwhelming government servers with scraping traffic can draw unwanted attention. The US government's data.gov portal and similar open data initiatives provide structured datasets that are explicitly intended for download and reuse.

Practical Guidelines: How to Scrape Legally and Responsibly

Based on the current legal landscape, here are actionable guidelines for conducting web scraping that minimizes legal risk while maximizing the value of your data collection.

The Green Zone: Generally Safe Practices

Scrape publicly available data without logging in. Data visible to anonymous visitors has the strongest legal protection under hiQ v. LinkedIn and Van Buren.
Extract factual data points. Prices, ratings, dates, specifications, and other facts are not copyrightable and carry minimal legal risk.
Use data for internal analysis and research. Transformative use of scraped data for your own business intelligence is well-supported legally.
Respect robots.txt and rate limits. Good-faith compliance with voluntary standards strengthens your legal position.
Scrape government public records. Public records are explicitly intended for public access.

The Yellow Zone: Proceed with Caution

Scraping data that includes personal information. Legal under some frameworks but requires GDPR/CCPA compliance if personal data of regulated residents is involved.
Ignoring ToS prohibitions. Not illegal on its own (for public data), but creates contractual risk and may be used as evidence of bad faith.
High-volume scraping. Large-scale scraping that impacts site performance could be framed as a tortious interference or trespass to chattels claim.
Scraping content for AI training. An evolving legal area with active litigation — see our guide on web scraping for AI training and RAG.

The Red Zone: High Legal Risk

Scraping behind authentication. Accessing data behind a login wall and scraping it likely constitutes exceeding authorized access under the CFAA.
Republishing scraped creative content. Copying articles, images, or descriptions and publishing them is copyright infringement.
Scraping and selling personal data. Collecting personal information through scraping and selling it to third parties creates significant privacy law exposure.
Overwhelming servers. Scraping at volumes that degrade website performance can support trespass to chattels and intentional interference claims.

Building a Legally Defensible Scraping Operation

If web scraping is core to your business, invest in building a legally defensible operation:

Document your practices. Maintain written policies covering what you scrape, why, how you store data, and how long you retain it.
Implement technical safeguards. Use rate limiting, respect robots.txt, and avoid scraping behind authentication. Tools like Autonoly's browser automation include built-in rate limiting and responsible scraping defaults.
Conduct regular legal reviews. The law in this area is evolving rapidly. Review your practices with legal counsel periodically, especially when expanding into new data sources or jurisdictions.
Separate data collection from data use. Keep your scraping infrastructure and data storage architecturally separate from your analytics and reporting systems. This makes it easier to comply with deletion requests and audit your data practices.
Have a takedown process. If a website operator contacts you and requests that you stop scraping, have a process for evaluating and responding to that request promptly. Even if you believe you have a legal right to scrape, engaging constructively with takedown requests demonstrates good faith.

Is Web Scraping Legal? What You Need to Know Before You Scrape

A comprehensive guide to the legal landscape of web scraping. Covers the Computer Fraud and Abuse Act, GDPR, robots.txt, Terms of Service enforcement, the landmark hiQ v LinkedIn ruling, and practical guidelines for staying on the right side of the law when extracting data from websites.