Tips before scraping data with Facebook Scraper.

2020-09-11 16:57:49

Facebook has billions of active users and this makes it the topmost platform to have numerous data about various individuals. Scraping Facebook data means to extract data about individuals’ interests, conduct, and other demographics. The data is usually fetched and stored in a document or spreadsheet so as to be utilized in the forthcoming time.


But before scraping Facebook data, there are some things you need to know.


When planning to scrape a website, you should always check its robots.txt first. Robots.txt is a file used by websites to let “bots” know if or how the site should be crawled and indexed. You could access the file by adding “/robots.txt” by the end of the link of your target website.


Enter in your browser, and let’s check the robots file of Facebook.


The lines state that Facebook prohibits all automated scrapers. That is, no part of the website should be visited by an automated crawler.


Why do we need to respect robots.txt?


Websites use the robots file to specify a set of rules on how you/bots should interact with them. When a website blocks all access to crawlers, the best thing to do is leave that site alone. To follow the robots file is to avoid unethical data gathering, as well as any legal ramifications.


Facebook warns at the very beginning of their robots file: “Crawling Facebook is prohibited unless you have express written permission.”

Check the link on the second line, you could find Facebook’s Automated Data Collection Terms, last revised on April 15th, 2010.


GDPR only applies to personal data.

If you aren’t scraping personal data, then GDPR does not apply.

In short, unless you have the person’s explicit consent it is now illegal to scrape an EU resident personal data under GDPR.


But surely you are still able to scrape data from Facebook as you need.

Disclaimer: This article is contributed by our user. Please advise to remove immediately if any infringement caused.

