Getting Started Main Features Examples

【Flowchart Mode】How to configure the scraping task

2018-10-15 19:24:17
163 views

Abstract:This tutorial shows you how to configure the scraping task.

In the Flowchart Mode of ScrapeStorm, when we edit the scraping rules, we need to configure the task before the task starts. Click the “Settings” button in the lower right corner to open the task setting window.

Specific settings include running settings and anti-block settings, as shown in the following figure:

1. Running Settings

(1) Encountering existing data

Scrap again: scrape all the data, regardless of whether the data has been scraped before. “Scrape again” is selected by default.
Skip and continue: When you encounter the data that has been scraped, skip this data and scrape the new data.
Stop scraping: encounter the data that has been scraped, stop scraping, and end the scraping task.

(2) Request waiting time

Some web pages are slow to open and sometimes affect the effect of extraction. Users can set up a waiting time, which can effectively improve the quality of the extraction.The system default wait time is 1 second, and the user can modify it according to requirements.

(3) Block Images

In general, blocking images can improve the scraping speed. However, if the web page scraped by the user needs to input a verification code, the function cannot be used. Otherwise, the verification code cannot be displayed, and the data cannot be scraped.

(4) Block Ads
Using this feature can effectively improve the speed of scraping, but under intelligent algorithms, it is possible to block content that is not an advertisement. Users should use this function with caution.

2. Anti-blocking Settings
Some websites may set some shielding measures to prevent data from being scraped properly. In this case, some anti-blocking function can be set to improve the scraping effect.

(1) Switch browser regularly
By setting the timing switch browser version, the anti-blocking effect can be achieved, and a switching cycle can be freely selected to switch the browser version.

(2)  Clear cookies regularly
By setting the timing to clear the cookies, the anti-blocking effect can be achieved, and the cycle can be freely selected to clear the webpage cookies.