Getting Started Main Features Examples

【Smart Mode】How to enter the URL correctly

2018-10-15 19:33:32
654 views

Abstract:This tutorial mainly introduces how to enter the URL correctly in smart mode.

In Smart Mode, when we create a new task, whether the URL is entered correctly is critical to the final extraction result. This article focuses on how to enter the URL correctly so that everyone can extract the data they want.

1. The location of the entered URL:

(1) On the scraper home page: only one URL can be entered.

(2) Create a new smart mode task and open the URL edit window:

Smart mode supports entering multiple URLs or importing URLs from local files (currently only TXT format files are supported, and the rest of the file formats are under development).
When entering/importing URLs in this window, please ensure that the format meets the following requirements:

ⅰ. All web pages belong to the same website;
ⅱ. Multiple URLs should be separated by the Enter key to ensure that there is only one URL per line;
ⅲ. All web pages belong to the same type, for example, list-type page or details page.
P.S.  Different pages of different websites or different types of web pages of the same website should be set up with different tasks.

(3) In the interface where the smart mode task is opened:

Here you can edit the URL. If you have more than 200, please modify the local file directly.
P.S.  If you are importing a URL from a local file, the changes here will not affect the local file.

The format of the URL after editing should meet the requirements in section (2) above.

2. Source of the input URL:

In Smart Mode, ScrapeStorm can automatically turn the page, but can’t perform operations such as inputting text and searching (if you need to do these things, please use Flowchart Mode).

So the input URL should be a page that has completed the search operation, showing the content that needs to be scraped eventually (or the first of the consecutive pages that need to be scraped).
For example: single URL extraction, search for “The Kite Runner” in Goodreads, open the corresponding page, copy the URL.

 

For example: single URL extraction, after searching for “mac pro” through Amazon, get the search result list page and copy the URL of the first page.

For example: multi-URL extraction, search for “restaurant” and “bar” in yelp, copy the URL separately.