Download and Sign Up
Get a $5 Coupon For Free
English
Getting Started Main Features

【Smart Mode】【Flowchart Mode】How to scrape links of detail pages | Web Scraping Tool | ScrapeStorm

2022-04-25 19:59:07
448 views

Abstract:This article will introduce how to scrape links of detail pages. ScrapeStormFree Download

When scraping data, it is often necessary to scrape links of detail pages. This article will explain how to use our ScrapeStorm smart mode to scrape links to the detail pages in three ways, and the flowchart mode is the same.

First way: Automatic detect

The smart mode will automatically detect the list. Generally, when the list is detected, the link of the detail page will also be detected.

Note: If the automatic detection is inaccurate, you can also select the list manually.

For more details, you can refer to the tutorial:

How to scrape a list page

 

Second way: Through “Scrape In”

In the process of list detection, sometimes it is encountered that the link to the detail page cannot be detected. At this time, we can use “Scrape In” to enter the detail page and scrape the link of the detail page.

1)  After the list is detected, add a field to detect data with a link of the detail page. The software automatically generates the field.

Note: The data with the link is generally the title of the article, or the name of the product, etc. You can confirm it by operating it on the browser.

2) Right-click the generated field, set “Extract Type“, and select “Link URL“.

3) Click “Scrape In” to enter the detail page.

For more details, you can refer to the tutorial:

How to Scrape In

4) After entering the detail page, add a field arbitrarily, then right-click the generated field, set “Special Value“, and select “Page URL“.

 

Third way: Splicing to get the link

If none of the above methods can successfully scrape the link of the detail page, but the ID of the detail page can be extracted by using XPath or regular expressions, you can use “Modify Data” to splicing to get the link of the detail page.

Note: If you do not know XPath or regular expressions, please contact us for customization.

Right-click the field, set “Modify Data“, and create a new “Add Prefix“.

This way, we can get the link of the detail page.

python download file python crawler
关闭