ScrapeStorm Tutorial : How to scrape details of hot questions from StackExchange

2018-07-27 16:54:50

Abstract:This tutorial explains in detail how to extract data from a product detail page via ScrapeStorm's smart mode.

Stack Exchange is a series of Q&A sites, each of which contains questions in different areas. These sites use a reputation reward system where users vote on questions and answers and influence user reputation. The reputation system makes these sites self-control.

The following is a detailed description of extracting data from this Q&A website.


Step 1.Creating a task.

Open ScrapeStorm, select “Smart Mode”, click “Start”.

Input a listing url,like: https://stackexchange.com/, then click “Create”.

Step 2. Scraping into the the product listing page.

Select the title link column and click “Scrape Into”.

On detail page click “Add Field” button and then select the element in web page to extract its related text.

Rename the fields.

Select “Modify Data” from the drop down box, and click”Extract Number”.

Step 3. Starting to extract.

Click “Start”, check “Block Ads” in the pop-up box to prevent the extraction of ads and change the request time to 5s. Then you can find that ScrapeStorm has extracted data.

Click “Export” to download your data.

After the extraction is completed, you can export the data to a local file (including excel, html, csv, etc.) and a database.

P.S. The data of the list page and the detail page will be merged during the extraction.

The following image is a screenshot of the file exported to excel2007:

If you are still confused about the process, please watch the tutorial video as below: