【Smart mode】Basic operational procedures
Abstract：This tutorial demonstrates the basic operational procedures of smart mode.
ScrapeStorm is a new generation of web scraping software developed by the former Google search technology team based on artificial intelligence technology. The software can not only automate the extraction of data, but also clean the data during the extraction process, and can filter various contents such as numbers and e-mail at the data source.In the smart mode, the user enters the correct URL and runs according to the rules automatically configured by the system or the rules configured by the user, which can extract the required data. This tutorial demonstrates the basic operation flow of the smart mode.
(1) Enter the correct URL
Copy the URL you want to scrape in the browser and open the ScrapeStorm smart mode paste URL to create a new scraping task.
Click here to learn more about how to enter the correct URL.
(2) Select page type and set the pagination
After selecting the URL to extract, we select the page type and what we need to extract and set the pagination.The page types can be divided into two categories, one is a single page (detail page), the other is a list page, and the smart mode is suitable for extracting the contents of a single page (detail page), a list page, a list + a single page (detail page).After the page type is determined, we can set the page. The smart mode can identify 99% of the page type. The user can directly perform data scraping, but in the case of system identification error, the page can be set manually.
(3) Scrape the content that needs to be logged in to view
In the process of data scraping, we sometimes encounter webpages that need to log in to view the content. At this time, we need to use the pre-login function to log in to the webpage and then perform normal data scraping.
Click here to learn more about how to log in to the web page.
(4) Switch browser mode
In the process of data scraping, the data scraping effect of different browser modes is different. We can improve the effect by switching browsers, but not every webpage is suitable for this operation. Users need to judge whether to use this function according to the actual situation. .
Click here to learn more about how to switch browser mode.
(5) Set the extraction field
After entering the URL, the system will automatically identify the URL and set the extraction field. The user can directly use this field for data scraping, or you can set the field to extract.
Click here for more ways to set up the extracted fields.
(6) Configuration scraping task
After the extraction field is set, the scraping task can be set. The user can use the system default settings or set the scraping task by himself.
Click here to learn more about how to configure the scraping task.
(7) Scheduled job
Ordinary users can choose to start scraping data at a fixed point in time. In addition to allowing users to select data at a fixed time, Premium Plan and above users can also continuously scrape data in a fixed period.
Click here for more information on scheduled job.
(8) Synchronize to the database
Professional Plan and above users can use the Synchronize to Database function to export data while running data. It is not necessary to wait until the end of the task to export the data, and synchronize to the database with the timing collection function, which can greatly save time and improve efficiency. Suitable for users who need to continuously query data or monitor grievances.
Click here to learn more about syncing to the database.
(9) Download images
If the user needs to scrape the image on the web page to the local, you can use the download image feature to complete this requirement.
Click here to learn more about how to download images to the local.
(10) View the extraction results and export data
After the task is set, the user can view the extraction result and export the data.
ScrapeStorm export without any restrictions, completely free to export, you can rest assured to use.
Click here for more ways to view the results of the extraction and export the data.