【Smart Mode】How to configure the extracted field
Abstract：In smart mode, ScrapeStorm will automatically identify the URL and extract the fields. If there are too many fields set by the system, or the user has other requirements, you can configure the extracted fields. This tutorial shows you how to set the extracted fields. No Programming Needed. Visual Operation. ScrapeStormFree Download
In Smart Mode, ScrapeStorm will automatically identify the URL and set the extraction field. The extracted field is the default user needs to extract the field.
If you think the field extracted by the system does not meet your needs, or you need to extract some new fields, then you can right click on the field and make settings in the menu bar, as shown below:
The detailed description of the specific settings is as follows:
There are two ways to merge fields.
(1) Click on a field that needs to be merged, right click and select “Merge”, then select the fields you want to merge in the page.
(2) Press crtl or shift to select multiple fields, then right click on “Merge”. This method is suitable for the combination of multiple fields.
3. Select in page
If you want to modify the content extracted in the field, or add a new field to set the extraction content, you need to click “Select in page”, and then extract the required data in the web page.
4. Edit Xpath
Xpath is a path query language that uses a path expression to find the location of the data we need in the web page. Users with a programming foundation can use this feature to set up a new XPath.
Click here to learn more about Xpath.
5. Extract Type
Different data needs to set different value attributes. When setting a new field, the value of the field defaults to a text field.
In general, when you select new data, ScrapeStorm will automatically help you determine the field attributes, you don’t need to set it up. However, if there is a judgment error, you can set the value attribute of the field yourself.
Extract text: Suitable for ordinary text data.
Extract innerHTML: Suitable for extracting HTML that does not include the content itself.
Extract outerHTML: Suitable for extracting HTML that includes the content itself.
Extract link URL: Suitable for extracting links
Extract image URL: Suitable for extracting images
Tips: HTML is a language used to describe web pages. It is mainly used to control the display and appearance of data. HTML documents are also called web pages.
Click here to learn more about HTML.
6. Modify data
Sometimes we need to do some processing on the content of the extracted fields. For example, you only need the numbers and email in the fields, or replace the text in the fields with new text, or clear the blank characters at the beginning and the end, or create some new regular expressions. Alternatively, you can click on “Modify Data”.
7. Special value
In the data scraping process, some users need to scrape some special fields, such as scraping time, page source code, current page title, current page URL, etc.
These fields cannot be scraped directly in the web page, then you can use “Special Value” to set the field. Users can create a new field, change the field to a special field, or change the original field to a special field.
8. Delete Column
You can right click on the field to select Delete, or press Ctrl or Shift to select multiple fields to delete.
If the user does not need the fields that the system automatically recognizes, you can click “Clear” to clear the fields and you can reset the required fields.
10. Add field
If you want to add a new field, click on “Add Field” in the upper right corner, right click on the newly added field, click on “Select in Page”, and extract the required data from the page.