【Flowchart Mode】What is a Behavior Component
Abstract：This tutorial describes what behavioral components are and what their respective features are.
The Behavior Component is a set of programming modules used in the ScrapeStorm Flowchart Mode, each component corresponding to an operational behavior.
There are a total of 8 behavior components, including:
Open URL, Scroll, Click elements, Enter text, Hover element, Extract data, Save data, and Select drop-list.
Components can be automatically generated by clicking on a web page or manually dragged from the component window below. If you need to delete the component, just click the “X” in the upper right corner of the component to delete it.
1. Open URL
“Open URL” component refers to the behavior of opening a web page specifically. Generally, when we enter a URL in the Flowchart Mode, an “Open URL” component appears on the task flow interface, and the component can be dragged to the operation interface.
The settings for the “Open URL” component include the following four sections:
(1) Edit URL: Click this setting to modify the open URL.
(2) Set the timeout period: It is used to set the timeout period for waiting for the webpage to open. The default time of the software is 10 seconds, and the user can modify it.
(3) Custom cookie: Same as the pre-login function, you can enter the cookie here to achieve the pre-login effect. The system default setting does not require login.
(4) Clear the cache: used to clean up the access records on some websites. It has nothing to do with the scraping task, and the system default settings are not cleaned up.
The “scroll” component means that when the URL is opened directly, the item does not appear directly on one screen, but you need to manually scroll the page to see more items.
In this case, the normal settings may affect the extraction effect, and the user can set a component of the scrolling page to solve the problem.
Rolling method: You can choose to scroll one screen at a time or scroll directly to the bottom.
Number of scrolls: The number of scrolls in this operation, the software defaults to 2 times, the user can also change the number of scrolls as needed.
Rolling interval(seconds): The time between one scrolling interval, the software default is 2 seconds.
3. Click element
The “click element” component corresponds to a click operation on a web page, and its settings include the XPath of the element and whether to open a new tab.
The “click element” component added by the auxiliary point selection will be set automatically, we can use it directly, no need to manually configure it.
By manually dragging and adding the “click element” component, we can generate XPath by clicking the button to the right of the settings box and then clicking on the elements in the page, or we can edit the XPath parameters directly.
On the option to open a new tab, we generally choose not to open the new tab, only when you need to extract the data on the details page, you can choose to open a new tab.
4. Enter text
The “Enter Text” component corresponds to the operation of entering text in a web page, and its settings include the XPath of the input box and the input text.
During the operation, we input the text in the operation prompt box according to the software prompt process. In this case, the component has been set up, we can use it directly, no need to manually configure.
By manually dragging and adding the “Enter Text” component, you can set XPath by clicking the search box on the web page. Users with programming basics can manually set XPath manually, and enter the text we want to search in the input text box.
Tips：If we need to jump to the text search interface results, we need to add a “click element” component after the “Enter Text” component, otherwise the task will only stay in the search box and will not jump to the result page.
5. Hover element
The application scenario of “Hover element” is that some web pages need to be moved to a certain location to display content. Its setting is mainly the XPath of the element.
We generally need to add the component by manual dragging. You can create XPath by clicking the button to the right of the setting box, then clicking the element in the page, or you can edit the XPath parameter directly.
6. Extract data
The “Extract Data” component refers to extracting data from a web page and setting the extracted fields.
Click here to learn more about the Extract Data component.
7. Save data
The “Save data” component is mainly used to save the scraped data. It is a component that must be present in the scraping task rule. The lack of this component will result in the task not being able to scrape the data.
The “Save data” component is usually set after the last “Extract data” component so that the data extracted before the Save Data component is merged.
For example, we first enter the list page, then click on the link on the list page to enter the detail page, there will be two “Extract data” components of the list page and the details page.
If we add a “Save data” component after “Extract data” on the list page and the detail page, the scraping results obtained by the two “Extract data” components are separate.
If you add the “Save data” component only after the last “Extract data” component, the fields extracted by the detail page are merged into the fields extracted by the list page.
8. Select dropList
The “Select dropList” component is used to select a drop-down list in a web page. Its settings include the XPath of the droplist and the options for the droplist.
The “Select dropList” component added by the auxiliary click operation, the software has automatically set up XPath, we only need to select the list option in the operation prompt box according to the software prompt.
By manually dragging and dropping the “Select droplist” component, you can set XPath by clicking the list box on the web page. Users with programming basics can manually set Xpath manually.