The new version of RapidMiner extension "Web Table Extraction" (version 1.0.0) has been released on the RapidMiner Marketplace. This version provides a new Operator called "Extract Structured Data", which extracts structured data items from HTML documents. The Operator is broadly applicable as it can extract data items defined using schema.org microdata or plain HTML elements, which can be identified from HTML attributes or CSS tags. The documentation of the operator provides several tutorial processes, which demonstrate extraction from various websites.
This new capability allows users to conveniently integrate structured data into their data mining processes. For instance, users can now extract product data such as furniture, electronic components, books, news articles, Blogs, etc. from Webshop catalogues or other websites. This data can be used to train analytics models for product ranking, price prediction or sentiment analysis.