Release of Data Search for Data Mining extension

Discover and Enrich Data Tables using the Search-Join method

Posted by Edwin Yaqub (RapidMiner) on May 1, 2017

Data is often scattered at various locations and stored in different formats. In order to make effective utilization of this data, a major gap in state of the art is to effectively utilize large heterogenous data corpa. The problem is to discover contextually relevant data and join it to existing tables. The solution has been developed as a joint work between RapidMiner and University of Mannheim in the form of Data Search for Data Mining extension. This includes a backend where 1) data tables are downloaded from various sources, preprocessed and indexed in a Lucene search engine. 2) A webservice provides an interface to discover tables per given criteria. This uses advanced schema and instance matching algorithms. 3) The highly interactive graphical front end of this extension eases query invocation, provides visual aids for inspection and manual refinements of results as well as (semi)automatic data integration and fusion. The Blog post on Data Search for Data Mining extension provides further details.