RapidMiner participated at the 2nd Big Data All Hands Meeting (BDAHM) which was held in conjunction with the 2nd Smart Data Innovation Conference (SDIC), by Karlsruhe Institute of Technology on October 11-12, 2017. In their joint presentation entitled Realizing Smart Data by automating Tabular Search, Integration and Extraction methods , Dr. Edwin Yaqub and Philipp Schlunder presented various aspects of the DS4DM work. They focused on the challenge of dealing with increasingly large and heterogenous data, part of which is siloed in corporate data stores while a large mass is also available publicly on the web.
Dr. Edwin Yaqub (Data Scientist, RapidMiner) at the 2nd BDAHM and SDIC Conference (BDAHM/SDIC 2017) in Karlsruhe, Deutschland
Philipp Schlunder (Data Scientist, RapidMiner) at the 2nd BDAHM and SDIC Conference (BDAHM/SDIC 2017) in Karlsruhe, Deutschland
The talk highlighted how the DS4DM project developed automated methods to separate noise from signal when harnessing tabular collections for discovering relevant new data. In addition, the talk explained DS4DM extraction extensions, which target non-trivial document formats like PDF and HTML documents, cloud-based and API-driven online spreadsheets like Microsoft Excel Online and Google Spreadsheets, and accessing data from SharePoint data store. The presenters highlighted the lack of practical tools to consolidate data from such disparate sources. They concluded by arguing how the DS4DM extensions for RapidMiner reduce this gap and help to realize smart data in a domain-independent fashion. In the more informal discussions that followed, live demonstrations were given to the participants on individual basis in order to raise further awareness of DS4DM work.
 Realizing Smart Data by automating Tabular Search, Integration and Extraction methods, Dr. Edwin Yaqub, Philipp Schlunder, David Arnu and Ralf Klinkenberg