Data Mining using R

Course Overview

ID 2201006
Duration 2.0 days
Methods Lecture with examples and exercises.
Prerequisites General knowledge of math
Target group Information workers, IT professionals
Vorgängerkurs 2201001

Overview

Data mining (the analysis step of the \"Knowledge Discovery in Databases\" process, or KDD) is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Dates

OPEN
IN-HOUSE

Zurzeit stehen keine offenen Termine zur Verfügung. Nutzen Sie alternativ die Inhouse‑Option.

Learn with customized examples and content—precisely tailored to your requirements.

Your benefits at a glance

  • Flexible preferred date
  • Customized content
  • Intensive exchange
  • High practical relevance

Description

R offers a variety of packages for multivariate analysis and data mining. Use R for data mining to identify patterns in data, such as groups, important variables, or relationships that can be used for classification and prediction. This seminar will show you how to perform many data mining methods using RStudio and common R packages. It will provide you with the mathematical background of the individual methods and demonstrate how to perform data mining practically with R, RStudio, and R Data Miner (Rattle).

Services

  • Lunch / catering
  • Help with hotel / travel
  • Comelio certificate
  • Flexible: free cancellation up to one day before
Service-Kaffeekanne

Still looking for additional reading? Discover suitable specialist books in our catalog.

Content

Data Mining-Grundlagen

Statistik, multivariate Statistik und Data Mining – Data Mining-Kreislauf - Daten-Vorverarbeitung: Beschreibende Datenaggregation, Datenbereinigung, Datenintegration und –transformation – Datenreduktion – Diskretisierung und Konzept-Hierarchien – Data Mining und Business Intelligence: Datenbanken, Data Warehouses und OLAP als Basis für Data Mining

Data Mining mit der Assoziationsanalyse

Suchen von häufigen Kombinationen (Frequent Itemset Mining) – Apriori-Algorithmus - Assoziationsregeln und Assoziationsanalyse - Warenkorbanalyse

Data Mining mit Entscheidungsbäumen

Ableitung von Entscheidungsbäumen – Auswahl von Attributen – Beschneidung von Bäumen – Ableitung von Regeln - Gütemaße und Vergleich von Modellen

Data Mining mit Wahrscheinlichkeitstheorie

Wahrscheinlichkeitstheorie und Bayes Theorem –Naïve Bayes-Algorithmus – Bayes Netze

Fortgeschrittene Data Mining-Verfahren für Klassifikation

Künstliche neuronale Netze und der Backpropagation-Algorithmus - Support Vector Machines für linear und nicht-linear trennbare Daten – Klassifikation mit Assoziationsanalyse – Lazy und Eager Learners

Cluster-Analyse

Einführung in die Cluster Analyse – Ähnlichkeits- und Distanzmessung - Varianten und grundlegende Techniken – Partitionierende Methoden: k-Means-Verfahren - Hierarchische Methoden: agglomerative und divisive Verfahren – Weitere Verfahren: Dichte- und Grid-basierte Methoden

Instructor

Our trainer for statistics and data mining with R, Marco Skulschus, studied economics in Wuppertal and Paris and has been working for more than 10 years as a lecturer, author of specialist books on databases and data analysis, and as a consultant for statistical analysis with R. Participants in his R seminars work in marketing, quality assurance, or are (aspiring) data scientists who want to use R for statistics and data mining.

Publications

  • Grundlagen empirische Sozialforschung (Comelio Medien )
    978-3-939701-23-1
  • System und Systematik von Fragebögen (Comelio Medien )
    978-3-939701-26-2
  • Oracle SQL (Comelio Medien )
    978-3-939701-41-5
  • MS SQL Server - T-SQL - Abfragen und Analysen (Comelio Medien )
    978-3-939701-69-9

Projects

As a consultant, Mr. Skulschus designs analysis systems based on relational databases and then develops statistical models and analyses using R programming. His clients include market research companies, marketing departments, quality assurance and process optimization departments, and research institutions.

Research

He led a multi-year research project to develop a questionnaire system with an ontology-based data model and innovative question-answer representations. Funded by the BMWi and in collaboration with various universities.