Data Mining - Concepts and Techniques

Course Overview

ID 2858813
Duration 2.0 days
Methods Lecture with examples and exercises.
Prerequisites Basics in Statistics
Target group Information workers, IT professionals

Overview

Data mining (the analysis step of the \"Knowledge Discovery in Databases\" process, or KDD) is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Dates

OPEN
IN-HOUSE

Zurzeit stehen keine offenen Termine zur Verfügung. Nutzen Sie alternativ die Inhouse‑Option.

Learn with customized examples and content—precisely tailored to your requirements.

Your benefits at a glance

  • Flexible preferred date
  • Customized content
  • Intensive exchange
  • High practical relevance

Description

In this seminar, you'll learn what data mining is and how you can use it to answer questions about your data. Use data mining in the graphical open-source tool Weka from the University of Waikato to identify patterns in data, such as groups, important variables, or relationships that can be used for classification and prediction.

Services

  • Lunch / catering
  • Help with hotel / travel
  • Comelio certificate
  • Flexible: free cancellation up to one day before
Service-Kaffeekanne

Still looking for additional reading? Discover suitable specialist books in our catalog.

Content

Introduction to Data Mining

Overview: Why Data Mining? What Is Data Mining? What Kinds of Data Can Be Mined? What Kinds of Patterns Can Be Mined? Which Technologies Are Used? - Data Preparation: Data Objects and Attribute Types, Basic Statistical Descriptions of Data, Measuring Data Similarity and Dissimilarity - Data Preprocessing: Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data Discretization - Data Warehousing and Online Analytical Processing (OLAP)

Data Mining for Frequent Patterns

Frequent Itemset Mining Methods - The Apriori Algorithm - Market Basket Analysis - Pattern Evaluation Method

Classification using Decision Trees

Decision Tree Induction - Attribute Selection Measures - Tree Pruning - Scalability and Decision Tree Induction - Rule-Based Classification

Classification using Probabilistic Approaches

Bayes Classification Methods - Bayes´ Theorem –Naïve Bayes Algorithm – Bayesian Networks - Model Evaluation and Selection - Techniques to Improve Classification Accuracy

Classification: Advanced Methods

Classification by Backpropagation and Artificial Neural Networks - Support Vector Machines - Lazy Learners

Cluster Analysis

Overview of Basic Clustering Methods - Measuring Data Similarity and Dissimilarity: Data Matrix versus Dissimilarity Matrix, Proximity Measures for Nominal, Ordinal, and Binary Attributes, Dissimilarity of Numeric Data - Partitioning Methods (k-Means and k-Medoids) - Hierarchical Methods: Agglomerative versus Divisive Hierarchical Clustering

Instructor

Our statistics and data mining trainer, Marco Skulschus, studied economics in Wuppertal and Paris and has been working for over 10 years as a lecturer, author of specialist books on databases, and as a developer of analytics platforms. He develops reporting solutions with data mining components in Microsoft Fabric and Oracle DB and develops in R, Python, and Oracle PL/SQL.

Publications

  • Grundlagen empirische Sozialforschung (Comelio Medien )
    978-3-939701-23-1
  • System und Systematik von Fragebögen (Comelio Medien )
    978-3-939701-26-2
  • Oracle SQL (Comelio Medien )
    978-3-939701-41-5
  • MS SQL Server - T-SQL - Abfragen und Analysen (Comelio Medien )
    978-3-939701-69-9

Projects

He developed analysis and reporting solutions for the processes of an insurance company, for the risk management of a bank, and a survey system for a human resources consultancy.

Research

He led a multi-year research project to develop a questionnaire system with an ontology-based data model and innovative question-answer representations. Funded by the BMWi and in collaboration with various universities.