Data Science - Part 1

A tantárgy neve magyarul / Name of the subject in Hungarian: Adatbányászat - 1

Last updated: 2010. november 10.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics

Mérnök informatikus szak

BSc képzés

Course ID Semester Assessment Credit Tantárgyfélév
VISZA083   0/0/2/f 2  
3. Course coordinator and department Dr. Katona Gyula,
Web page of the course www.cs.bme.hu/....
4. Instructors
Name

 

Position

 

Department

 

András A. BENCZÚR

 

Lecturer

 

Department of Computer Science and Information Theory

 

András LUKÁCS

 

Lecturer

 

Department of Computer Science and Information Theory

 

Gyula Y. KATONA

 

Assoc. professor

 

Department of Computer Science and Information Theory

 

Gábor WIENER

 

Assoc. professor

 

Department of Computer Science and Information Theory

 

5. Required knowledge The course requires basic knowledge in calculus, probability theory, and linear algebra. Knowledge of graphs and basic algorithms is an advantage
7. Objectives, learning outcomes and obtained knowledge The aim of the course is to provide a basic but comprehensive introduction to data mining. By the end of the course, students will be able to build models, choose algorithms, and implement and evaluate them.

 

8. Synopsis

1.      Motivations for data mining. Examples of application domains. Methodology of knowledge discovery in     databases (KDD) and data mining (DM). Formulation of main problems of data mining.
2.      Understanding data: preparation and exploration. Sampling.
3.      Basics of classification. Concepts of training and prediction. Decision trees.
4.      Models and algorithms for classification: k-NN, naïve-Bayes. Measuring quality and comparison of classification models.
5.      Introduction to the WEKA data mining software. Classification with WEKA.
6.      More models and algorithms for classification: neural networks, linear separation methods, support vector machine (SVM).
7.      Feature selection: filter and wrapper methods. Midterm test.
8.      Basics of cluster analysis. Type of variables, measuring similarity and distances. Partitioning clustering algorithms, k-means, k-medoids.
9.      Hierarchical clustering algorithms. Density based clustering, DBSCAN, OPTICS. Cluster analysis with WEKA.
10. Introduction to frequent itemset mining. Applications for finding association rules.
11. Level-wise algorithms, APRIORI. Partitioning and Toivonen algorithms.
12. Pattern growth methods, FP-growth. Constraints handling.
13. Hierarchical and general association rules. Pattern mining with WEKA.
14. Sequental and subgraph patterns. Final test.

9. Method of instruction   Handouts, PowerPoint presentations, relevant research papers, web page, course mailing list and Wiki. Weekly regular office hour for consultations.

 

10. Assessment

At the end of the semester, there will be a comprehensive written test of the theory. 

Grading will be based on the following criteria:

Class participation & activity       30            points

Comprehensive written test         70            points.

 

12. Consultations You can reach the instructor at the following e-mail address for consultation:

 

András A. Benczúr : benczur@ilab.sztaki.hu

 

13. References, textbooks and resources Jiawei Han and Micheline Kamber: Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann Publishers, 2006.

 

Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison-Wesley, 2006.

 

T. Hastie, R. Tibshirani, J. H. Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001.

 

14. Required learning hours and assignment
Number of contact hour28
Preparation to the classes12
Preparation to the tests
20
Homework 
Assigned reading 
Preparation to the exam 
Total60
15. Syllabus prepared by
Name

 

Position

 

Department

 

András A. Benczúr

 

Lecturer

 

Department of Computer Science and Information Theory