Budapest University of Technology and Economics, Faculty of Electrical Engineering and Informatics

    Belépés
    címtáras azonosítással

    vissza a tantárgylistához   nyomtatható verzió    

    Data Science - Part 1

    A tantárgy neve magyarul / Name of the subject in Hungarian: Adatbányászat - 1

    Last updated: 2010. november 10.

    Budapest University of Technology and Economics
    Faculty of Electrical Engineering and Informatics

    Mérnök informatikus szak

    BSc képzés

    Course ID Semester Assessment Credit Tantárgyfélév
    VISZA083   0/0/2/f 2  
    3. Course coordinator and department Dr. Katona Gyula,
    Web page of the course www.cs.bme.hu/....
    4. Instructors
    Name

     

    Position

     

    Department

     

    András A. BENCZÚR

     

    Lecturer

     

    Department of Computer Science and Information Theory

     

    András LUKÁCS

     

    Lecturer

     

    Department of Computer Science and Information Theory

     

    Gyula Y. KATONA

     

    Assoc. professor

     

    Department of Computer Science and Information Theory

     

    Gábor WIENER

     

    Assoc. professor

     

    Department of Computer Science and Information Theory

     

    5. Required knowledge The course requires basic knowledge in calculus, probability theory, and linear algebra. Knowledge of graphs and basic algorithms is an advantage
    7. Objectives, learning outcomes and obtained knowledge The aim of the course is to provide a basic but comprehensive introduction to data mining. By the end of the course, students will be able to build models, choose algorithms, and implement and evaluate them.

     

    8. Synopsis

    1.      Motivations for data mining. Examples of application domains. Methodology of knowledge discovery in     databases (KDD) and data mining (DM). Formulation of main problems of data mining.
    2.      Understanding data: preparation and exploration. Sampling.
    3.      Basics of classification. Concepts of training and prediction. Decision trees.
    4.      Models and algorithms for classification: k-NN, naïve-Bayes. Measuring quality and comparison of classification models.
    5.      Introduction to the WEKA data mining software. Classification with WEKA.
    6.      More models and algorithms for classification: neural networks, linear separation methods, support vector machine (SVM).
    7.      Feature selection: filter and wrapper methods. Midterm test.
    8.      Basics of cluster analysis. Type of variables, measuring similarity and distances. Partitioning clustering algorithms, k-means, k-medoids.
    9.      Hierarchical clustering algorithms. Density based clustering, DBSCAN, OPTICS. Cluster analysis with WEKA.
    10. Introduction to frequent itemset mining. Applications for finding association rules.
    11. Level-wise algorithms, APRIORI. Partitioning and Toivonen algorithms.
    12. Pattern growth methods, FP-growth. Constraints handling.
    13. Hierarchical and general association rules. Pattern mining with WEKA.
    14. Sequental and subgraph patterns. Final test.

    9. Method of instruction   Handouts, PowerPoint presentations, relevant research papers, web page, course mailing list and Wiki. Weekly regular office hour for consultations.

     

    10. Assessment

    At the end of the semester, there will be a comprehensive written test of the theory. 

    Grading will be based on the following criteria:

    Class participation & activity       30            points

    Comprehensive written test         70            points.

     

    12. Consultations You can reach the instructor at the following e-mail address for consultation:

     

    András A. Benczúr : benczur@ilab.sztaki.hu

     

    13. References, textbooks and resources Jiawei Han and Micheline Kamber: Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann Publishers, 2006.

     

    Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining, Addison-Wesley, 2006.

     

    T. Hastie, R. Tibshirani, J. H. Friedman: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001.

     

    14. Required learning hours and assignment
    Number of contact hour28
    Preparation to the classes12
    Preparation to the tests
    20
    Homework 
    Assigned reading 
    Preparation to the exam 
    Total60
    15. Syllabus prepared by
    Name

     

    Position

     

    Department

     

    András A. Benczúr

     

    Lecturer

     

    Department of Computer Science and Information Theory