vissza a tantárgylistához   nyomtatható verzió    

    Very Large Databases

    A tantárgy neve magyarul / Name of the subject in Hungarian: Nagyméretű adathalmazok kezelése

    Last updated: 2015. december 2.

    Budapest University of Technology and Economics
    Faculty of Electrical Engineering and Informatics

    Engineering Information Technology, MsC

    Theory of Computation, minor specialization

    Course ID Semester Assessment Credit Tantárgyfélév
    VISZMA01 2 2/1/0/v 4  
    3. Course coordinator and department Dr. Katona Gyula,
    Web page of the course cs.bme.hu/nagyadat
    4. Instructors

    Dr. Gyula Katona

    associate professor

    Department of Computer Science and Information Theory

    Bálint Daróczy 

    PhD

    MTA SZTAKI

    5. Required knowledge Database theory, graph theory, basic algorithmic techniques
    6. Pre-requisites
    Kötelező:
    NEM ( TárgyEredmény( "BMEVISZM144" , "jegy" , _ ) >= 2
    VAGY
    TárgyEredmény("BMEVISZM144", "FELVETEL", AktualisFelev()) > 0)

    A fenti forma a Neptun sajátja, ezen technikai okokból nem változtattunk.

    A kötelező előtanulmányi rend az adott szak honlapján és képzési programjában található.

    7. Objectives, learning outcomes and obtained knowledge Overview of special theoretical and practical problems arising in the course aims for large data sets. Students are given an insight into the topic of modern trends, data mining, relational databases, large graphs, data streams theoretical and practical questions.
    8. Synopsis
    1. Machine learning basic tasks first, discriminating and generative models, attribute types,

    2. Nearest neighbor search: normalization, distance.

    3. Decision trees: wood building models (C4.5, regression trees), the purity levels, cuts,

    4. Early- and post-pruning, management of continuous variables.

    5. Naive Bayes: Managing continuous variables, m-Estimate.

    6. Perceptron: activation function, stochastic gradient.

    7. Clustering: mid-point (k-Means, bisecting k-Means)

    8. Density-based methods (DBSC, OPTICS), hierarchical clustering (linkage).

    9. Recommendation Systems: collaborative filtering (matrix factorization, nearest neighbor methods), content-based recommendation.

    10 Searching: index building, ranking (TF-IDF, BM25, PageRank)

    11. Support vector machines (SVM): maximal margin, kernel functions

    12. Principal Component Analysis (PCA)

    13 Artificial Neural Networks (ANN): Unsupervised (Restricted Boltzmann Machines)

    14 Artificial Neural Networks (ANN): Supervised (Multilayer Percetpron) case.
    9. Method of instruction Lectures and computer aided practice problems.
    10. Assessment

    Signature:

    2 midterms, both must be at >=40%, optional homework, extra points added to midterm results

     

    Final:

    The grade is based on the midterm results, can be improved at oral exam.

    13. References, textbooks and resources Tan-Steinbach-Kumar: Introduction to Data Mining, Pearson Educacion; 2nd Revised edition edition (2013)
    14. Required learning hours and assignment
    In class42
    Preparation for classes28
    Preparation for midterms20
    Homework15
    Reading assignment
    Preparation for final15
    Total120
    Comments

    Dr. Gyula Katona

    associate professor

    Department of Computer Science and Information Theory

    Bálint Daróczy 

    PhD

    MTA SZTAKI