Intelligent Data Analysis

A tantárgy neve magyarul / Name of the subject in Hungarian: Intelligens adatelemzés

Last updated: 2016. augusztus 27.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics
PhD Course
Course ID Semester Assessment Credit Tantárgyfélév
VIMMD294   4/0/0/v 5  
3. Course coordinator and department Dr. Antal Péter,
5. Required knowledge basics of probability theory
7. Objectives, learning outcomes and obtained knowledge

The rapidly escalating challenges in data science with respect to data size, dimensionality or heterogeneity highlighted the importance of the whole process of data analysis, including study design, data collection, data engineering, combination of a priori knowledge and data, combination of multiple inductive modules into a complex system and deriving optimal interventions. In parallel, the unprecedented challenges also renewed interest in complex inductive schemes, such as in learning of overall network models, causal systems models or in active and reinforcement learning.

 

The course provides a systematic overview both about intelligent methods used throughout the data analysis process and about intelligent, complex machine learning schemes used in modern data analysis. Unifying themes of this dual approach, are the Bayesian decision theoretic framework, the network and systems-based approaches, data and knowledge fusion, the use of ontologies and semantic technologies and active, online (reinforcement) learning, which integrate various phases and aspects of data analysis. The course also presents and discusses real-world applications, from the field of biomedicine, pharmaceutical research and system diagnostics.

 

The course is at the cross-road of statistics, big data analytics, artificial intelligence and machine learning. It is self-contained, but ideally complements earlier studies in these directions.

 

After accomplishing this course, you will be familiar with the following:

(1)    Theoretical bases of induction. The engineering workflow of data analysis.

(2)    Optimization, Bayesian model averaging and sensitivity analysis using resampling methods in data analysis.

(3)    Semantic data repositories, data visualization, dimensionality reduction, data engineering/transformations using ontologies, data cleaning and imputation.

(4)    Unsupervised learning: clustering, module learning, self-organizing maps, network science, metric learning.

(5)    Supervised learning: decision trees, regression, kernel methods, multilayer perceptron, deep neural networks.

(6)    Probabilistic graphical models: Bayesian networks, dynamic/temporal Bayesian networks.

(7)    Reinforcement, active, budgeted and online learning.

(8)    Knowledge and data fusion: ontologies, semantic technologies, linked open data.

8. Synopsis
  1. Introduction: the basic task of data analysis. Intelligent data analysis: mathematical statistics + machine learning.
  2. Some complex data analysis examples: industrial problems, medical decision making, financial data prediction, etc.  
  3. Theoretical bases of induction. The frequentist approach. The Bayesian approach.
  4. The bias-variance dilemma. Essential inequalities. PAC learning. The VC-dimension.
  5. Decision theory and machine learning. Evaluation and estimation of future performance, early discovery measures.
  6. Optimization (from gradient descent and simulated annealing to constrained optimizations).
  7. Supervised learning (SL): Decision-tree learning.
  8. SL: from linear discriminator/regression to perceptron and multilayer perceptron (MLP).
  9. SL: kernel methods, sparse models (SVM, RVM, etc.).
  10. SL: from knowledge-based MLPs to deep learning architectures.
  11. Data visualization, dimensionality reduction, and data engineering.
  12. Data cleaning, outlier/anomaly detection, incomplete data.
  13. The data analysis workflow: examples for the complex process of data analysis.
  14. Complex models in data analysis: examples.
  15. Unsupervised learning (UL): clustering.
  16. UL: module learning, network science.
  17. Bayesian inference (development of Monte Carlo methods).
  18. Resampling methods: bootstrap and permutation tests.
  19. Naïve Bayesian network, logistic regression.
  20. Hidden Markov Models, Kalman filters.
  21. Bayesian networks and its extensions.
  22. Causality research, causal Bayesian networks.
  23. Dynamic Bayesian networks. Longitudinal data and time series analysis. Gaussian processes.
  24. Reinforcement learning. Active/budgeted learning. Bandits, sequential/online learning
  25. Knowledge and data fusion. Ontologies, semantic technologies, linked open data, semantic data repositories. Multiple hypothesis testing and correction, enrichment methods.
  26. Rank learning, prioritization methods, recommendation systems, matrix factorization methods.
  27. Homework presentation.
  28. Overview, outlook.
13. References, textbooks and resources

D. J. Hand: Intelligent Data Analysis

C.M. Bishop: Neural Networks for Pattern Recognition

Andrew Gelman: Bayesian Data Analysis

T.Hastie, R.Tibshirani, J.Friedman: The Elements of Statistical Learning

R. G. Cowel: Probabilistic Networks and Expert Systems

14. Required learning hours and assignment
Kontakt óra
Félévközi készülés órákra
Felkészülés zárthelyire
Házi feladat elkészítése
Kijelölt írásos tananyag elsajátítása
Vizsgafelkészülés
Összesen