Mathematical Statistics

A tantárgy neve magyarul / Name of the subject in Hungarian: Matematikai statisztika

Last updated: 2017. december 5.

Budapest University of Technology and Economics
Faculty of Electrical Engineering and Informatics

PhD course

Electrical Engineering

Informatics Engineering 

Course ID Semester Assessment Credit Tantárgyfélév
VISZD302   4/0/0/v 5 1/1
3. Course coordinator and department Dr. Győrfi László, Számítástudományi és Információelméleti Tanszék
Web page of the course www.cs.bme.hu/matstatd
4. Instructors Dr. László Ketskeméty, associate professor, Department of Computer Science and Information Theory
5. Required knowledge Probability theory
7. Objectives, learning outcomes and obtained knowledge

In this course students are introduced to  the theory and applications of mathematical statistics.

After overviewing the necessary tools and definitions from probability theory, we introduce the various statistical methods with an emphasis on the mathematical background and the practical use. We will also have practical examples on computer.

8. Synopsis


  1. Basic concepts. Basic modell of the mathematical statistics. Sample, sample realization. The data matrix. Measurement levels: nominal-, ordinal-, scale-level. The concept of the statistic. Descriptive statistics. Mean, trimmed mean, M-estimators. Median, mode. Variance, standard deviation, range, curtosis, skewness. Quantiles, quartiles. Graphs: bar-, pie-, line-diagram, histogram, steam-and-leaf, P-P-, Q-Q-diagram, boxplot, scatter dot. Tables: frequences, descriptives, crosstabs. Order statistics. Empirical distribution function.

  2. Distribution function of the sample, concept of the parameter. Theory of estimation. Unbiasedness, asymptotical unbiasedness, consistency, strong consistency, efficacy, sufficiency. Examples. Theoretical background: Glivenko-Cantelli, Neymann-Fisher, Rao-Blackwell-Kolmogorov theorems, Cramer inequality. Lukacs theorem. Distributions derived from normal distribution: Student-, Chi-squared-, Fisher-distributions.

  3. Maximum likely method, momentum method. Cramer-Dugue theorem. Examples. Confidence intervals. Confidence intervals edited for the parameters of the normal distribution. Confidence interval edited for the parameter of the exponential distribution. The basic concepts of hypothesis theory: null hypothesis, alternate hypothesis, test statistic, type I error, type II error, power function, consistency of the test, uniformly best test, Stein-lemma.

  4. Parametric tests: one-sample u- and t-tests. Two-sample u-test. Independent samples t-test. Paired samples t-test. F-test. Welch-test. One-way ANOVA. Bartlett-test. Examples.

  5. Nonparametric or asymptotic tests. The basic theorem of the chi-square test. Kolmogoroff and Gnedenko theorems. Fit Tests: Chi-squared tests, one-sample Kolmogorov-Smirnoff test. Examples for testing goodness of fit in discrete and continouos cases. Testing independency with chi-square test. The measure of the relationship strengths (association measures). Examples.

  6. Tests of homogeneity. Case of two independent samples: Mann-Whitney-test and two-samples Kolmogorov-Smirnoff test. Case of the two paired samples: Wilcoxon test. Case of more than two independent samples: Kruskal-Wallis test. Case of more than two paired samples: Friedman test. Examples. Exact tests with small samples. Sequential test. Wald’s theorem.

  7. Regression. Concept and the properties of the conditional expectation. The conditional expectation is linear in normal case. The theoretical linear regression. Pharametrical regressions, method of the least squares. Two-variables linear regression. The interpretation of coefficients. Coefficient of determination (R-squared coefficient). ANOVA-table. Confidence bands. Residual members. Simplification of the two-parameter, nonlinear regressions into linear case. Polinomial regression. Nonlinear regressions. Levenberg Marquardt method. Nadaraja method.

  8. Multivariate linear regression. Model building techniques: ENTER, STEPWISE, BACWARD, FOREWARD, REMOVE. Partial F-test. Beta coefficients. Multicollinearity. Heteroskedasticity. Sensitivity analysis, avarage matrix, detection of outliers. Multinomial logistic regression. Correlation metrics: total correlations, partial correlations, multiple correlations. The covariance- and correlation matrices.

  9. Factor- and Pricipal Components Analysis. The k-factors model. Decomposotion of the covariance matrix. Kaiser-Meyer-Olkin statistic and Measures Sampling Adequacy. Bartlett's sphericity test. Communalies. Loading matrix. Rotations: varimax, quartimax, equamax. Knee diagram. Principal Components, principal directions. Watanabe theorem.

  10. Multidimensional Scaling (MDS). Metrics, examples. Special features of the Euclidean distance matrix. Creating a point representation for a given Euclidean distance matrix. The Classical MDS, the nonmetrical MDS (RMDS), the weigted MDS. Stress koefficients, RSQ statistic. Examples.

  11. Mathematical Model of Statistical Pattern Recognition. Baesian decission function, Baesian risk. Clustering (learning without teacher). Dinamical methods. K-means and k-medoids methods. McQueen theorem. Hierarchical clustering. Distances of Clusters. Dendogram, the compactness function.

  12. Learning algorithms, learning data. Preparation of the learning data. Discriminant analysis. Separation with hyper-surfaces. Classification- and leave-one-out classification tables. Wilk’s lambda statistic. Nearest Neighbour methods. Rist of this method. L-NN methods. Fast searching of the nearest neighbour. Examples.

  13. Survey. Elements of the questionnaire. Open- and closed questions. Likert-scale, semantic differential. Filling out with interviewer and the self-completion questionnaire. Measuring the reliability of responses. Sampling. Population and sampling frame. Simple random sampling, sistematic method. Stratified sampling. Cluster sampling. Non-random samplings: sampling by expert, snowball sampling. Determination the number of the necessary sample size. The sample size estimation on upon central limit theorem. Estimations on upon Bernstein, Hoeffding and Chernoff inequalities.

  14. Stoshastics processes. Time series. Basic concepts. Stacionarity. Mean value- and variance functions. Autokovariance- and autocorrelation functions (ACF), partial autocovariance function (PACF), cross-correlation function (CCF). Some types of the time series: Markov-, Wiener- Poisson processes. Fields of applications. Forecasting, interpolation, process control. Deterministic models. White noises. Test checking white noise: sign, peak, alternation tests. The portenau test. Testin autocorrelation with Durbin Watson test. Trend analysis. Exponential smoothing. ARIMA models.

9. Method of instruction 4 hours of lectures and laboratories per week.
10. Assessment

Signature: Midterm exam and homework

Final: Oral exam

12. Consultations In office hours or by appointment.
13. References, textbooks and resources John A. Rice, Mathematical Statistics and Data Analysis, Cengage Learning, 2006. ápr. 28. - 688 oldal
Wolfgang Karl Härdle, ‎Zdeněk Hlávka, Multivariate Statistics: Exercises and Solutions, 2015, Springer 
14. Required learning hours and assignment
In class60
Preparation for lectures15
Preparation for laboratories10
Preparation for midterms25
Preparation for the final40
  
Total150
15. Syllabus prepared by Dr. László Ketskeméty, associate professor, Department of Computer Science and Information Theory