vissza a tantárgylistához   nyomtatható verzió    

    Computer Vision Systems

    A tantárgy neve magyarul / Name of the subject in Hungarian: Számítógépes látórendszerek

    Last updated: 2025. május 7.

    Budapest University of Technology and Economics
    Faculty of Electrical Engineering and Informatics
    Autonomous Vechile Engineer, MSc
    Course ID Semester Assessment Credit Tantárgyfélév
    VIIIMA19   2/1/0/v 5  
    3. Course coordinator and department Dr. Szemenyei Márton,
    Web page of the course https://deeplearning.iit.bme.hu/oktatas/
    4. Instructors Dr. Márton Szemenyei
    5. Required knowledge Programming, Linear Algebra, Optimization, Signal Processing
    6. Pre-requisites
    Kötelező:
    NEM
    (TárgyEredmény( "BMEVIIIMA07", "jegy" , _ ) >= 2
    VAGY
    TárgyEredmény("BMEVIIIMA07", "FELVETEL", AktualisFelev()) > 0)

    A fenti forma a Neptun sajátja, ezen technikai okokból nem változtattunk.

    A kötelező előtanulmányi rend az adott szak honlapján és képzési programjában található.

    Ajánlott:
    Linear Algebra
    7. Objectives, learning outcomes and obtained knowledge The demand for processing image-based information has been rapidly increasing over the past decades. Examples include industrial quality control, the gaming and entertainment industry, modern imaging diagnostic tools, and more recently, the development of autonomous vehicles and the fight against terrorism. The aim of the course is to familiarize students with the theory and practice of computer-based image processing, object recognition, and comparative analysis. Based on what they learn in the course, students will be able to apply the fundamentals of machine vision (such as image capture, storage, and processing), as well as solve more complex image processing tasks and carry out development work.
    8. Synopsis

    1. Introduction, basic tasks and challenges of computer vision, semantic gap. Fundamentals of image sensing, human vision, photodiode, CCD, CMOS, color vision. Sources of image noise and defects, blurriness, focus, image storage techniques. Role of color components, color spaces. Image enhancement techniques, intensity transformations, histograms, histogram transformations.

    2. Filtering in the image domain, convolution, smoothing, sharpening, and edge detection filters, nonlinear filters. Edge detection, Canny algorithm. Image arithmetic, interpolation techniques, fittings.

    3. Image processing in the frequency domain, 2D Fourier transform, analysis of image spectrum. Filtering in the frequency domain, properties of ideal and other filters. Classification based on spectrum, analysis of periodic noise. DCT, JPEG compression, Wiener deconvolution.

    4. Types and extraction of image features. Template matching, similarity metrics. Corner detection, local structure matrix, KLT, Harris. Invariances to transformations, SIFT, ORB. Classification methods: Haar features, Viola-Jones, Bag of Visual Words, Deformable Parts. Tracking solutions: Pixel-based tracking, Optical Flow, LK and Farneback methods. Iterative and pyramid optical flow. Application of HMM and Kalman Filter, object matching based on affinity.

    5. (Listed twice as 6 in original) Categorization of segmentation methods. Intensity-based segmentation, thresholding, histogram-based methods. Clustering techniques: k-Means, MoG, Mean-shift. Region growing, Split & Merge, SRM. Watershed, graph cuts, motion segmentation.

    6. Processing of binary images, basic morphological operations, opening, closing, contour detection. Distance and adjacency, Jordan property. Skeletonization. Binary object descriptors: Euler number, fingerprint, position, orientation. Object counting and labeling. Hough transform.

    7. Basics of machine learning, structure of learning systems, types of learning. Examples of learning systems, kNN. Neural networks, fundamental learning challenges, overfitting, data quality. Steps of supervised learning. Perceptron model, decision function. Error functions, gradient method, higher-order methods. MLP and backpropagation.

    8. Structure of convolutional networks. Well-known architectures: VGG, Inception, ResNet, DenseNet, EfficientNet. Neural network visualization, adversarial attacks.

    9. Deep learning in practice, ensuring convergence, avoiding overfitting. Hyperparameter search, model compression, pruning and ensembles.

    10. Detection architectures: R-CNN variants, YOLO. Key metrics and databases, anchor-based and anchor-free solutions. Mask and other R-CNN extensions. Segmentation methods: U-Net, upscaling techniques. ASPP and CRF extensions.

    11. Video processing, levels of fusion, 3D convolution. Recurrent architectures: RNN, BPTT, vanishing gradients. LSTM and GRU, soft attention mechanisms. Self-attention and vision transformer solutions.

    12. Basics of projective geometry, types of transformations and their properties. Imaging geometry, pinhole camera model, extrinsic and intrinsic parameters. Camera calibration methods: 3D marker-based and chessboard-based solutions, self-calibration.

    13. Stereo setup, epipolar geometry, essential and fundamental matrix. Stereo calibration, rectification. Concept of disparity and methods for its determination: BM, SGBM, BP. 3D reconstruction and its invariances, practical applications. SLAM and SfM, multi-view reconstruction.

    9. Method of instruction Lectures, programming practicals
    10. Assessment

    During the semester:
    To obtain the course signature, the following requirement must be met:

    1. Summary assessment: Completion of one midterm test with a minimum score of 40%.

    During the exam period:
    Students earn their final grade by completing a written exam. The score from the midterm test contributes 20% to the final exam grade. The final grade is determined based on the following point scale:

    • 0–39%: Fail

    • 40–54%: Pass

    • 55–69%: Satisfactory

    • 70–84%: Good

    • 85–100%: Excellent

    11. Recaps During the semester, students are given the opportunity to retake the main midterm test. However, the main midterm cannot be retaken during the makeup week.
    12. Consultations In person consultation is possible, if needed.
    13. References, textbooks and resources

    1. Lecture notes and slides

    2. John C. Russ, The Image Processing Handbook, CRC Press, 2017, https://doi.org/10.1201/b18983

    3. Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016, https://www.deeplearningbook.org/

    14. Required learning hours and assignment
    Kontakt óra42
    Félévközi készülés órákra10
    Felkészülés zárthelyire45
    Házi feladat elkészítése0
    Kijelölt írásos tananyag elsajátítása20
    Vizsgafelkészülés33
    Összesen
    15. Syllabus prepared by Dr. Szemenyei Márton, docens, IIT