vissza a tantárgylistához   nyomtatható verzió    

    Introduction to Python and Natural Language Technologies

    A tantárgy neve magyarul / Name of the subject in Hungarian: Bevezetés a Python és nyelvtechnológia világába

    Last updated: 2020. június 5.

    Tantárgy lejárati dátuma: 2023. július 31.

    Budapest University of Technology and Economics
    Faculty of Electrical Engineering and Informatics
    Free elective
    Course ID Semester Assessment Credit Tantárgyfélév
    VIAUAV35   2/0/2/V 4  
    3. Course coordinator and department Dr. Gulyás Gábor György,
    4. Instructors

    Név:

    Beosztás:

    Tanszék, Int.:

    Judit Ács

    junior lecturer

    Department of Automation and Applied Informatics

    Ádám Kovács

    PhD student

    Department of Automation and Applied Informatics

    Kinga Gémes

    PhD student

    Department of Automation and Applied Informatics

    Gábor György Gulyás Dr.

    associate researcher

    Department of Automation and Applied Informatics


    5. Required knowledge

    Basics of Object-Oriented Programming, basics of software development


    6. Pre-requisites
    Ajánlott:

    None

    7. Objectives, learning outcomes and obtained knowledge

    The aim of this course is to provide students with an overview of the theory and practice of current natural language processing (NLP) technologies, while also allowing them to gain hands-on experience in a popular, high-level programming language. Students shall not only get acquainted with all major fields of NLP, they will also be expected to implement simple solutions for each level of language processing.


    8. Synopsis

     

    Lecture Topic

    Lab Topic

    1

    Introduction: what is natural language processing, typical applications, history, major areas of NLP/CL, relationship to computer science and linguistics

    Setting up - using the provided VMs, git repository, basic NLP tools

    2

    Introduction to Python, basic syntax, built-in types, operators, functions, file manipulation

    Using Jupyter. Writing simple functions. Handling text files.

    3

    Built-in types in detail (list, set, dict), immutability. Advanced string manipulation, encodings, Regular expressions.

    Typical string manipulation exercises. Writing a simple parser with regular expressions.

    4

    Object-oriented Python, properties, static methods, class methods, magic methods, operator overloading. Iterators, generators. Context managers.

    Complex OOP exercise. Writing our own iterator and context manager.

    5

    Decorators. Functional programming in Python. Writing command line applications in Python. Using Linux command line applications for text processing. IO redirection, pipelines.

    Writing a simple command line application. Using it in the terminal, interacting with built-in Linux commands via pipes.

    6

    Scientific Python. Numpy, scipy. Basic matrix operations. Sparse matrices.

    Matrix manipulation. Working with large sparce matrices.

    7

    Data science. Handling text data. Basic shell commands. Pandas.

    Handling text data exercises. Data cleaning. Pandas exercises.

    8

    Deep learning for NLP. Feed forward neural networks, recurrent neural networks. LSTM, GRU.

    PyTorch basics. Defining neural networks, training and evaluation loops

    9

    Textual sequence modeling: sequence labeling and classification, sequence-to-sequence models. Attention.

    Sequence modeling in PyTorch.

    10

    Language modeling. Word vectors. Contextualized language models. Transformers. BERT.

    Exploring pretrained models: word2vec, GloVe, fastText, BERT, ELMo

    11

    Dependency parsing. Universal dependencies.

    Working with Universal Dependencies. Multilingual models and problems.

    12

    NLP applications I.: neural machine translation, sentiment analysis, question answering, summarization, dialogue.

    NLP in practice. Using pretrained models for machine translation, question answering etc.

    13

    NLP applications II.: knowledge bases.

    NLP in practice II. Homework consultation.

    14

    Practical session

    Practical session

    9. Method of instruction

    Lecture (2x45 min. / week) and Laboratory (2x45 min. / week)


    10. Assessment

     

    A. 3 homework assignments during the term

    B. Oral exam

     To pass, students must receive a passing grade (>=2) for each of the three homework assignments and at the oral exam. The exam grade and the average of the three homework grades both count toward 50% of the final grade.


     

    11. Recaps

    Late submissions of homework assignments can be submitted late until the repeat period in accordance with the Code of Studies and Exams.


    12. Consultations

    Arranged on demand with instructors.

    13. References, textbooks and resources

     

    -          Jurafsky, D., & Martin, J. H. (2014). Speech and language processing. Pearson.

    -          Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge: MIT press.


     

    14. Required learning hours and assignment

    Contact class

    56

    Preparation for classes during the semester

    4

    Preparation for the midterm exam

    0

    Preparing homework

    30

    Preparation for the final exam

    30

    Total

    120

    15. Syllabus prepared by

    Name:

    Position:

    Department:

    Judit Ács

    junior lecturer

    Department of Automation and Applied Informatics

    Ádám Kovács

    PhD student

    Department of Automation and Applied Informatics

    Kinga Gémes

    PhD student

    Department of Automation and Applied Informatics

    Gábor György Gulyás Dr.

    associate researcher

    Department of Automation and Applied Informatics