Course Overview
This course is focused on understanding a variety of ways to represent human language as computational systems and how to exploit those representations to develop programs for translation, summarization, extracting information, question answering, natural interfaces to databases, and conversational agents. This course will include concepts central to Machine Learning (discrete classification, probability models) and to Linguistics (morphology, syntax, semantics). Students will learn computational treatments of words, sounds, sentences, meanings, and conversations. Students will understand how probabilities and real-world text data can help. The course covers some high-level formalisms (e.g., regular expressions) and tools (e.g., Python) that can greatly simplify prototype implementation. Students will learn techniques to address the social impact of natural language processing, such as demographic bias, exclusion, and overgeneralization.