28–29 Jan 2026
Instituto Superior Técnico - Campus Alameda
Europe/Lisbon timezone

Noise isolation and voice isolation for crowds and festivals

29 Jan 2026, 18:15
15m
Departamento de Matemática - PA1 (Instituto Superior Técnico - Campus Alameda)

Departamento de Matemática - PA1

Instituto Superior Técnico - Campus Alameda

Av. Rovisco Pais 1, 1049-001 Lisboa
Workshop 2025/2026

Description

Monaural speech enhancement is the task of improving how clear and understandable speech sounds when it’s recorded in noisy, reverberant environments using only a single microphone. Traditional signal-processing methods depend on stochastic assumptions about how speech and noise behave, which makes them struggle in real-world situations where noise changes over time or the signal-to-noise ratio is low. More recently, deep learning methods have outperformed these approaches by learning rich spectral representations directly from data, performing particularly well when nonstationary noise is present.

This report focuses on frequency-domain speech enhancement methods, with a particular emphasis on deep learning–based techniques. It covers the main building blocks of these systems, including feature extraction, network architectures, training targets, and loss functions. Special attention is given to complex-valued models, especially convolutional recurrent networks, which currently achieve state-of-the-art results. This report sets the basis for a project exploring the state-of-the-art speech enhancement deep learning techniques and their potential applications.

Field of Research/Work Beyond Physics

Author

André Feliciano (Instituto Superior Técnico - University of Lisbon)

Presentation materials