Abstract
Speech enhancement methods have become effective at estimating a clean magnitude spectrum from a noisy speech signal. However, they are much less effective at recovering the noise-free phase. At higher signal-to-noise ratios (SNRs) this is unimportant, but at lower SNRs the noisy phase introduces a perceptible distortion to the enhanced speech that reduces quality and intelligibility. Complex masking methods have addressed this problem to some extent but they report underestimation of the imaginary mask which in turn limits the possible phase correction. This work first analyses the problem of imaginary mask estimation and examines further its effect on both phase and magnitude masking. Second, a CNN-DNN architecture is proposed for complex mask estimation that uses a new loss function aimed at giving errors in the imaginary mask component a greater contribution in model training. Experimental results are presented that consider variations to the loss function and demonstrate that improved speech quality and intelligibility can be achieved.
| Original language | English |
|---|---|
| Pages | 131-135 |
| Number of pages | 5 |
| DOIs | |
| Publication status | Published - 2023 |
| Event | 31st European Signal Processing Conference (EUSIPCO) - Duration: 4 Sept 2023 → 8 Sept 2023 |
Conference
| Conference | 31st European Signal Processing Conference (EUSIPCO) |
|---|---|
| Period | 4/09/23 → 8/09/23 |
Keywords
- Speech enhancement
- complex masking
- loss functions