Citation
S. S. Kajarekar and A. Stolcke, “NAP and WCCN: Comparison of Approaches using MLLR-SVM Speaker Verification System,” 2007 IEEE International Conference on Acoustics, Speech and Signal Processing – ICASSP ’07, 2007, pp. IV-249-IV-252, doi: 10.1109/ICASSP.2007.366896.
Abstract
We compare two recently proposed techniques, within class covariance normalization (WCCN) [1] and nuisance attribute projection (NAP) [2], for intersession variability compensation in speaker verification. The comparison is performed using an MLLR-SVM speaker verification system. Both techniques model intersession variability using a within-speaker covariance matrix (WSCM). However, they manipulate eigenvectors of this matrix differently. We compare them on the 2005 and 2006 NIST speaker recognition evaluation (SRE) task. Results show that WCCN is more sensitive to the choice of background speakers and NAP is more sensitive to the choice of data for WSCM estimation. WCCN gives the best performance on 2005 SRE. On 2006 SRE, both techniques give similar performance under matched conditions. Further experiments with a simple combination of these techniques show slight improvements in the best performance of either technique. Overall results show that an MLLR-SVM system with either NAP or WCCN performs comparably to the best single systems in the 2006 NIST SRE.
Index Terms: Speaker recognition, Intersession variability, MLLR transforms, SVM