Feature Space Mahalanobis Sequence Kernels: Application to SVM Speaker Verification

التفاصيل البيبلوغرافية
العنوان: Feature Space Mahalanobis Sequence Kernels: Application to SVM Speaker Verification
المؤلفون: Jérôme Louradour, K. Daoudi, Francis Bach
المساهمون: Geometry and Statistics in acquisition data (GeoStat), Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
المصدر: IEEE Transactions on Audio, Speech and Language Processing
IEEE Transactions on Audio, Speech and Language Processing, 2007, 15 (8), pp.2465--2475
بيانات النشر: Institute of Electrical and Electronics Engineers (IEEE), 2007.
سنة النشر: 2007
مصطلحات موضوعية: Mahalanobis distance, Acoustics and Ultrasonics, business.industry, Pattern recognition, Speaker recognition, Linear discriminant analysis, Support vector machine, symbols.namesake, Kernel method, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, symbols, Degree of a polynomial, Artificial intelligence, Electrical and Electronic Engineering, business, [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing, Gaussian process, ComputingMilieux_MISCELLANEOUS, Kernel (category theory), Mathematics
الوصف: The generalized linear discriminant sequence (GLDS) kernel has been shown to provide very good performance and efficiency at the NIST Speaker Recognition Evaluations (SRE) in the last few years. This kernel is based on an explicit map of polynomial expansions of input frames which, because of practical limitations, have to be of a degree less or equal to three. In this paper, we consider an extension of the GLDS kernel to allow not only any polynomial degree but also any embedding, including infinite-dimensional ones associated with Mercer kernels (such as Gaussian kernels). It turns out that the resulting kernels belong to the family of posterior covariance kernels. However, their exact ldquokernelizedrdquo form involves the computation of the Gram matrix on background data, and may be intractable when the background corpus is very large (which is the case in speaker verification). To overcome this problem, we use a low-rank approximation of the Gram matrix to provide an approximate but tractable form of these kernels. We then present comparative experiments on NIST SRE 2005. The results show that our sequence kernel outperforms the GLDS one, and gives similar (individual) performances to the traditional universal background model-Gaussiam mixture model (UBM-GMM) system. As expected, the fusion of both improves the scores.
تدمد: 1558-7916
DOI: 10.1109/tasl.2007.905147
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::36271bbc8a0173890a0f009e212659cf
https://doi.org/10.1109/tasl.2007.905147
Rights: CLOSED
رقم الانضمام: edsair.doi.dedup.....36271bbc8a0173890a0f009e212659cf
قاعدة البيانات: OpenAIRE
الوصف
تدمد:15587916
DOI:10.1109/tasl.2007.905147