System and method for multi-lingual speech recognition

التفاصيل البيبلوغرافية
العنوان: System and method for multi-lingual speech recognition
Patent Number: 7,761,297
تاريخ النشر: July 20, 2010
Appl. No: 10/779764
Application Filed: February 18, 2004
مستخلص: A system for multi-lingual speech recognition. The inventive system includes a speech modeling engine, a speech search engine, and a decision reaction engine. The speech modeling engine receives and transfers a mixed multi-lingual speech signal into speech features. The speech search engine locates and compares candidate data sets. The decision reaction engine selects resulting speech models from the candidate speech models and generates a speech command.
Inventors: Lee, Yun-Wen (Taipei, TW)
Assignees: Delta Electronics, Inc. (Taoyuan Sien, TW)
Claim: 1. A system for multi-lingual speech recognition, comprising: a digital signal processing unit; a speech modeling system, receiving and transferring a mixed multi-lingual speech signal into a plurality of speech features; a multi-lingual baseform mapping engine, comparing a plurality of multi-lingual query commands to obtain a plurality of multi-lingual baseforms; a cross-lingual diphone model generation engine executed by the digital signal processing unit, coupled to the multi-lingual baseform mapping engine, selecting and combining the multi-lingual baseforms, further comprising: fixing left contexts of the multi-lingual baseforms and mapping right contexts of the multi-lingual baseforms to obtain a mapping result; fixing right context and mapping the left contexts of the multi-lingual baseforms to obtain the mapping result if the contexts of the multi-lingual baseforms mapping fails; and obtaining the multi-lingual context-speech mapping data according to the mapping result; storing the multi-lingual context-speech mapping data in a multi-lingual model database; a speech search engine, coupled to the speech modeling engine, receiving the speech features, and locating and comparing a plurality of candidate data sets corresponding to the speech features according to the multi-lingual model database to find match probability of a plurality of candidate speech models of the candidate data sets; and a decision reaction engine, coupled to the speech search engine, selecting a plurality of resulting speech models corresponding to the speech features according to the match probability from the candidate speech models to generate a speech command.
Claim: 2. The system as claimed in claim 1 , wherein the multi-lingual model database comprises a plurality of multi-lingual anti-models.
Claim: 3. The system as claimed in claim 2 , further comprising: at least one uni-lingual anti-model generation engine, receiving a plurality of multi-lingual query commands to generate a plurality of uni-lingual anti-models corresponding to specific languages; and an anti-model combination engine, coupled to the uni-lingual anti-model generation engine, calculating the uni-lingual anti-models to generate the multi-lingual anti-models.
Claim: 4. The system as claimed in claim 1 , wherein the speech search engine locates and compares the candidate data sets, further referring the connecting sequences of the speech features and a speech rule database.
Claim: 5. A method for multi-lingual speech recognition, comprising the steps of: performing the following steps by a digital signal processing system; transferring a mixed multi-lingual speech signal into a plurality of speech features; comparing a plurality of multi-lingual query commands to obtain a plurality of multi-lingual baseforms; selecting and combining the multi-lingual baseforms, comprising: fixing left contexts of the multi-lingual baseforms and mapping right contexts of the multi-lingual baseforms to obtain a mapping result; fixing right context and mapping the left contexts of the multi-lingual baseforms to obtain the mapping result if the right contexts of the multi-lingual baseforms mapping fails; and obtaining the multi-lingual context-speech mapping data according to the mapping result; storing the multi-lingual context-speech mapping data in a multi-lingual model database; locating and comparing a plurality of candidate data sets corresponding to the speech features according to the multi-lingual model database to find match probability of a plurality of candidate speech models of the candidate data sets; and selecting a plurality of resulting speech models corresponding to the speech features from the candidate speech models according to the match probability to generate a speech command.
Claim: 6. The method as claimed in claim 5 , wherein the multi-lingual model database comprises a plurality of multi-lingual anti-models.
Claim: 7. The method as claimed in claim 6 , further comprising the steps of: receiving a plurality of multi-lingual query commands corresponding to specific languages and generate a plurality of uni-lingual anti-models; and combining the uni-lingual anti-models to generate the multi-lingual anti-model.
Claim: 8. The method as claimed in claim 5 , wherein locating and comparison of the candidate data sets further refers the connecting sequences of the speech features and a speech rule database.
Current U.S. Class: 704/252
Patent References Cited: 4882759 November 1989 Bahl et al.
5835888 November 1998 Kanevsky et al.
5897617 April 1999 Collier
6076056 June 2000 Huang et al.
6085160 July 2000 D'hoore et al.
6912499 June 2005 Sabourin et al.
6928404 August 2005 Gopalakrishnan et al.
6999925 February 2006 Fischer et al.
7149688 December 2006 Schalkwyk
7295979 November 2007 Neti et al.
2002/0035469 March 2002 Holzapfel
2002/0040296 April 2002 Kienappel
2004/0088163 May 2004 Schalkwyk












Other References: Waibel, “Interactive Translation of Conversational Speech”, IEEE Computer, Jul. 1996, vol. 29, issue 7, pp. 41-48. cited by examiner
L. Manzara and D.R. Hill, “DEGAS: a System for Rule Based Diphone Speech Synthesis.” ICSLP 92, Banff, Oct. 12-16, 1992. cited by examiner
Building voices in the Festival speech synthesis system. http://festvox.org—Black, Lenzo—2000. cited by examiner
J. Kohler, “Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds,” in Proc. ICASSP, pp. 2195-2198, 1996. cited by examiner
Boeffard, O., Miclet, S. & White, S. (1992). Automatic generation of optimized unit dictionaries for text-to-speech synthesis. Proceedings on the International Conference on Speech and Language Processing'92, Banff, pp. 1211-1214. cited by examiner
T. Holter and T. Svendsen, “Incorporation of linguistic knowledge and automatic baseform generation in acoustic subword unit based speech recognition,” in Proc. Eur. Conf. Speech Communication Technology (EUROSPEECH), 1997, pp. 1159-1162. cited by examiner
T. Holter and T. Svendsen, “Combined optimization of baseforms and model parameters in speech recognition based on acoustic subword units,” in Proc. IEEE Workshop Automatic Speech Recognition, 1997, pp. 199-206. cited by examiner
Schwartz, R., Chow, Y., Kimball, O., Roucos, S., Krasner, M., and Makhoul, J., “Context-Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech”. in: IEEE International Con-ference on Acoustics, Speech, and Signal Processing 1985, pp. 1205-1208. cited by examiner
Glass, J., J. Chang˜ and M. McCandless. (1996) “A Probabilistic Framework for Feature-based Speech Recognition,” Proc. ICSLP 'Y6, Philadelphia, PA, pp. 2277-2280. cited by examiner
Bonaventura, P., Gallocchio, F., Micca, G., 1997. Multilingual speech recognition for flexible vocabularies. In: Proceedings Eurospeech'97, pp. 355-358. cited by examiner
J. J-X Wu et al, “Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese”, Proc. ICSLP96, pp. 2281-2284, Philadelphia, 1996. cited by examiner
L. F. Lamel and J. -L. Gauvain, “Cross-lingual experiments with phone recognition,” in Proc. IEEE ICASSP-93 (Minneapolis, MN), Apr. 1993, pp. 507-510, vol. 2. cited by examiner
Altosaar et al. (1989) “A Knowledge-based Approach to Unlimited Vocabulary Speech Recognition for the Finiish Language,” Proc. Eurospeech-89, pp. 613-616. cited by examiner
Assistant Examiner: Borsetti, Greg A
Primary Examiner: Smits, Talivaldis I
Attorney, Agent or Firm: Birch, Stewart, Kolasch & Birch, LLP
رقم الانضمام: edspgr.07761297
قاعدة البيانات: USPTO Patent Grants