Vocabulary partitioned speech recognition apparatus

التفاصيل البيبلوغرافية
العنوان: Vocabulary partitioned speech recognition apparatus
Patent Number: 5,136,654
تاريخ النشر: August 04, 1992
Appl. No: 07/424,139
Application Filed: October 19, 1989
مستخلص: The speech recognition system disclosed herein operates to select, from a collection of tokens which represent vocabulary words, those tokens which most closely match an unknown spoken word. The collection of tokens is divided into partitions, each of which is characterized or identified by a representative one of the tokens. Both the tokens and the unknown speech word are represented by a sequence of standard data frames which may, for example, define characteristic spectra. In operation, the system computes the distance from the unknown to each of the representative tokens and then, starting with the partition having the nearest representative token and proceeding through partitions represented by successively more distant tokens, examines the other tokens in that partition while keeping a list of predetermined length identifying the examined tokens which thus far provide the best match. This process is continued until the number of distance calculations performed reaches a preselected level.
Inventors: Ganong, III, William F. (Brookline, MA); Bauer, William F. (Belmont, MA); Sevush, Daniel (Hopkinton, MA); Rosnow, Harley M. (Cambridge, MA)
Assignees: Kurzweil Applied Intelligence, Inc. (Waltham, MA)
Claim: What is claimed is
Claim: 1. In a large vocabulary speech recognition system, a method of partitioning a collection of data tokens which represent the system's entire vocabulary and of identifying tokens to be representative of each partition to facilitate the subsequent selection of tokens which most closely match a spoken word which is to be recognized, said method comprising
Claim: examining said tokens sequentially and, for each token, determining if said token matches any previously identified representative within a preselected tolerance and, if it does not, identifying said token as a new representative; and
Claim: assigning each token which is not identified as a representative token to a partition corresponding to a representative token which is within said preselected tolerance irrespective of whether the non-representative token corresponds to a different vocabulary word than the representative token thereby to form partitions encompassing different vocabulary words.
Claim: 2. The method according to claim 1 further comprising the step of determining the number of partitions created and, if the number of partitions created exceeds a selected value, increasing said preselected tolerance and repeating the procedures of claim 1.
Claim: 3. The method according to claim 1 further comprising the step of examining all partitions created and, for partitions containing fewer than a preselected number of tokens, deleting those partitions and assigning the tokens which were in those deleted partitions to a common partition.
Claim: 4. The method according to claim 1 further comprising the step of examining all partitions created and, for partitions containing no tokens other than a representative token, deleting those partitions and assigning the tokens which were in those deleted partitions to a common partition.
Claim: 5. In a large vocabulary speech recognition system, a method of selecting those data tokens which most closely match an unknown spoken word from a collection of tokens which are assigned to a plurality of partitions, the various partitions being characterized by respective representative tokens, all tokens and the unknown spoken word being characterized by both a coarse sequence of vectors and a fine sequence of vectors, said method comprising
Claim: using only the respective coarse sequences of vectors determining a coarse vector distance between the unknown spoken word and each representative;
Claim: starting with the nearest and proceeding through successively more distant representatives as measured by the respective said coarse vector distances, examining the tokens within the respective partition and, using only the respective coarse sequences of vectors, determining a coarse vector distance between the unknown and each token within said respective partition while maintaining a list of preselected size identifying those examined tokens which are least distant from the unknown as measured by the respective said coarse vector distances, said examining being continued until the number of distance determinations reaches a preselected level; and
Claim: using the respective fine sequences of vectors, determining a fine vector distance between the unknown and each token identified in said list thereby to enable the tokens in said list to be ranked on the basis of the respective fine vector distances for output from the system.
Claim: 6. The method of claim 5 wherein said distance calculations are performed according to a numerical system providing a maximum possible distance and wherein the tokens comprising said list are identified by pointers contained in a table having entries corresponding in number to the value of the maximum possible distance between an unknown and any
Claim: 7. In a large vocabulary speech recognition system, a method of partitioning a collection of data tokens which represent the system's entire vocabulary and of subsequently selecting tokens which most closely match an unknown spoken word which is to be recognized, said method comprising
Claim: examining said tokens sequentially and, for each token, determining if said token matches any previously identified representative within a preselected tolerance and, if it does not, identifying said token as a new representative; assigning each token which is not identified as a representative token to a partition corresponding to a representative token which is within said preselected tolerance irrespective of whether the non-representative token corresponds to a different vocabulary word than the representative token thereby to form partitions encompassing different vocabulary words;
Claim: computing the distance between the unknown spoken word and each representative; and
Claim: starting with the nearest representative and proceeding through successively more distance representatives, computing the distance between the unknown spoken word and each token within the respective partition until the number of distance computations reaches a preselected value, thereby to identify a number of tokens to be output together with respective distance measurements enabling the identified tokens to be ranked as to closeness of match with said unknown spoken word.
Claim: 8. The method according to claim 7 further comprising the step of determining the number of partitions created and, if the number of partitions created exceeds a selected value, increasing said preselected tolerance and repeating the procedures of claim 7.
Claim: 9. The method according to claim 7 further comprising the step of examining all partitions created and, for partitions containing fewer than a preselected number of tokens, deleting those partitions and assigning the tokens which were in those deleted partitions to a common partition.
Claim: 10. The method of claim 7 further comprising maintaining a list of preselected size containing the tokens which are least distance from the unknown.
Claim: 11. In a large vocabulary speech recognition system, a method of partitioning a collection of data tokens which represent the system's entire vocabulary and subsequently selecting tokens which most closely match an unknown spoken word which is to be recognized, said method comprising
Claim: for each token in sequence, determining if said token matches any previously identified representative within a preselected tolerance and, if it does not, identifying said token as a new representative;
Claim: assigning each token which is not identified as a representative token to a partition corresponding to a representative token which is within said preselected tolerance irrespective of whether the non-representative token corresponds to a different vocabulary word than the respective token thereby to form partitions encompassing different vocabulary words;
Claim: combining all partitions which contain no tokens other than a representative token into a common partition;
Claim: computing the distance between the unknown spoken word and each token in the common partition;
Claim: computing the distance between the unknown and each representative; and
Claim: starting with the nearest representative as determined by the ranking of the respective computed distances and proceeding through successively more distance representative, computing the distance between the unknown spoken word and each token within the respective partition until the number of distance computations reaches a preselected value thereby to identify a number of tokens to be output together with respective distance measurements enabling the identified tokens to be ranked as to closeness of match with said unknown spoken word.
Claim: 12. In a large vocabulary speech recognition system, a method of selecting those data tokens which most closely match an unknown spoken word from a collection of tokens which are assigned to a plurality of partitions including a common partition and a group of other partitions which are characterized by respective representative tokens and which can encompass tokens corresponding to different vocabulary words, said method comprising
Claim: starting with the nearest representative as determined by the ranking of the respective computed distances and proceeding through successively more distant representative, computing the distance between the unknown spoken word and each token within the respective partition until the number of distance computations reaches a preselected value thereby to identify a number of tokens to be output together with respective distance measurements enabling the identified tokens to be ranked as to closeness of match with said unknown spoken word.
Claim: 13. The method of claim 12 further comprising maintaining a list of preselected size containing the tokens which are least distant from the unknown thereby to limit the number of distances which must be ranked.
Claim: 14. In a speech recognition system, a method of selecting from a collection of data tokens those tokens which most closely represent an unknown spoken word according to a numeric vector distance calculation which provides a predetermined number of distance values, said method comprising
Claim: initializing a table having ordered entries corresponding in number to the values of the possible distances between an unknown spoken word and any token, each entry being active when it contains a pointer to a linked list of tokens;
Claim: placing in the last entry in said table a pointer to a linked list of selected length;
Claim: computing the distance between the unknown spoken word and a succession of candidate tokens;
Claim: for each computed distance which is less than that corresponding to the lowest active entry in said table, deleting an element from the linked list corresponding to the lowest active entry and inserting a linked list element corresponding to the computed distance
Claim: whereby, after the succession of candidate tokens is scanned, the entries in the table will identify those candidate tokens which are closest to the unknown, the number of thereby identified candidate tokens being equal to said selected length.
Current U.S. Class: 381/41; 381/43
Current International Class: G10L 500
Patent References Cited: 4181821 January 1980 Pirz et al.
4601054 July 1986 Watari et al.
4715004 December 1987 Kabasawa et al.
4797929 January 1989 Gerson et al.
4799262 January 1989 Feldman et al.
4802224 January 1989 Shiraki et al.
4837831 June 1989 Gillick et al.
Primary Examiner: Shaw, Dale M.
Assistant Examiner: Doerrler, Michelle
Attorney, Agent or Firm: Pahl, Jr., Henry D.
رقم الانضمام: edspgr.05136654
قاعدة البيانات: USPTO Patent Grants