语音识别外文文献翻译

资源ID：131273 资源大小：66KB 全文页数：14页
资源格式： DOC 下载积分：100金币

快捷下载

账号登录下载

三方登录下载：

下载资源需要100金币

邮箱/手机：
温馨提示：	快捷下载时，用户名和密码都是您填写的邮箱或者手机号，方便查询和重复下载（系统自动生成）。如填写123，账号就是123，密码也是123。
支付方式：
验证码：	换一换

账号：
密码：
验证码：	换一换
当日自动登录忘记密码？

友情提示

1、下载资料失败解决办法

2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。

3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。

4、本站资源下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

语音识别外文文献翻译

1、Speech Recognition Victor Zue, Ron Cole, & Wayne Ward MIT Laboratory for Computer Science, Cambridge, Massachusetts, USA Oregon Graduate Institute of Science & Technology, Portland, Oregon, USA Carnegie Mellon University, Pittsburgh, Pennsylvania, USA 1 Defining the Problem Speech recognition is the

2、 process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processin

3、g in order to achieve speech understanding, a subject covered in section. Speech recognition systems can be characterized by many parameters, some of the more important of which are shown in Figure. An isolated-word speech recognition system requires that the speaker pause briefly between words, whe

4、reas a continuous speech recognition system does not. Spontaneous, or extemporaneously generated, speech contains disfluencies, and is much more difficult to recognize than speech read from script. Some systems require speaker enrollment-a user must provide samples of his or her speech before using

5、them, whereas other systems are said to be speaker-independent, in that no enrollment is necessary. Some of the other parameters depend on the specific task. Recognition is generally more difficult when vocabularies are large or have many similar-sounding words. When speech is produced in a sequence

6、 of words, language models or artificial grammars are used to restrict the combination of words. The simplest language model can be specified as a finite-state network, where the permissible words following each word are given explicitly. More general language models approximating natural language a

7、re specified in terms of a context-sensitive grammar. One popular measure of the difficulty of the task, combining the vocabulary size and the 1 language model, is perplexity, loosely defined as the geometric mean of the number of words that can follow a word after the language model has been applie

8、d (see section for a discussion of language modeling in general and perplexity in particular). Finally, there are some external parameters that can affect speech recognition system performance, including the characteristics of the environmental noise and the type and the placement of the microphone.

9、 Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear. These phonetic vari

10、abilities are exemplified by the acoustic differences of the phoneme， At word boundaries, contextual variations can be quite dramatic-making gas shortage sound like gash shortage in American English, and devo andare sound like devandare in Italian. Second, acoustic variabilities can result from chan

11、ges in the environment as well as in the position and characteristics of the transducer. Third, within-speaker variabilities can result from changes in the speakers physical and emotional state, speaking rate, or voice quality. Finally, differences in sociolinguistic background, dialect, and vocal t

12、ract size and shape can contribute to across-speaker variabilities. Figure shows the major components of a typical speech recognition system. The digitized speech signal is first transformed into a set of useful measurements or features at a fixed rate, 2 typically once every 10-20 msec (see section

13、sand 11.3 for signal representation and digital signal processing, respectively). These measurements are then used to search for the most likely word candidate, making use of constraints imposed by the acoustic, lexical, and language models. Throughout this process, training data are used to determi

14、ne the values of the model parameters. Speech recognition systems attempt to model the sources of variability described above in several ways. At the level of signal representation, researchers have developed representations that emphasize perceptually important speaker-independent features of the s

15、ignal, and de-emphasize speaker-dependent characteristics. At the acoustic phonetic level, speaker variability is typically modeled using statistical techniques applied to large amounts of data. Speaker adaptation algorithms have also been developed that adapt speaker-independent acoustic models to

16、those of the current speaker during system use, (see section). Effects of linguistic context at the acoustic phonetic level are typically handled by training separate models for phonemes in different contexts； this is called context dependent acoustic modeling. Word level variability can be handled

17、by allowing alternate pronunciations of words in representations known as pronunciation networks. Common alternate pronunciations of words, as well as effects of dialect and accent are handled by allowing search algorithms to find alternate paths of phonemes through these networks. Statistical langu

18、age models, based on estimates of the frequency of occurrence of word sequences, are often used to guide the search through the most probable sequence of words. The dominant recognition paradigm in the past fifteen years is known as hidden Markov models (HMM). An HMM is a doubly stochastic model, in

19、 which the generation of the underlying phoneme string and the frame-by-frame, surface acoustic realizations are both represented probabilistically as Markov processes, as discussed in sections,and 11.2. Neural networks have also been used to estimate the frame based scores； these scores are then in

20、tegrated into HMM-based system architectures, in what has come to be known as hybrid systems, as described in section 11.5. An interesting feature of frame-based HMM systems is that speech segments are identified during the search process, rather than explicitly. An alternate approach is to first id

21、entify speech segments, then classify the segments and use the segment scores to recognize words. This approach has produced competitive recognition performance in several tasks. 2 State of the Art Comments about the state-of-the-art need to be made in the context of specific applications which reflect the constraints on the task. Moreover, different technologies are sometimes appropriate for different tasks. For example, when the vocabulary is

注意事项: 本文（语音识别外文文献翻译）为本站会员（泛舟）主动上传，毕设资料网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请联系网站客服QQ：540560583，我们立即给予删除！