Audio Search for Spoken Documents Content-Based Retrieval
【摘要】：正The amount of digital audio and video documents being shot and stored in large archives is growing faster than ever before. However, lack of effective content searching methods is a major barrier that prevents people from operating audio and video databases pervasively and intelligently. This paper reviews the state-of-the-art in audio information retrieval and presents an audio searching solution that uses statistical pattern matching techniques. The two-tier speech retrieval solution proposed in this paper, named Audio Search, is composed of a fast searching component followed by an ends-free Viterbi keyword spotter combined with a histogram-based phoneme duration model. Its performances have been evaluated on the 1997 HUB4 database for Chinese broadcast news. In comparison with the results of large vocabulary automatic speech recognition based systems, the results we obtained with the Audio Search system show significant improvements for both long queries (typically 5 to 7 syllables) and out-of-vocabulary (OOV) queries.