Welcome to roadip.com on July 10 2009.
This is an internet experiment running to monitor browsing habbits of individuals through wikipedia contents.

Speech corpus

From Wikipedia, the free encyclopedia

Jump to: navigation, search

A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions in a format that can be used to create acoustic models (which can then be used with a speech recognition engine).

A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of Speech Corpora:

  • (1) Read Speech - which includes:
  • Book excerpts
  • Broadcast news
  • Lists of words
  • Sequences of numbers
  • (2) Spontaneous Speech - which includes:
  • Dialogs - between two or more people (includes meetings);
  • Narratives - a person telling a story (one such corpus is the Buckeye Corpus);
  • Map-tasks - one person explains a route on a map to another;
  • Appointment-tasks - two people try to find a common meeting time based on individual schedules.

A special kind of speech corpora are non-native speech databases that contain speech with foreign accent.

[edit] External links

Personal tools

Visit joltnews for the latest headlines
Visit bloit.com for company information
Geed Media does computer consulting on long island.
This page viewed times. See Logs