Difference between revisions of "Datasets"

From rosp
(Automatic speech recognition)
(Automatic speech recognition)
Line 5: Line 5:
 
'''CHiME Challenge (2011)'''
 
'''CHiME Challenge (2011)'''
  
Distorted version of the small vocabulary [http://spandh.dcs.shef.ac.uk/gridcorpus/ GRID audio-visual corpus] (audio only). Binaural recordings with speaker situated in front of the microphones, additive noises impinging from different directions. Household environment, see
+
Artificially distorted version of the small vocabulary [http://spandh.dcs.shef.ac.uk/gridcorpus/ GRID audio-visual corpus] (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions, see
  
 
:Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.  
 
:Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.  
Line 15: Line 15:
 
''Resources''
 
''Resources''
  
*Training recipe for HTK [http://spandh.dcs.shef.ac.uk/projects/chime/PCC/evaluation.html here]
+
*Training recipe for HTK [http://spandh.dcs.shef.ac.uk/projects/chime/PCC/evaluation.html here].
  
 
''Baselines''
 
''Baselines''
  
* See the paper above for the challenge results
+
* See the paper above for a summary of the challenge results.
  
  
 
'''AURORA 4 (2002)'''   
 
'''AURORA 4 (2002)'''   
  
Distorted version of the 5K word Wall Street Journal corpus ([http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC93S6A WSJ0]). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone available, see
+
Artificially distorted version of the 5K word Wall Street Journal corpus ([http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC93S6A WSJ0]). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone available, see
  
 
:Günter Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.
 
:Günter Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.

Revision as of 21:34, 25 August 2013

This page provides a number of datasets grouped by application, as well as research results (papers, numerical results, output transcriptions, intermediary data, etc) corresponding to each dataset.

Automatic speech recognition

CHiME Challenge (2011)

Artificially distorted version of the small vocabulary GRID audio-visual corpus (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions, see

Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.

Available from Computer Speech and Language here

Corpus available here (no cost)

Resources

  • Training recipe for HTK here.

Baselines

  • See the paper above for a summary of the challenge results.


AURORA 4 (2002)

Artificially distorted version of the 5K word Wall Street Journal corpus (WSJ0). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone available, see

Günter Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.

Corpora available from ELRA here and here

Resources

  • Training recipe for HTK available here. Note that this recipe is for Wall-Street Journal (WSJ0), which is the clean speech version of AURORA4. Small changes are needed in the feature extraction scripts to account for different file terminations.

Baselines

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute a dataset

To contribute a new dataset, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your dataset (the datasets are ordered by year of collection)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the dataset and year of collection
  • authors, institution, contact information
  • link to the dataset and to side resources (lexicon, language model, etc)
  • short description (nature of the data, license, etc) and link to a paper/report describing the dataset, if any
  • at least 1 research result obtained for this dataset (see below)

We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.

Contribute a research result

To contribute a new research result, please

  • create an account and login
  • go to the wiki page and the section corresponding to the dataset for which this result was obtained
  • click on the "Edit" link on the right of the section header and add a new item for your result
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • authors, paper/report title, means of publication
  • link to the pdf of the paper
  • link to derived data (output transcriptions, intermediary data, etc)

In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your instutitutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).

We currently cannot provide storage space for large datasets. Please upload the derived data at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.