Difference between revisions of "Datasets"
m |
m |
||
Line 41: | Line 41: | ||
|1994 | |1994 | ||
|meeting | |meeting | ||
− | |0.6 | + | |{{no|0.6}} |
|48 | |48 | ||
|3 (distant) | |3 (distant) | ||
Line 68: | Line 68: | ||
|1996 | |1996 | ||
|conversation | |conversation | ||
− | |1.4 | + | |{{some|1.4}} |
|16 | |16 | ||
|4 (distant) | |4 (distant) | ||
Line 95: | Line 95: | ||
|1996 - 1997 | |1996 - 1997 | ||
|conversation | |conversation | ||
− | |10 | + | |{{yes|10}} |
|16 | |16 | ||
|2 (close but cross-talk) | |2 (close but cross-talk) | ||
Line 122: | Line 122: | ||
|2000 | |2000 | ||
|public spaces | |public spaces | ||
− | |33 | + | |{{yes|33}} |
|8 - 16 | |8 - 16 | ||
|1 (close) | |1 (close) | ||
Line 149: | Line 149: | ||
|2000 - 2001 | |2000 - 2001 | ||
|military | |military | ||
− | |38 | + | |{{yes|38}} |
|16 | |16 | ||
|2 (close) | |2 (close) | ||
Line 203: | Line 203: | ||
|2001 | |2001 | ||
|meeting | |meeting | ||
− | |3.5 | + | |{{some|3.5}} |
|16 - 48 | |16 - 48 | ||
|1 (distant) | |1 (distant) | ||
Line 311: | Line 311: | ||
|2002 | |2002 | ||
|seminar | |seminar | ||
− | |47 | + | |{{yes|47}} |
|16 | |16 | ||
|1 (distant) | |1 (distant) | ||
Line 338: | Line 338: | ||
|2002 | |2002 | ||
|cocktail party | |cocktail party | ||
− | |3 | + | |{{some|3}} |
|44 | |44 | ||
|1 (distant) | |1 (distant) | ||
Line 365: | Line 365: | ||
|2002 - 2011 | |2002 - 2011 | ||
|car | |car | ||
− | |286 | + | |{{yes|286}} |
|44 | |44 | ||
|6 to 8 (distant) | |6 to 8 (distant) | ||
Line 419: | Line 419: | ||
|2004 | |2004 | ||
|car | |car | ||
− | |29 | + | |{{yes|29}} |
|16 | |16 | ||
|7 (distant) | |7 (distant) | ||
Line 446: | Line 446: | ||
|2004 | |2004 | ||
|meeting | |meeting | ||
− | |1.5 | + | |{{some|1.5}} |
|16 | |16 | ||
|16 (distant) | |16 (distant) | ||
Line 473: | Line 473: | ||
|2004 | |2004 | ||
|meeting | |meeting | ||
− | |72 | + | |{{yes|72}} |
|16 | |16 | ||
|6 (distant) | |6 (distant) | ||
Line 500: | Line 500: | ||
|2004 | |2004 | ||
|meeting | |meeting | ||
− | |15 | + | |{{yes||15}} |
|16 | |16 | ||
|7 (distant) | |7 (distant) | ||
Line 527: | Line 527: | ||
|2004 - 2007 | |2004 - 2007 | ||
|seminar, meeting | |seminar, meeting | ||
− | |60 | + | |{{yes|60}} |
|44 | |44 | ||
|79 to 147 (distant) | |79 to 147 (distant) | ||
Line 662: | Line 662: | ||
|2006 | |2006 | ||
|meeting | |meeting | ||
− | |100 | + | |{{yes|100}} |
|16 | |16 | ||
|16 (distant) | |16 (distant) | ||
Line 689: | Line 689: | ||
|2006 | |2006 | ||
|cocktail party | |cocktail party | ||
− | |8.8 | + | |{{some|8.8}} |
|25 | |25 | ||
|1 (mixing console) | |1 (mixing console) | ||
Line 716: | Line 716: | ||
|2007 | |2007 | ||
|airplane | |airplane | ||
− | |21 | + | |{{yes|21}} |
|16 | |16 | ||
|1 (close) | |1 (close) | ||
Line 743: | Line 743: | ||
|2007 | |2007 | ||
|car | |car | ||
− | |40 | + | |{{yes|40}} |
|25 | |25 | ||
|5 (distant) | |5 (distant) | ||
Line 770: | Line 770: | ||
|2007 - 2011 | |2007 - 2011 | ||
|cocktail party | |cocktail party | ||
− | |0.3 | + | |{{no|0.3}} |
|16 | |16 | ||
|2 (distant) | |2 (distant) | ||
Line 797: | Line 797: | ||
|2007 - 2014 | |2007 - 2014 | ||
|cocktail party | |cocktail party | ||
− | |10 | + | |{{yes|10}} |
|16 | |16 | ||
|8 to 40 (distant) | |8 to 40 (distant) | ||
Line 878: | Line 878: | ||
|2008 | |2008 | ||
|domestic | |domestic | ||
− | |6 | + | |{{some|6}} |
|48 | |48 | ||
|16 (distant) | |16 (distant) | ||
Line 905: | Line 905: | ||
|2008 | |2008 | ||
|cocktail party | |cocktail party | ||
− | |1.9 | + | |{{some|1.9}} |
|16 | |16 | ||
|2 (distant) | |2 (distant) | ||
Line 932: | Line 932: | ||
|2009 | |2009 | ||
|conversation | |conversation | ||
− | |38 | + | |{{yes|38}} |
|48 | |48 | ||
|20 (distant) | |20 (distant) | ||
Line 959: | Line 959: | ||
|2010 | |2010 | ||
|public spaces | |public spaces | ||
− | |0.3 | + | |{{no|0.3}} |
|16 | |16 | ||
|2 to 4 (distant) | |2 to 4 (distant) | ||
Line 986: | Line 986: | ||
|2010 - 2011 | |2010 - 2011 | ||
|cocktail party | |cocktail party | ||
− | |0.2 | + | |{{no|0.2}} |
|16 | |16 | ||
|2 to 4 (distant) | |2 to 4 (distant) | ||
Line 1,013: | Line 1,013: | ||
|2011 - 2012 | |2011 - 2012 | ||
|domestic | |domestic | ||
− | |70 | + | |{{yes|70}} |
|16 | |16 | ||
|2 (distant) | |2 (distant) | ||
Line 1,040: | Line 1,040: | ||
|2012 | |2012 | ||
|domestic | |domestic | ||
− | |78 | + | |{{yes|78}} |
|16 | |16 | ||
|2 (distant) | |2 (distant) | ||
Line 1,067: | Line 1,067: | ||
|2012 | |2012 | ||
|TV/radio debates, outdoor interviews... | |TV/radio debates, outdoor interviews... | ||
− | |42 | + | |{{yes|42}} |
|16 | |16 | ||
|1 (mixing console) | |1 (mixing console) | ||
Line 1,094: | Line 1,094: | ||
|2013 | |2013 | ||
|TV conversation | |TV conversation | ||
− | |120 | + | |{{yes|120}} |
|16 | |16 | ||
|1 (mixing console) | |1 (mixing console) | ||
Line 1,121: | Line 1,121: | ||
|2013 | |2013 | ||
|TV conversation | |TV conversation | ||
− | |251 | + | |{{yes|251}} |
|16 | |16 | ||
|1 (mixing console) | |1 (mixing console) | ||
Line 1,148: | Line 1,148: | ||
|2013 | |2013 | ||
|domestic, office | |domestic, office | ||
− | |25 | + | |{{yes|25}} |
|16 | |16 | ||
|8 (distant) | |8 (distant) | ||
Line 1,175: | Line 1,175: | ||
|2014 | |2014 | ||
|domestic | |domestic | ||
− | |3.8 | + | |{{some|3.8}} |
|48 | |48 | ||
|40 (distant) | |40 (distant) |
Revision as of 17:19, 8 August 2014
This page aims to provide a list of datasets with detailed attributes and links to corresponding research results (papers, numerical results, output transcriptions, intermediary data, etc). Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation...
Disclaimer: Only publicly available datasets with a total duration longer than 5 min are listed.
Datasets | General attributes | Speech | Channel | Noise | Ground truth | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release | scenario | total duration (h) | sampling rate (kHz) | degraded channels | cameras | cost (acad) | links | speech duration (h) | unique speakers | language | unique words (k) | speaking style | simultaneous speakers | speaker overlap | channel type | radiation | speaker location | speaker movements | noise type | speech signal | speaker location, orientation | words | nonverbal traits | noise events | |
ShATR | 1994 | meeting | 0.6 | 48 | 3 (distant) | no | free | download email paper | 0.6 | 5 | UK English | 1 | colloquial | 5 | multiple conversations | reverb | human | quasi-fixed | head | meeting | headset | yes | yes | no | yes |
LLSEC | 1996 | conversation | 1.4 | 16 | 4 (distant) | no | free | download email | ? | 12 | N/S | N/S | read, colloquial | 2 | conversation | reverb | human | quasi-fixed | head | hallway, restaurant (scenarized) | no | yes | no | no | no |
RWCP Spoken Dialog Corpus | 1996 - 1997 | conversation | 10 | 16 | 2 (close but cross-talk) | no | free | download email paper | 10 | 39 | Japanese | ? | colloquial | 1 or 2 | conversation | reverb | human | quasi-fixed | head | stationary background | no | no | yes | no | no |
Aurora-2 | 2000 | public spaces | 33 | 8 - 16 | 1 (close) | no | free given TIDigits | download email paper | 33 | 214 | US English | 0.01 | digits | 1 | no | simulated phone | human | N/S | no | various real environments | original | N/S | yes | no | yes |
SPINE1, SPINE2 | 2000 - 2001 | military | 38 | 16 | 2 (close) | no | 7400 $ | purchase email paper | ? | 100 | US English | 1 | command, colloquial | 1 or 2 | no | simulated radio | human | quasi-fixed | head | military | no | no | yes | no | no |
Aurora-3 (subset of SpeechDat-Car) | 2000 - 2003 | car | ? | 16 | 4 (distant) | no | 1000 € | purchase papers | ? | ? | Finnish, German, Spanish, Danish, Italian | ? | digits, command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
RWCP Meeting Speech Corpus | 2001 | meeting | 3.5 | 16 - 48 | 1 (distant) | 3 | free | download email paper | 3.5 | ? | Japanese | ? | colloquial | 1 to 5 | meeting | reverb (low) | human | quasi-fixed | head | stationary background | headset | no | yes | no | no |
RWCP Real Environment Speech and Acoustic Database | 2001 | domestic, office | ? | 16 - 48 | 30 (distant) | no | free | download email paper | ? | 5 | Japanese | ? | read | 1 | no | real rir, reverb | loudspeaker | various | no, pivoting arm | stationary background | original | yes | yes | no | yes |
SpeechDat-Car | 2001 - 2011 | car | ? | 16 | 4 (distant) | no | 39000 - 182000 k€ per lang | purchase paper | ? | 300 per lang | various | ? | digits, command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
Aurora-4 | 2002 | public spaces | ? | 8 - 16 | 1 (close) | no | free given WSJ0 | download email paper | ? | 101 | US English | 10 | read | 1 | no | simulated phone | human | N/S | no | various real environments | original | N/S | yes | no | yes |
TED | 2002 | seminar | 47 | 16 | 1 (distant) | no | 525 $ | purchase paper | 47 | 188 | English (mostly non-native) | ? | lecture | 1 or more | seminar | reverb | human | quasi-fixed | head | stationary background | lapel | no | partial | no | no |
CUAVE | 2002 | cocktail party | 3 | 44 | 1 (distant) | 1 | free | download email paper | 3 | 36 | US English | 0.01 | digits | 1 or 2 | full | reverb | human | quasi-fixed | head | stationary background | no | no | yes | no | no |
CU-Move Microphone Array Data | 2002 - 2011 | car | 286 | 44 | 6 to 8 (distant) | no | 25000 $ | purchase email paper | 286 | 172 | US English | 12 | digits, command, read, dialogue | 1 | no | reverb | human | quasi-fixed | head | car | no | no | yes | no | no |
CENSREC-1 (Aurora-2J) | 2003 | public spaces | ? | 8 | 1 (close) | no | free | download email paper | 214 | Japanese | 0.01 | digits | 1 | no | simulated phone | human | N/S | no | various real environments | original | N/S | yes | no | yes | |
AVICAR | 2004 | car | 29 | 16 | 7 (distant) | 4 | free | download email paper | 29 | 86 | US English, non-native English | 1 | read | 1 | no | reverb | human | quasi-fixed | head | car | no | no | yes | no | no |
AV16.3 | 2004 | meeting | 1.5 | 16 | 16 (distant) | 3 | free | download email paper | 1.5 | 12 | N/S | N/S | colloquial | 1 to 3 | full | reverb | human | various | walk | stationary background | no | yes | no | no | no |
ICSI Meeting Corpus | 2004 | meeting | 72 | 16 | 6 (distant) | no | 2800 $ | purchase email paper | 72 | 53 | US English | 13 | meeting | 3 to 10 | meeting | reverb | human | quasi-fixed | head | stationary background | headset, lapel | no | yes | yes | no |
NIST Meeting Pilot Corpus Speech | 2004 | meeting | 16 | 7 (distant) | no | 5500 $ | purchase email paper | 15 | 61 | US English | 6 | meeting | 3 to 9 | meeting | reverb | human | various | walk | stationary background | headset, lapel | no | yes | no | no | |
CHIL Meetings | 2004 - 2007 | seminar, meeting | 60 | 44 | 79 to 147 (distant) | 6 to 9 | 3500 € | purchase email paper | ? | ? | non-native English | ? | seminar, meeting | 3 to 20 | seminar, meeting | reverb | human | quasi-fixed | head | meeting (scenarized) | headset | yes | yes | yes | no |
SPEECON | 2004 - 2011 | public space, domestic, office, car | ? | 16 | 3 (distant) | no | 75000 € per lang | purchase email paper | ? | 600 per lang | various | ? | command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | various real environments | headset | no | yes | no | no |
CENSREC-2 | 2005 | car | ? | 16 | 1 (distant) | no | free | download email paper | ? | 214 | Japanese | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
CENSREC-3 | 2005 | car | ? | 16 | 1 (distant) | no | 21000 ¥ | purchase email paper | ? | 311 | Japanese | 0.05 | read | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
Aurora-5 | 2006 | public spaces, domestic, office, car | ? | 8 | 1 (distant) | no | free given TIDigits | download email paper | ? | 225 | US English | 0.01 | digits | 1 | no | no, simulated rir, real rir | loudspeaker | N/S | no | various real environments | original | no | yes | no | yes |
AMI | 2006 | meeting | 100 | 16 | 16 (distant) | 6 | free | download email paper | ? | 189 | UK English | 8 | meeting | 4 (18% overlap) | meeting | reverb | human | quasi-fixed | head | stationary background | headset, lapel | yes | yes | yes | no |
PASCAL SSC | 2006 | cocktail party | 8.8 | 25 | 1 (mixing console) | no | free | email paper | 8.8 | 34 | UK English | 0.05 | command | 2 | full | no | human | N/S | no | no | original | N/S | yes | no | no |
HIWIRE | 2007 | airplane | 21 | 16 | 1 (close) | no | 50 € | purchase email paper | 21 | 81 | non-native English | 0.1 | command | 1 | no | no | human | N/S | head | airplane | original | N/S | yes | no | no |
UT-Drive | 2007 | car | 40 | 25 | 5 (distant) | 2 | 25000 $ | download email paper | 40 | 25 | US English | 2.4 | command, dialogue | 1 to 2 | conversation | reverb | human | quasi-fixed | head | car | headset (low quality) | no | partial | no | no |
SASSEC, SiSEC underdetermined | 2007 - 2011 | cocktail party | 0.3 | 16 | 2 (distant) | no | free | download email paper | 0.3 | 16 | N/S | N/S | read | 3 or 4 | full | simulated rir, real rir, reverb | no, loudspeaker | fixed | no | no | original, spatial image | yes | no | no | no |
MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData | 2007 - 2014 | cocktail party | 10 | 16 | 8 to 40 (distant) | no | 1500 $ | purchase email paper paper | ? | 45 | UK English | 10 | read | 1 or 2 | full | reverb | human | various | walk | stationary background | headset, lapel | yes | yes | no | no |
CENSREC-4 (Simulated) | 2008 | public spaces, domestic, office, car | ? | 16 | 1 (distant) | no | free | download email paper | ? | 214 | Japanese | 0.01 | digits | 1 | no | real rir | dummy | fixed | no | various real environments | original | no | yes | no | yes |
CENSREC-4 (Real) | 2008 | public spaces, domestic, office, car | ? | 16 | 1 (distant) | no | free | download email paper | ? | 10 | Japanese | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | various real environments | headset | no | yes | no | yes |
DICIT | 2008 | domestic | 6 | 48 | 16 (distant) | 2 | free | download email paper | 1 | ? | Italian | ? | command | 4 | no | reverb | human | various | walk | domestic (scenarized) | headset, tv | yes | yes | no | yes |
SiSEC head-geometry | 2008 | cocktail party | 1.9 | 16 | 2 (distant) | no | free | download email paper | 1.9 | ? | N/S | N/S | read | 2 | full | real rir | loudspeaker | various | no | no | original, spatial image | yes | no | no | no |
COSINE | 2009 | conversation | 38 | 48 | 20 (distant) | no | free | download email paper | 11 | 91 | US English, non-native English | 5 | colloquial | 2 to 7 | conversation | reverb | human | various | walk | various real environments | headset, throat mic | no | yes | no | no |
SiSEC real-world noise | 2010 | public spaces | 0.3 | 16 | 2 to 4 (distant) | no | free | download email paper | 0.3 | 6 | N/S | N/S | read | 1 or 3 | full | no, reverb (other room) | loudspeaker | various | no | various real environments | original, spatial image | yes | no | no | no |
SiSEC dynamic | 2010 - 2011 | cocktail party | 0.2 | 16 | 2 to 4 (distant) | no | free | download email paper | 0.2 | ? | N/S | N/S | read | many but only 2 simultaneous | full | reverb | loudspeaker | various | simulated | no | original, spatial image | yes | no | no | no |
CHiME 1, CHiME 2 Grid | 2011 - 2012 | domestic | 70 | 16 | 2 (distant) | no | free | download email paper | 12 | 34 | UK English | 0.05 | command | 1 | no | real rir | dummy | quasi-fixed | simulated head | domestic | yes | yes | yes | no | no |
CHiME 2 WSJ0 | 2012 | domestic | 78 | 16 | 2 (distant) | no | free given WSJ0 | download email paper | 33 | 101 | US English | 11 | read | 1 | no | real rir | dummy | fixed | no | domestic | yes | yes | yes | no | no |
ETAPE | 2012 | TV/radio debates, outdoor interviews... | 42 | 16 | 1 (mixing console) | 1 | ? | email paper | 32 | 347 | French | 16 | colloquial | 1 or more (up to 10% overlap) | conversation | reverb (some) | human | quasi-fixed | head | various real environments | no | N/S | yes | no | yes |
GALE (Chinese broadcast conversation) | 2013 | TV conversation | 120 | 16 | 1 (mixing console) | no | 3500 $ | purchase email | 108 | ? | Mandarin | ? | colloquial | 1 or more | conversation | no | human | quasi-fixed | head | no | no | N/S | yes | no | no |
GALE (Arabic broadcast conversation) | 2013 | TV conversation | 251 | 16 | 1 (mixing console) | no | 7000 $ | purchase email | 234 | ? | Arabic | ? | colloquial | 1 or more | conversation | no | human | quasi-fixed | head | no | no | N/S | yes | no | no |
REVERB SimData | 2013 | domestic, office | 25 | 16 | 8 (distant) | no | free given WSJCAM0 | purchase email paper | 25 | 130 | UK English | 10 | read | 1 | no | real rir | loudspeaker | fixed | no | stationary background | original, spatial image | yes | yes | no | yes |
DIRHA | 2014 | domestic | 3.8 | 48 | 40 (distant) | no | free | download email paper | 1.3 | 30 | Italian, German, Greek, Portuguese | various | various | 1 or more | simulated | real rir | loudspeaker | various | no | domestic (sum of events) | yes | yes | yes | no | yes |
Contents
Automatic speech recognition
1st CHiME Challenge (2011)
Artificially distorted version of the small vocabulary GRID audio-visual corpus (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions. Clean-training, noisy-training, development and evaluation sets available, see
- Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.
Available from Computer Speech and Language here
Corpus available here (no cost)
Resources
Baselines
- See the paper above for results for a wide range of techniques.
AURORA 5 (2007)
Artificially distorted version of the digits TI-DIGITS corpus. Additive noise and additive noise plus reverberant speech sets. Variable SNR range. Various mixed training sets, no evaluation set, see
- G. Hirsch "Aurora-5 Experimental Framework for the Performance Evaluation of Speech Recognition in Case of a Hands-free Speech Input in Noisy Environments", Niederrhein University of Applied Sciences, 2007.
Paper available online here (no cost)
Corpus available from LDC here
Resources
- Training recipe for HTK is provided with the corpora.
Baselines
- Reproducible baseline: The above cited paper includes a baseline for the ETSI Advanced Front-End.
AURORA 4 (2002)
Artificially distorted version of the 5K word Wall Street Journal corpus (WSJ0). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone. Clean-training, mixed-training, noisy training and test sets available. No evaluation set, see
- G. Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.
Paper available with the corpus.
Corpora available from ELRA here and here
Resources
- Training recipe for HTK available here. Note that this recipe is for Wall-Street Journal (WSJ0), which is the clean speech version of AURORA4. Small changes are needed in the feature extraction scripts to account for different file terminations.
Speaker identification and verification
Speech enhancement and separation
Other applications
Contribute a dataset
To contribute a new dataset, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your dataset (the datasets are ordered by year of collection)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the dataset and year of collection
- authors, institution, contact information
- link to the dataset and to side resources (lexicon, language model, etc)
- short description (nature of the data, license, etc) and link to a paper/report describing the dataset, if any
- at least 1 research result obtained for this dataset (see below)
We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.
Contribute a research result
To contribute a new research result, please
- create an account and login
- go to the wiki page and the section corresponding to the dataset for which this result was obtained
- click on the "Edit" link on the right of the section header and add a new item for your result
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- authors, paper/report title, means of publication
- link to the pdf of the paper
- link to derived data (output transcriptions, intermediary data, etc)
- Code and instructions to reproduce experiments (if available)
In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).
We currently cannot provide storage space for large datasets. Please upload the derived data at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.