Difference between revisions of "Datasets"
m |
m (→Impulse response datasets) |
||
(126 intermediate revisions by 6 users not shown) | |||
Line 1: | Line 1: | ||
− | |||
− | + | == [[Speech datasets]] == | |
+ | The table below aims to provide a list of speech datasets with detailed attributes and links to software baselines and evaluation results. Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation... The meaning of each attribute is detailed [[#speech_attributes|below]]. | ||
− | {| class="wikitable sortable" style="font-size: | + | Disclaimer: Only datasets that are '''publicly available''', (at least partially) '''annotated''', '''suitable for research on robustness''', and '''longer than 5 min''' are listed. Other relevant datasets are listed [[#Other datasets|below]]. |
+ | |||
+ | If you would like to refer to this table, please cite | ||
+ | '''J. Le Roux and E. Vincent, "A categorization of robust speech processing datasets", Mitsubishi Electric Research Laboratories Technical Report, TR2014-116, Aug. 2014.''' | ||
+ | |||
+ | |||
+ | {| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | ||
|- | |- | ||
− | !style="width: | + | !style="width: 40px" rowspan="2" class="unsortable"|Datasets |
!colspan="8" |General attributes | !colspan="8" |General attributes | ||
!colspan="7" |Speech | !colspan="7" |Speech | ||
!colspan="4" |Channel | !colspan="4" |Channel | ||
− | !Noise | + | !colspan="2" |Noise |
!colspan="5" |Ground truth | !colspan="5" |Ground truth | ||
|- | |- | ||
− | !scope="col" width=" | + | !scope="col" width="40px" | rel. year |
− | !scope="col" width=" | + | !scope="col" width="40px" | use case |
− | !scope="col" width=" | + | !scope="col" width="40px" | total time (h) |
− | !scope="col" width=" | + | !scope="col" width="40px" | sam. rate (kHz) |
− | !scope="col" width=" | + | !scope="col" width="40px" | dist. or noisy mics |
− | !scope="col" width=" | + | !scope="col" width="40px" | video cams |
− | !scope="col" width=" | + | !scope="col" width="40px" | cost (non- memb) |
− | !scope="col" width=" | + | !scope="col" width="40px" class="unsortable" | links |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. time (h) |
− | !scope="col" width=" | + | !scope="col" width="40px" | uniq. speak. |
− | !scope="col" width=" | + | !scope="col" width="40px" | lang. |
− | !scope="col" width=" | + | !scope="col" width="40px" | uniq. words (k) |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. style |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. / rec. |
− | !scope="col" width=" | + | !scope="col" width="40px" | overl. type |
− | !scope="col" width=" | + | !scope="col" width="40px" | chan. type |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. radiat. |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. loc. |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. moves |
− | !scope="col" width=" | + | !scope="col" width="40px" | noise type |
− | !scope="col" width=" | + | !scope="col" width="40px" | avg. SNR |
− | !scope="col" width=" | + | !scope="col" width="40px" | ref. signal |
− | !scope="col" width=" | + | !scope="col" width="40px" | speak. loc., orient. |
− | !scope="col" width=" | + | !scope="col" width="40px" | words |
− | !scope="col" width=" | + | !scope="col" width="40px" | non- verb. traits |
+ | !scope="col" width="40px" | noise events | ||
|- | |- | ||
!ShATR | !ShATR | ||
|1994 | |1994 | ||
|meeting | |meeting | ||
− | |0.6 | + | |{{no|0.6}} |
− | |48 | + | |{{yes|48}} |
− | |3 | + | |{{some|3}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://spandh.dcs.shef.ac.uk/projects/shatrweb/ download] | + | |[http://spandh.dcs.shef.ac.uk/projects/shatrweb/ download] |
− | |0.6 | + | [http://spandh.dcs.shef.ac.uk/projects/shatrweb/papers/ioa94.html paper] |
− | |5 | + | |{{no|0.6}} |
+ | |{{no|5}} | ||
|UK English | |UK English | ||
− | |1 | + | |{{some|1}} |
− | | | + | |{{yes|spontaneous}} |
|5 | |5 | ||
− | |multiple | + | |multiple dialogs |
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |meeting | + | |{{yes|meeting}} |
+ | |{{no|high}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{yes}} | |{{yes}} | ||
Line 67: | Line 76: | ||
!LLSEC | !LLSEC | ||
|1996 | |1996 | ||
− | | | + | |dialog |
− | |1.4 | + | |{{some|1.4}} |
− | |16 | + | |{{some|16}} |
− | |4 | + | |{{yes|4}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html download | + | |[https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html download] |
|{{dunno}} | |{{dunno}} | ||
− | |12 | + | |{{some|12}} |
|{{n/s}} | |{{n/s}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |read, | + | |{{yes|read, spontaneous}} |
|2 | |2 | ||
− | | | + | |dialog |
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |hallway, restaurant | + | |{{some|hallway, restaurant (scenarized)}} |
+ | |{{some|medium}} | ||
|{{no}} | |{{no}} | ||
|{{yes}} | |{{yes}} | ||
|{{no}} | |{{no}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !MicArray | ||
+ | |1996 | ||
+ | |office | ||
+ | |{{no|0.2}} | ||
+ | |{{some|16}} | ||
+ | |{{yes|9 - 16}} | ||
+ | |{{no}} | ||
+ | |{{yes|free}} | ||
+ | |[http://www.speech.cs.cmu.edu/databases/micarray/ download] | ||
+ | [http://www.cs.cmu.edu/afs/cs/user/robust/www/Thesis/tms_thesis.pdf paper] | ||
+ | |{{no|0.2}} | ||
+ | |{{some|14}} | ||
+ | |US English | ||
+ | |{{no|0.07}} | ||
+ | |{{no|digits, command}} | ||
+ | |1 | ||
+ | |no | ||
+ | |{{yes|reverb}} | ||
+ | |{{yes|human}} | ||
+ | |{{some|quasi-fixed}} | ||
+ | |{{yes|head}} | ||
+ | |{{yes|stationary background}} | ||
+ | |{{some|medium}} | ||
+ | |{{some|headset}} | ||
+ | |{{no}} | ||
+ | |{{yes}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
Line 94: | Line 133: | ||
!RWCP Spoken Dialog Corpus | !RWCP Spoken Dialog Corpus | ||
|1996 - 1997 | |1996 - 1997 | ||
− | | | + | |dialog |
− | |10 | + | |{{yes|10}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{some|2}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/RWCP-SP96.html download] | + | |[http://research.nii.ac.jp/src/en/RWCP-SP96.html download] |
− | |10 | + | [http://scitation.aip.org/content/asa/journal/jasa/100/4/10.1121/1.416338 paper] |
− | |39 | + | |{{yes|10}} |
+ | |{{some|39}} | ||
|Japanese | |Japanese | ||
|{{dunno}} | |{{dunno}} | ||
− | | | + | |{{yes|spontaneous}} |
− | |1 | + | |1 - 2 |
− | | | + | |dialog |
− | |reverb | + | |{{yes|reverb (low)}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
|{{yes}} | |{{yes}} | ||
|{{no}} | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !SUSAS | ||
+ | |1999 | ||
+ | |stress | ||
+ | |{{dunno}} | ||
+ | |{{no|8}} | ||
+ | |{{no|1}} | ||
+ | |{{no}} | ||
+ | |{{some|0.5k$}} | ||
+ | |[https://catalog.ldc.upenn.edu/LDC99S78 download] | ||
+ | [https://catalog.ldc.upenn.edu/LDC99T33 download] | ||
+ | [https://catalog.ldc.upenn.edu/docs/LDC99S78/susas_rev1b4.ps paper] | ||
+ | |{{dunno}} | ||
+ | |{{some|36}} | ||
+ | |US English | ||
+ | |{{no|0.035}} | ||
+ | |{{no|command}} | ||
+ | |1 | ||
+ | |no | ||
+ | |{{yes|reverb}} | ||
+ | |{{yes|human}} | ||
+ | |{{some|quasi-fixed}} | ||
+ | |{{yes|head}} | ||
+ | |{{yes|stationary background}} | ||
+ | |{{no|high}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |{{yes}} | ||
+ | |{{yes}} | ||
|{{no}} | |{{no}} | ||
|- | |- | ||
Line 122: | Line 193: | ||
|2000 | |2000 | ||
|public spaces | |public spaces | ||
− | |33 | + | |{{yes|33}} |
− | |8 - 16 | + | |{{some|8 - 16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free given TIDigits | + | |{{some|free given TIDigits (0.5 k$)}} |
− | |[http:// | + | |[http://catalog.elra.info/product_info.php?cPath=37_40&products_id=693 purchase] (incl. HTK) |
− | |33 | + | [http://www.isca-speech.org/archive_open/asr2000/asr0_181.html paper] |
− | |214 | + | [http://aurora.hsnr.de/download.html features] |
+ | |{{yes|33}} | ||
+ | |{{yes|214}} | ||
|US English | |US English | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |simulated phone | + | |{{some|simulated phone}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{no}} | |{{no}} | ||
− | |various real environments | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{n/s}} | |{{n/s}} | ||
Line 149: | Line 223: | ||
|2000 - 2001 | |2000 - 2001 | ||
|military | |military | ||
− | |38 | + | |{{yes|38}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{some|2}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|7.4 k$}} |
− | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=SPINE purchase] | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=SPINE purchase] |
+ | [http://dl.acm.org/citation.cfm?id=1289199 paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |100 | + | |{{yes|100}} |
|US English | |US English | ||
− | |1 | + | |{{some|1}} |
− | |command, | + | |{{yes|command, spontaneous}} |
− | |1 | + | |1 - 2 |
− | |{{ | + | |no |
− | |simulated radio | + | |{{some|simulated radio}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |military ( | + | |{{some|military (rescaled)}} |
+ | |{{yes|low}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
Line 173: | Line 249: | ||
|{{no}} | |{{no}} | ||
|- | |- | ||
− | !Aurora-3 (subset of SpeechDat-Car) | + | !Aurora-3 (subset of SpeechDat- Car) |
|2000 - 2003 | |2000 - 2003 | ||
|car | |car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |4 | + | |{{yes|4}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{some|1 k€}} |
− | |[http://catalog.elra.info/index.php?cPath=37_40 purchase] [http://aurora.hsnr.de/aurora-3/reports.html papers] | + | |[http://catalog.elra.info/index.php?cPath=37_40 purchase] (incl. HTK) |
+ | [http://aurora.hsnr.de/aurora-3/reports.html papers] | ||
|{{dunno}} | |{{dunno}} | ||
− | |{{ | + | |{{yes|730}} |
− | | | + | |various |
− | |{{ | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |car | + | |{{yes|car}} |
+ | |{{yes|low}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 203: | Line 281: | ||
|2001 | |2001 | ||
|meeting | |meeting | ||
− | |3.5 | + | |{{some|3.5}} |
− | |16 - 48 | + | |{{yes|16 - 48}} |
− | |1 | + | |{{no|1}} |
− | |3 | + | |{{yes|3}} |
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/RWCP-SP01.html download] | + | |[http://research.nii.ac.jp/src/en/RWCP-SP01.html download] |
− | |3.5 | + | [http://id.nii.ac.jp/1001/00057420/ paper] |
+ | |{{some|3.5}} | ||
|{{dunno}} | |{{dunno}} | ||
|Japanese | |Japanese | ||
|{{dunno}} | |{{dunno}} | ||
− | | | + | |{{yes|spontaneous}} |
− | |1 | + | |1 - 5 |
|meeting | |meeting | ||
− | |low | + | |{{yes|reverb (low)}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 227: | Line 307: | ||
|{{no}} | |{{no}} | ||
|- | |- | ||
− | !RWCP Real Environment Speech | + | !RWCP Real Environment Speech Database |
|2001 | |2001 | ||
|domestic, office | |domestic, office | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 - 48 | + | |{{yes|16 - 48}} |
− | | | + | |{{yes|84}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] | + | |[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] |
+ | [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/356.htm paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |5 | + | |{{no|5}} |
− | |Japanese | + | |US English, Japanese |
|{{dunno}} | |{{dunno}} | ||
− | |read | + | |{{some|read}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |real rir, reverb | + | |{{yes|real rir, reverb}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
|{{yes|various}} | |{{yes|various}} | ||
|{{some|no, pivoting arm}} | |{{some|no, pivoting arm}} | ||
− | | | + | |{{some|various (sum of events)}} |
+ | |{{some|medium}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{yes}} | |{{yes}} | ||
Line 254: | Line 336: | ||
|{{yes}} | |{{yes}} | ||
|- | |- | ||
− | !SpeechDat-Car | + | !SpeechDat- Car |
|2001 - 2011 | |2001 - 2011 | ||
|car | |car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |4 | + | |{{yes|4}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|39 - 182 k€ per lang}} |
− | |[http://catalog.elra.info/search.php purchase] [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/373.htm paper] | + | |[http://catalog.elra.info/search.php purchase] |
+ | [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/373.htm paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |300 per lang | + | |{{yes|300 per lang}} |
|various | |various | ||
|{{dunno}} | |{{dunno}} | ||
− | |digits, command, read, spontaneous | + | |{{yes|digits, command, read, spontaneous}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |car | + | |{{yes|car}} |
+ | |{{yes|low}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 285: | Line 369: | ||
|public spaces | |public spaces | ||
|{{dunno}} | |{{dunno}} | ||
− | |8 - 16 | + | |{{some|8 - 16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free given WSJ0 | + | |{{some|free given WSJ0 (1.5 k$)}} |
− | |[http:// | + | |[http://catalog.elra.info/index.php?cPath=37_40 purchase] |
+ | [http://aurora.hsnr.de/aurora-4/reports.html paper] | ||
+ | [http://www.keithv.com/software/htk/ HTK] | ||
|{{dunno}} | |{{dunno}} | ||
− | |101 | + | |{{yes|101}} |
|US English | |US English | ||
− | |10 | + | |{{yes|10}} |
− | |read | + | |{{some|read}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |simulated phone | + | |{{some|simulated phone}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{no}} | |{{no}} | ||
− | |various real environments | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{n/s}} | |{{n/s}} | ||
Line 311: | Line 398: | ||
|2002 | |2002 | ||
|seminar | |seminar | ||
− | |47 | + | |{{yes|47}} |
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{some|0.5 k$}} |
− | |[https://catalog.ldc.upenn.edu/LDC2002S04 purchase] [http://perso.limsi.fr/lamel/icslp94ted.pdf paper] | + | |[https://catalog.ldc.upenn.edu/LDC2002S04 purchase] |
− | |47 | + | [http://perso.limsi.fr/lamel/icslp94ted.pdf paper] |
− | |188 | + | |{{yes|47}} |
− | | | + | |{{yes|188}} |
+ | |non-native English | ||
|{{dunno}} | |{{dunno}} | ||
− | |lecture | + | |{{some|lecture}} |
|1 or more | |1 or more | ||
|seminar | |seminar | ||
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{some|lapel}} | |{{some|lapel}} | ||
|{{no}} | |{{no}} | ||
Line 337: | Line 426: | ||
!CUAVE | !CUAVE | ||
|2002 | |2002 | ||
− | | | + | |speech overlap |
− | |3 | + | |{{some|3}} |
− | |44 | + | |{{yes|44}} |
− | |1 | + | |{{no|1}} |
− | |1 | + | |{{some|1}} |
− | |free | + | |{{yes|free}} |
− | |[http:// | + | |[http://media.clemson.edu/cuave/CUAVE-092908.iso download] |
− | |3 | + | [http://asp.eurasipjournals.com/content/2002/11/208541 paper] |
− | |36 | + | |{{some|3}} |
+ | |{{some|36}} | ||
|US English | |US English | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
− | |1 | + | |1 - 2 |
|full | |full | ||
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
Line 365: | Line 456: | ||
|2002 - 2011 | |2002 - 2011 | ||
|car | |car | ||
− | |286 | + | |{{yes|286}} |
− | |44 | + | |{{yes|44}} |
− | |6 | + | |{{yes|6 - 8}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|25 k$}} |
− | |[http://crss.utdallas.edu/ purchase] | + | |[http://crss.utdallas.edu/ purchase] |
− | |286 | + | [http://www.isca-speech.org/archive/eurospeech_2001/e01_2023.html paper] |
− | |172 | + | |{{yes|286}} |
+ | |{{yes|172}} | ||
|US English | |US English | ||
− | |12 | + | |{{yes|12}} |
− | |digits, command, read, | + | |{{yes|digits, command, read, dialog}} |
|1 | |1 | ||
+ | |no | ||
+ | |{{yes|reverb}} | ||
+ | |{{yes|human}} | ||
+ | |{{some|quasi-fixed}} | ||
+ | |{{yes|head}} | ||
+ | |{{yes|car}} | ||
+ | |{{yes|low}} | ||
|{{no}} | |{{no}} | ||
− | |reverb | + | |{{no}} |
+ | |{{yes}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !PDA | ||
+ | |2003 | ||
+ | |office | ||
+ | |{{some|1.6-3}} | ||
+ | |{{some|11 - 16}} | ||
+ | |{{some|1 - 4}} | ||
+ | |{{no}} | ||
+ | |{{yes|free}} | ||
+ | |[http://www.speech.cs.cmu.edu/databases/pda/ download] | ||
+ | [http://www.sapaworkshops.org/2004/papers/52.pdf paper] | ||
+ | [http://www.cs.cmu.edu/afs/cs/user/robust/www/Thesis/mseltzer_phdthesis.pdf paper] | ||
+ | |{{some|1.6 - 3}} | ||
+ | |{{some|11 - 16}} | ||
+ | |US English | ||
+ | |{{some|1 - 2}} | ||
+ | |{{some|read}} | ||
+ | |1 | ||
+ | |no | ||
+ | |{{yes|reverb}} | ||
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | | | + | |{{yes|stationary background}} |
− | |{{ | + | |{{yes|low}} |
+ | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
|{{yes}} | |{{yes}} | ||
Line 393: | Line 516: | ||
|public spaces | |public spaces | ||
|{{dunno}} | |{{dunno}} | ||
− | |8 | + | |{{no|8}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/CENSREC-1.html download] | + | |[http://research.nii.ac.jp/src/en/CENSREC-1.html download] |
− | | | + | [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15046/1/425.pdf paper] |
− | |214 | + | |{{dunno}} |
+ | |{{yes|214}} | ||
|Japanese | |Japanese | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |simulated phone | + | |{{some|simulated phone}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{no}} | |{{no}} | ||
− | |various real environments | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{n/s}} | |{{n/s}} | ||
Line 419: | Line 544: | ||
|2004 | |2004 | ||
|car | |car | ||
− | | | + | |{{yes|40}} |
− | |16 | + | |{{some|16}} |
− | |7 | + | |{{yes|7}} |
− | |4 | + | |{{yes|4}} |
− | |free | + | |{{yes|free}} |
− | |[http://www.isle.illinois.edu/sst/AVICAR/ download] | + | |[http://www.isle.illinois.edu/sst/AVICAR/ download] |
− | | | + | [http://www.isca-speech.org/archive/interspeech_2004/i04_2489.html paper] |
− | | | + | |{{yes|40}} |
+ | |{{some|87}} | ||
|US English, non-native English | |US English, non-native English | ||
+ | |{{some|1}} | ||
+ | |{{some|read}} | ||
|1 | |1 | ||
− | | | + | |no |
− | + | |{{yes|reverb}} | |
− | |{{ | ||
− | |||
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |car | + | |{{yes|moving car, windows open or closed}} |
+ | |{{yes|low}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
Line 446: | Line 573: | ||
|2004 | |2004 | ||
|meeting | |meeting | ||
− | |1.5 | + | |{{some|1.5}} |
− | |16 | + | |{{some|16}} |
− | |16 | + | |{{yes|16}} |
− | |3 | + | |{{yes|3}} |
− | |free | + | |{{yes|free}} |
− | |[http://www.idiap.ch/dataset/av16-3/ download] | + | |[http://www.idiap.ch/dataset/av16-3/ download] |
− | |1.5 | + | [http://publications.idiap.ch/index.php/publications/show/353 paper] |
− | |12 | + | |{{some|1.5}} |
+ | |{{some|12}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{n/s}} | |{{n/s}} | ||
− | | | + | |{{yes|spontaneous}} |
− | |1 | + | |1 - 3 |
|full | |full | ||
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{yes|various}} | |{{yes|various}} | ||
− | |{{yes|walk}} | + | |{{yes|head, walk}} |
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{no}} | |{{no}} | ||
− | |{{ | + | |{{some|partial}} |
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
Line 473: | Line 602: | ||
|2004 | |2004 | ||
|meeting | |meeting | ||
− | |72 | + | |{{yes|72}} |
− | |16 | + | |{{some|16}} |
− | |6 | + | |{{yes|6}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|2.8 k$}} |
− | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=ICSI purchase] [ | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=ICSI purchase] |
− | |72 | + | [http://www1.icsi.berkeley.edu/Speech/mr/ info] |
− | |53 | + | [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1198793 paper] |
− | |US English | + | |{{yes|72}} |
− | |13 | + | |{{some|53}} |
+ | |US English, other English | ||
+ | |{{yes|13}} | ||
+ | |{{yes|meeting}} | ||
+ | |3 - 10 | ||
|meeting | |meeting | ||
− | | | + | |{{yes|reverb}} |
− | |||
− | |reverb | ||
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | | | + | |{{yes|meeting}} |
+ | |{{no|high}} | ||
|{{some|headset, lapel}} | |{{some|headset, lapel}} | ||
|{{no}} | |{{no}} | ||
|{{yes}} | |{{yes}} | ||
|{{yes}} | |{{yes}} | ||
− | |{{ | + | |{{some|ad-hoc}} |
|- | |- | ||
!NIST Meeting Pilot Corpus Speech | !NIST Meeting Pilot Corpus Speech | ||
|2004 | |2004 | ||
|meeting | |meeting | ||
− | |15 | + | |{{yes|15}} |
− | |16 | + | |{{some|16}} |
− | |7 | + | |{{yes|7}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|5.5 k$}} |
− | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=NIST%20Meeting purchase] | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=NIST%20Meeting purchase] |
− | |15 | + | [http://www.lrec-conf.org/proceedings/lrec2004/summaries/137.htm paper] |
− | |61 | + | |{{yes|15}} |
+ | |{{some|61}} | ||
|US English | |US English | ||
− | |6 | + | |{{some|6}} |
− | |meeting | + | |{{yes|meeting}} |
− | |3 | + | |3 - 9 |
|meeting | |meeting | ||
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{yes|various}} | |{{yes|various}} | ||
− | |{{yes|walk}} | + | |{{yes|head, walk}} |
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{some|headset, lapel}} | |{{some|headset, lapel}} | ||
|{{no}} | |{{no}} | ||
Line 527: | Line 661: | ||
|2004 - 2007 | |2004 - 2007 | ||
|seminar, meeting | |seminar, meeting | ||
− | |60 | + | |{{yes|60}} |
− | |44 | + | |{{yes|44}} |
− | |79 | + | |{{yes|79 - 147}} |
− | |6 | + | |{{yes|6 - 9}} |
− | | | + | |{{no|3.5 k€}} |
− | |[http://catalog.elra.info/search.php purchase] | + | |[http://catalog.elra.info/search.php purchase] |
+ | [http://link.springer.com/article/10.1007%2Fs10579-007-9054-4 paper] | ||
|{{dunno}} | |{{dunno}} | ||
|{{dunno}} | |{{dunno}} | ||
|non-native English | |non-native English | ||
|{{dunno}} | |{{dunno}} | ||
+ | |{{yes|seminar, meeting}} | ||
+ | |3 - 20 | ||
|seminar, meeting | |seminar, meeting | ||
− | | | + | |{{yes|reverb}} |
− | |||
− | |reverb | ||
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |meeting (scenarized) | + | |{{some|meeting (scenarized)}} |
+ | |{{no|high}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{yes}} | |{{yes}} | ||
Line 555: | Line 691: | ||
|public space, domestic, office, car | |public space, domestic, office, car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |3 | + | |{{some|3}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|75 k€ per lang}} |
− | |[http://catalog.elra.info/search.php purchase] | + | |[http://catalog.elra.info/search.php purchase] |
+ | [http://www.lrec-conf.org/proceedings/lrec2002/sumarios/177.htm paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |600 per lang | + | |{{yes|600 per lang}} |
|various | |various | ||
|{{dunno}} | |{{dunno}} | ||
− | |command, read, spontaneous | + | |{{yes|command, read, spontaneous}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |various real environments | + | |{{yes|various real environments}} |
+ | |{{some|medium}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 582: | Line 720: | ||
|car | |car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/CENSREC-2.html download] | + | |[http://research.nii.ac.jp/src/en/CENSREC-2.html download] |
+ | [http://www.isca-speech.org/archive/interspeech_2006/i06_1726.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |214 | + | |{{yes|214}} |
|Japanese | |Japanese | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |car | + | |{{yes|car}} |
+ | |{{yes|low}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 609: | Line 749: | ||
|car | |car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{some|21 k¥}} |
− | |[http://research.nii.ac.jp/src/en/CENSREC-3.html purchase] | + | |[http://research.nii.ac.jp/src/en/CENSREC-3.html purchase] |
+ | [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15050/1/429.pdf paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |311 | + | |{{yes|311}} |
|Japanese | |Japanese | ||
− | |0.05 | + | |{{no|0.05}} |
− | |read | + | |{{some|read}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |car | + | |{{yes|car}} |
+ | |{{yes|low}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 636: | Line 778: | ||
|public spaces, domestic, office, car | |public spaces, domestic, office, car | ||
|{{dunno}} | |{{dunno}} | ||
− | |8 | + | |{{no|8}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free given TIDigits | + | |{{some|free given TIDigits (0.5 k$)}} |
− | |[http:// | + | |[http://catalog.elra.info/product_info.php?cPath=37_40&products_id=1015 purchase] (incl. HTK) |
+ | [http://aurora.hsnr.de/aurora-5/reports.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |225 | + | |{{yes|225}} |
|US English | |US English | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |no, simulated rir, real rir | + | |{{some|no, simulated rir, real rir}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
− | |{{ | + | |{{no|fixed}} |
|{{no}} | |{{no}} | ||
− | |various real environments | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{no}} | |{{no}} | ||
Line 662: | Line 806: | ||
|2006 | |2006 | ||
|meeting | |meeting | ||
− | |100 | + | |{{yes|100}} |
− | |16 | + | |{{some|16}} |
− | |16 | + | |{{yes|16}} |
− | |6 | + | |{{yes|6}} |
− | |free | + | |{{yes|free}} |
− | |[http://groups.inf.ed.ac.uk/ami/ download] | + | |[http://groups.inf.ed.ac.uk/ami/ download] |
+ | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4538700 paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |189 | + | |{{yes|189}} |
− | |UK English | + | |UK English, other English |
− | |8 | + | |{{some|8}} |
− | |meeting | + | |{{yes|meeting}} |
− | |4 (18% overlap) | + | |most often 4 |
− | | | + | |meeting (18% overlap) |
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{some|headset, lapel}} | |{{some|headset, lapel}} | ||
|{{yes}} | |{{yes}} | ||
Line 688: | Line 834: | ||
!PASCAL SSC | !PASCAL SSC | ||
|2006 | |2006 | ||
− | | | + | |speech overlap |
− | |8.8 | + | |{{some|8.8}} |
− | |25 | + | |{{yes|25}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[ | + | |[http://staffwww.dcs.shef.ac.uk/people/M.Cooke/SpeechSeparationChallenge.htm download] |
− | |8.8 | + | [http://www.sciencedirect.com/science/article/pii/S0885230809000205 paper] |
− | |34 | + | |{{some|8.8}} |
+ | |{{some|34}} | ||
|UK English | |UK English | ||
− | |0.05 | + | |{{no|0.05}} |
− | |command | + | |{{no|command}} |
|2 | |2 | ||
|full | |full | ||
Line 707: | Line 854: | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
+ | |{{n/s}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{n/s}} | |{{n/s}} | ||
Line 716: | Line 864: | ||
|2007 | |2007 | ||
|airplane | |airplane | ||
− | |21 | + | |{{yes|21}} |
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{some|0.05 k€}} |
− | |[http://catalog.elra.info/product_info.php?products_id=1088&language=en purchase] | + | |[http://catalog.elra.info/product_info.php?products_id=1088&language=en purchase] |
− | |21 | + | [http://cvsp.cs.ntua.gr/projects/pub/HIWIRE/WebHome/HIWIRE_db_description_paper.pdf paper] |
− | |81 | + | |{{yes|21}} |
+ | |{{some|81}} | ||
|non-native English | |non-native English | ||
− | |0.1 | + | |{{no|0.1}} |
− | |command | + | |{{no|command}} |
|1 | |1 | ||
+ | |no | ||
|{{no}} | |{{no}} | ||
+ | |{{yes|human}} | ||
+ | |{{n/s}} | ||
|{{no}} | |{{no}} | ||
+ | |{{some|airplane (rescaled)}} | ||
+ | |{{yes|low}} | ||
+ | |{{yes|original}} | ||
+ | |{{n/s}} | ||
+ | |{{yes}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !NOIZEUS | ||
+ | |2007 | ||
+ | |public spaces | ||
+ | |{{no|0.6}} | ||
+ | |{{no|8}} | ||
+ | |{{no|1}} | ||
+ | |{{no}} | ||
+ | |{{yes|free}} | ||
+ | |[http://ecs.utdallas.edu/loizou/speech/noizeus/ download] | ||
+ | [http://www.sciencedirect.com/science/article/pii/S0167639306001920 paper] | ||
+ | |{{no|0.6}} | ||
+ | |{{no|6}} | ||
+ | |US English | ||
+ | |{{no|0.1}} | ||
+ | |{{some|read}} | ||
+ | |1 | ||
+ | |no | ||
+ | |{{some|simulated phone}} | ||
|{{yes|human}} | |{{yes|human}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |{{ | + | |{{no}} |
− | | | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |{{ | + | |{{no}} |
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
Line 743: | Line 922: | ||
|2007 | |2007 | ||
|car | |car | ||
− | |40 | + | |{{yes|40}} |
− | |25 | + | |{{yes|25}} |
− | |5 | + | |{{yes|5}} |
− | |2 | + | |{{yes|2}} |
− | | | + | |{{no|25 k$}} |
− | |[http://crss.utdallas.edu/ download] | + | |[http://crss.utdallas.edu/ download] |
− | |40 | + | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4290175 paper] |
− | |25 | + | |{{yes|40}} |
+ | |{{some|25}} | ||
|US English | |US English | ||
− | |2.4 | + | |{{some|2.4}} |
− | |command, | + | |{{yes|command, dialog}} |
− | |1 | + | |1 - 2 |
− | | | + | |dialog |
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |car | + | |{{yes|car}} |
+ | |{{yes|low}} | ||
|{{some|headset (low quality)}} | |{{some|headset (low quality)}} | ||
|{{no}} | |{{no}} | ||
Line 767: | Line 948: | ||
|{{no}} | |{{no}} | ||
|- | |- | ||
− | !SASSEC, SiSEC | + | !SASSEC, SiSEC under- determined |
|2007 - 2011 | |2007 - 2011 | ||
|cocktail party | |cocktail party | ||
− | |0.3 | + | |{{no|0.3}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{some|2}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] | + | |[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] |
− | |0.3 | + | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
− | |16 | + | |{{no|0.3}} |
+ | |{{some|16}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |read | + | |{{some|read}} |
− | |3 | + | |3 - 4 |
|full | |full | ||
− | |simulated rir, real rir, reverb | + | |{{yes|simulated rir, real rir, reverb}} |
|{{no|no, loudspeaker}} | |{{no|no, loudspeaker}} | ||
|{{no|fixed}} | |{{no|fixed}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
+ | |{{n/s}} | ||
|{{yes|original, spatial image}} | |{{yes|original, spatial image}} | ||
|{{yes}} | |{{yes}} | ||
Line 796: | Line 979: | ||
!MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData | !MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData | ||
|2007 - 2014 | |2007 - 2014 | ||
− | | | + | |speech overlap |
− | |10 | + | |{{yes|10}} |
− | |16 | + | |{{some|16}} |
− | |8 | + | |{{yes|8 - 40}} |
− | |{{ | + | |{{some|partial}} |
− | | | + | |{{some|1.5 k$}} |
− | |[https://catalog.ldc.upenn.edu/LDC2014S03 purchase] | + | |[https://catalog.ldc.upenn.edu/LDC2014S03 purchase] |
+ | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=1566470 paper] | ||
+ | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6639033 paper] | ||
+ | [http://www.cstr.ed.ac.uk/corpora/2012_MMA/ info] | ||
+ | [http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=8J_nG0wAAAAJ&citation_for_view=8J_nG0wAAAAJ:08ZZubdj9fEC video] | ||
+ | [http://reverb2014.dereverberation.com/tools/REVERB_TOOLS_FOR_ASR_ver2.0.tgz HTK] | ||
+ | [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Kaldi] | ||
+ | [http://reverb2014.dereverberation.com/result_se.html results] | ||
+ | [http://reverb2014.dereverberation.com/result_asr.html results] | ||
|{{dunno}} | |{{dunno}} | ||
− | |45 | + | |{{some|45}} |
|UK English | |UK English | ||
− | |10 | + | |{{yes|10}} |
− | |read | + | |{{some|read}} |
− | |1 | + | |1 - 2 |
|full | |full | ||
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{yes|various}} | |{{yes|various}} | ||
− | |{{yes|walk}} | + | |{{yes|head, walk}} |
− | |stationary background | + | |{{yes|stationary background}} |
+ | |{{no|high}} | ||
|{{some|headset, lapel}} | |{{some|headset, lapel}} | ||
|{{yes}} | |{{yes}} | ||
Line 825: | Line 1,017: | ||
|public spaces, domestic, office, car | |public spaces, domestic, office, car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] | + | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] |
+ | [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |214 | + | |{{yes|214}} |
|Japanese | |Japanese | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |real rir | + | |{{some|real rir}} |
|{{some|dummy}} | |{{some|dummy}} | ||
|{{no|fixed}} | |{{no|fixed}} | ||
|{{no}} | |{{no}} | ||
− | |various real environments | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original}} | |{{yes|original}} | ||
|{{no}} | |{{no}} | ||
Line 852: | Line 1,046: | ||
|public spaces, domestic, office, car | |public spaces, domestic, office, car | ||
|{{dunno}} | |{{dunno}} | ||
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] | + | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] |
+ | [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
− | |10 | + | |{{some|10}} |
|Japanese | |Japanese | ||
− | |0.01 | + | |{{no|0.01}} |
− | |digits | + | |{{no|digits}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |various real environments | + | |{{yes|various real environments}} |
+ | |{{yes|low}} | ||
|{{some|headset}} | |{{some|headset}} | ||
|{{no}} | |{{no}} | ||
Line 878: | Line 1,074: | ||
|2008 | |2008 | ||
|domestic | |domestic | ||
− | |6 | + | |{{some|6}} |
− | |48 | + | |{{yes|48}} |
− | |16 | + | |{{yes|16}} |
− | |2 | + | |{{yes|2}} |
− | |free | + | |{{yes|free}} |
− | |[http://shine.fbk.eu/resources/dicit-acoustic-woz-data download] | + | |[http://shine.fbk.eu/resources/dicit-acoustic-woz-data download] |
− | |1 | + | [http://www.lrec-conf.org/proceedings/lrec2008/summaries/584.html paper] |
+ | |{{some|1}} | ||
|{{dunno}} | |{{dunno}} | ||
|Italian | |Italian | ||
|{{dunno}} | |{{dunno}} | ||
− | |command | + | |{{no|command}} |
|4 | |4 | ||
− | |{{ | + | |no |
− | + | |{{yes|reverb}} | |
|{{yes|human}} | |{{yes|human}} | ||
|{{yes|various}} | |{{yes|various}} | ||
− | |{{yes|walk}} | + | |{{yes|head, walk}} |
− | |domestic (scenarized) | + | |{{some|domestic (scenarized)}} |
+ | |{{some|medium}} | ||
|{{some|headset, tv}} | |{{some|headset, tv}} | ||
|{{yes}} | |{{yes}} | ||
Line 904: | Line 1,102: | ||
!SiSEC head-geometry | !SiSEC head-geometry | ||
|2008 | |2008 | ||
− | | | + | |speech overlap |
− | |1.9 | + | |{{some|1.9}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{some|2}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download] | + | |[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download] |
− | |1.9 | + | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
+ | |{{some|1.9}} | ||
|{{dunno}} | |{{dunno}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |read | + | |{{some|read}} |
|2 | |2 | ||
|full | |full | ||
− | |real rir | + | |{{some|real rir}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
|{{yes|various}} | |{{yes|various}} | ||
|{{no}} | |{{no}} | ||
|{{no}} | |{{no}} | ||
+ | |{{n/s}} | ||
|{{yes|original, spatial image}} | |{{yes|original, spatial image}} | ||
|{{yes}} | |{{yes}} | ||
Line 931: | Line 1,131: | ||
!COSINE | !COSINE | ||
|2009 | |2009 | ||
− | | | + | |dialog |
− | |38 | + | |{{yes|38}} |
− | |48 | + | |{{yes|48}} |
− | |20 | + | |{{yes|20}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://melodi.ee.washington.edu/cosine/ download] | + | |[http://melodi.ee.washington.edu/cosine/ download] |
− | |11 | + | [http://www.sciencedirect.com/science/article/pii/S0885230811000143 paper] |
− | |91 | + | |{{yes|11}} |
+ | |{{some|91}} | ||
|US English, non-native English | |US English, non-native English | ||
− | |5 | + | |{{some|5}} |
− | | | + | |{{yes|spontaneous}} |
− | |2 | + | |2 - 7 |
− | | | + | |dialog |
− | |reverb | + | |{{yes|reverb}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{yes|various}} | |{{yes|various}} | ||
− | |{{yes|walk}} | + | |{{yes|head, walk}} |
− | |various real environments | + | |{{yes|various real environments}} |
+ | |{{yes|low}} | ||
|{{some|headset, throat mic}} | |{{some|headset, throat mic}} | ||
|{{no}} | |{{no}} | ||
Line 959: | Line 1,161: | ||
|2010 | |2010 | ||
|public spaces | |public spaces | ||
− | |0.3 | + | |{{no|0.3}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{yes|2 - 4}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise download] | + | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise download] |
− | |0.3 | + | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
− | |6 | + | |{{no|0.3}} |
+ | |{{no|6}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |read | + | |{{some|read}} |
− | |1 | + | |1 - 3 |
|full | |full | ||
− | |no, reverb (other room) | + | |{{some|no, reverb (other room)}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
|{{yes|various}} | |{{yes|various}} | ||
|{{no}} | |{{no}} | ||
− | |various real environments | + | |{{some|various real environments (rescaled)}} |
+ | |{{yes|low}} | ||
|{{yes|original, spatial image}} | |{{yes|original, spatial image}} | ||
|{{yes}} | |{{yes}} | ||
Line 986: | Line 1,190: | ||
|2010 - 2011 | |2010 - 2011 | ||
|cocktail party | |cocktail party | ||
− | |0.2 | + | |{{no|0.2}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{yes|2 - 4}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions download] | + | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions download] |
− | |0.2 | + | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
+ | |{{no|0.2}} | ||
|{{dunno}} | |{{dunno}} | ||
|{{n/s}} | |{{n/s}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |read | + | |{{some|read}} |
− | | | + | |{{dunno}} |
− | |full | + | |full (2 at a time) |
− | |reverb | + | |{{yes|reverb}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
|{{yes|various}} | |{{yes|various}} | ||
|{{some|simulated}} | |{{some|simulated}} | ||
|{{no}} | |{{no}} | ||
+ | |{{n/s}} | ||
|{{yes|original, spatial image}} | |{{yes|original, spatial image}} | ||
|{{yes}} | |{{yes}} | ||
Line 1,013: | Line 1,219: | ||
|2011 - 2012 | |2011 - 2012 | ||
|domestic | |domestic | ||
− | |70 | + | |{{yes|70}} |
− | |16 | + | |{{yes|16 - 48}} |
− | |2 | + | |{{some|2}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{yes|free}} |
− | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html download] | + | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/chime2_task1.html download] |
− | |12 | + | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] |
− | |34 | + | [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/chime2_task1.html#tools HTK] |
+ | [http://spandh.dcs.shef.ac.uk/projects/chime/PCC/results.html results] | ||
+ | [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/track1_results.html results] | ||
+ | |{{yes|12}} | ||
+ | |{{some|34}} | ||
|UK English | |UK English | ||
− | |0.05 | + | |{{no|0.05}} |
− | |command | + | |{{no|command}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |real rir | + | |{{some|real rir}} |
|{{some|dummy}} | |{{some|dummy}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{some|simulated head}} | |{{some|simulated head}} | ||
− | |domestic | + | |{{yes|domestic (added without rescaling)}} |
+ | |{{yes|low}} | ||
|{{yes}} | |{{yes}} | ||
|{{yes}} | |{{yes}} | ||
Line 1,040: | Line 1,251: | ||
|2012 | |2012 | ||
|domestic | |domestic | ||
− | |78 | + | |{{yes|78}} |
− | |16 | + | |{{some|16}} |
− | |2 | + | |{{some|2}} |
|{{no}} | |{{no}} | ||
− | |free given WSJ0 | + | |{{some|free given WSJ0 (1.5 k$)}} |
− | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html download] | + | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/chime2_task2.html download] |
− | |33 | + | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] |
− | |101 | + | [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/chime2_task2.html#tools HTK] |
+ | [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz Kaldi] | ||
+ | [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/track2_results.html results] | ||
+ | |{{yes|33}} | ||
+ | |{{yes|101}} | ||
|US English | |US English | ||
− | |11 | + | |{{yes|11}} |
− | |read | + | |{{some|read}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |real rir | + | |{{some|real rir}} |
|{{some|dummy}} | |{{some|dummy}} | ||
|{{no|fixed}} | |{{no|fixed}} | ||
|{{no}} | |{{no}} | ||
− | |domestic | + | |{{yes|domestic (added without rescaling)}} |
+ | |{{yes|low}} | ||
|{{yes}} | |{{yes}} | ||
|{{yes}} | |{{yes}} | ||
Line 1,066: | Line 1,282: | ||
!ETAPE | !ETAPE | ||
|2012 | |2012 | ||
− | |TV/radio debates, outdoor interviews | + | |TV/radio debates, outdoor interviews |
− | |42 | + | |{{yes|42}} |
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
− | |1 | + | |{{some|1}} |
|{{dunno}} | |{{dunno}} | ||
− | |[ | + | |[http://www.afcp-parole.org/etape.html download] |
− | |32 | + | [http://www.lrec-conf.org/proceedings/lrec2012/summaries/495.html paper] |
− | |347 | + | |{{yes|32}} |
+ | |{{yes|347}} | ||
|French | |French | ||
− | |16 | + | |{{yes|16}} |
− | | | + | |{{yes|spontaneous}} |
− | |1 or more (up to 10% overlap) | + | |1 or more |
− | | | + | |dialog (up to 10% overlap) |
− | |some | + | |{{yes|reverb (some)}} |
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |various real environments | + | |{{yes|various real environments}} |
+ | |{{no|high}} | ||
|{{no}} | |{{no}} | ||
|{{n/s}} | |{{n/s}} | ||
Line 1,091: | Line 1,309: | ||
|{{yes}} | |{{yes}} | ||
|- | |- | ||
− | !GALE | + | !GALE |
|2013 | |2013 | ||
− | |TV | + | |TV dialog |
− | |120 | + | |{{yes|120 - 251 per lang}} |
− | |16 | + | |{{some|16}} |
− | |1 | + | |{{no|1}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{no|3.5 - 7 k$ per lang}} |
− | |[https://catalog.ldc.upenn.edu/ | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=GALE purchase] |
− | |108 | + | |{{yes|108 - 234 per lang}} |
|{{dunno}} | |{{dunno}} | ||
− | |Mandarin | + | |Mandarin, Arabic |
|{{dunno}} | |{{dunno}} | ||
− | | | + | |{{yes|spontaneous}} |
|1 or more | |1 or more | ||
− | | | + | |dialog |
|{{no}} | |{{no}} | ||
|{{yes|human}} | |{{yes|human}} | ||
|{{some|quasi-fixed}} | |{{some|quasi-fixed}} | ||
|{{yes|head}} | |{{yes|head}} | ||
− | |||
|{{no}} | |{{no}} | ||
|{{n/s}} | |{{n/s}} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
|{{no}} | |{{no}} | ||
|{{n/s}} | |{{n/s}} | ||
Line 1,148: | Line 1,340: | ||
|2013 | |2013 | ||
|domestic, office | |domestic, office | ||
− | |25 | + | |{{yes|25}} |
− | |16 | + | |{{some|16}} |
− | |8 | + | |{{yes|8}} |
|{{no}} | |{{no}} | ||
− | |free given WSJCAM0 | + | |{{some|free given WSJCAM0 (1.75 k$)}} |
− | |[http://reverb2014.dereverberation.com/ purchase] | + | |[http://reverb2014.dereverberation.com/ purchase] |
− | |25 | + | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6701894 paper] |
− | |130 | + | [http://reverb2014.dereverberation.com/tools/REVERB_TOOLS_FOR_ASR_ver2.0.tgz HTK] |
+ | [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Kaldi] | ||
+ | [http://reverb2014.dereverberation.com/result_se.html results] | ||
+ | [http://reverb2014.dereverberation.com/result_asr.html results] | ||
+ | |{{yes|25}} | ||
+ | |{{yes|130}} | ||
|UK English | |UK English | ||
− | |10 | + | |{{yes|10}} |
− | |read | + | |{{some|read}} |
|1 | |1 | ||
− | |{{ | + | |no |
− | |real rir | + | |{{some|real rir}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
− | |{{ | + | |{{yes|various}} |
|{{no}} | |{{no}} | ||
− | | | + | |{{some|random noise}} |
+ | |{{no|high}} | ||
|{{yes|original, spatial image}} | |{{yes|original, spatial image}} | ||
|{{yes}} | |{{yes}} | ||
Line 1,171: | Line 1,369: | ||
|{{no}} | |{{no}} | ||
|{{yes}} | |{{yes}} | ||
+ | |- | ||
+ | !Sheffield Wargames Corpus | ||
+ | |2013 | ||
+ | |cocktail party | ||
+ | |{{some|7}} | ||
+ | |{{yes|48}} | ||
+ | |{{yes|92}} | ||
+ | |{{yes|3}} | ||
+ | |{{yes|free}} | ||
+ | |[http://mini.dcs.shef.ac.uk/data-2/ download] | ||
+ | [http://www.isca-speech.org/archive/interspeech_2013/i13_1116.html paper] | ||
+ | |{{dunno}} | ||
+ | |{{no|9}} | ||
+ | |UK English | ||
+ | |{{dunno}} | ||
+ | |{{yes|spontaneous}} | ||
+ | |4 | ||
+ | |multiple dialogs | ||
+ | |{{yes|reverb}} | ||
+ | |{{yes|human}} | ||
+ | |{{yes|various}} | ||
+ | |{{yes|head, walk}} | ||
+ | |{{yes|background music}} | ||
+ | |{{some|medium}} | ||
+ | |{{some|headset}} | ||
+ | |{{yes}} | ||
+ | |{{yes}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
|- | |- | ||
!DIRHA | !DIRHA | ||
|2014 | |2014 | ||
|domestic | |domestic | ||
− | | | + | |{{yes|11}} |
− | |48 | + | |{{yes|48}} |
− | |40 | + | |{{yes|40}} |
|{{no}} | |{{no}} | ||
− | |free | + | |{{some|free (partial avail.)}} |
− | |[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] | + | |[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] |
− | | | + | [http://www.lrec-conf.org/proceedings/lrec2014/summaries/650.html paper] |
− | | | + | |{{some|4}} |
− | | | + | |{{some|90}} |
− | | | ||
|various | |various | ||
+ | |{{some|3.8}} | ||
+ | |{{yes|command, read, spontaneous}} | ||
|1 or more | |1 or more | ||
|simulated | |simulated | ||
− | |real rir | + | |{{some|real rir}} |
|{{no|loudspeaker}} | |{{no|loudspeaker}} | ||
|{{yes|various}} | |{{yes|various}} | ||
|{{no}} | |{{no}} | ||
− | |domestic ( | + | |{{yes|domestic (added without rescaling)}} |
+ | |{{yes|low}} | ||
|{{yes}} | |{{yes}} | ||
|{{yes}} | |{{yes}} | ||
Line 1,198: | Line 1,427: | ||
|{{no}} | |{{no}} | ||
|{{yes}} | |{{yes}} | ||
+ | |- | ||
+ | !CHiME 3 | ||
+ | |2015 | ||
+ | |public spaces | ||
+ | |{{yes|48}} | ||
+ | |{{some|16}} | ||
+ | |{{yes|6}} | ||
+ | |{{no}} | ||
+ | |{{some|free given WSJ0 (1.5 k$)}} | ||
+ | |[http://spandh.dcs.shef.ac.uk/chime_challenge/download.html download] | ||
+ | [https://hal.inria.fr/hal-01211376 paper] | ||
+ | |{{yes|28}} | ||
+ | |{{yes|113}} | ||
+ | |US English | ||
+ | |{{yes|11}} | ||
+ | |{{some|read}} | ||
+ | |1 | ||
+ | |no | ||
+ | |{{yes|simulated, reverb}} | ||
+ | |{{yes|human}} | ||
+ | |{{yes|various}} | ||
+ | |{{yes|head}} | ||
+ | |{{yes|various real environments}} | ||
+ | |{{yes|low}} | ||
+ | |{{some|headset}} | ||
+ | |{{no}} | ||
+ | |{{yes}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
|} | |} | ||
− | = | + | <span id="speech_attributes"></span> |
+ | '''General attributes''': | ||
+ | * year of release | ||
+ | * scenario: car, cocktail party, domestic, lecture, meeting, office, public space, TV... | ||
+ | * total duration (h) (multiple channels counted only once) | ||
+ | * sampling rate (kHz) | ||
+ | * number of distant or noisy microphones | ||
+ | * number of video cameras | ||
+ | * cost for non-members of ELRA and LDC (cost for members is lower or free) | ||
+ | * links: download data, reference papers, software baselines, evaluation results... | ||
+ | '''Speech attributes''': | ||
+ | * duration of speech (h) (overlapping speech counted only once) | ||
+ | * number of unique speakers | ||
+ | * language | ||
+ | * number of unique words (differs from assumed vocabulary size, which is somewhat arbitrary) | ||
+ | * speaking style: digits, command, read, spontaneous... | ||
+ | * number of speakers present in the room | ||
+ | * type of speaker overlap: no overlap, simulated overlap, dialogue, meeting, full overlap... | ||
+ | '''Channel attributes''': | ||
+ | * channel type: none, simulated room impulse response, convolution by a recorded room impulse response, reverberant recording... | ||
+ | * speaker radiation: loudspeaker, dummy head with mouth simulator, human... | ||
+ | * speaker location: at a fixed position in the room, at a quasi-fixed position (e.g., seated), at different positions... | ||
+ | * speaker movements: no movement, head movements, walking... | ||
+ | '''Noise attributes''': | ||
+ | * noise type: stationary background noise (e.g., air-conditioning), car noise, meeting noises, domestic noises, outdoor noises... | ||
+ | '''Available ground truth''': | ||
+ | * reference speech signal: original (at the mouth), headset or lapel (slightly differs from the signal at the mouth), spatial image (at the microphones)... | ||
+ | * speaker location and orientation | ||
+ | * words uttered | ||
+ | * paralinguistic attributes: nodding, gaze, communication intent, emotion... (excluding speaker attributes such as age, gender, or native language) | ||
+ | * noise events: type and time of individual noise events | ||
− | + | == [[Impulse response datasets]] == | |
+ | The table below provides a list of impulse response (IR) datasets with detailed attributes. The meaning of each attribute is detailed [[#ir_attributes|below]]. | ||
− | + | Disclaimer: Only datasets that are '''publicly available''' and include some '''reverberation''' (not only HRTFs) are listed. | |
− | |||
− | + | {| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | |
− | + | |- | |
− | + | !style="width: 40px" rowspan="2" class="unsortable"|Datasets | |
− | + | !colspan="7" |General attributes | |
− | + | !colspan="8" |Channel | |
− | + | !style="width: 40px" rowspan="2" |Room noise | |
− | + | |- | |
− | + | !scope="col" width="40px" | rel. year | |
− | + | !scope="col" width="40px" | envir. | |
− | + | !scope="col" width="40px" | total IRs | |
− | + | !scope="col" width="40px" | sam. rate (kHz) | |
− | + | !scope="col" width="40px" | mics | |
− | + | !scope="col" width="40px" | cost | |
− | + | !scope="col" width="40px" class="unsortable" | links | |
− | + | !scope="col" width="40px" | chan. type | |
− | + | !scope="col" width="40px" | rooms | |
− | + | !scope="col" width="40px" | speak. radiat. | |
− | + | !scope="col" width="40px" | speak. loc. | |
− | + | !scope="col" width="40px" | speak. moves | |
− | + | !scope="col" width="40px" | mic. direc. | |
− | + | !scope="col" width="40px" | mic. loc. | |
+ | !scope="col" width="40px" | mic. moves | ||
+ | |- | ||
+ | !RWCP Real Environment Acoustic Database | ||
+ | |2001 | ||
+ | |varechoic room, office | ||
+ | |{{some|364}} | ||
+ | |{{yes|16 - 48}} | ||
+ | |{{yes|84}} | ||
+ | |{{yes|free}} | ||
+ | |[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] | ||
+ | [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/356.htm paper] | ||
+ | |{{yes|real}} | ||
+ | |{{some|7}} | ||
+ | |{{some|dummy}} | ||
+ | |{{no|9 (far)}} | ||
+ | |{{yes}} | ||
+ | |omni | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !SASSEC, SiSEC under- determined | ||
+ | |2007 - 2011 | ||
+ | |office | ||
+ | |{{dunno}} | ||
+ | |{{some|16}} | ||
+ | |{{some|2}} | ||
+ | |{{yes|free}} | ||
+ | |[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] | ||
+ | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | ||
+ | |{{yes|simulated, real}} | ||
+ | |{{some|4}} | ||
+ | |{{no|no, loudspeaker}} | ||
+ | |{{dunno}} | ||
+ | |{{no}} | ||
+ | |omni | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !SiSEC head-geometry | ||
+ | |2008 | ||
+ | |office | ||
+ | |{{no|38}} | ||
+ | |{{some|16}} | ||
+ | |{{some|2}} | ||
+ | |{{some|free (partial avail.)}} | ||
+ | |[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download] | ||
+ | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | ||
+ | |{{yes|real}} | ||
+ | |{{no|1}} | ||
+ | |{{no|loudspeaker}} | ||
+ | |{{some|19 (far)}} | ||
+ | |{{no}} | ||
+ | |binaural | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !Aachen Impulse Response | ||
+ | |2009 - 2012 | ||
+ | |various | ||
+ | |{{some|214}} | ||
+ | |{{yes|48}} | ||
+ | |{{some|2}} | ||
+ | |{{yes|free}} | ||
+ | |[http://www.ind.rwth-aachen.de/de/forschung/tools-downloads/aachen-impulse-response-database/ download] | ||
+ | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=5201259 paper] | ||
+ | |{{yes|real}} | ||
+ | |{{some|8}} | ||
+ | |{{no|loudspeaker}} | ||
+ | |{{some|13 (far)}} | ||
+ | |{{no}} | ||
+ | |omni, binaural, phone | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !CAMIL | ||
+ | |2010 - 2012 | ||
+ | |office | ||
+ | |{{yes|32400}} | ||
+ | |{{some|16}} | ||
+ | |{{some|2}} | ||
+ | |{{yes|free}} | ||
+ | |[https://team.inria.fr/perception/the-camil-dataset/ download] | ||
+ | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637612 paper] | ||
+ | |{{yes|real}} | ||
+ | |{{no|1}} | ||
+ | |{{no|loudspeaker}} | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |binaural | ||
+ | |{{yes|16200 (close)}} | ||
+ | |{{yes}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !CHiME 2 Grid | ||
+ | |2012 | ||
+ | |domestic | ||
+ | |{{some|242}} | ||
+ | |{{yes|16 - 48}} | ||
+ | |{{some|2}} | ||
+ | |{{yes|free}} | ||
+ | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html download] | ||
+ | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] | ||
+ | |{{yes|real}} | ||
+ | |{{no|1}} | ||
+ | |{{some|dummy}} | ||
+ | |{{yes|121 (close)}} | ||
+ | |{{no}} | ||
+ | |binaural | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{yes}} | ||
+ | |- | ||
+ | !AVASM | ||
+ | |2013 | ||
+ | |office | ||
+ | |{{some|864}} | ||
+ | |{{some|16}} | ||
+ | |{{some|2}} | ||
+ | |{{yes|free}} | ||
+ | |[http://perception.inrialpes.fr/~Deleforge/AVASM_Dataset/ download] | ||
+ | [http://www.eurasip.org/Proceedings/Eusipco/Eusipco2014/HTML/papers/1569923293.pdf paper] | ||
+ | |{{yes|real}} | ||
+ | |{{no|1}} | ||
+ | |{{no|loudspeaker}} | ||
+ | |{{yes|432 (close)}} | ||
+ | |{{no}} | ||
+ | |binaural | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{no}} | ||
+ | |- | ||
+ | !DIRHA | ||
+ | |2014 | ||
+ | |domestic | ||
+ | |{{yes|9200}} | ||
+ | |{{yes|48}} | ||
+ | |{{yes|40}} | ||
+ | |{{some|free (partial avail.)}} | ||
+ | |[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] | ||
+ | [http://www.lrec-conf.org/proceedings/lrec2014/summaries/650.html paper] | ||
+ | |{{yes|real}} | ||
+ | |{{some|5}} | ||
+ | |{{no|loudspeaker}} | ||
+ | |{{some|57 (far)}} | ||
+ | |{{no}} | ||
+ | |omni | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |{{yes}} | ||
+ | |- | ||
+ | !ACE | ||
+ | |2015 | ||
+ | |office, meeting, lecture, lobby | ||
+ | |{{some|700}} | ||
+ | |{{yes|48}} | ||
+ | |{{yes|50}} | ||
+ | |{{yes|free}} | ||
+ | |[http://www.ace-challenge.org download] | ||
+ | [http://www.ace-challenge.org paper] | ||
+ | |{{yes|real}} | ||
+ | |{{some|7}} | ||
+ | |{{no|loudspeaker}} | ||
+ | |{{no|fixed}} | ||
+ | |{{no}} | ||
+ | |omni, laptop, mobile, cruciform, linear, spherical | ||
+ | |{{some|2 (near, far)}} | ||
+ | |{{no}} | ||
+ | |{{yes| ambient, live babble, fan}} | ||
+ | |} | ||
− | + | <span id="ir_attributes"></span> | |
+ | '''General attributes''': | ||
+ | * year of release | ||
+ | * recording environment: car, domestic, lecture, meeting, office, public space... | ||
+ | * total IRs: total number of single-channel impulse responses | ||
+ | * sampling rate (kHz) | ||
+ | * number of microphones | ||
+ | * cost | ||
+ | * links: download data, reference papers, software baselines, evaluation results... | ||
+ | '''Channel attributes''': | ||
+ | * channel type: simulated or real impulse response | ||
+ | * number of rooms: | ||
+ | * speaker radiation: loudspeaker, mouth simulator | ||
+ | * speaker location: at a fixed position in the room, or number of different positions (closely spaced or far) | ||
+ | * speaker movements: no movement, moves while recording | ||
+ | * microphone directivity: omnidirectional, cardioid, binaural... | ||
+ | * microphone location: at a fixed position in the room, or number of different positions (closely spaced or far) | ||
+ | * microphone movements: no movement, moves while recording | ||
+ | '''Noise attributes''': | ||
+ | * room noise: background noise recorded in the same room as the impulse responses | ||
− | + | == [[Text datasets]] == | |
− | |||
− | + | == [[Other datasets]] == | |
+ | This section lists all other relevant datasets that have not been annotated or made publicly available yet. | ||
− | * | + | Speech datasets: |
− | + | * [http://www.iarpa.gov/index.php/research-programs/babel BABEL] (not yet available) | |
− | + | * [https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=HUB4 Broadcast news, HUB4] (no noise and 4.5% speaker overlap, less than ETAPE) | |
− | + | * [http://www.isca-speech.org/archive/interspeech_2004/i04_2789.html CIAIR In-Car Speech Database] (availability unknown) | |
− | + | * [http://bme.ccny.cuny.edu/faculty/parra/bss/ Dyrholm/Sawada/Parra] (about 1 min long) | |
− | + | * [http://www.ee.columbia.edu/~dpwe/pubs/EllisSC14-proximity.pdf NEMISIG] (unavailable) | |
− | + | * [http://cs.uef.fi/odyssey2014/program/pdfs/21.pdf NFI-FRITS] (unavailable) | |
− | + | * [http://www.darpa.mil/Our_Work/I2O/Programs/Robust_Automatic_Transcription_of_Speech_%28RATS%29.aspx RATS] (not yet available) | |
− | : | + | * Rich Transcription (RT) (dataset gathered from other sets, e.g. CHIL, ICSI, ISL, AMI...) |
− | + | * [http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=8J_nG0wAAAAJ&citation_for_view=8J_nG0wAAAAJ:08ZZubdj9fEC Settlers of Catan] (unannotated, [http://meetingdiarisation.wordpress.com/2013/05/09/ready-for-recording-settlers-of-cattan-with-the-dmma-2-and-dmma-3/ more info]) | |
− | + | * [http://scholar.google.co.uk/citations?view_op=view_citation&hl=en&user=8J_nG0wAAAAJ&citation_for_view=8J_nG0wAAAAJ:08ZZubdj9fEC Flying MEMS microphone array] (unannotated, [http://meetingdiarisation.wordpress.com/2014/08/11/flying-digital-mems-microphone-array-dmma-3/ more info]) | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | * | ||
− | |||
− | |||
− | |||
− | == [ | ||
− | |||
− | == [ | ||
== Contribute a dataset == | == Contribute a dataset == | ||
To contribute a new dataset, please | To contribute a new dataset, please | ||
− | * [[ | + | * [[Main_Page#Contribute|create an account]] and login |
− | * go to the | + | * go to the section above corresponding to your type of dataset; if the table does not exist yet, you may create it |
− | * click on the "Edit" link at the top of the | + | * click on the "Edit" link at the top of the table and add a new line for your dataset (the lines are ordered by year of release) |
+ | * fill all columns as much as possible, following the detailed list of attributes below the table | ||
* click on the "Save page" link at the bottom of the page to save your modifications | * click on the "Save page" link at the bottom of the page to save your modifications | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the [[Main_Page#Working group contacts|resources sharing working group]]. | We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the [[Main_Page#Working group contacts|resources sharing working group]]. | ||
− | == Contribute a | + | == Contribute a software baseline == |
− | To contribute a new | + | To contribute a new software baseline, please |
− | * [[ | + | * [[Main_Page#Contribute|create an account]] and login |
− | * go to | + | * fill an entry for your software on the [[Software]] page, if not done yet |
− | * click on the "Edit" link | + | * go to the section above corresponding to the dataset for which your baseline was designed |
+ | * click on the "Edit" link at the top of the table and add a link to your software in the corresponding "links" cell | ||
* click on the "Save page" link at the bottom of the page to save your modifications | * click on the "Save page" link at the bottom of the page to save your modifications | ||
− | Please | + | We currently cannot provide storage space for large software. Please upload your software at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the [[Main_Page#Working group contacts|resources sharing working group]]. |
− | |||
− | |||
− | |||
− | |||
− | + | == Contribute an evaluation result == | |
+ | To contribute a new research result, please | ||
+ | * [[Main_Page#Contribute|create an account]] and login | ||
+ | * go to the section above corresponding to the dataset for which this result was obtained | ||
+ | * click on the "Edit" link at the top of the table and add a link to your result in the corresponding "links" cell | ||
+ | * make sure that the link (e.g., a paper or another webpage) contains the following information: authors, link to a paper/report containing objective evaluation results, link to derived data (output transcriptions, intermediary data, etc) | ||
+ | * click on the "Save page" link at the bottom of the page to save your modifications | ||
− | + | In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore). |
Latest revision as of 11:38, 3 November 2015
Contents
Speech datasets
The table below aims to provide a list of speech datasets with detailed attributes and links to software baselines and evaluation results. Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation... The meaning of each attribute is detailed below.
Disclaimer: Only datasets that are publicly available, (at least partially) annotated, suitable for research on robustness, and longer than 5 min are listed. Other relevant datasets are listed below.
If you would like to refer to this table, please cite J. Le Roux and E. Vincent, "A categorization of robust speech processing datasets", Mitsubishi Electric Research Laboratories Technical Report, TR2014-116, Aug. 2014.
Datasets | General attributes | Speech | Channel | Noise | Ground truth | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rel. year | use case | total time (h) | sam. rate (kHz) | dist. or noisy mics | video cams | cost (non- memb) | links | speak. time (h) | uniq. speak. | lang. | uniq. words (k) | speak. style | speak. / rec. | overl. type | chan. type | speak. radiat. | speak. loc. | speak. moves | noise type | avg. SNR | ref. signal | speak. loc., orient. | words | non- verb. traits | noise events | |
ShATR | 1994 | meeting | 0.6 | 48 | 3 | no | free | download | 0.6 | 5 | UK English | 1 | spontaneous | 5 | multiple dialogs | reverb | human | quasi-fixed | head | meeting | high | headset | yes | yes | no | yes |
LLSEC | 1996 | dialog | 1.4 | 16 | 4 | no | free | download | ? | 12 | N/S | N/S | read, spontaneous | 2 | dialog | reverb | human | quasi-fixed | head | hallway, restaurant (scenarized) | medium | no | yes | no | no | no |
MicArray | 1996 | office | 0.2 | 16 | 9 - 16 | no | free | download | 0.2 | 14 | US English | 0.07 | digits, command | 1 | no | reverb | human | quasi-fixed | head | stationary background | medium | headset | no | yes | no | no |
RWCP Spoken Dialog Corpus | 1996 - 1997 | dialog | 10 | 16 | 2 | no | free | download | 10 | 39 | Japanese | ? | spontaneous | 1 - 2 | dialog | reverb (low) | human | quasi-fixed | head | stationary background | high | no | no | yes | no | no |
SUSAS | 1999 | stress | ? | 8 | 1 | no | 0.5k$ | download | ? | 36 | US English | 0.035 | command | 1 | no | reverb | human | quasi-fixed | head | stationary background | high | no | no | yes | yes | no |
Aurora-2 | 2000 | public spaces | 33 | 8 - 16 | 1 | no | free given TIDigits (0.5 k$) | purchase (incl. HTK) | 33 | 214 | US English | 0.01 | digits | 1 | no | simulated phone | human | N/S | no | various real environments (rescaled) | low | original | N/S | yes | no | yes |
SPINE1, SPINE2 | 2000 - 2001 | military | 38 | 16 | 2 | no | 7.4 k$ | purchase | ? | 100 | US English | 1 | command, spontaneous | 1 - 2 | no | simulated radio | human | quasi-fixed | head | military (rescaled) | low | no | no | yes | no | no |
Aurora-3 (subset of SpeechDat- Car) | 2000 - 2003 | car | ? | 16 | 4 | no | 1 k€ | purchase (incl. HTK) | ? | 730 | various | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | car | low | headset | no | yes | no | no |
RWCP Meeting Speech Corpus | 2001 | meeting | 3.5 | 16 - 48 | 1 | 3 | free | download | 3.5 | ? | Japanese | ? | spontaneous | 1 - 5 | meeting | reverb (low) | human | quasi-fixed | head | stationary background | high | headset | no | yes | no | no |
RWCP Real Environment Speech Database | 2001 | domestic, office | ? | 16 - 48 | 84 | no | free | download | ? | 5 | US English, Japanese | ? | read | 1 | no | real rir, reverb | loudspeaker | various | no, pivoting arm | various (sum of events) | medium | original | yes | yes | no | yes |
SpeechDat- Car | 2001 - 2011 | car | ? | 16 | 4 | no | 39 - 182 k€ per lang | purchase | ? | 300 per lang | various | ? | digits, command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | car | low | headset | no | yes | no | no |
Aurora-4 | 2002 | public spaces | ? | 8 - 16 | 1 | no | free given WSJ0 (1.5 k$) | purchase | ? | 101 | US English | 10 | read | 1 | no | simulated phone | human | N/S | no | various real environments (rescaled) | low | original | N/S | yes | no | yes |
TED | 2002 | seminar | 47 | 16 | 1 | no | 0.5 k$ | purchase | 47 | 188 | non-native English | ? | lecture | 1 or more | seminar | reverb | human | quasi-fixed | head | stationary background | high | lapel | no | partial | no | no |
CUAVE | 2002 | speech overlap | 3 | 44 | 1 | 1 | free | download | 3 | 36 | US English | 0.01 | digits | 1 - 2 | full | reverb | human | quasi-fixed | head | stationary background | high | no | no | yes | no | no |
CU-Move Microphone Array Data | 2002 - 2011 | car | 286 | 44 | 6 - 8 | no | 25 k$ | purchase | 286 | 172 | US English | 12 | digits, command, read, dialog | 1 | no | reverb | human | quasi-fixed | head | car | low | no | no | yes | no | no |
PDA | 2003 | office | 1.6-3 | 11 - 16 | 1 - 4 | no | free | download | 1.6 - 3 | 11 - 16 | US English | 1 - 2 | read | 1 | no | reverb | human | quasi-fixed | head | stationary background | low | headset | no | yes | no | no |
CENSREC-1 (Aurora-2J) | 2003 | public spaces | ? | 8 | 1 | no | free | download | ? | 214 | Japanese | 0.01 | digits | 1 | no | simulated phone | human | N/S | no | various real environments (rescaled) | low | original | N/S | yes | no | yes |
AVICAR | 2004 | car | 40 | 16 | 7 | 4 | free | download | 40 | 87 | US English, non-native English | 1 | read | 1 | no | reverb | human | quasi-fixed | head | moving car, windows open or closed | low | no | no | yes | no | no |
AV16.3 | 2004 | meeting | 1.5 | 16 | 16 | 3 | free | download | 1.5 | 12 | N/S | N/S | spontaneous | 1 - 3 | full | reverb | human | various | head, walk | stationary background | high | no | partial | no | no | no |
ICSI Meeting Corpus | 2004 | meeting | 72 | 16 | 6 | no | 2.8 k$ | purchase | 72 | 53 | US English, other English | 13 | meeting | 3 - 10 | meeting | reverb | human | quasi-fixed | head | meeting | high | headset, lapel | no | yes | yes | ad-hoc |
NIST Meeting Pilot Corpus Speech | 2004 | meeting | 15 | 16 | 7 | no | 5.5 k$ | purchase | 15 | 61 | US English | 6 | meeting | 3 - 9 | meeting | reverb | human | various | head, walk | stationary background | high | headset, lapel | no | yes | no | no |
CHIL Meetings | 2004 - 2007 | seminar, meeting | 60 | 44 | 79 - 147 | 6 - 9 | 3.5 k€ | purchase | ? | ? | non-native English | ? | seminar, meeting | 3 - 20 | seminar, meeting | reverb | human | quasi-fixed | head | meeting (scenarized) | high | headset | yes | yes | yes | no |
SPEECON | 2004 - 2011 | public space, domestic, office, car | ? | 16 | 3 | no | 75 k€ per lang | purchase | ? | 600 per lang | various | ? | command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | various real environments | medium | headset | no | yes | no | no |
CENSREC-2 | 2005 | car | ? | 16 | 1 | no | free | download | ? | 214 | Japanese | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | car | low | headset | no | yes | no | no |
CENSREC-3 | 2005 | car | ? | 16 | 1 | no | 21 k¥ | purchase | ? | 311 | Japanese | 0.05 | read | 1 | no | reverb | human | quasi-fixed | head | car | low | headset | no | yes | no | no |
Aurora-5 | 2006 | public spaces, domestic, office, car | ? | 8 | 1 | no | free given TIDigits (0.5 k$) | purchase (incl. HTK) | ? | 225 | US English | 0.01 | digits | 1 | no | no, simulated rir, real rir | loudspeaker | fixed | no | various real environments (rescaled) | low | original | no | yes | no | yes |
AMI | 2006 | meeting | 100 | 16 | 16 | 6 | free | download | ? | 189 | UK English, other English | 8 | meeting | most often 4 | meeting (18% overlap) | reverb | human | quasi-fixed | head | stationary background | high | headset, lapel | yes | yes | yes | no |
PASCAL SSC | 2006 | speech overlap | 8.8 | 25 | 1 | no | free | download | 8.8 | 34 | UK English | 0.05 | command | 2 | full | no | human | N/S | no | no | N/S | original | N/S | yes | no | no |
HIWIRE | 2007 | airplane | 21 | 16 | 1 | no | 0.05 k€ | purchase | 21 | 81 | non-native English | 0.1 | command | 1 | no | no | human | N/S | no | airplane (rescaled) | low | original | N/S | yes | no | no |
NOIZEUS | 2007 | public spaces | 0.6 | 8 | 1 | no | free | download | 0.6 | 6 | US English | 0.1 | read | 1 | no | simulated phone | human | N/S | no | various real environments (rescaled) | low | original | N/S | no | no | no |
UT-Drive | 2007 | car | 40 | 25 | 5 | 2 | 25 k$ | download | 40 | 25 | US English | 2.4 | command, dialog | 1 - 2 | dialog | reverb | human | quasi-fixed | head | car | low | headset (low quality) | no | partial | no | no |
SASSEC, SiSEC under- determined | 2007 - 2011 | cocktail party | 0.3 | 16 | 2 | no | free | download | 0.3 | 16 | N/S | N/S | read | 3 - 4 | full | simulated rir, real rir, reverb | no, loudspeaker | fixed | no | no | N/S | original, spatial image | yes | no | no | no |
MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData | 2007 - 2014 | speech overlap | 10 | 16 | 8 - 40 | partial | 1.5 k$ | purchase | ? | 45 | UK English | 10 | read | 1 - 2 | full | reverb | human | various | head, walk | stationary background | high | headset, lapel | yes | yes | no | no |
CENSREC-4 (Simulated) | 2008 | public spaces, domestic, office, car | ? | 16 | 1 | no | free | download | ? | 214 | Japanese | 0.01 | digits | 1 | no | real rir | dummy | fixed | no | various real environments (rescaled) | low | original | no | yes | no | yes |
CENSREC-4 (Real) | 2008 | public spaces, domestic, office, car | ? | 16 | 1 | no | free | download | ? | 10 | Japanese | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | various real environments | low | headset | no | yes | no | yes |
DICIT | 2008 | domestic | 6 | 48 | 16 | 2 | free | download | 1 | ? | Italian | ? | command | 4 | no | reverb | human | various | head, walk | domestic (scenarized) | medium | headset, tv | yes | yes | no | yes |
SiSEC head-geometry | 2008 | speech overlap | 1.9 | 16 | 2 | no | free | download | 1.9 | ? | N/S | N/S | read | 2 | full | real rir | loudspeaker | various | no | no | N/S | original, spatial image | yes | no | no | no |
COSINE | 2009 | dialog | 38 | 48 | 20 | no | free | download | 11 | 91 | US English, non-native English | 5 | spontaneous | 2 - 7 | dialog | reverb | human | various | head, walk | various real environments | low | headset, throat mic | no | yes | no | no |
SiSEC real-world noise | 2010 | public spaces | 0.3 | 16 | 2 - 4 | no | free | download | 0.3 | 6 | N/S | N/S | read | 1 - 3 | full | no, reverb (other room) | loudspeaker | various | no | various real environments (rescaled) | low | original, spatial image | yes | no | no | no |
SiSEC dynamic | 2010 - 2011 | cocktail party | 0.2 | 16 | 2 - 4 | no | free | download | 0.2 | ? | N/S | N/S | read | ? | full (2 at a time) | reverb | loudspeaker | various | simulated | no | N/S | original, spatial image | yes | no | no | no |
CHiME 1, CHiME 2 Grid | 2011 - 2012 | domestic | 70 | 16 - 48 | 2 | no | free | download | 12 | 34 | UK English | 0.05 | command | 1 | no | real rir | dummy | quasi-fixed | simulated head | domestic (added without rescaling) | low | yes | yes | yes | no | no |
CHiME 2 WSJ0 | 2012 | domestic | 78 | 16 | 2 | no | free given WSJ0 (1.5 k$) | download | 33 | 101 | US English | 11 | read | 1 | no | real rir | dummy | fixed | no | domestic (added without rescaling) | low | yes | yes | yes | no | no |
ETAPE | 2012 | TV/radio debates, outdoor interviews | 42 | 16 | 1 | 1 | ? | download | 32 | 347 | French | 16 | spontaneous | 1 or more | dialog (up to 10% overlap) | reverb (some) | human | quasi-fixed | head | various real environments | high | no | N/S | yes | no | yes |
GALE | 2013 | TV dialog | 120 - 251 per lang | 16 | 1 | no | 3.5 - 7 k$ per lang | purchase | 108 - 234 per lang | ? | Mandarin, Arabic | ? | spontaneous | 1 or more | dialog | no | human | quasi-fixed | head | no | N/S | no | N/S | yes | no | no |
REVERB SimData | 2013 | domestic, office | 25 | 16 | 8 | no | free given WSJCAM0 (1.75 k$) | purchase | 25 | 130 | UK English | 10 | read | 1 | no | real rir | loudspeaker | various | no | random noise | high | original, spatial image | yes | yes | no | yes |
Sheffield Wargames Corpus | 2013 | cocktail party | 7 | 48 | 92 | 3 | free | download | ? | 9 | UK English | ? | spontaneous | 4 | multiple dialogs | reverb | human | various | head, walk | background music | medium | headset | yes | yes | no | no |
DIRHA | 2014 | domestic | 11 | 48 | 40 | no | free (partial avail.) | download | 4 | 90 | various | 3.8 | command, read, spontaneous | 1 or more | simulated | real rir | loudspeaker | various | no | domestic (added without rescaling) | low | yes | yes | yes | no | yes |
CHiME 3 | 2015 | public spaces | 48 | 16 | 6 | no | free given WSJ0 (1.5 k$) | download | 28 | 113 | US English | 11 | read | 1 | no | simulated, reverb | human | various | head | various real environments | low | headset | no | yes | no | no |
General attributes:
- year of release
- scenario: car, cocktail party, domestic, lecture, meeting, office, public space, TV...
- total duration (h) (multiple channels counted only once)
- sampling rate (kHz)
- number of distant or noisy microphones
- number of video cameras
- cost for non-members of ELRA and LDC (cost for members is lower or free)
- links: download data, reference papers, software baselines, evaluation results...
Speech attributes:
- duration of speech (h) (overlapping speech counted only once)
- number of unique speakers
- language
- number of unique words (differs from assumed vocabulary size, which is somewhat arbitrary)
- speaking style: digits, command, read, spontaneous...
- number of speakers present in the room
- type of speaker overlap: no overlap, simulated overlap, dialogue, meeting, full overlap...
Channel attributes:
- channel type: none, simulated room impulse response, convolution by a recorded room impulse response, reverberant recording...
- speaker radiation: loudspeaker, dummy head with mouth simulator, human...
- speaker location: at a fixed position in the room, at a quasi-fixed position (e.g., seated), at different positions...
- speaker movements: no movement, head movements, walking...
Noise attributes:
- noise type: stationary background noise (e.g., air-conditioning), car noise, meeting noises, domestic noises, outdoor noises...
Available ground truth:
- reference speech signal: original (at the mouth), headset or lapel (slightly differs from the signal at the mouth), spatial image (at the microphones)...
- speaker location and orientation
- words uttered
- paralinguistic attributes: nodding, gaze, communication intent, emotion... (excluding speaker attributes such as age, gender, or native language)
- noise events: type and time of individual noise events
Impulse response datasets
The table below provides a list of impulse response (IR) datasets with detailed attributes. The meaning of each attribute is detailed below.
Disclaimer: Only datasets that are publicly available and include some reverberation (not only HRTFs) are listed.
Datasets | General attributes | Channel | Room noise | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rel. year | envir. | total IRs | sam. rate (kHz) | mics | cost | links | chan. type | rooms | speak. radiat. | speak. loc. | speak. moves | mic. direc. | mic. loc. | mic. moves | ||
RWCP Real Environment Acoustic Database | 2001 | varechoic room, office | 364 | 16 - 48 | 84 | free | download | real | 7 | dummy | 9 (far) | yes | omni | fixed | no | no |
SASSEC, SiSEC under- determined | 2007 - 2011 | office | ? | 16 | 2 | free | download | simulated, real | 4 | no, loudspeaker | ? | no | omni | fixed | no | no |
SiSEC head-geometry | 2008 | office | 38 | 16 | 2 | free (partial avail.) | download | real | 1 | loudspeaker | 19 (far) | no | binaural | fixed | no | no |
Aachen Impulse Response | 2009 - 2012 | various | 214 | 48 | 2 | free | download | real | 8 | loudspeaker | 13 (far) | no | omni, binaural, phone | fixed | no | no |
CAMIL | 2010 - 2012 | office | 32400 | 16 | 2 | free | download | real | 1 | loudspeaker | fixed | no | binaural | 16200 (close) | yes | no |
CHiME 2 Grid | 2012 | domestic | 242 | 16 - 48 | 2 | free | download | real | 1 | dummy | 121 (close) | no | binaural | fixed | no | yes |
AVASM | 2013 | office | 864 | 16 | 2 | free | download | real | 1 | loudspeaker | 432 (close) | no | binaural | fixed | no | no |
DIRHA | 2014 | domestic | 9200 | 48 | 40 | free (partial avail.) | download | real | 5 | loudspeaker | 57 (far) | no | omni | fixed | no | yes |
ACE | 2015 | office, meeting, lecture, lobby | 700 | 48 | 50 | free | download | real | 7 | loudspeaker | fixed | no | omni, laptop, mobile, cruciform, linear, spherical | 2 (near, far) | no | ambient, live babble, fan |
General attributes:
- year of release
- recording environment: car, domestic, lecture, meeting, office, public space...
- total IRs: total number of single-channel impulse responses
- sampling rate (kHz)
- number of microphones
- cost
- links: download data, reference papers, software baselines, evaluation results...
Channel attributes:
- channel type: simulated or real impulse response
- number of rooms:
- speaker radiation: loudspeaker, mouth simulator
- speaker location: at a fixed position in the room, or number of different positions (closely spaced or far)
- speaker movements: no movement, moves while recording
- microphone directivity: omnidirectional, cardioid, binaural...
- microphone location: at a fixed position in the room, or number of different positions (closely spaced or far)
- microphone movements: no movement, moves while recording
Noise attributes:
- room noise: background noise recorded in the same room as the impulse responses
Text datasets
Other datasets
This section lists all other relevant datasets that have not been annotated or made publicly available yet.
Speech datasets:
- BABEL (not yet available)
- Broadcast news, HUB4 (no noise and 4.5% speaker overlap, less than ETAPE)
- CIAIR In-Car Speech Database (availability unknown)
- Dyrholm/Sawada/Parra (about 1 min long)
- NEMISIG (unavailable)
- NFI-FRITS (unavailable)
- RATS (not yet available)
- Rich Transcription (RT) (dataset gathered from other sets, e.g. CHIL, ICSI, ISL, AMI...)
- Settlers of Catan (unannotated, more info)
- Flying MEMS microphone array (unannotated, more info)
Contribute a dataset
To contribute a new dataset, please
- create an account and login
- go to the section above corresponding to your type of dataset; if the table does not exist yet, you may create it
- click on the "Edit" link at the top of the table and add a new line for your dataset (the lines are ordered by year of release)
- fill all columns as much as possible, following the detailed list of attributes below the table
- click on the "Save page" link at the bottom of the page to save your modifications
We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.
Contribute a software baseline
To contribute a new software baseline, please
- create an account and login
- fill an entry for your software on the Software page, if not done yet
- go to the section above corresponding to the dataset for which your baseline was designed
- click on the "Edit" link at the top of the table and add a link to your software in the corresponding "links" cell
- click on the "Save page" link at the bottom of the page to save your modifications
We currently cannot provide storage space for large software. Please upload your software at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.
Contribute an evaluation result
To contribute a new research result, please
- create an account and login
- go to the section above corresponding to the dataset for which this result was obtained
- click on the "Edit" link at the top of the table and add a link to your result in the corresponding "links" cell
- make sure that the link (e.g., a paper or another webpage) contains the following information: authors, link to a paper/report containing objective evaluation results, link to derived data (output transcriptions, intermediary data, etc)
- click on the "Save page" link at the bottom of the page to save your modifications
In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).