Difference between revisions of "Datasets"
m |
m |
||
Line 47: | Line 47: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://spandh.dcs.shef.ac.uk/projects/shatrweb/ download] | |[http://spandh.dcs.shef.ac.uk/projects/shatrweb/ download] | ||
− | |||
[http://spandh.dcs.shef.ac.uk/projects/shatrweb/papers/ioa94.html paper] | [http://spandh.dcs.shef.ac.uk/projects/shatrweb/papers/ioa94.html paper] | ||
|{{no|0.6}} | |{{no|0.6}} | ||
Line 76: | Line 75: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html download] | |[https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html download] | ||
− | |||
|{{dunno}} | |{{dunno}} | ||
|{{some|12}} | |{{some|12}} | ||
Line 104: | Line 102: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/RWCP-SP96.html download] | |[http://research.nii.ac.jp/src/en/RWCP-SP96.html download] | ||
− | |||
[http://scitation.aip.org/content/asa/journal/jasa/100/4/10.1121/1.416338 paper] | [http://scitation.aip.org/content/asa/journal/jasa/100/4/10.1121/1.416338 paper] | ||
|{{yes|10}} | |{{yes|10}} | ||
Line 133: | Line 130: | ||
|{{some|free given TIDigits (0.5 k$)}} | |{{some|free given TIDigits (0.5 k$)}} | ||
|[http://aurora.hsnr.de/download.html download] | |[http://aurora.hsnr.de/download.html download] | ||
− | |||
[http://www.isca-speech.org/archive_open/asr2000/asr0_181.html paper] | [http://www.isca-speech.org/archive_open/asr2000/asr0_181.html paper] | ||
|{{yes|33}} | |{{yes|33}} | ||
Line 162: | Line 158: | ||
|{{no|7.4 k$}} | |{{no|7.4 k$}} | ||
|[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=SPINE purchase] | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=SPINE purchase] | ||
− | |||
[http://dl.acm.org/citation.cfm?id=1289199 paper] | [http://dl.acm.org/citation.cfm?id=1289199 paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 219: | Line 214: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/RWCP-SP01.html download] | |[http://research.nii.ac.jp/src/en/RWCP-SP01.html download] | ||
− | |||
[http://id.nii.ac.jp/1001/00057420/ paper] | [http://id.nii.ac.jp/1001/00057420/ paper] | ||
|{{some|3.5}} | |{{some|3.5}} | ||
Line 248: | Line 242: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] | |[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] | ||
− | |||
[http://www.lrec-conf.org/proceedings/lrec2000/html/summary/356.htm paper] | [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/356.htm paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 305: | Line 298: | ||
|{{some|free given WSJ0 (1.5 k$)}} | |{{some|free given WSJ0 (1.5 k$)}} | ||
|[http://aurora.hsnr.de/download.html download] | |[http://aurora.hsnr.de/download.html download] | ||
− | |||
[http://aurora.hsnr.de/aurora-4/reports.html paper] | [http://aurora.hsnr.de/aurora-4/reports.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 362: | Line 354: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://www.clemson.edu/ces/speech/cuave.htm download] | |[http://www.clemson.edu/ces/speech/cuave.htm download] | ||
− | |||
[http://asp.eurasipjournals.com/content/2002/11/208541 paper] | [http://asp.eurasipjournals.com/content/2002/11/208541 paper] | ||
|{{some|3}} | |{{some|3}} | ||
Line 391: | Line 382: | ||
|{{no|25 k$}} | |{{no|25 k$}} | ||
|[http://crss.utdallas.edu/ purchase] | |[http://crss.utdallas.edu/ purchase] | ||
− | |||
[http://www.isca-speech.org/archive/eurospeech_2001/e01_2023.html paper] | [http://www.isca-speech.org/archive/eurospeech_2001/e01_2023.html paper] | ||
|{{yes|286}} | |{{yes|286}} | ||
Line 420: | Line 410: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/CENSREC-1.html download] | |[http://research.nii.ac.jp/src/en/CENSREC-1.html download] | ||
− | |||
[http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15046/1/425.pdf paper] | [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15046/1/425.pdf paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 449: | Line 438: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://www.isle.illinois.edu/sst/AVICAR/ download] | |[http://www.isle.illinois.edu/sst/AVICAR/ download] | ||
− | |||
[http://www.isca-speech.org/archive/interspeech_2004/i04_2489.html paper] | [http://www.isca-speech.org/archive/interspeech_2004/i04_2489.html paper] | ||
|{{yes|29}} | |{{yes|29}} | ||
Line 478: | Line 466: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://www.idiap.ch/dataset/av16-3/ download] | |[http://www.idiap.ch/dataset/av16-3/ download] | ||
− | |||
[http://publications.idiap.ch/index.php/publications/show/353 paper] | [http://publications.idiap.ch/index.php/publications/show/353 paper] | ||
|{{some|1.5}} | |{{some|1.5}} | ||
Line 507: | Line 494: | ||
|{{no|2.8 k$}} | |{{no|2.8 k$}} | ||
|[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=ICSI purchase] | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=ICSI purchase] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1198793 paper] | [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1198793 paper] | ||
|{{yes|72}} | |{{yes|72}} | ||
Line 536: | Line 522: | ||
|{{no|5.5 k$}} | |{{no|5.5 k$}} | ||
|[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=NIST%20Meeting purchase] | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=NIST%20Meeting purchase] | ||
− | |||
[http://www.lrec-conf.org/proceedings/lrec2004/summaries/137.htm paper] | [http://www.lrec-conf.org/proceedings/lrec2004/summaries/137.htm paper] | ||
|{{yes|15}} | |{{yes|15}} | ||
Line 565: | Line 550: | ||
|{{no|3.5 k€}} | |{{no|3.5 k€}} | ||
|[http://catalog.elra.info/search.php purchase] | |[http://catalog.elra.info/search.php purchase] | ||
− | |||
[http://link.springer.com/article/10.1007%2Fs10579-007-9054-4 paper] | [http://link.springer.com/article/10.1007%2Fs10579-007-9054-4 paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 594: | Line 578: | ||
|{{no|75 k€ per lang}} | |{{no|75 k€ per lang}} | ||
|[http://catalog.elra.info/search.php purchase] | |[http://catalog.elra.info/search.php purchase] | ||
− | |||
[http://www.lrec-conf.org/proceedings/lrec2002/sumarios/177.htm paper] | [http://www.lrec-conf.org/proceedings/lrec2002/sumarios/177.htm paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 623: | Line 606: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/CENSREC-2.html download] | |[http://research.nii.ac.jp/src/en/CENSREC-2.html download] | ||
− | |||
[http://www.isca-speech.org/archive/interspeech_2006/i06_1726.html paper] | [http://www.isca-speech.org/archive/interspeech_2006/i06_1726.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 652: | Line 634: | ||
|{{some|21 k¥}} | |{{some|21 k¥}} | ||
|[http://research.nii.ac.jp/src/en/CENSREC-3.html purchase] | |[http://research.nii.ac.jp/src/en/CENSREC-3.html purchase] | ||
− | |||
[http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15050/1/429.pdf paper] | [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15050/1/429.pdf paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 681: | Line 662: | ||
|{{some|free given TIDigits (0.5 k$)}} | |{{some|free given TIDigits (0.5 k$)}} | ||
|[http://aurora.hsnr.de/download.html download] | |[http://aurora.hsnr.de/download.html download] | ||
− | |||
[http://aurora.hsnr.de/aurora-5/reports.html paper] | [http://aurora.hsnr.de/aurora-5/reports.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 710: | Line 690: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://groups.inf.ed.ac.uk/ami/ download] | |[http://groups.inf.ed.ac.uk/ami/ download] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4538700 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4538700 paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 738: | Line 717: | ||
|{{no}} | |{{no}} | ||
|{{yes|free}} | |{{yes|free}} | ||
− | |[ | + | |[http://staffwww.dcs.shef.ac.uk/people/M.Cooke/SpeechSeparationChallenge.htm download] |
[http://www.sciencedirect.com/science/article/pii/S0885230809000205 paper] | [http://www.sciencedirect.com/science/article/pii/S0885230809000205 paper] | ||
|{{some|8.8}} | |{{some|8.8}} | ||
Line 767: | Line 746: | ||
|{{some|0.05 k€}} | |{{some|0.05 k€}} | ||
|[http://catalog.elra.info/product_info.php?products_id=1088&language=en purchase] | |[http://catalog.elra.info/product_info.php?products_id=1088&language=en purchase] | ||
− | |||
[http://cvsp.cs.ntua.gr/projects/pub/HIWIRE/WebHome/HIWIRE_db_description_paper.pdf paper] | [http://cvsp.cs.ntua.gr/projects/pub/HIWIRE/WebHome/HIWIRE_db_description_paper.pdf paper] | ||
|{{yes|21}} | |{{yes|21}} | ||
Line 796: | Line 774: | ||
|{{no|25 k$}} | |{{no|25 k$}} | ||
|[http://crss.utdallas.edu/ download] | |[http://crss.utdallas.edu/ download] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4290175 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4290175 paper] | ||
|{{yes|40}} | |{{yes|40}} | ||
Line 825: | Line 802: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] | |[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] | ||
− | |||
[http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | ||
|{{no|0.3}} | |{{no|0.3}} | ||
Line 854: | Line 830: | ||
|{{some|1.5 k$}} | |{{some|1.5 k$}} | ||
|[https://catalog.ldc.upenn.edu/LDC2014S03 purchase] | |[https://catalog.ldc.upenn.edu/LDC2014S03 purchase] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=1566470 paper] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6639033 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=1566470 paper] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6639033 paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 883: | Line 858: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/CENSREC-4.html download] | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] | ||
− | |||
[http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] | [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 912: | Line 886: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://research.nii.ac.jp/src/en/CENSREC-4.html download] | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] | ||
− | |||
[http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] | [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] | ||
|{{dunno}} | |{{dunno}} | ||
Line 941: | Line 914: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://shine.fbk.eu/resources/dicit-acoustic-woz-data download] | |[http://shine.fbk.eu/resources/dicit-acoustic-woz-data download] | ||
− | |||
[http://www.lrec-conf.org/proceedings/lrec2008/summaries/584.html paper] | [http://www.lrec-conf.org/proceedings/lrec2008/summaries/584.html paper] | ||
|{{some|1}} | |{{some|1}} | ||
Line 969: | Line 941: | ||
|{{no}} | |{{no}} | ||
|{{yes|free}} | |{{yes|free}} | ||
− | |[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download | + | |[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download] |
[http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | ||
|{{some|1.9}} | |{{some|1.9}} | ||
Line 998: | Line 970: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://melodi.ee.washington.edu/cosine/ download] | |[http://melodi.ee.washington.edu/cosine/ download] | ||
− | |||
[http://www.sciencedirect.com/science/article/pii/S0885230811000143 paper] | [http://www.sciencedirect.com/science/article/pii/S0885230811000143 paper] | ||
|{{yes|11}} | |{{yes|11}} | ||
Line 1,027: | Line 998: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise download] | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise download] | ||
− | |||
[http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | ||
|{{no|0.3}} | |{{no|0.3}} | ||
Line 1,055: | Line 1,025: | ||
|{{no}} | |{{no}} | ||
|{{yes|free}} | |{{yes|free}} | ||
− | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions download | + | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions download] |
[http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] | ||
|{{no|0.2}} | |{{no|0.2}} | ||
Line 1,084: | Line 1,054: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html download] | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html download] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] | ||
|{{yes|12}} | |{{yes|12}} | ||
Line 1,113: | Line 1,082: | ||
|{{some|free given WSJ0 (1.5 k$)}} | |{{some|free given WSJ0 (1.5 k$)}} | ||
|[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html download] | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html download] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] | ||
|{{yes|33}} | |{{yes|33}} | ||
Line 1,141: | Line 1,109: | ||
|{{some|1}} | |{{some|1}} | ||
|{{dunno}} | |{{dunno}} | ||
− | |[ | + | |[http://www.afcp-parole.org/etape.html download] |
[http://www.lrec-conf.org/proceedings/lrec2012/summaries/495.html paper] | [http://www.lrec-conf.org/proceedings/lrec2012/summaries/495.html paper] | ||
|{{yes|32}} | |{{yes|32}} | ||
Line 1,170: | Line 1,138: | ||
|{{no|3.5 k$}} | |{{no|3.5 k$}} | ||
|[https://catalog.ldc.upenn.edu/LDC2013S04 purchase] | |[https://catalog.ldc.upenn.edu/LDC2013S04 purchase] | ||
− | |||
|{{yes|108}} | |{{yes|108}} | ||
|{{dunno}} | |{{dunno}} | ||
Line 1,198: | Line 1,165: | ||
|{{no|7 k$}} | |{{no|7 k$}} | ||
|[https://catalog.ldc.upenn.edu/LDC2013S02 purchase] | |[https://catalog.ldc.upenn.edu/LDC2013S02 purchase] | ||
− | |||
|{{yes|234}} | |{{yes|234}} | ||
|{{dunno}} | |{{dunno}} | ||
Line 1,226: | Line 1,192: | ||
|{{some|free given WSJCAM0 (1.75 k$)}} | |{{some|free given WSJCAM0 (1.75 k$)}} | ||
|[http://reverb2014.dereverberation.com/ purchase] | |[http://reverb2014.dereverberation.com/ purchase] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6701894 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6701894 paper] | ||
|{{yes|25}} | |{{yes|25}} | ||
Line 1,255: | Line 1,220: | ||
|{{yes|free}} | |{{yes|free}} | ||
|[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] | |[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] | ||
− | |||
[http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6843271 paper] | [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6843271 paper] | ||
|{{some|1.3}} | |{{some|1.3}} |
Revision as of 22:48, 8 August 2014
This page aims to provide a list of datasets with detailed attributes and links to corresponding research results (papers, numerical results, output transcriptions, intermediary data, etc). Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation...
Disclaimer: Only publicly available datasets with a total duration longer than 5 min are listed.
Datasets | General attributes | Speech | Channel | Noise | Ground truth | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
year | use case | total time (h) | sam. rate (kHz) | dist. or noisy mics | video cams | cost | links | speak. time (h) | uniq. speak. | lang. | uniq. words (k) | speak. style | speak. / rec. | overl. type | chan. type | speak. radiat. | speak. loc. | speak. moves | noise type | ref. signal | speak. loc., orient. | words | non- verb. traits | noise events | |
ShATR | 1994 | meeting | 0.6 | 48 | 3 | no | free | download | 0.6 | 5 | UK English | 1 | spontaneous | 5 | multiple dialogs | reverb | human | quasi-fixed | head | meeting | headset | yes | yes | no | yes |
LLSEC | 1996 | dialog | 1.4 | 16 | 4 | no | free | download | ? | 12 | N/S | N/S | read, spontaneous | 2 | dialog | reverb | human | quasi-fixed | head | hallway, restaurant (scenarized) | no | yes | no | no | no |
RWCP Spoken Dialog Corpus | 1996 - 1997 | dialog | 10 | 16 | 2 | no | free | download | 10 | 39 | Japanese | ? | spontaneous | 1 - 2 | dialog | reverb (low) | human | quasi-fixed | head | stationary background | no | no | yes | no | no |
Aurora-2 | 2000 | public spaces | 33 | 8 - 16 | 1 | no | free given TIDigits (0.5 k$) | download | 33 | 214 | US English | 0.01 | digits | 1 | no | simulated phone | human | N/S | no | various real environments | original | N/S | yes | no | yes |
SPINE1, SPINE2 | 2000 - 2001 | military | 38 | 16 | 2 | no | 7.4 k$ | purchase | ? | 100 | US English | 1 | command, spontaneous | 1 - 2 | no | simulated radio | human | quasi-fixed | head | military | no | no | yes | no | no |
Aurora-3 (subset of SpeechDat- Car) | 2000 - 2003 | car | ? | 16 | 4 | no | 1 k€ | purchase | ? | ? | various | ? | digits, command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
RWCP Meeting Speech Corpus | 2001 | meeting | 3.5 | 16 - 48 | 1 | 3 | free | download | 3.5 | ? | Japanese | ? | spontaneous | 1 - 5 | meeting | reverb (low) | human | quasi-fixed | head | stationary background | headset | no | yes | no | no |
RWCP Real Environment Speech and Acoustic Database | 2001 | domestic, office | ? | 16 - 48 | 30 | no | free | download | ? | 5 | Japanese | ? | read | 1 | no | real rir, reverb | loudspeaker | various | no, pivoting arm | stationary background | original | yes | yes | no | yes |
SpeechDat- Car | 2001 - 2011 | car | ? | 16 | 4 | no | 39 - 182 k€ per lang | purchase | ? | 300 per lang | various | ? | digits, command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
Aurora-4 | 2002 | public spaces | ? | 8 - 16 | 1 | no | free given WSJ0 (1.5 k$) | download | ? | 101 | US English | 10 | read | 1 | no | simulated phone | human | N/S | no | various real environments | original | N/S | yes | no | yes |
TED | 2002 | seminar | 47 | 16 | 1 | no | 0.5 k$ | purchase | 47 | 188 | non-native English | ? | lecture | 1 or more | seminar | reverb | human | quasi-fixed | head | stationary background | lapel | no | partial | no | no |
CUAVE | 2002 | cocktail party | 3 | 44 | 1 | 1 | free | download | 3 | 36 | US English | 0.01 | digits | 1 - 2 | full | reverb | human | quasi-fixed | head | stationary background | no | no | yes | no | no |
CU-Move Microphone Array Data | 2002 - 2011 | car | 286 | 44 | 6 - 8 | no | 25 k$ | purchase | 286 | 172 | US English | 12 | digits, command, read, dialog | 1 | no | reverb | human | quasi-fixed | head | car | no | no | yes | no | no |
CENSREC-1 (Aurora-2J) | 2003 | public spaces | ? | 8 | 1 | no | free | download | ? | 214 | Japanese | 0.01 | digits | 1 | no | simulated phone | human | N/S | no | various real environments | original | N/S | yes | no | yes |
AVICAR | 2004 | car | 29 | 16 | 7 | 4 | free | download | 29 | 86 | US English, non-native English | 1 | read | 1 | no | reverb | human | quasi-fixed | head | car | no | no | yes | no | no |
AV16.3 | 2004 | meeting | 1.5 | 16 | 16 | 3 | free | download | 1.5 | 12 | N/S | N/S | spontaneous | 1 - 3 | full | reverb | human | various | walk | stationary background | no | yes | no | no | no |
ICSI Meeting Corpus | 2004 | meeting | 72 | 16 | 6 | no | 2.8 k$ | purchase | 72 | 53 | US English | 13 | meeting | 3 - 10 | meeting | reverb | human | quasi-fixed | head | stationary background | headset, lapel | no | yes | yes | no |
NIST Meeting Pilot Corpus Speech | 2004 | meeting | 15 | 16 | 7 | no | 5.5 k$ | purchase | 15 | 61 | US English | 6 | meeting | 3 - 9 | meeting | reverb | human | various | walk | stationary background | headset, lapel | no | yes | no | no |
CHIL Meetings | 2004 - 2007 | seminar, meeting | 60 | 44 | 79 - 147 | 6 - 9 | 3.5 k€ | purchase | ? | ? | non-native English | ? | seminar, meeting | 3 - 20 | seminar, meeting | reverb | human | quasi-fixed | head | meeting (scenarized) | headset | yes | yes | yes | no |
SPEECON | 2004 - 2011 | public space, domestic, office, car | ? | 16 | 3 | no | 75 k€ per lang | purchase | ? | 600 per lang | various | ? | command, read, spontaneous | 1 | no | reverb | human | quasi-fixed | head | various real environments | headset | no | yes | no | no |
CENSREC-2 | 2005 | car | ? | 16 | 1 | no | free | download | ? | 214 | Japanese | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
CENSREC-3 | 2005 | car | ? | 16 | 1 | no | 21 k¥ | purchase | ? | 311 | Japanese | 0.05 | read | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
Aurora-5 | 2006 | public spaces, domestic, office, car | ? | 8 | 1 | no | free given TIDigits (0.5 k$) | download | ? | 225 | US English | 0.01 | digits | 1 | no | no, simulated rir, real rir | loudspeaker | N/S | no | various real environments | original | no | yes | no | yes |
AMI | 2006 | meeting | 100 | 16 | 16 | 6 | free | download | ? | 189 | UK English | 8 | meeting | 4 | meeting (18% overlap) | reverb | human | quasi-fixed | head | stationary background | headset, lapel | yes | yes | yes | no |
PASCAL SSC | 2006 | cocktail party | 8.8 | 25 | 1 | no | free | download | 8.8 | 34 | UK English | 0.05 | command | 2 | full | no | human | N/S | no | no | original | N/S | yes | no | no |
HIWIRE | 2007 | airplane | 21 | 16 | 1 | no | 0.05 k€ | purchase | 21 | 81 | non-native English | 0.1 | command | 1 | no | no | human | N/S | head | airplane | original | N/S | yes | no | no |
UT-Drive | 2007 | car | 40 | 25 | 5 | 2 | 25 k$ | download | 40 | 25 | US English | 2.4 | command, dialog | 1 - 2 | dialog | reverb | human | quasi-fixed | head | car | headset (low quality) | no | partial | no | no |
SASSEC, SiSEC under- determined | 2007 - 2011 | cocktail party | 0.3 | 16 | 2 | no | free | download | 0.3 | 16 | N/S | N/S | read | 3 - 4 | full | simulated rir, real rir, reverb | no, loudspeaker | fixed | no | no | original, spatial image | yes | no | no | no |
MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData | 2007 - 2014 | cocktail party | 10 | 16 | 8 - 40 | no | 1.5 k$ | purchase | ? | 45 | UK English | 10 | read | 1 - 2 | full | reverb | human | various | walk | stationary background | headset, lapel | yes | yes | no | no |
CENSREC-4 (Simulated) | 2008 | public spaces, domestic, office, car | ? | 16 | 1 | no | free | download | ? | 214 | Japanese | 0.01 | digits | 1 | no | real rir | dummy | fixed | no | various real environments | original | no | yes | no | yes |
CENSREC-4 (Real) | 2008 | public spaces, domestic, office, car | ? | 16 | 1 | no | free | download | ? | 10 | Japanese | 0.01 | digits | 1 | no | reverb | human | quasi-fixed | head | various real environments | headset | no | yes | no | yes |
DICIT | 2008 | domestic | 6 | 48 | 16 | 2 | free | download | 1 | ? | Italian | ? | command | 4 | no | reverb | human | various | walk | domestic (scenarized) | headset, tv | yes | yes | no | yes |
SiSEC head-geometry | 2008 | cocktail party | 1.9 | 16 | 2 | no | free | download | 1.9 | ? | N/S | N/S | read | 2 | full | real rir | loudspeaker | various | no | no | original, spatial image | yes | no | no | no |
COSINE | 2009 | dialog | 38 | 48 | 20 | no | free | download | 11 | 91 | US English, non-native English | 5 | spontaneous | 2 - 7 | dialog | reverb | human | various | walk | various real environments | headset, throat mic | no | yes | no | no |
SiSEC real-world noise | 2010 | public spaces | 0.3 | 16 | 2 - 4 | no | free | download | 0.3 | 6 | N/S | N/S | read | 1 - 3 | full | no, reverb (other room) | loudspeaker | various | no | various real environments | original, spatial image | yes | no | no | no |
SiSEC dynamic | 2010 - 2011 | cocktail party | 0.2 | 16 | 2 - 4 | no | free | download | 0.2 | ? | N/S | N/S | read | many but only 2 simultaneous | full | reverb | loudspeaker | various | simulated | no | original, spatial image | yes | no | no | no |
CHiME 1, CHiME 2 Grid | 2011 - 2012 | domestic | 70 | 16 - 48 | 2 | no | free | download | 12 | 34 | UK English | 0.05 | command | 1 | no | real rir | dummy | quasi-fixed | simulated head | domestic | yes | yes | yes | no | no |
CHiME 2 WSJ0 | 2012 | domestic | 78 | 16 | 2 | no | free given WSJ0 (1.5 k$) | download | 33 | 101 | US English | 11 | read | 1 | no | real rir | dummy | fixed | no | domestic | yes | yes | yes | no | no |
ETAPE | 2012 | TV/radio debates, outdoor interviews | 42 | 16 | 1 | 1 | ? | download | 32 | 347 | French | 16 | spontaneous | 1 or more | dialog (up to 10% overlap) | reverb (some) | human | quasi-fixed | head | various real environments | no | N/S | yes | no | yes |
GALE (Chinese broadcast conversation) | 2013 | TV dialog | 120 | 16 | 1 | no | 3.5 k$ | purchase | 108 | ? | Mandarin | ? | spontaneous | 1 or more | dialog | no | human | quasi-fixed | head | no | no | N/S | yes | no | no |
GALE (Arabic broadcast conversation) | 2013 | TV dialog | 251 | 16 | 1 | no | 7 k$ | purchase | 234 | ? | Arabic | ? | spontaneous | 1 or more | dialog | no | human | quasi-fixed | head | no | no | N/S | yes | no | no |
REVERB SimData | 2013 | domestic, office | 25 | 16 | 8 | no | free given WSJCAM0 (1.75 k$) | purchase | 25 | 130 | UK English | 10 | read | 1 | no | real rir | loudspeaker | fixed | no | stationary background | original, spatial image | yes | yes | no | yes |
DIRHA | 2014 | domestic | 3.8 | 48 | 40 | no | free | download | 1.3 | 30 | various | ? | command, read, spontaneous | 1 or more | simulated | real rir | loudspeaker | various | no | domestic (sum of events) | yes | yes | yes | no | yes |
Contents
Automatic speech recognition
1st CHiME Challenge (2011)
Artificially distorted version of the small vocabulary GRID audio-visual corpus (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions. Clean-training, noisy-training, development and evaluation sets available, see
- Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.
Available from Computer Speech and Language here
Corpus available here (no cost)
Resources
Baselines
- See the paper above for results for a wide range of techniques.
AURORA 5 (2007)
Artificially distorted version of the digits TI-DIGITS corpus. Additive noise and additive noise plus reverberant speech sets. Variable SNR range. Various mixed training sets, no evaluation set, see
- G. Hirsch "Aurora-5 Experimental Framework for the Performance Evaluation of Speech Recognition in Case of a Hands-free Speech Input in Noisy Environments", Niederrhein University of Applied Sciences, 2007.
Paper available online here (no cost)
Corpus available from LDC here
Resources
- Training recipe for HTK is provided with the corpora.
Baselines
- Reproducible baseline: The above cited paper includes a baseline for the ETSI Advanced Front-End.
AURORA 4 (2002)
Artificially distorted version of the 5K word Wall Street Journal corpus (WSJ0). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone. Clean-training, mixed-training, noisy training and test sets available. No evaluation set, see
- G. Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.
Paper available with the corpus.
Corpora available from ELRA here and here
Resources
- Training recipe for HTK available here. Note that this recipe is for Wall-Street Journal (WSJ0), which is the clean speech version of AURORA4. Small changes are needed in the feature extraction scripts to account for different file terminations.
Speaker identification and verification
Speech enhancement and separation
Other applications
Contribute a dataset
To contribute a new dataset, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your dataset (the datasets are ordered by year of collection)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the dataset and year of collection
- authors, institution, contact information
- link to the dataset and to side resources (lexicon, language model, etc)
- short description (nature of the data, license, etc) and link to a paper/report describing the dataset, if any
- at least 1 research result obtained for this dataset (see below)
We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.
Contribute a research result
To contribute a new research result, please
- create an account and login
- go to the wiki page and the section corresponding to the dataset for which this result was obtained
- click on the "Edit" link on the right of the section header and add a new item for your result
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- authors, paper/report title, means of publication
- link to the pdf of the paper
- link to derived data (output transcriptions, intermediary data, etc)
- Code and instructions to reproduce experiments (if available)
In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).
We currently cannot provide storage space for large datasets. Please upload the derived data at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.