Difference between revisions of "Datasets"
m |
|||
Line 5: | Line 5: | ||
{| class="wikitable sortable" style="font-size:85%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | {| class="wikitable sortable" style="font-size:85%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | ||
|- | |- | ||
− | !style="width: | + | !style="width: 50em" rowspan="2" |Datasets |
− | !colspan=" | + | !colspan="8" |General attributes |
!colspan="7" |Speech | !colspan="7" |Speech | ||
!colspan="4" |Channel | !colspan="4" |Channel | ||
Line 16: | Line 16: | ||
!scope="col" width="50px" | total duration | !scope="col" width="50px" | total duration | ||
!scope="col" width="50px" | sampling rate | !scope="col" width="50px" | sampling rate | ||
− | !scope="col" width="50px" | | + | !scope="col" width="50px" | degraded channels |
!scope="col" width="50px" | cameras | !scope="col" width="50px" | cameras | ||
!scope="col" width="50px" | cost | !scope="col" width="50px" | cost | ||
− | !scope="col" width="50px" | | + | !scope="col" width="50px" | links |
− | |||
− | |||
!scope="col" width="50px" | speech duration | !scope="col" width="50px" | speech duration | ||
!scope="col" width="50px" | unique speakers | !scope="col" width="50px" | unique speakers | ||
Line 48: | Line 46: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://spandh.dcs.shef.ac.uk/projects/shatrweb/ | + | |[http://spandh.dcs.shef.ac.uk/projects/shatrweb/ download] [mailto:g.brown@dcs.shef.ac.uk email] [http://spandh.dcs.shef.ac.uk/projects/shatrweb/papers/ioa94.html paper] |
− | |||
− | |||
|37 min | |37 min | ||
|5 | |5 | ||
Line 77: | Line 73: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html | + | |[https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html download] [mailto:jpc@ll.mit.edu email] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|12 | |12 | ||
Line 106: | Line 100: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/RWCP-SP96.html | + | |[http://research.nii.ac.jp/src/en/RWCP-SP96.html download] [mailto:src@nii.ac.jp email] [http://scitation.aip.org/content/asa/journal/jasa/100/4/10.1121/1.416338 paper] |
− | |||
− | |||
|10 h | |10 h | ||
|39 | |39 | ||
Line 135: | Line 127: | ||
|{{no}} | |{{no}} | ||
|TIDigits | |TIDigits | ||
− | |http://aurora.hsnr.de/download.html | + | |[http://aurora.hsnr.de/download.html download] [mailto:hans-guenter.hirsch@hs-niederrhein.de email] [http://www.isca-speech.org/archive_open/asr2000/asr0_181.html paper] |
− | |||
− | |||
|33 h | |33 h | ||
|214 | |214 | ||
Line 164: | Line 154: | ||
|{{no}} | |{{no}} | ||
|2 x ($800 (audio) + $500 (transcripts)) + 3 x ($1000 (audio) + $600 (transcripts)) | |2 x ($800 (audio) + $500 (transcripts)) + 3 x ($1000 (audio) + $600 (transcripts)) | ||
− | |https://catalog.ldc.upenn.edu/ | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=SPINE purchase] [mailto:jdwright@ldc.upenn.edu email] [http://dl.acm.org/citation.cfm?id=1289199 paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|100 | |100 | ||
Line 193: | Line 181: | ||
|{{no}} | |{{no}} | ||
|5 x 200 (Academics) / 5 x 1,000 (Companies) | |5 x 200 (Academics) / 5 x 1,000 (Companies) | ||
− | |http://catalog.elra.info/index.php?cPath=37_40 | + | |[http://catalog.elra.info/index.php?cPath=37_40 purchase] [http://aurora.hsnr.de/aurora-3/reports.html papers] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|{{dunno}} | |{{dunno}} | ||
Line 222: | Line 208: | ||
|3 | |3 | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/RWCP-SP01.html | + | |[http://research.nii.ac.jp/src/en/RWCP-SP01.html download] [mailto:src@nii.ac.jp email] [http://id.nii.ac.jp/1001/00057420/ paper] |
− | |||
− | |||
|3.5 h | |3.5 h | ||
|{{dunno}} | |{{dunno}} | ||
Line 251: | Line 235: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/RWCP-SSD.html | + | |[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] [mailto:s-nakamura@is.naist.jp email] [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/356.htm paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|5 | |5 | ||
Line 280: | Line 262: | ||
|{{no}} | |{{no}} | ||
|1.1 Million for all 10 languages. Each costs 39k to 182k | |1.1 Million for all 10 languages. Each costs 39k to 182k | ||
− | |http://catalog.elra.info/ | + | |[http://catalog.elra.info/search.php purchase] [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/373.htm paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|300/language | |300/language | ||
Line 309: | Line 289: | ||
|{{no}} | |{{no}} | ||
|WSJ0 | |WSJ0 | ||
− | |http://aurora.hsnr.de/download.html | + | |[http://aurora.hsnr.de/download.html download] [mailto:hans-guenter.hirsch@hs-niederrhein.de email] [http://aurora.hsnr.de/aurora-4/reports.html paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|101 | |101 | ||
Line 338: | Line 316: | ||
|{{no}} | |{{no}} | ||
|$275 (audio) + $250 (transcripts) | |$275 (audio) + $250 (transcripts) | ||
− | |https://catalog.ldc.upenn.edu/LDC2002S04 | + | |[https://catalog.ldc.upenn.edu/LDC2002S04 purchase] [http://perso.limsi.fr/lamel/icslp94ted.pdf paper] |
− | |||
− | |||
|47 h | |47 h | ||
|188 | |188 | ||
Line 367: | Line 343: | ||
|1 | |1 | ||
|free | |free | ||
− | |http://www.clemson.edu/ces/speech/cuave.htm | + | |[http://www.clemson.edu/ces/speech/cuave.htm download] [mailto:ksampat@clemson.edu email] [http://asp.eurasipjournals.com/content/2002/11/208541 paper] |
− | |||
− | |||
|3 h | |3 h | ||
|36 | |36 | ||
Line 396: | Line 370: | ||
|{{no}} | |{{no}} | ||
|$25k with UT-Drive | |$25k with UT-Drive | ||
− | |http://crss.utdallas.edu/ | + | |[http://crss.utdallas.edu/ purchase] [mailto:john.hansen@utdallas.edu email] [http://www.isca-speech.org/archive/eurospeech_2001/e01_2023.html paper] |
− | |||
− | |||
|286 h | |286 h | ||
|172 | |172 | ||
Line 425: | Line 397: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/CENSREC-1.html | + | |[http://research.nii.ac.jp/src/en/CENSREC-1.html download] [mailto:s-nakamura@is.naist.jp email] [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15046/1/425.pdf paper] |
− | |||
− | |||
| | | | ||
|214 | |214 | ||
Line 454: | Line 424: | ||
|4 | |4 | ||
|free | |free | ||
− | |http://www.isle.illinois.edu/sst/AVICAR/ | + | |[http://www.isle.illinois.edu/sst/AVICAR/ download] [mailto:jhasegaw@illinois.edu email] [http://www.isca-speech.org/archive/interspeech_2004/i04_2489.html paper] |
− | |||
− | |||
|29 h | |29 h | ||
|86 | |86 | ||
Line 483: | Line 451: | ||
|3 | |3 | ||
|free | |free | ||
− | |http://www.idiap.ch/dataset/av16-3/ | + | |[http://www.idiap.ch/dataset/av16-3/ download] [mailto:odobez@idiap.ch email] [http://publications.idiap.ch/index.php/publications/show/353 paper] |
− | |||
− | |||
|1.5 h | |1.5 h | ||
|12 | |12 | ||
Line 512: | Line 478: | ||
|{{no}} | |{{no}} | ||
|$1900 (audio) + $900 (transcripts) | |$1900 (audio) + $900 (transcripts) | ||
− | |https://catalog.ldc.upenn.edu/ | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=ICSI purchase] [mailto:mrcontact@icsi.berkeley.edu email] [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1198793 paper] |
− | |||
− | |||
|72 h | |72 h | ||
|53 | |53 | ||
Line 541: | Line 505: | ||
|{{no}} (released but not currently available for download) | |{{no}} (released but not currently available for download) | ||
|$4000 (audio) + $1500 (transcripts) | |$4000 (audio) + $1500 (transcripts) | ||
− | |https://catalog.ldc.upenn.edu/ | + | |[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=NIST%20Meeting purchase] [mailto:john.garofolo@nist.gov email] [http://www.lrec-conf.org/proceedings/lrec2004/summaries/137.htm paper] |
− | |||
− | |||
|15 h | |15 h | ||
|61 | |61 | ||
Line 570: | Line 532: | ||
|6 to 9 | |6 to 9 | ||
|3 500 | |3 500 | ||
− | |http://catalog.elra.info/search.php | + | |[http://catalog.elra.info/search.php purchase] [mailto:choukri@elda.org email] [http://link.springer.com/article/10.1007%2Fs10579-007-9054-4 paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|{{dunno}} | |{{dunno}} | ||
Line 599: | Line 559: | ||
|{{no}} | |{{no}} | ||
|29 x 75000 for all languages | |29 x 75000 for all languages | ||
− | |http://catalog.elra.info/ | + | |[http://catalog.elra.info/search.php purchase] [mailto:diskra@appen.com email] [http://www.lrec-conf.org/proceedings/lrec2002/sumarios/177.htm paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|600/language | |600/language | ||
Line 628: | Line 586: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/CENSREC-2.html | + | |[http://research.nii.ac.jp/src/en/CENSREC-2.html download] [mailto:src@nii.ac.jp email] [http://www.isca-speech.org/archive/interspeech_2006/i06_1726.html paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|214 | |214 | ||
Line 657: | Line 613: | ||
|{{no}} | |{{no}} | ||
|free except phonetically balanced training set: JPY 21000 (Universities) / JPY 105000 (Companies) | |free except phonetically balanced training set: JPY 21000 (Universities) / JPY 105000 (Companies) | ||
− | |http://research.nii.ac.jp/src/en/CENSREC-3.html | + | |[http://research.nii.ac.jp/src/en/CENSREC-3.html purchase] [mailto:src@nii.ac.jp email] [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15050/1/429.pdf paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|18 (+293 in training) | |18 (+293 in training) | ||
Line 686: | Line 640: | ||
|{{no}} | |{{no}} | ||
|TIDigits | |TIDigits | ||
− | |http://aurora.hsnr.de/download.html | + | |[http://aurora.hsnr.de/download.html download] [mailto:hans-guenter.hirsch@hs-niederrhein.de email] [http://aurora.hsnr.de/aurora-5/reports.html paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|225 | |225 | ||
Line 715: | Line 667: | ||
|6 | |6 | ||
|free | |free | ||
− | |http://groups.inf.ed.ac.uk/ami/ | + | |[http://groups.inf.ed.ac.uk/ami/ download] [mailto:amicorpus@amiproject.org email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4538700 paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|189 | |189 | ||
Line 744: | Line 694: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | | | + | |[mailto:m.cooke@ikerbasque.org email] [http://www.sciencedirect.com/science/article/pii/S0885230809000205 paper] |
− | |||
− | |||
|18.5 min (+ 8.5h clean training data) | |18.5 min (+ 8.5h clean training data) | ||
|34 | |34 | ||
Line 773: | Line 721: | ||
|{{no}} | |{{no}} | ||
|50 | |50 | ||
− | |http://catalog.elra.info/product_info.php?products_id=1088&language=en | + | |[http://catalog.elra.info/product_info.php?products_id=1088&language=en purchase] [mailto:segura@ugr.es email] [http://cvsp.cs.ntua.gr/projects/pub/HIWIRE/WebHome/HIWIRE_db_description_paper.pdf paper] |
− | |||
− | |||
|21 h | |21 h | ||
|81 | |81 | ||
Line 802: | Line 748: | ||
|2 | |2 | ||
|$25k with CU-Move | |$25k with CU-Move | ||
− | |http://crss.utdallas.edu/ | + | |[http://crss.utdallas.edu/ download] [mailto:john.hansen@utdallas.edu email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4290175 paper] |
− | |||
− | |||
|40 h | |40 h | ||
|25 (more exist but not included in latest release 3.0) | |25 (more exist but not included in latest release 3.0) | ||
Line 831: | Line 775: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures | + | |[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] [mailto:araki.shoko@lab.ntt.co.jp email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
− | |||
− | |||
|19 min | |19 min | ||
|16 | |16 | ||
Line 860: | Line 802: | ||
|{{no}} | |{{no}} | ||
|$1 500 | |$1 500 | ||
− | |https://catalog.ldc.upenn.edu/LDC2014S03 | + | |[https://catalog.ldc.upenn.edu/LDC2014S03 purchase] [mailto:mike.lincoln@quoratetechnology.com email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=1566470 paper] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6639033 paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|45 | |45 | ||
Line 889: | Line 829: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/CENSREC-4.html | + | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] [mailto:src@nii.ac.jp email] [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|214 | |214 | ||
Line 918: | Line 856: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://research.nii.ac.jp/src/en/CENSREC-4.html | + | |[http://research.nii.ac.jp/src/en/CENSREC-4.html download] [mailto:src@nii.ac.jp email] [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper] |
− | |||
− | |||
|{{dunno}} | |{{dunno}} | ||
|10 | |10 | ||
Line 947: | Line 883: | ||
|2 | |2 | ||
|free | |free | ||
− | |http://shine.fbk.eu/resources/dicit-acoustic-woz-data | + | |[http://shine.fbk.eu/resources/dicit-acoustic-woz-data download] [mailto:omologo@fbk.eu email] [http://www.lrec-conf.org/proceedings/lrec2008/summaries/584.html paper] |
− | |||
− | |||
|1 h | |1 h | ||
|{{dunno}} | |{{dunno}} | ||
Line 976: | Line 910: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions | + | |[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download] [mailto:hendrik.kayser@uni-oldenburg.de email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
− | |||
− | |||
|1.9 h | |1.9 h | ||
|{{dunno}} | |{{dunno}} | ||
Line 1,005: | Line 937: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://melodi.ee.washington.edu/cosine/ | + | |[http://melodi.ee.washington.edu/cosine/ download] [mailto:cosine@melodi.ee.washington.edu email] [http://www.sciencedirect.com/science/article/pii/S0885230811000143 paper] |
− | |||
− | |||
|11 h | |11 h | ||
|91 | |91 | ||
Line 1,034: | Line 964: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise | + | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise download] [mailto:ito.nobutaka@lab.ntt.co.jp email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
− | |||
− | |||
|20 min | |20 min | ||
|6 | |6 | ||
Line 1,063: | Line 991: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions | + | |[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions download] [mailto:francesco.nesta@gmail.com email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper] |
− | |||
− | |||
|11 min | |11 min | ||
|{{dunno}} | |{{dunno}} | ||
Line 1,092: | Line 1,018: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html | + | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html download] [mailto:emmanuel.vincent@inria.fr email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] |
− | |||
− | |||
|12 h | |12 h | ||
|34 | |34 | ||
Line 1,121: | Line 1,045: | ||
|{{no}} | |{{no}} | ||
|WSJ0 | |WSJ0 | ||
− | |http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html | + | |[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html download] [mailto:francesco.nesta@gmail.com email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper] |
− | |||
− | |||
|33 h | |33 h | ||
|101 | |101 | ||
Line 1,150: | Line 1,072: | ||
|1 | |1 | ||
|{{dunno}} | |{{dunno}} | ||
− | | | + | |[mailto:guillaume.gravier@irisa.fr email] [http://www.lrec-conf.org/proceedings/lrec2012/summaries/495.html paper] |
− | |||
− | |||
|32 h | |32 h | ||
|347 | |347 | ||
Line 1,179: | Line 1,099: | ||
|{{no}} | |{{no}} | ||
|$2000 (audio) + $1500 (transcripts) | |$2000 (audio) + $1500 (transcripts) | ||
− | |https://catalog.ldc.upenn.edu/LDC2013S04 | + | |[https://catalog.ldc.upenn.edu/LDC2013S04 purchase] [mailto:strassel@ldc.upenn.edu email] |
− | |||
− | |||
|108 h | |108 h | ||
|{{dunno}} | |{{dunno}} | ||
Line 1,208: | Line 1,126: | ||
|{{no}} | |{{no}} | ||
|2 x [$2000 (audio) + $1500 (transcripts)] | |2 x [$2000 (audio) + $1500 (transcripts)] | ||
− | |https://catalog.ldc.upenn.edu/LDC2013S02 | + | |[https://catalog.ldc.upenn.edu/LDC2013S02 purchase] [mailto:strassel@ldc.upenn.edu email] |
− | |||
− | |||
|234 h | |234 h | ||
|{{dunno}} | |{{dunno}} | ||
Line 1,237: | Line 1,153: | ||
|{{no}} | |{{no}} | ||
|WSJCAM0 | |WSJCAM0 | ||
− | |http://reverb2014.dereverberation.com/ | + | |[http://reverb2014.dereverberation.com/ purchase] [mailto:REVERB-challenge@lab.ntt.co.jp email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6701894 paper] |
− | |||
− | |||
|25 h | |25 h | ||
|130 | |130 | ||
Line 1,266: | Line 1,180: | ||
|{{no}} | |{{no}} | ||
|free | |free | ||
− | |http://shine.fbk.eu/resources/dirha-ii-simulated-corpus | + | |[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] [mailto:mravanelli@fbk.eu email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6843271 paper] |
− | |||
− | |||
|1.3 h | |1.3 h | ||
|30 | |30 |
Revision as of 22:26, 6 August 2014
This page aims to provide a list of datasets with detailed attributes and links to corresponding research results (papers, numerical results, output transcriptions, intermediary data, etc). Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation...
Disclaimer: Only publicly available datasets with a total duration longer than 5 min are listed.
Datasets | General attributes | Speech | Channel | Noise | Ground truth | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release | scenario | total duration | sampling rate | degraded channels | cameras | cost | links | speech duration | unique speakers | language | unique words | speaking style | simultaneous speakers | speaker overlap | channel type | radiation | speaker location | speaker movements | noise type | speech signal | speaker location and orientation | words | nonverbal traits | noise events | |
ShATR | 1994 | meeting | 37 min | 48000 | 3 (distant) | no | free | download email paper | 37 min | 5 | UK English | 1k | colloquial | 5 | multiple conversations | reverb | human | quasi-fixed | head | meeting | headset | yes | yes | no | yes |
LLSEC | 1996 | conversation | 1.4 h | 16000 | 4 (distant) | no | free | download email | ? | 12 | N/S | N/S | read/colloquial | 2 | conversation | reverb | human | quasi-fixed | head | hallway, restaurant | no | yes | no | no | no |
RWCP Spoken Dialog Corpus | 1996-1997 | conversation | 10 h | 16000 | 2 (close but cross-talk) | no | free | download email paper | 10 h | 39 | Japanese | ? | colloquial | 1 or 2 | conversation | reverb | human | quasi-fixed | head | stationary background noise | no | no | yes | no | no |
Aurora-2 | 2000 | public spaces | 33 h | 8000-16000 | 1 (close) | no | TIDigits | download email paper | 33 h | 214 | US English | 11 | digits | 1 | no | no (simulated telephone channel) | human | N/S | no | various real environments | original | N/S | yes | no | yes |
SPINE1/SPINE2 | 2000-2001 | military | 38 h | 16000 | 2 (close) | no | 2 x ($800 (audio) + $500 (transcripts)) + 3 x ($1000 (audio) + $600 (transcripts)) | purchase email paper | ? | 100 | US English | 1k | command/colloquial | 1 or 2 | no | no (simulated transmission channels) | human | quasi-fixed | head | military (pre-recorded noise played in sound booth while recording speech) | no | no | yes | no | no |
Aurora-3 (subset of SpeechDat-Car) | 2000-2003 | car | ? | 16000 | 3 (+1 GSM) (distant) | no | 5 x 200 (Academics) / 5 x 1,000 (Companies) | purchase papers | ? | ? | Finnish, German, Spanish, Danish, Italian | ? | command (read/digits/keywords/spontaneous) | 1 | no | reverb | human | quasi-fixed | head | car | close-talk | no | yes | no | no |
RWCP Meeting Speech Corpus | 2001 | meeting | 3.5 h | 16000-48000 | 1 (distant) | 3 | free | download email paper | 3.5 h | ? | Japanese | ? | colloquial | 1 to 5 | meeting | low reverb | human | quasi-fixed | head | stationary background noise | headset | no | yes | no | no |
RWCP Real Environment Speech and Acoustic Database | 2001 | domestic/office | ? | 16000-48000 | 30 (distant) | no | free | download email paper | ? | 5 | Japanese | ? | read | 1 | no | real rir/reverb | loudspeaker | various | no/pivoting arm | stationary background noise | original | yes | yes | no | yes |
SpeechDat-Car | 2001-2011 | car | ? | 16000 | 3 (+1 GSM) (distant) | no | 1.1 Million for all 10 languages. Each costs 39k to 182k | purchase paper | ? | 300/language | Multiple | ? | command (read/digits/keywords/spontaneous) | 1 | no | reverb | human | quasi-fixed | head | car | close-talk | no | yes | no | no |
Aurora-4 | 2002 | public spaces | ? | 8000-16000 | 1 (close) | no | WSJ0 | download email paper | ? | 101 | US English | 10k | read | 1 | no | no (simulated telephone channel) | human | N/S | no | various real environments | original | N/S | yes | no | yes |
TED | 2002 | seminar | 47 h | 16000 | 1 (distant) | no | $275 (audio) + $250 (transcripts) | purchase paper | 47 h | 188 | English (mostly non-native) | ? | lecture | 1 or more | seminar | reverb | human | quasi-fixed | head | stationary background noise | lapel | no | yes (partial) | no | no |
CUAVE | 2002 | cocktail party | 3 h | 44100 | 1 (distant) | 1 | free | download email paper | 3 h | 36 | US English | 10 | digits | 1 or 2 | full | reverb | human | quasi-fixed | head | stationary background noise | no | no | yes | no | no |
CU-Move ("Microphone Array Data"; downsampled data with more speakers but less channels exist) | 2002-2011 | car | 286 h | 44100 | 6 to 8 (distant) | no | $25k with UT-Drive | purchase email paper | 286 h | 172 | US English | 12k | command/digits/read/dialogue | 1 | no | reverb | human | quasi-fixed | head | car | no | no | yes | no | no |
CENSREC-1 (Aurora-2J) | 2003 | public spaces | ? | 8000 | 1 (close) | no | free | download email paper | 214 | Japanese | 11 | digits | 1 | no | various microphones and simulated channels | human | N/S | no | various real environments | original | N/S | yes | no | yes | |
AVICAR | 2004 | car | 29 h | 16000 | 7 (distant) | 4 | free | download email paper | 29 h | 86 | US/non-native English | 1k | read | 1 | no | reverb | human | quasi-fixed | head | car | no | no | yes | no | no |
AV16.3 | 2004 | meeting | 1.5 h | 16000 | 16 (distant) | 3 | free | download email paper | 1.5 h | 12 | N/S | N/S | colloquial | 1 to 3 | full | reverb | human | various | walk | stationary background noise | no | yes | no | no | no |
ICSI Meeting Corpus | 2004 | meeting | 72 h | 16000 | 6 (distant) | no | $1900 (audio) + $900 (transcripts) | purchase email paper | 72 h | 53 | US English | 13k | meeting | 3 to 10 | meeting | reverb | human | quasi-fixed | head | stationary background noise | headset (some lapel) | no | yes | yes | no |
NIST Meeting Pilot Corpus Speech | 2004 | meeting | 15 h | 16000 | 7 (distant) | no (released but not currently available for download) | $4000 (audio) + $1500 (transcripts) | purchase email paper | 15 h | 61 | US English | 6k | meeting | 3 to 9 | meeting | reverb | human | various | walk | stationary background noise | headset+lapel | no | yes | no | no |
CHIL Meetings | 2004-2007 | seminar/meeting | 60 h | 44100 | 79 to 147 (distant) | 6 to 9 | 3 500 | purchase email paper | ? | ? | non-native English | ? | lecture/meeting | 3 to 20 | seminar/meeting | reverb | human | quasi-fixed | head | meeting (scenarized) | headset | yes | yes | yes | no |
SPEECON | 2004-2011 | public space/domestic/office/car | ? | 16000 | 3 (distant) | no | 29 x 75000 for all languages | purchase email paper | ? | 600/language | Multiple | ? | command/read/spontaneous | 1 | no | reverb | human | quasi-fixed | head | various real environments | headset | no | yes | no | no |
CENSREC-2 | 2005 | car | ? | 16000 | 1 (distant) | no | free | download email paper | ? | 214 | Japanese | 11 | digits | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
CENSREC-3 | 2005 | car | ? | 16000 | 1 (distant) | no | free except phonetically balanced training set: JPY 21000 (Universities) / JPY 105000 (Companies) | purchase email paper | ? | 18 (+293 in training) | Japanese | 50 in evaluation; unknown but larger in phonetically-balanced utterances of training set | read | 1 | no | reverb | human | quasi-fixed | head | car | headset | no | yes | no | no |
Aurora-5 | 2006 | public spaces/domestic/office/car | ? | 8000 | 1 (distant) | no | TIDigits | download email paper | ? | 225 | US English | 11 | digits | 1 | no | real rir/simu/no + simulated telephone channel | loudspeaker | N/S | no | various real environments | original | no | yes | no | yes |
AMI | 2006 | meeting | 100 h | 16000 | 16 (distant) | 6 | free | download email paper | ? | 189 | UK English | 8k | meeting | 4 (18% overlap) | meeting | reverb | human | quasi-fixed | head | stationary background noise | headset+lapel | yes | yes | yes | no |
PASCAL SSC | 2006 | cocktail party | 18.5 min (+ 8.5h clean training data) | 25000 | 1 (mixing console) | no | free | email paper | 18.5 min (+ 8.5h clean training data) | 34 | UK English | 51 | command | 2 | full | no | human | N/S | no | no | original | N/S | yes | no | no |
HIWIRE | 2007 | airplane | 21 h | 16000 | 1 (close) | no | 50 | purchase email paper | 21 h | 81 | non-native English | 133 | command | 1 | no | no | human | N/S | head | airplane | original | N/S | yes | no | no |
UT-Drive | 2007 | car | 40 h | 25000 | 5 (distant) | 2 | $25k with CU-Move | download email paper | 40 h | 25 (more exist but not included in latest release 3.0) | US English | 2.4k (but transcription is incomplete) | command/conversation | 1 to 2 | conversation | reverb | human | quasi-fixed | head | car | headset (but problem w/ recording quality) | no | yes (partial) | no | no |
SASSEC/SiSEC underdetermined | 2007-2011 | cocktail party | 19 min | 16000 | 2 (distant) | no | free | download email paper | 19 min | 16 | N/S | N/S | read | 3 or 4 | full | reverb/real rir/simu | no | fixed | no | no | original+spatial image | yes | no | no | no |
MC-WSJ-AV/PASCAL SSC2/2012_MMA/REVERB RealData | 2007-2014 | cocktail party | 10 h | 16000 | 8 to 40 (distant) | no | $1 500 | purchase email paper paper | ? | 45 | UK English | 10k | read | 1 or 2 | full | reverb | human | various | walk | stationary background noise | headset+lapel | yes | yes | no | no |
CENSREC-4 (Simulated) | 2008 | public spaces/domestic/office/car | ? | 16000 | 1 (distant) | no | free | download email paper | ? | 214 | Japanese | 11 | digits | 1 | no | real rir | mouth simulator | fixed | no | various real environments | original | no | yes | no | yes |
CENSREC-4 (Real) | 2008 | public spaces/domestic/office/car | ? | 16000 | 1 (distant) | no | free | download email paper | ? | 10 | Japanese | 11 | digits | 1 | no | reverb | human | quasi-fixed | head | various real environments | headset | no | yes | no | yes |
DICIT | 2008 | domestic | 6 h | 48000 | 16 (distant) | 2 | free | download email paper | 1 h | ? | Italian | ? | command | 4 | no | reverb | human | various | walk | domestic (scenarized) | headset+tv | yes | yes | no | yes |
SiSEC head-geometry | 2008 | cocktail party | 1.9 h | 16000 | 2 (distant) | no | free | download email paper | 1.9 h | ? | N/S | N/S | read | 2 | full | real rir | loudspeaker | various | no | no | original+spatial image | yes | no | no | no |
COSINE | 2009 | conversation | 38 h | 48000 | 20 (distant) | no | free | download email paper | 11 h | 91 | US/non-native English | 5k | colloquial | 2 to 7 | conversation | reverb | human | various | walk | various real environments | headset+throat mic | no | yes | no | no |
SiSEC real-world noise | 2010 | public spaces | 20 min | 16000 | 2 to 4 (distant) | no | free | download email paper | 20 min | 6 | N/S | N/S | read | 1 or 3 | full | no | loudspeaker | various | no | various real environments | original+spatial image | yes | no | no | no |
SiSEC dynamic | 2010-2011 | cocktail party | 11 min | 16000 | 2 to 4 (distant) | no | free | download email paper | 11 min | ? | N/S | N/S | read | Many but only 2 simultaneous | simu | reverb | loudspeaker | various | simu | no | original+spatial image | yes | no | no | no |
CHiME 1/CHiME 2 Grid | 2011-2012 | domestic | 70 h with some overlap | 16000 | 2 (distant) | no | free | download email paper | 12 h | 34 | UK English | 51 | command | 1 | no | real rir | dummy | quasi-fixed | simu | domestic | yes | yes | yes | no | no |
CHiME 2 WSJ0 | 2012 | domestic | 78 h with some overlap | 16000 | 2 (distant) | no | WSJ0 | download email paper | 33 h | 101 | US English | 11k | read | 1 | no | real rir | dummy | fixed | no | domestic | yes | yes | yes | no | no |
ETAPE | 2012 | debates, outdoor interviews, and other TV/radio broadcasts selected for large speaker overlap and/or noise | 42 h | 16000 | 1 (mixing console) | 1 | ? | email paper | 32 h | 347 | French | 16k | colloquial | 1 or more (7% overlap on average, up to 10% in debates) | conversation | some reverb | human | quasi-fixed | head | various real environments | no | N/S | yes | no | yes |
GALE (Chinese broadcast conversation) | 2013 | conversation (TV Broadcast) | 120 h | 16000 | 1 (mixing console) | no | $2000 (audio) + $1500 (transcripts) | purchase email | 108 h | ? | Mandarin | ? | colloquial | 1 or more | conversation | no | human | quasi-fixed | head | no | no | N/S | yes | no | no |
GALE (Arabic broadcast conversation) | 2013 | conversation (TV Broadcast) | 251 h | 16000 | 1 (mixing console) | no | 2 x [$2000 (audio) + $1500 (transcripts)] | purchase email | 234 h | ? | Arabic | ? | colloquial | 1 or more | conversation | no | human | quasi-fixed | head | no | no | N/S | yes | no | no |
REVERB SimData | 2013 | domestic/office | 25 h | 16000 | 8 (distant) | no | WSJCAM0 | purchase email paper | 25 h | 130 | UK English | 10k | read | 1 | no | real rir | loudspeaker | fixed | no | experimental room | original+spatial image | yes | yes | no | yes |
DIRHA | 2014 | domestic | 3.8 h | 48000 | 40 (distant) | no | free | download email paper | 1.3 h | 30 | Italian, German, Greek, Portuguese | various | various | 1 or more | simu | real rir | loudspeaker | various | no | domestic (sum of individual noises) | yes | yes | yes | no | yes |
Contents
Automatic speech recognition
1st CHiME Challenge (2011)
Artificially distorted version of the small vocabulary GRID audio-visual corpus (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions. Clean-training, noisy-training, development and evaluation sets available, see
- Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.
Available from Computer Speech and Language here
Corpus available here (no cost)
Resources
Baselines
- See the paper above for results for a wide range of techniques.
AURORA 5 (2007)
Artificially distorted version of the digits TI-DIGITS corpus. Additive noise and additive noise plus reverberant speech sets. Variable SNR range. Various mixed training sets, no evaluation set, see
- G. Hirsch "Aurora-5 Experimental Framework for the Performance Evaluation of Speech Recognition in Case of a Hands-free Speech Input in Noisy Environments", Niederrhein University of Applied Sciences, 2007.
Paper available online here (no cost)
Corpus available from LDC here
Resources
- Training recipe for HTK is provided with the corpora.
Baselines
- Reproducible baseline: The above cited paper includes a baseline for the ETSI Advanced Front-End.
AURORA 4 (2002)
Artificially distorted version of the 5K word Wall Street Journal corpus (WSJ0). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone. Clean-training, mixed-training, noisy training and test sets available. No evaluation set, see
- G. Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.
Paper available with the corpus.
Corpora available from ELRA here and here
Resources
- Training recipe for HTK available here. Note that this recipe is for Wall-Street Journal (WSJ0), which is the clean speech version of AURORA4. Small changes are needed in the feature extraction scripts to account for different file terminations.
Speaker identification and verification
Speech enhancement and separation
Other applications
Contribute a dataset
To contribute a new dataset, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your dataset (the datasets are ordered by year of collection)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the dataset and year of collection
- authors, institution, contact information
- link to the dataset and to side resources (lexicon, language model, etc)
- short description (nature of the data, license, etc) and link to a paper/report describing the dataset, if any
- at least 1 research result obtained for this dataset (see below)
We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.
Contribute a research result
To contribute a new research result, please
- create an account and login
- go to the wiki page and the section corresponding to the dataset for which this result was obtained
- click on the "Edit" link on the right of the section header and add a new item for your result
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- authors, paper/report title, means of publication
- link to the pdf of the paper
- link to derived data (output transcriptions, intermediary data, etc)
- Code and instructions to reproduce experiments (if available)
In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).
We currently cannot provide storage space for large datasets. Please upload the derived data at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.