Difference between revisions of "Datasets"

Revision as of 23:26, 6 August 2014

This page aims to provide a list of datasets with detailed attributes and links to corresponding research results (papers, numerical results, output transcriptions, intermediary data, etc). Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation...

Disclaimer: Only publicly available datasets with a total duration longer than 5 min are listed.

Datasets	General attributes								Speech							Channel				Noise	Ground truth
Datasets	release	scenario	total duration	sampling rate	degraded channels	cameras	cost	links	speech duration	unique speakers	language	unique words	speaking style	simultaneous speakers	speaker overlap	channel type	radiation	speaker location	speaker movements	noise type	speech signal	speaker location and orientation	words	nonverbal traits	noise events
ShATR	1994	meeting	37 min	48000	3 (distant)	no	free	download email paper	37 min	5	UK English	1k	colloquial	5	multiple conversations	reverb	human	quasi-fixed	head	meeting	headset	yes	yes	no	yes
LLSEC	1996	conversation	1.4 h	16000	4 (distant)	no	free	download email	?	12	N/S	N/S	read/colloquial	2	conversation	reverb	human	quasi-fixed	head	hallway, restaurant	no	yes	no	no	no
RWCP Spoken Dialog Corpus	1996-1997	conversation	10 h	16000	2 (close but cross-talk)	no	free	download email paper	10 h	39	Japanese	?	colloquial	1 or 2	conversation	reverb	human	quasi-fixed	head	stationary background noise	no	no	yes	no	no
Aurora-2	2000	public spaces	33 h	8000-16000	1 (close)	no	TIDigits	download email paper	33 h	214	US English	11	digits	1	no	no (simulated telephone channel)	human	N/S	no	various real environments	original	N/S	yes	no	yes
SPINE1/SPINE2	2000-2001	military	38 h	16000	2 (close)	no	2 x ($800 (audio) + $500 (transcripts)) + 3 x ($1000 (audio) + $600 (transcripts))	purchase email paper	?	100	US English	1k	command/colloquial	1 or 2	no	no (simulated transmission channels)	human	quasi-fixed	head	military (pre-recorded noise played in sound booth while recording speech)	no	no	yes	no	no
Aurora-3 (subset of SpeechDat-Car)	2000-2003	car	?	16000	3 (+1 GSM) (distant)	no	5 x 200 (Academics) / 5 x 1,000 (Companies)	purchase papers	?	?	Finnish, German, Spanish, Danish, Italian	?	command (read/digits/keywords/spontaneous)	1	no	reverb	human	quasi-fixed	head	car	close-talk	no	yes	no	no
RWCP Meeting Speech Corpus	2001	meeting	3.5 h	16000-48000	1 (distant)	3	free	download email paper	3.5 h	?	Japanese	?	colloquial	1 to 5	meeting	low reverb	human	quasi-fixed	head	stationary background noise	headset	no	yes	no	no
RWCP Real Environment Speech and Acoustic Database	2001	domestic/office	?	16000-48000	30 (distant)	no	free	download email paper	?	5	Japanese	?	read	1	no	real rir/reverb	loudspeaker	various	no/pivoting arm	stationary background noise	original	yes	yes	no	yes
SpeechDat-Car	2001-2011	car	?	16000	3 (+1 GSM) (distant)	no	1.1 Million for all 10 languages. Each costs 39k to 182k	purchase paper	?	300/language	Multiple	?	command (read/digits/keywords/spontaneous)	1	no	reverb	human	quasi-fixed	head	car	close-talk	no	yes	no	no
Aurora-4	2002	public spaces	?	8000-16000	1 (close)	no	WSJ0	download email paper	?	101	US English	10k	read	1	no	no (simulated telephone channel)	human	N/S	no	various real environments	original	N/S	yes	no	yes
TED	2002	seminar	47 h	16000	1 (distant)	no	$275 (audio) + $250 (transcripts)	purchase paper	47 h	188	English (mostly non-native)	?	lecture	1 or more	seminar	reverb	human	quasi-fixed	head	stationary background noise	lapel	no	yes (partial)	no	no
CUAVE	2002	cocktail party	3 h	44100	1 (distant)	1	free	download email paper	3 h	36	US English	10	digits	1 or 2	full	reverb	human	quasi-fixed	head	stationary background noise	no	no	yes	no	no
CU-Move ("Microphone Array Data"; downsampled data with more speakers but less channels exist)	2002-2011	car	286 h	44100	6 to 8 (distant)	no	$25k with UT-Drive	purchase email paper	286 h	172	US English	12k	command/digits/read/dialogue	1	no	reverb	human	quasi-fixed	head	car	no	no	yes	no	no
CENSREC-1 (Aurora-2J)	2003	public spaces	?	8000	1 (close)	no	free	download email paper		214	Japanese	11	digits	1	no	various microphones and simulated channels	human	N/S	no	various real environments	original	N/S	yes	no	yes
AVICAR	2004	car	29 h	16000	7 (distant)	4	free	download email paper	29 h	86	US/non-native English	1k	read	1	no	reverb	human	quasi-fixed	head	car	no	no	yes	no	no
AV16.3	2004	meeting	1.5 h	16000	16 (distant)	3	free	download email paper	1.5 h	12	N/S	N/S	colloquial	1 to 3	full	reverb	human	various	walk	stationary background noise	no	yes	no	no	no
ICSI Meeting Corpus	2004	meeting	72 h	16000	6 (distant)	no	$1900 (audio) + $900 (transcripts)	purchase email paper	72 h	53	US English	13k	meeting	3 to 10	meeting	reverb	human	quasi-fixed	head	stationary background noise	headset (some lapel)	no	yes	yes	no
NIST Meeting Pilot Corpus Speech	2004	meeting	15 h	16000	7 (distant)	no (released but not currently available for download)	$4000 (audio) + $1500 (transcripts)	purchase email paper	15 h	61	US English	6k	meeting	3 to 9	meeting	reverb	human	various	walk	stationary background noise	headset+lapel	no	yes	no	no
CHIL Meetings	2004-2007	seminar/meeting	60 h	44100	79 to 147 (distant)	6 to 9	3 500	purchase email paper	?	?	non-native English	?	lecture/meeting	3 to 20	seminar/meeting	reverb	human	quasi-fixed	head	meeting (scenarized)	headset	yes	yes	yes	no
SPEECON	2004-2011	public space/domestic/office/car	?	16000	3 (distant)	no	29 x 75000 for all languages	purchase email paper	?	600/language	Multiple	?	command/read/spontaneous	1	no	reverb	human	quasi-fixed	head	various real environments	headset	no	yes	no	no
CENSREC-2	2005	car	?	16000	1 (distant)	no	free	download email paper	?	214	Japanese	11	digits	1	no	reverb	human	quasi-fixed	head	car	headset	no	yes	no	no
CENSREC-3	2005	car	?	16000	1 (distant)	no	free except phonetically balanced training set: JPY 21000 (Universities) / JPY 105000 (Companies)	purchase email paper	?	18 (+293 in training)	Japanese	50 in evaluation; unknown but larger in phonetically-balanced utterances of training set	read	1	no	reverb	human	quasi-fixed	head	car	headset	no	yes	no	no
Aurora-5	2006	public spaces/domestic/office/car	?	8000	1 (distant)	no	TIDigits	download email paper	?	225	US English	11	digits	1	no	real rir/simu/no + simulated telephone channel	loudspeaker	N/S	no	various real environments	original	no	yes	no	yes
AMI	2006	meeting	100 h	16000	16 (distant)	6	free	download email paper	?	189	UK English	8k	meeting	4 (18% overlap)	meeting	reverb	human	quasi-fixed	head	stationary background noise	headset+lapel	yes	yes	yes	no
PASCAL SSC	2006	cocktail party	18.5 min (+ 8.5h clean training data)	25000	1 (mixing console)	no	free	email paper	18.5 min (+ 8.5h clean training data)	34	UK English	51	command	2	full	no	human	N/S	no	no	original	N/S	yes	no	no
HIWIRE	2007	airplane	21 h	16000	1 (close)	no	50	purchase email paper	21 h	81	non-native English	133	command	1	no	no	human	N/S	head	airplane	original	N/S	yes	no	no
UT-Drive	2007	car	40 h	25000	5 (distant)	2	$25k with CU-Move	download email paper	40 h	25 (more exist but not included in latest release 3.0)	US English	2.4k (but transcription is incomplete)	command/conversation	1 to 2	conversation	reverb	human	quasi-fixed	head	car	headset (but problem w/ recording quality)	no	yes (partial)	no	no
SASSEC/SiSEC underdetermined	2007-2011	cocktail party	19 min	16000	2 (distant)	no	free	download email paper	19 min	16	N/S	N/S	read	3 or 4	full	reverb/real rir/simu	no	fixed	no	no	original+spatial image	yes	no	no	no
MC-WSJ-AV/PASCAL SSC2/2012_MMA/REVERB RealData	2007-2014	cocktail party	10 h	16000	8 to 40 (distant)	no	$1 500	purchase email paper paper	?	45	UK English	10k	read	1 or 2	full	reverb	human	various	walk	stationary background noise	headset+lapel	yes	yes	no	no
CENSREC-4 (Simulated)	2008	public spaces/domestic/office/car	?	16000	1 (distant)	no	free	download email paper	?	214	Japanese	11	digits	1	no	real rir	mouth simulator	fixed	no	various real environments	original	no	yes	no	yes
CENSREC-4 (Real)	2008	public spaces/domestic/office/car	?	16000	1 (distant)	no	free	download email paper	?	10	Japanese	11	digits	1	no	reverb	human	quasi-fixed	head	various real environments	headset	no	yes	no	yes
DICIT	2008	domestic	6 h	48000	16 (distant)	2	free	download email paper	1 h	?	Italian	?	command	4	no	reverb	human	various	walk	domestic (scenarized)	headset+tv	yes	yes	no	yes
SiSEC head-geometry	2008	cocktail party	1.9 h	16000	2 (distant)	no	free	download email paper	1.9 h	?	N/S	N/S	read	2	full	real rir	loudspeaker	various	no	no	original+spatial image	yes	no	no	no
COSINE	2009	conversation	38 h	48000	20 (distant)	no	free	download email paper	11 h	91	US/non-native English	5k	colloquial	2 to 7	conversation	reverb	human	various	walk	various real environments	headset+throat mic	no	yes	no	no
SiSEC real-world noise	2010	public spaces	20 min	16000	2 to 4 (distant)	no	free	download email paper	20 min	6	N/S	N/S	read	1 or 3	full	no	loudspeaker	various	no	various real environments	original+spatial image	yes	no	no	no
SiSEC dynamic	2010-2011	cocktail party	11 min	16000	2 to 4 (distant)	no	free	download email paper	11 min	?	N/S	N/S	read	Many but only 2 simultaneous	simu	reverb	loudspeaker	various	simu	no	original+spatial image	yes	no	no	no
CHiME 1/CHiME 2 Grid	2011-2012	domestic	70 h with some overlap	16000	2 (distant)	no	free	download email paper	12 h	34	UK English	51	command	1	no	real rir	dummy	quasi-fixed	simu	domestic	yes	yes	yes	no	no
CHiME 2 WSJ0	2012	domestic	78 h with some overlap	16000	2 (distant)	no	WSJ0	download email paper	33 h	101	US English	11k	read	1	no	real rir	dummy	fixed	no	domestic	yes	yes	yes	no	no
ETAPE	2012	debates, outdoor interviews, and other TV/radio broadcasts selected for large speaker overlap and/or noise	42 h	16000	1 (mixing console)	1	?	email paper	32 h	347	French	16k	colloquial	1 or more (7% overlap on average, up to 10% in debates)	conversation	some reverb	human	quasi-fixed	head	various real environments	no	N/S	yes	no	yes
GALE (Chinese broadcast conversation)	2013	conversation (TV Broadcast)	120 h	16000	1 (mixing console)	no	$2000 (audio) + $1500 (transcripts)	purchase email	108 h	?	Mandarin	?	colloquial	1 or more	conversation	no	human	quasi-fixed	head	no	no	N/S	yes	no	no
GALE (Arabic broadcast conversation)	2013	conversation (TV Broadcast)	251 h	16000	1 (mixing console)	no	2 x [$2000 (audio) + $1500 (transcripts)]	purchase email	234 h	?	Arabic	?	colloquial	1 or more	conversation	no	human	quasi-fixed	head	no	no	N/S	yes	no	no
REVERB SimData	2013	domestic/office	25 h	16000	8 (distant)	no	WSJCAM0	purchase email paper	25 h	130	UK English	10k	read	1	no	real rir	loudspeaker	fixed	no	experimental room	original+spatial image	yes	yes	no	yes
DIRHA	2014	domestic	3.8 h	48000	40 (distant)	no	free	download email paper	1.3 h	30	Italian, German, Greek, Portuguese	various	various	1 or more	simu	real rir	loudspeaker	various	no	domestic (sum of individual noises)	yes	yes	yes	no	yes

Automatic speech recognition

1st CHiME Challenge (2011)

Artificially distorted version of the small vocabulary GRID audio-visual corpus (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions. Clean-training, noisy-training, development and evaluation sets available, see

Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.

Available from Computer Speech and Language here

Corpus available here (no cost)

Resources

Training recipe of the challenge for HTK here.

Baselines

See the paper above for results for a wide range of techniques.

AURORA 5 (2007)

Artificially distorted version of the digits TI-DIGITS corpus. Additive noise and additive noise plus reverberant speech sets. Variable SNR range. Various mixed training sets, no evaluation set, see

G. Hirsch "Aurora-5 Experimental Framework for the Performance Evaluation of Speech Recognition in Case of a Hands-free Speech Input in Noisy Environments", Niederrhein University of Applied Sciences, 2007.

Paper available online here (no cost)

Corpus available from LDC here

Resources

Training recipe for HTK is provided with the corpora.

Baselines

Reproducible baseline: The above cited paper includes a baseline for the ETSI Advanced Front-End.

AURORA 4 (2002)

Artificially distorted version of the 5K word Wall Street Journal corpus (WSJ0). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone. Clean-training, mixed-training, noisy training and test sets available. No evaluation set, see

G. Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.

Paper available with the corpus.

Corpora available from ELRA here and here

Resources

Training recipe for HTK available here. Note that this recipe is for Wall-Street Journal (WSJ0), which is the clean speech version of AURORA4. Small changes are needed in the feature extraction scripts to account for different file terminations.

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute a dataset

To contribute a new dataset, please

create an account and login
go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
click on the "Edit" link at the top of the page and add a new section for your dataset (the datasets are ordered by year of collection)
click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

name of the dataset and year of collection
authors, institution, contact information
link to the dataset and to side resources (lexicon, language model, etc)
short description (nature of the data, license, etc) and link to a paper/report describing the dataset, if any
at least 1 research result obtained for this dataset (see below)

We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.

Contribute a research result

To contribute a new research result, please

create an account and login
go to the wiki page and the section corresponding to the dataset for which this result was obtained
click on the "Edit" link on the right of the section header and add a new item for your result
click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

authors, paper/report title, means of publication
link to the pdf of the paper
link to derived data (output transcriptions, intermediary data, etc)
Code and instructions to reproduce experiments (if available)

In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).

We currently cannot provide storage space for large datasets. Please upload the derived data at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.

Not logged in

Search

Navigation

Tools

Difference between revisions of "Datasets"

Namespaces

Views

Actions

Revision as of 23:26, 6 August 2014

Contents

Automatic speech recognition

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute a dataset

Contribute a research result

Navigation

Tools

@@ Line 5: / Line 5: @@
 {| class="wikitable sortable" style="font-size:85%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 |-
-!style="width: 50px" rowspan="2" |Datasets
+!style="width: 50em" rowspan="2" |Datasets
-!colspan="10" |Data
+!colspan="8" |General attributes
 !colspan="7" |Speech
 !colspan="4" |Channel
@@ Line 16: / Line 16: @@
 !scope="col" width="50px" | total duration
 !scope="col" width="50px" | sampling rate
-!scope="col" width="50px" | mixture channels
+!scope="col" width="50px" | degraded channels
 !scope="col" width="50px" | cameras
 !scope="col" width="50px" | cost
-!scope="col" width="50px" | download
+!scope="col" width="50px" | links
-!scope="col" width="50px" | email
-!scope="col" width="50px" | reference paper
 !scope="col" width="50px" | speech duration
 !scope="col" width="50px" | unique speakers
@@ Line 48: / Line 46: @@
 |{{no}}
 |free
-|http://spandh.dcs.shef.ac.uk/projects/shatrweb/
+|[http://spandh.dcs.shef.ac.uk/projects/shatrweb/ download] [mailto:g.brown@dcs.shef.ac.uk email] [http://spandh.dcs.shef.ac.uk/projects/shatrweb/papers/ioa94.html paper]
-|g.brown@dcs.shef.ac.uk
-|Malcolm Crawford, Guy J. Brown, Martin Cooke and Phil Green, "Design, collection and analysis of a multi-simultaneous-speaker corpus," Proceedings of The Institute of Acoustics, 16(5):183-190.
 |37 min
 |5
@@ Line 77: / Line 73: @@
 |{{no}}
 |free
-|https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html
+|[https://www.ll.mit.edu/mission/cybersec/HLT/corpora/SpeechCorpora.html download] [mailto:jpc@ll.mit.edu email]
-|jpc@ll.mit.edu
-|{{dunno}}
 |{{dunno}}
 |12
@@ Line 106: / Line 100: @@
 |{{no}}
 |free
-|http://research.nii.ac.jp/src/en/RWCP-SP96.html
+|[http://research.nii.ac.jp/src/en/RWCP-SP96.html download] [mailto:src@nii.ac.jp email] [http://scitation.aip.org/content/asa/journal/jasa/100/4/10.1121/1.416338 paper]
-|src@nii.ac.jp
-|Kazuyo Tanaka, Satoru Hayamizu, Yoichi Yamashita, Kiyohiro Shikano, Shuichi Itahashi and Ryuichi Oka, "Design and data collection for a spoken dialog database in the Real World Computing (RWC) program," J. Acoust. Soc. Am. 100, 2759 (1996)
 |10 h
 |39
@@ Line 135: / Line 127: @@
 |{{no}}
 |TIDigits
-|http://aurora.hsnr.de/download.html
+|[http://aurora.hsnr.de/download.html download] [mailto:hans-guenter.hirsch@hs-niederrhein.de email] [http://www.isca-speech.org/archive_open/asr2000/asr0_181.html paper]
-|hans-guenter.hirsch@hs-niederrhein.de
-|Hans-Gnter Hirsch, David Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions,", Proc. Interspeech 2000
 |33 h
 |214
@@ Line 164: / Line 154: @@
 |{{no}}
 |2 x ($800 (audio) + $500 (transcripts)) + 3 x ($1000 (audio) + $600 (transcripts))
-|https://catalog.ldc.upenn.edu/LDC2000S87
+|[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=SPINE purchase] [mailto:jdwright@ldc.upenn.edu email] [http://dl.acm.org/citation.cfm?id=1289199 paper]
-|jdwright@ldc.upenn.edu
-|T.H. Crystal et al., "Speech in noisy environments (SPINE) adds new dimension to speech recognition R&D", Proc. HLT 2002
 |{{dunno}}
 |100
@@ Line 193: / Line 181: @@
 |{{no}}
 |5 x 200 (Academics) / 5 x 1,000 (Companies)
-|http://catalog.elra.info/index.php?cPath=37_40
+|[http://catalog.elra.info/index.php?cPath=37_40 purchase] [http://aurora.hsnr.de/aurora-3/reports.html papers]
-|
-|
 |{{dunno}}
 |{{dunno}}
@@ Line 222: / Line 208: @@
 |3
 |free
-|http://research.nii.ac.jp/src/en/RWCP-SP01.html
+|[http://research.nii.ac.jp/src/en/RWCP-SP01.html download] [mailto:src@nii.ac.jp email] [http://id.nii.ac.jp/1001/00057420/ paper]
-|src@nii.ac.jp
-|Kazuyo Tanaka, Katunobu Itou, Masanori Ihara, Ryuichi Oka, "Constructing a Meeting Speech Corpus", IPSJ, 37-15, 2001
 |3.5 h
 |{{dunno}}
@@ Line 251: / Line 235: @@
 |{{no}}
 |free
-|http://research.nii.ac.jp/src/en/RWCP-SSD.html
+|[http://research.nii.ac.jp/src/en/RWCP-SSD.html download] [mailto:s-nakamura@is.naist.jp email] [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/356.htm paper]
-|s-nakamura@is.naist.jp
-|Satoshi Nakamura, Kazuo Hiyane, Futoshi Asano, Takanobu Nishiura, and Takeshi Yamada, "Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition," LREC 2000.
 |{{dunno}}
 |5
@@ Line 280: / Line 262: @@
 |{{no}}
 |1.1 Million  for all 10 languages. Each costs 39k  to 182k
-|http://catalog.elra.info/index.php?cPath=37_41
+|[http://catalog.elra.info/search.php purchase] [http://www.lrec-conf.org/proceedings/lrec2000/html/summary/373.htm paper]
-|
-|A. Moreno et al., "SPEECHDAT-CAR. A Large Speech Database for Automotive Environments," Proc. LREC 2000
 |{{dunno}}
 |300/language
@@ Line 309: / Line 289: @@
 |{{no}}
 |WSJ0
-|http://aurora.hsnr.de/download.html
+|[http://aurora.hsnr.de/download.html download] [mailto:hans-guenter.hirsch@hs-niederrhein.de email] [http://aurora.hsnr.de/aurora-4/reports.html paper]
-|hans-guenter.hirsch@hs-niederrhein.de
-|N. Parihar and J. Picone, "Aurora Working Group: DSR Front End LVCSR Evaluation AU/384/02," Tech. Rep., Inst. for Signal and Information Process, Mississippi State University, 2002
 |{{dunno}}
 |101
@@ Line 338: / Line 316: @@
 |{{no}}
 |$275 (audio) + $250 (transcripts)
-|https://catalog.ldc.upenn.edu/LDC2002S04
+|[https://catalog.ldc.upenn.edu/LDC2002S04 purchase] [http://perso.limsi.fr/lamel/icslp94ted.pdf paper]
-|
-|L. Lamel, F. Schiel, A. Fourcin, J. Mariani, and H. Tillman, "The translingual English database (TED)," Proc. ICSLP, 1994
 |47 h
 |188
@@ Line 367: / Line 343: @@
 |1
 |free
-|http://www.clemson.edu/ces/speech/cuave.htm
+|[http://www.clemson.edu/ces/speech/cuave.htm download] [mailto:ksampat@clemson.edu email] [http://asp.eurasipjournals.com/content/2002/11/208541 paper]
-|ksampat@clemson.edu
-|Eric K Patterson, Sabri Gurbuz, Zekeriya Tufekci and John N Gowdy, "Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus," EURASIP Journal on Advances in Signal Processing 2002, 2002:208541
 |3 h
 |36
@@ Line 396: / Line 370: @@
 |{{no}}
 |$25k with UT-Drive
-|http://crss.utdallas.edu/
+|[http://crss.utdallas.edu/ purchase] [mailto:john.hansen@utdallas.edu email] [http://www.isca-speech.org/archive/eurospeech_2001/e01_2023.html paper]
-|john.hansen@utdallas.edu
-|John H.L. Hansen, Pongtep Angkititrakul, Jay Plucienkowski, Stephen Gallant, Umit Yapanel, Bryan Pellom, Wayne Ward, and Ron Cole, ""CU-Move" : Analysis & Corpus Development for Interactive In-Vehicle Speech Systems", Interspeech 2001
 |286 h
 |172
@@ Line 425: / Line 397: @@
 |{{no}}
 |free
-|http://research.nii.ac.jp/src/en/CENSREC-1.html
+|[http://research.nii.ac.jp/src/en/CENSREC-1.html download]  [mailto:s-nakamura@is.naist.jp email] [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15046/1/425.pdf paper]
-|
-|S. Nakamura, K. Takeda, K. Yamamoto, T. Yamada, S. Kuroiwa, N. Kitaoka, T. Nishiura, A. Sasou, M. Mizumachi, C. Miyajima, M. Fujimoto, and T. Endo, "Aurora-2J, an evaluation framework for Japanese noisy speech recognition," IEICE Transactions on Information and Systems, vol. E88-D, no. 3:pp. 535544, 2005
 |
 |214
@@ Line 454: / Line 424: @@
 |4
 |free
-|http://www.isle.illinois.edu/sst/AVICAR/
+|[http://www.isle.illinois.edu/sst/AVICAR/ download] [mailto:jhasegaw@illinois.edu email] [http://www.isca-speech.org/archive/interspeech_2004/i04_2489.html paper]
-|jhasegaw@illinois.edu
-|Bowon Lee, Mark Hasegawa-Johnson, Camille Goudeseune, Suketu Kamdar, Sarah Borys, Ming Liu, Thomas Huang, "AVICAR: Audio-Visual Speech Corpus in a Car Environment", Proc. Interspeech, 2004
 |29 h
 |86
@@ Line 483: / Line 451: @@
 |3
 |free
-|http://www.idiap.ch/dataset/av16-3/
+|[http://www.idiap.ch/dataset/av16-3/ download] [mailto:odobez@idiap.ch email] [http://publications.idiap.ch/index.php/publications/show/353 paper]
-|odobez@idiap.ch
-|"AV16.3: an Audio-Visual Corpus for Speaker Localization and Tracking", by Guillaume Lathoud, Jean-Marc Odobez and Daniel Gatica-Perez, in Proceedings of the MLMI'04 Workshop, 2004.
 |1.5 h
 |12
@@ Line 512: / Line 478: @@
 |{{no}}
 |$1900 (audio) + $900 (transcripts)
-|https://catalog.ldc.upenn.edu/LDC2004S02
+|[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=ICSI purchase] [mailto:mrcontact@icsi.berkeley.edu email] [http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1198793 paper]
-|mrcontact@icsi.berkeley.edu
-|A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, C. Wooters, "The ICSI meeting corpus," Proc. ICASSP, Apr. 2003
 |72 h
 |53
@@ Line 541: / Line 505: @@
 |{{no}} (released but not currently available for download)
 |$4000 (audio) + $1500 (transcripts)
-|https://catalog.ldc.upenn.edu/LDC2004S09
+|[https://catalog.ldc.upenn.edu/search?q%5Bname_cont%5D=NIST%20Meeting purchase] [mailto:john.garofolo@nist.gov email] [http://www.lrec-conf.org/proceedings/lrec2004/summaries/137.htm paper]
-|john.garofolo@nist.gov
-|John S. Garofolo, Christophe D. Laprun, Martial Michel, Vincent M. Stanford and Elham Tabassi, "The NIST Meeting Room Pilot Corpus," Proc. LREC, 2004
 |15 h
 |61
@@ Line 570: / Line 532: @@
 |6 to 9
 |3 500
-|http://catalog.elra.info/search.php
+|[http://catalog.elra.info/search.php purchase] [mailto:choukri@elda.org email] [http://link.springer.com/article/10.1007%2Fs10579-007-9054-4 paper]
-|choukri@elda.org
-|D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. Chu, A. Tyagi, J. Casas, J. Turmo, L. Cristoforetti, F. Tobia, A. Pnevmatikakis, V. Mylonakis, F. Talantzis, S. Burger, R. Stiefelhagen, K. Bernardin, C. Rochet, The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms, in LANGUAGE RESOURCES AND EVALUATION, vol. 41, n. 3-4, 2007, pp. 389-407
 |{{dunno}}
 |{{dunno}}
@@ Line 599: / Line 559: @@
 |{{no}}
 |29 x 75000  for all languages
-|http://catalog.elra.info/index.php?cPath=37
+|[http://catalog.elra.info/search.php purchase] [mailto:diskra@appen.com email] [http://www.lrec-conf.org/proceedings/lrec2002/sumarios/177.htm paper]
-|diskra@appen.com
-|Dorota Iskra, Beate Grosskopf, Krzysztof Marasek, Henk van den Heuvel, Frank Diehl, Andreas Kiessling, "SPEECON  Speech Databases for Consumer Devices: Database Specification and Validation", LREC p. 329-333, 2002.
 |{{dunno}}
 |600/language
@@ Line 628: / Line 586: @@
 |{{no}}
 |free
-|http://research.nii.ac.jp/src/en/CENSREC-2.html
+|[http://research.nii.ac.jp/src/en/CENSREC-2.html download] [mailto:src@nii.ac.jp email] [http://www.isca-speech.org/archive/interspeech_2006/i06_1726.html paper]
-|src@nii.ac.jp
-|S. Nakamura, M. Fujimoto, and K. Takeda, "CENSREC2: Corpus and evaluation environments for in car continuous digit speech recognition," Proc. ICSLP 2006
 |{{dunno}}
 |214
@@ Line 657: / Line 613: @@
 |{{no}}
 |free except phonetically balanced training set: JPY 21000 (Universities) / JPY 105000 (Companies)
-|http://research.nii.ac.jp/src/en/CENSREC-3.html
+|[http://research.nii.ac.jp/src/en/CENSREC-3.html purchase] [mailto:src@nii.ac.jp email] [http://ir.nul.nagoya-u.ac.jp/jspui/bitstream/2237/15050/1/429.pdf paper]
-|src@nii.ac.jp
-|M. Fujimoto, K. Takeda, and S. Nakamura, "CENSREC-3: An evaluation framework for Japanese speech recognition in real driving-car environments," IEICE Transactions on Information and Systems, vol. E89-D, no. 11:pp. 27832793, 2006
 |{{dunno}}
 |18 (+293 in training)
@@ Line 686: / Line 640: @@
 |{{no}}
 |TIDigits
-|http://aurora.hsnr.de/download.html
+|[http://aurora.hsnr.de/download.html download] [mailto:hans-guenter.hirsch@hs-niederrhein.de email] [http://aurora.hsnr.de/aurora-5/reports.html paper]
-|hans-guenter.hirsch@hs-niederrhein.de
-|Hans-Gnter Hirsch, "Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments,", Tech Report, Niederrhein Univ. of Applied Sciences, 2007
 |{{dunno}}
 |225
@@ Line 715: / Line 667: @@
 |6
 |free
-|http://groups.inf.ed.ac.uk/ami/
+|[http://groups.inf.ed.ac.uk/ami/ download] [mailto:amicorpus@amiproject.org email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4538700 paper]
-|amicorpus@amiproject.org
-|Steve Renals, Thomas Hain, and Herv Bourlard. Interpretation of multiparty meetings: The AMI and AMIDA projects. In IEEE Workshop on Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008, pages 115-118, 2008
 |{{dunno}}
 |189
@@ Line 744: / Line 694: @@
 |{{no}}
 |free
-|
+|[mailto:m.cooke@ikerbasque.org email] [http://www.sciencedirect.com/science/article/pii/S0885230809000205 paper]
-|m.cooke@ikerbasque.org
-|Martin Cooke, John R. Hershey, Steven J. Rennie, "Monaural speech separation and recognition challenge," Computer, Speech and Language, 2010
 |18.5 min (+ 8.5h clean training data)
 |34
@@ Line 773: / Line 721: @@
 |{{no}}
 |50
-|http://catalog.elra.info/product_info.php?products_id=1088&language=en
+|[http://catalog.elra.info/product_info.php?products_id=1088&language=en purchase] [mailto:segura@ugr.es email] [http://cvsp.cs.ntua.gr/projects/pub/HIWIRE/WebHome/HIWIRE_db_description_paper.pdf paper]
-|segura@ugr.es
-|J.C. Segura, T. Ehrette, A. Potamianos, D. Fohr, I. Illina, P.-A. Breton, V. Clot, R. Gemello, M. Matassoni, P. Maragos, "The HIWIRE database, a noisy and non-native English speech corpus for cockpit communication"
 |21 h
 |81
@@ Line 802: / Line 748: @@
 |2
 |$25k with CU-Move
-|http://crss.utdallas.edu/
+|[http://crss.utdallas.edu/ download] [mailto:john.hansen@utdallas.edu email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=4290175 paper]
-|john.hansen@utdallas.edu
-|P. Angkititrakul, M. Petracca, A. Sathyanarayana, J.H.L. Hansen, "UTDrive: Driver Behavior and Speech Interactive Systems for In-Vehicle Environments," Intelligent Vehicles Symposium, 2007
 |40 h
 |25 (more exist but not included in latest release 3.0)
@@ Line 831: / Line 775: @@
 |{{no}}
 |free
-|http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures
+|[http://sisec2011.wiki.irisa.fr/tiki-index.php?page=Underdetermined+speech+and+music+mixtures download] [mailto:araki.shoko@lab.ntt.co.jp email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper]
-|araki.shoko@lab.ntt.co.jp
-|The Signal Separation Evaluation Campaign (2007-2010): Achievements and Remaining Challenges, Emmanuel Vincent; Shoko Araki; Fabian J. Theis; Guido Nolte; Pau Bofill; Hiroshi Sawada; Alexey Ozerov; B. Vikrham Gowreesunker; Dominik Lutter; Ngoc Duong, Signal Processing, Elsevier, 2012, 92, pp. 1928-1936
 |19 min
 |16
@@ Line 860: / Line 802: @@
 |{{no}}
 |$1 500
-|https://catalog.ldc.upenn.edu/LDC2014S03
+|[https://catalog.ldc.upenn.edu/LDC2014S03 purchase] [mailto:mike.lincoln@quoratetechnology.com email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=1566470 paper] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6639033 paper]
-|mike.lincoln@quoratetechnology.com
-|M. Lincoln, I. McCowan, J. Vepa, and H. K. Maganti, The multi-channel wall street journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments, in IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005. + E. Zwyssig, F. Faubel, S. Renals and M. Lincoln, "Recognition of overlapping speech using digital MEMS microphone arrays", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013
 |{{dunno}}
 |45
@@ Line 889: / Line 829: @@
 |{{no}}
 |free
-|http://research.nii.ac.jp/src/en/CENSREC-4.html
+|[http://research.nii.ac.jp/src/en/CENSREC-4.html download] [mailto:src@nii.ac.jp email] [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper]
-|src@nii.ac.jp
-|T. Nishiura et al., "Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments  Newest Part of the CENSREC Series", Proc. LREC 2008
 |{{dunno}}
 |214
@@ Line 918: / Line 856: @@
 |{{no}}
 |free
-|http://research.nii.ac.jp/src/en/CENSREC-4.html
+|[http://research.nii.ac.jp/src/en/CENSREC-4.html download] [mailto:src@nii.ac.jp email] [http://www.lrec-conf.org/proceedings/lrec2008/summaries/468.html paper]
-|src@nii.ac.jp
-|T. Nishiura et al., "Evaluation Framework for Distant-talking Speech Recognition under Reverberant Environments  Newest Part of the CENSREC Series", Proc. LREC 2008
 |{{dunno}}
 |10
@@ Line 947: / Line 883: @@
 |2
 |free
-|http://shine.fbk.eu/resources/dicit-acoustic-woz-data
+|[http://shine.fbk.eu/resources/dicit-acoustic-woz-data download] [mailto:omologo@fbk.eu email] [http://www.lrec-conf.org/proceedings/lrec2008/summaries/584.html paper]
-|omologo@fbk.eu
-|Alessio Brutti, Luca Cristoforetti, Walter Kellermann, Lutz Marquardt and Maurizio Omologo, WOZ Acoustic Data Collection for Interactive TV, Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), 2008.
 |1 h
 |{{dunno}}
@@ Line 976: / Line 910: @@
 |{{no}}
 |free
-|http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions
+|[http://sisec2008.wiki.irisa.fr/tiki-index.php?page=Head-geometry%20mixtures%20of%20two%20speech%20sources%20in%20real%20environments,%20impinging%20from%20many%20directions download] [mailto:hendrik.kayser@uni-oldenburg.de email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper]
-|hendrik.kayser@uni-oldenburg.de
-|The Signal Separation Evaluation Campaign (2007-2010): Achievements and Remaining Challenges, Emmanuel Vincent; Shoko Araki; Fabian J. Theis; Guido Nolte; Pau Bofill; Hiroshi Sawada; Alexey Ozerov; B. Vikrham Gowreesunker; Dominik Lutter; Ngoc Duong, Signal Processing, Elsevier, 2012, 92, pp. 1928-1936
 |1.9 h
 |{{dunno}}
@@ Line 1,005: / Line 937: @@
 |{{no}}
 |free
-|http://melodi.ee.washington.edu/cosine/
+|[http://melodi.ee.washington.edu/cosine/ download] [mailto:cosine@melodi.ee.washington.edu email] [http://www.sciencedirect.com/science/article/pii/S0885230811000143 paper]
-|cosine@melodi.ee.washington.edu
-|Alex Stupakov,  Evan Hanusa,  Deepak Vijaywargi,  Dieter Fox, and  Jeff Bilmes.  The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments.  Computer Speech and Langauge,  26:5266, 2011.
 |11 h
 |91
@@ Line 1,034: / Line 964: @@
 |{{no}}
 |free
-|http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise
+|[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Source+separation+in+the+presence+of+real-world+background+noise download] [mailto:ito.nobutaka@lab.ntt.co.jp email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper]
-|ito.nobutaka@lab.ntt.co.jp
-|The Signal Separation Evaluation Campaign (2007-2010): Achievements and Remaining Challenges, Emmanuel Vincent; Shoko Araki; Fabian J. Theis; Guido Nolte; Pau Bofill; Hiroshi Sawada; Alexey Ozerov; B. Vikrham Gowreesunker; Dominik Lutter; Ngoc Duong, Signal Processing, Elsevier, 2012, 92, pp. 1928-1936
 |20 min
 |6
@@ Line 1,063: / Line 991: @@
 |{{no}}
 |free
-|http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions
+|[http://sisec2010.wiki.irisa.fr/tiki-index.php?page=Determined+convolutive+mixtures+under+dynamic+conditions download] [mailto:francesco.nesta@gmail.com email] [http://www.sciencedirect.com/science/article/pii/S0165168411003604 paper]
-|francesco.nesta@gmail.com
-|The Signal Separation Evaluation Campaign (2007-2010): Achievements and Remaining Challenges, Emmanuel Vincent; Shoko Araki; Fabian J. Theis; Guido Nolte; Pau Bofill; Hiroshi Sawada; Alexey Ozerov; B. Vikrham Gowreesunker; Dominik Lutter; Ngoc Duong, Signal Processing, Elsevier, 2012, 92, pp. 1928-1936
 |11 min
 |{{dunno}}
@@ Line 1,092: / Line 1,018: @@
 |{{no}}
 |free
-|http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html
+|[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task1.html download] [mailto:emmanuel.vincent@inria.fr email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper]
-|emmanuel.vincent@inria.fr
-|Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F. and Matassoni, M., "The second CHiME Speech Separation and Recognition Challenge: Datasets, tasks and baselines'' In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver
 |12 h
 |34
@@ Line 1,121: / Line 1,045: @@
 |{{no}}
 |WSJ0
-|http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html
+|[http://spandh.dcs.shef.ac.uk/chime_challenge/chime2_task2.html download] [mailto:francesco.nesta@gmail.com email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6637622 paper]
-|francesco.nesta@gmail.com
-|Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F. and Matassoni, M., "The second CHiME Speech Separation and Recognition Challenge: Datasets, tasks and baselines'' In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013, Vancouver
 |33 h
 |101
@@ Line 1,150: / Line 1,072: @@
 |1
 |{{dunno}}
-|{{dunno}}
+|[mailto:guillaume.gravier@irisa.fr email] [http://www.lrec-conf.org/proceedings/lrec2012/summaries/495.html paper]
-|guillaume.gravier@irisa.fr
-|Guillaume Gravier, Gilles Adda, Niklas Paulsson, Matthieu Carr, Aude Giraudel, Olivier Galibert, The ETAPE corpus for the evaluation of speech-based TV content processing in the French language, LREC 2012.
 |32 h
 |347
@@ Line 1,179: / Line 1,099: @@
 |{{no}}
 |$2000 (audio) + $1500 (transcripts)
-|https://catalog.ldc.upenn.edu/LDC2013S04
+|[https://catalog.ldc.upenn.edu/LDC2013S04 purchase] [mailto:strassel@ldc.upenn.edu email]
-|strassel@ldc.upenn.edu
-|
 |108 h
 |{{dunno}}
@@ Line 1,208: / Line 1,126: @@
 |{{no}}
 |2 x [$2000 (audio) + $1500 (transcripts)]
-|https://catalog.ldc.upenn.edu/LDC2013S02
+|[https://catalog.ldc.upenn.edu/LDC2013S02 purchase] [mailto:strassel@ldc.upenn.edu email]
-|strassel@ldc.upenn.edu
-|
 |234 h
 |{{dunno}}
@@ Line 1,237: / Line 1,153: @@
 |{{no}}
 |WSJCAM0
-|http://reverb2014.dereverberation.com/
+|[http://reverb2014.dereverberation.com/ purchase] [mailto:REVERB-challenge@lab.ntt.co.jp email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6701894 paper]
-|REVERB-challenge@lab.ntt.co.jp
-|Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Emanuel Habets, Reinhold Haeb-Umbach, Volker Leutnant, Armin Sehr, Walter Kellermann, Roland Maas, Sharon Gannot, Bhiksha Raj, "The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech", Proc. WASPAA 2013
 |25 h
 |130
@@ Line 1,266: / Line 1,180: @@
 |{{no}}
 |free
-|http://shine.fbk.eu/resources/dirha-ii-simulated-corpus
+|[http://shine.fbk.eu/resources/dirha-ii-simulated-corpus download] [mailto:mravanelli@fbk.eu email] [http://ieeexplore.ieee.org/xpl/login.jsp?arnumber=6843271 paper]
-|mravanelli@fbk.eu
-|Alessio Brutti, Mirco Ravanelli, Piergiorgio Svaizer, Maurizio Omologo, A speech event detection and localization task for multiroom environments, HSCMA 2014.
 |1.3 h
 |30