Datasets

This page aims to provide a list of datasets with detailed attributes and links to corresponding research results (papers, numerical results, output transcriptions, intermediary data, etc). Each dataset may be used for one or more applications: automatic speech recognition, speaker identification and verification, source localization, speech enhancement and separation...

Disclaimer: Only publicly available datasets with a total duration longer than 5 min are listed.

Datasets	General attributes								Speech							Channel				Noise	Ground truth
Datasets	rel. year	use case	total time (h)	sam. rate (kHz)	dist. or noisy mics	video cams	cost	links	speak. time (h)	uniq. speak.	lang.	uniq. words (k)	speak. style	speak. / rec.	overl. type	chan. type	speak. radiat.	speak. loc.	speak. moves	noise type	ref. signal	speak. loc., orient.	words	non- verb. traits	noise events
ShATR	1994	meeting	0.6	48	3	no	free	download paper	0.6	5	UK English	1	spontaneous	5	multiple dialogs	reverb	human	quasi-fixed	head	meeting	headset	yes	yes	no	yes
LLSEC	1996	dialog	1.4	16	4	no	free	download	?	12	N/S	N/S	read, spontaneous	2	dialog	reverb	human	quasi-fixed	head	hallway, restaurant (scenarized)	no	yes	no	no	no
RWCP Spoken Dialog Corpus	1996 - 1997	dialog	10	16	2	no	free	download paper	10	39	Japanese	?	spontaneous	1 - 2	dialog	reverb (low)	human	quasi-fixed	head	stationary background	no	no	yes	no	no
Aurora-2	2000	public spaces	33	8 - 16	1	no	free given TIDigits (0.5 k$)	download paper	33	214	US English	0.01	digits	1	no	simulated phone	human	N/S	no	various real environments	original	N/S	yes	no	yes
SPINE1, SPINE2	2000 - 2001	military	38	16	2	no	7.4 k$	purchase paper	?	100	US English	1	command, spontaneous	1 - 2	no	simulated radio	human	quasi-fixed	head	military	no	no	yes	no	no
Aurora-3 (subset of SpeechDat- Car)	2000 - 2003	car	?	16	4	no	1 k€	purchase papers	?	?	various	?	digits, command, read, spontaneous	1	no	reverb	human	quasi-fixed	head	car	headset	no	yes	no	no
RWCP Meeting Speech Corpus	2001	meeting	3.5	16 - 48	1	3	free	download paper	3.5	?	Japanese	?	spontaneous	1 - 5	meeting	reverb (low)	human	quasi-fixed	head	stationary background	headset	no	yes	no	no
RWCP Real Environment Speech and Acoustic Database	2001	domestic, office	?	16 - 48	30	no	free	download paper	?	5	Japanese	?	read	1	no	real rir, reverb	loudspeaker	various	no, pivoting arm	stationary background	original	yes	yes	no	yes
SpeechDat- Car	2001 - 2011	car	?	16	4	no	39 - 182 k€ per lang	purchase paper	?	300 per lang	various	?	digits, command, read, spontaneous	1	no	reverb	human	quasi-fixed	head	car	headset	no	yes	no	no
Aurora-4	2002	public spaces	?	8 - 16	1	no	free given WSJ0 (1.5 k$)	download paper	?	101	US English	10	read	1	no	simulated phone	human	N/S	no	various real environments	original	N/S	yes	no	yes
TED	2002	seminar	47	16	1	no	0.5 k$	purchase paper	47	188	non-native English	?	lecture	1 or more	seminar	reverb	human	quasi-fixed	head	stationary background	lapel	no	partial	no	no
CUAVE	2002	cocktail party	3	44	1	1	free	download paper	3	36	US English	0.01	digits	1 - 2	full	reverb	human	quasi-fixed	head	stationary background	no	no	yes	no	no
CU-Move Microphone Array Data	2002 - 2011	car	286	44	6 - 8	no	25 k$	purchase paper	286	172	US English	12	digits, command, read, dialog	1	no	reverb	human	quasi-fixed	head	car	no	no	yes	no	no
CENSREC-1 (Aurora-2J)	2003	public spaces	?	8	1	no	free	download paper	?	214	Japanese	0.01	digits	1	no	simulated phone	human	N/S	no	various real environments	original	N/S	yes	no	yes
AVICAR	2004	car	29	16	7	4	free	download paper	29	86	US English, non-native English	1	read	1	no	reverb	human	quasi-fixed	head	car	no	no	yes	no	no
AV16.3	2004	meeting	1.5	16	16	3	free	download paper	1.5	12	N/S	N/S	spontaneous	1 - 3	full	reverb	human	various	walk	stationary background	no	yes	no	no	no
ICSI Meeting Corpus	2004	meeting	72	16	6	no	2.8 k$	purchase paper	72	53	US English	13	meeting	3 - 10	meeting	reverb	human	quasi-fixed	head	stationary background	headset, lapel	no	yes	yes	no
NIST Meeting Pilot Corpus Speech	2004	meeting	15	16	7	no	5.5 k$	purchase paper	15	61	US English	6	meeting	3 - 9	meeting	reverb	human	various	walk	stationary background	headset, lapel	no	yes	no	no
CHIL Meetings	2004 - 2007	seminar, meeting	60	44	79 - 147	6 - 9	3.5 k€	purchase paper	?	?	non-native English	?	seminar, meeting	3 - 20	seminar, meeting	reverb	human	quasi-fixed	head	meeting (scenarized)	headset	yes	yes	yes	no
SPEECON	2004 - 2011	public space, domestic, office, car	?	16	3	no	75 k€ per lang	purchase paper	?	600 per lang	various	?	command, read, spontaneous	1	no	reverb	human	quasi-fixed	head	various real environments	headset	no	yes	no	no
CENSREC-2	2005	car	?	16	1	no	free	download paper	?	214	Japanese	0.01	digits	1	no	reverb	human	quasi-fixed	head	car	headset	no	yes	no	no
CENSREC-3	2005	car	?	16	1	no	21 k¥	purchase paper	?	311	Japanese	0.05	read	1	no	reverb	human	quasi-fixed	head	car	headset	no	yes	no	no
Aurora-5	2006	public spaces, domestic, office, car	?	8	1	no	free given TIDigits (0.5 k$)	download paper	?	225	US English	0.01	digits	1	no	no, simulated rir, real rir	loudspeaker	N/S	no	various real environments	original	no	yes	no	yes
AMI	2006	meeting	100	16	16	6	free	download paper	?	189	UK English	8	meeting	4	meeting (18% overlap)	reverb	human	quasi-fixed	head	stationary background	headset, lapel	yes	yes	yes	no
PASCAL SSC	2006	cocktail party	8.8	25	1	no	free	download paper	8.8	34	UK English	0.05	command	2	full	no	human	N/S	no	no	original	N/S	yes	no	no
HIWIRE	2007	airplane	21	16	1	no	0.05 k€	purchase paper	21	81	non-native English	0.1	command	1	no	no	human	N/S	head	airplane	original	N/S	yes	no	no
UT-Drive	2007	car	40	25	5	2	25 k$	download paper	40	25	US English	2.4	command, dialog	1 - 2	dialog	reverb	human	quasi-fixed	head	car	headset (low quality)	no	partial	no	no
SASSEC, SiSEC under- determined	2007 - 2011	cocktail party	0.3	16	2	no	free	download paper	0.3	16	N/S	N/S	read	3 - 4	full	simulated rir, real rir, reverb	no, loudspeaker	fixed	no	no	original, spatial image	yes	no	no	no
MC-WSJ-AV, PASCAL SSC2, 2012_MMA, REVERB RealData	2007 - 2014	cocktail party	10	16	8 - 40	no	1.5 k$	purchase paper paper HTK Kaldi results results	?	45	UK English	10	read	1 - 2	full	reverb	human	various	walk	stationary background	headset, lapel	yes	yes	no	no
CENSREC-4 (Simulated)	2008	public spaces, domestic, office, car	?	16	1	no	free	download paper	?	214	Japanese	0.01	digits	1	no	real rir	dummy	fixed	no	various real environments	original	no	yes	no	yes
CENSREC-4 (Real)	2008	public spaces, domestic, office, car	?	16	1	no	free	download paper	?	10	Japanese	0.01	digits	1	no	reverb	human	quasi-fixed	head	various real environments	headset	no	yes	no	yes
DICIT	2008	domestic	6	48	16	2	free	download paper	1	?	Italian	?	command	4	no	reverb	human	various	walk	domestic (scenarized)	headset, tv	yes	yes	no	yes
SiSEC head-geometry	2008	cocktail party	1.9	16	2	no	free	download paper	1.9	?	N/S	N/S	read	2	full	real rir	loudspeaker	various	no	no	original, spatial image	yes	no	no	no
COSINE	2009	dialog	38	48	20	no	free	download paper	11	91	US English, non-native English	5	spontaneous	2 - 7	dialog	reverb	human	various	walk	various real environments	headset, throat mic	no	yes	no	no
SiSEC real-world noise	2010	public spaces	0.3	16	2 - 4	no	free	download paper	0.3	6	N/S	N/S	read	1 - 3	full	no, reverb (other room)	loudspeaker	various	no	various real environments	original, spatial image	yes	no	no	no
SiSEC dynamic	2010 - 2011	cocktail party	0.2	16	2 - 4	no	free	download paper	0.2	?	N/S	N/S	read	many but only 2 simultaneous	full	reverb	loudspeaker	various	simulated	no	original, spatial image	yes	no	no	no
CHiME 1, CHiME 2 Grid	2011 - 2012	domestic	70	16 - 48	2	no	free	download paper HTK results results	12	34	UK English	0.05	command	1	no	real rir	dummy	quasi-fixed	simulated head	domestic	yes	yes	yes	no	no
CHiME 2 WSJ0	2012	domestic	78	16	2	no	free given WSJ0 (1.5 k$)	download paper HTK Kaldi results	33	101	US English	11	read	1	no	real rir	dummy	fixed	no	domestic	yes	yes	yes	no	no
ETAPE	2012	TV/radio debates, outdoor interviews	42	16	1	1	?	download paper	32	347	French	16	spontaneous	1 or more	dialog (up to 10% overlap)	reverb (some)	human	quasi-fixed	head	various real environments	no	N/S	yes	no	yes
GALE (Chinese broadcast conversation)	2013	TV dialog	120	16	1	no	3.5 k$	purchase	108	?	Mandarin	?	spontaneous	1 or more	dialog	no	human	quasi-fixed	head	no	no	N/S	yes	no	no
GALE (Arabic broadcast conversation)	2013	TV dialog	251	16	1	no	7 k$	purchase	234	?	Arabic	?	spontaneous	1 or more	dialog	no	human	quasi-fixed	head	no	no	N/S	yes	no	no
REVERB SimData	2013	domestic, office	25	16	8	no	free given WSJCAM0 (1.75 k$)	purchase paper HTK Kaldi results results	25	130	UK English	10	read	1	no	real rir	loudspeaker	fixed	no	stationary background	original, spatial image	yes	yes	no	yes
DIRHA	2014	domestic	3.8	48	40	no	free	download paper	1.3	30	various	?	command, read, spontaneous	1 or more	simulated	real rir	loudspeaker	various	no	domestic (sum of events)	yes	yes	yes	no	yes

Automatic speech recognition

1st CHiME Challenge (2011)

Artificially distorted version of the small vocabulary GRID audio-visual corpus (audio only). Binaural reverberated speech with speaker situated in front of the microphones. Additive household noises impinging from different directions. Clean-training, noisy-training, development and evaluation sets available, see

Jon Barker, E. Vincent, N. Ma, H. Christensen, P. Green, "The PASCAL CHiME speech separation and recognition challenge", Computer Speech & Language, Volume 27, Issue 3, May 2013, Pages 621-633.

Available from Computer Speech and Language here

Corpus available here (no cost)

Resources

Training recipe of the challenge for HTK here.

Baselines

See the paper above for results for a wide range of techniques.

AURORA 5 (2007)

Artificially distorted version of the digits TI-DIGITS corpus. Additive noise and additive noise plus reverberant speech sets. Variable SNR range. Various mixed training sets, no evaluation set, see

G. Hirsch "Aurora-5 Experimental Framework for the Performance Evaluation of Speech Recognition in Case of a Hands-free Speech Input in Noisy Environments", Niederrhein University of Applied Sciences, 2007.

Paper available online here (no cost)

Corpus available from LDC here

Resources

Training recipe for HTK is provided with the corpora.

Baselines

Reproducible baseline: The above cited paper includes a baseline for the ETSI Advanced Front-End.

AURORA 4 (2002)

Artificially distorted version of the 5K word Wall Street Journal corpus (WSJ0). Stationary and non-stationary noises added. Second recordings with distant mismatched microphone. Clean-training, mixed-training, noisy training and test sets available. No evaluation set, see

G. Hirsch "Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task", ETSI STQ Aurora DSR Working Group, 2002.

Paper available with the corpus.

Corpora available from ELRA here and here

Resources

Training recipe for HTK available here. Note that this recipe is for Wall-Street Journal (WSJ0), which is the clean speech version of AURORA4. Small changes are needed in the feature extraction scripts to account for different file terminations.

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute a dataset

To contribute a new dataset, please

create an account and login
go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
click on the "Edit" link at the top of the page and add a new section for your dataset (the datasets are ordered by year of collection)
click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

name of the dataset and year of collection
authors, institution, contact information
link to the dataset and to side resources (lexicon, language model, etc)
short description (nature of the data, license, etc) and link to a paper/report describing the dataset, if any
at least 1 research result obtained for this dataset (see below)

We currently cannot provide storage space for large datasets. Please upload the dataset at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.

Contribute a research result

To contribute a new research result, please

create an account and login
go to the wiki page and the section corresponding to the dataset for which this result was obtained
click on the "Edit" link on the right of the section header and add a new item for your result
click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

authors, paper/report title, means of publication
link to the pdf of the paper
link to derived data (output transcriptions, intermediary data, etc)
Code and instructions to reproduce experiments (if available)

In order to save storage space, please do not upload the paper on this wiki, but link it as much as possible from your institutional archive, from another public archive (e.g., arxiv) or from the publisher website (e.g., ieexplore).

We currently cannot provide storage space for large datasets. Please upload the derived data at a stable URL on the website of your institution or elsewhere and provide its URL only. If this is not possible, please contact the resources sharing working group.

Not logged in

Search

Navigation

Tools

Datasets

Namespaces

Views

Actions

Contents

Automatic speech recognition

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute a dataset

Contribute a research result

Navigation

Tools