Revision as of 14:48, 19 September 2015 by Ramon.astudillo (talk | contribs) (Automatic speech recognition)

Software

From rosp

This page provides software grouped by application.

Automatic speech recognition

ASR engines General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization acoustic models model adaptation decoding techniques training techniques online ASR robust ASR recipes reproducible results
CMU Sphinx 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) Yes BSD-like Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) website

paper paper mail-list forum github

Java (Sphinx4), C (pocketsphinx) No Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch Yes AURORA4 (WSJ0)
HTK 1993-2009 (3.4.1) Yes proprietary Windows, Linux, OSX website

book mail-list

official

ATK uncertain features diagonal uncertainty decoding full uncertainty decoding

C No Yes MFCC, PLP VTLN, CMN GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch, MMI, MPE, MWE Yes AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.)
Kaldi 2009-* (continous updates) Yes Apache 2.0 Windows (not mantained as of 2014), Linux, OSX website

paper mail-list forum SVN

uncertain features

diagonal uncertainty decoding Matlab conversion tools DNN Uncertainty Decoding

C++ BLAS, LAPACK, GPU (for DNNs) Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform aligment, N-best, lattice rescoring (uses OpenFST) Baum-Welch, MMI (boosted), MC, feature-based Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code
Spraak 2008-* (1.1.374) Yes proprietary Windows (limited), Linux, OSX website

paper mail-list forum SVN

Missing Data Techniques (MDT) C, Python No Yes Flexible preprocessing script language -- examples for MFCC, PLP VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] Viterbi Yes AURORA4, [8]
Julius 1997-* (4.3.1) Yes propietary Windows, Linux, OSX website

book book online mail-list forum CVS

htk2Julius grammar

phoneme seg.

C No Yes MFCC VTLN,CMVN GMM (Tied-Mix) aligment, two-pass decoder Baum-Welch Yes (low latency)
RWTH 2001-* (0.6.1) Yes non-commercial Windows, Linux, OSX website

paper wiki forum

C BLAS, LAPACK, GPU (CUDA), OpenMP Yes MFCC, PLP, Gammatone, Tandem (MLP) VTLN, CMVN, PCA, LDA GMM (Tied covariance), DNN MLLR, CMLLR, BIC aligment, lattice rescoring, system fusion Baum-Welch, MPE Yes

Speaker identification and verification

Software General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization UBM subspace projection subspace normalization scoring diarization robust recognition recipes reproducible results
BECARS 2002-2005 (1.1.9) No CeCILL Windows, Linux download

paper

C No No MFCC Gaussianization GMM MMI-weighted LLR No
ALIZE 2005-* Yes LGPL Windows, Linux, OSX download

mail-list linkedin paper

C++, Perl, Bash No Yes MFCC, LFCC CMVN GMM JFA, i-vector whitening, length norm, LDA, WCCN cosine, Mahalanobis, SVM, PLDA, Z/T norm Yes
LIUM SpkDiarization 2009-2013 No GPL Windows, Linux, OSX download

paper

Python extension Java No Yes MFCC, LFCC CMVN GMM i-vector cosine, Mahalanobis Yes
MSR Identity Toolbox 2013 No proprietary Windows, Linux, OSX download

paper

Matlab No Yes MFCC CMVN, Gaussianization GMM i-vector whitening, length norm, LDA PLDA No
SPEAR 2014-* Yes GPL Windows, Linux, OSX download

paper

Python SGE grid Yes MFCC, LFCC CMVN GMM ISV, JFA, i-vector whitening, length norm, LDA, WCCN PLDA, Z/T norm, score fusion Yes

Speech enhancement and separation

Software General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization spatial model spectral model estimation algorithm online separation public recipes reproducible results
BTK 2005-* Yes proprietary Linux, OSX download

papers

C++, Python BLAS DS, SD, MVDR, MN beamforming;

Zelinski, McCowan, Lefkimmiatis post-filters

none GCC-PHAT localization Yes
MESSL 2006-2009 No proprietary Windows, Linux, OSX download

paper

Matlab No IPD/ILD clustering none EM No
BeamformIt 2006-2014 (3.51) Yes ICSI Open Source Speech Tools Windows, Linux, OSX download

paper thesis

C++ No weighted DS beamforming none GCC-PHAT localization Yes NIST RT06 (included), AMI
ManyEars 2007-2014 (1.1.2) Yes GPL Windows, Linux, OSX download

paper

C No geometric ICA Wiener post-filter (noise only) CC-PHAT localization No
HARK 2010-* (2.1.2) Yes non-commercial Windows, Linux, OSX download

paper

C++ BLAS DS, weighted DS, LCMV, GJ, max SNR beamforming;

geometric ICA

Wiener post-filter (noise only) MUSIC localization; MCRA noise estimation Yes
FASST 2012-* (2.0) Yes QPL Windows, Linux, OSX download

paper

C++, Matlab, Python OpenMP full-rank spatial covariance model NMF, source-filter NMF, harmonic NMF, smooth NMF EM and multiplicative updates No

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.