Software
From rosp
This page provides software grouped by application.
Contents
Automatic speech recognition
ASR engines | General attributes | Programming | Implemented techniques | Reproducible research | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | VAD | acoustic features | feature normalization | acoustic models | model adaptation | decoding techniques | training techniques | online ASR | robust ASR recipes | reproducible results | |
CMU Sphinx | 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | Yes | BSD-like | Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) | website | Java (Sphinx4), C (pocketsphinx) | No | Yes | MFCC, PLP | CMN, Mel-Spectrum subtraction | GMM, Streams | MLLR, MAP | aligment, N-best, lattice rescoring | Baum-Welch | Yes | AURORA4 (WSJ0) | ||
HTK | 1993-2009 (3.4.1) | Yes | proprietary | Windows, Linux, OSX | website | official
ATK uncertain features diagonal uncertainty decoding full uncertainty decoding |
C | No | Yes | MFCC, PLP | VTLN, CMN | GMM (Full Cov.), Tied-Mix, Streams | HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP | aligment, N-best, lattice rescoring | Baum-Welch, MMI, MPE, MWE | Yes | AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB | ETSI-AFE-AURORA2 paper (see AURORA2 purch.) |
Kaldi | 2009-* (continous updates) | Yes | Apache 2.0 | Windows (not mantained as of 2014), Linux, OSX | website | uncertain features
diagonal uncertainty decoding Matlab conversion tools DNN Uncertainty Decoding |
C++ | BLAS, LAPACK, GPU (for DNNs) | Yes | MFCC, PLP | VTLN, CMVN | GMM (Full Cov.), SGMM, DNN | HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | aligment, N-best, lattice rescoring (uses OpenFST) | Baum-Welch, MMI (boosted), MC, feature-based | Yes | AURORA4 (WSJ0), CHIME-2 | Weniger2014-REVERB Paper Code |
Spraak | 2008-* (1.1.374) | Yes | proprietary | Windows (limited), Linux, OSX | website | Missing Data Techniques (MDT) | C, Python | No | Yes | Flexible preprocessing script language -- examples for MFCC, PLP | VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] | GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] | CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP | aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] | Viterbi | Yes | AURORA4, [8] | |
Julius | 1997-* (4.3.1) | Yes | propietary | Windows, Linux, OSX | website | htk2Julius grammar | C | No | Yes | MFCC | VTLN,CMVN | GMM (Tied-Mix) | aligment, two-pass decoder | Baum-Welch | Yes (low latency) | |||
RWTH | 2001-* (0.6.1) | Yes | non-commercial | Windows, Linux, OSX | website | C | BLAS, LAPACK, GPU (CUDA), OpenMP | Yes | MFCC, PLP, Gammatone, Tandem (MLP) | VTLN, CMVN, PCA, LDA | GMM (Tied covariance), DNN | MLLR, CMLLR, BIC | aligment, lattice rescoring, system fusion | Baum-Welch, MPE | Yes |
Speaker identification and verification
Software | General attributes | Programming | Implemented techniques | Reproducible research | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | VAD | acoustic features | feature normalization | UBM | subspace projection | subspace normalization | scoring | diarization | robust recognition recipes | reproducible results | |
BECARS | 2002-2005 (1.1.9) | No | CeCILL | Windows, Linux | download | C | No | No | MFCC | Gaussianization | GMM | MMI-weighted LLR | No | |||||
ALIZE | 2005-* | Yes | LGPL | Windows, Linux, OSX | download | C++, Perl, Bash | No | Yes | MFCC, LFCC | CMVN | GMM | JFA, i-vector | whitening, length norm, LDA, WCCN | cosine, Mahalanobis, SVM, PLDA, Z/T norm | Yes | |||
LIUM SpkDiarization | 2009-2013 | No | GPL | Windows, Linux, OSX | download | Python extension | Java | No | Yes | MFCC, LFCC | CMVN | GMM | i-vector | cosine, Mahalanobis | Yes | |||
MSR Identity Toolbox | 2013 | No | proprietary | Windows, Linux, OSX | download | Matlab | No | Yes | MFCC | CMVN, Gaussianization | GMM | i-vector | whitening, length norm, LDA | PLDA | No | |||
SIDEKIT | 2014-* | Yes | LGPL | Windows, Linux, OSX | [9] | [11] | Python | Yes multiprocessing, threading | Yes | MFCC, LFCC, FB, bottleneck | CMS, CMVN, Gaussianization, RASTA | GMM, DNN | i-vector, JFA, LFA, SVM | whitening, length norm, LDA, WCCN, EFR, SphericalNorm | PLDA, Cosine, Mahalanobis, 2 Covariance, Dot-product | Yes | No | Yes |
SPEAR | 2014-* | Yes | GPL | Windows, Linux, OSX | download | Python | SGE grid | Yes | MFCC, LFCC | CMVN | GMM | ISV, JFA, i-vector | whitening, length norm, LDA, WCCN | PLDA, Z/T norm, score fusion | Yes |
Speech enhancement and separation
Software | General attributes | Programming | Implemented techniques | Reproducible research | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | spatial model | spectral model | estimation algorithm | online separation | public recipes | reproducible results | |
BTK | 2005-* | Yes | proprietary | Linux, OSX | download | C++, Python | BLAS | DS, SD, MVDR, MN beamforming;
Zelinski, McCowan, Lefkimmiatis post-filters |
none | GCC-PHAT localization | Yes | |||
MESSL | 2006-2009 | No | proprietary | Windows, Linux, OSX | download | Matlab | No | IPD/ILD clustering | none | EM | No | |||
BeamformIt | 2006-2014 (3.51) | Yes | ICSI Open Source Speech Tools | Windows, Linux, OSX | download | C++ | No | weighted DS beamforming | none | GCC-PHAT localization | Yes | NIST RT06 (included), AMI | ||
ManyEars | 2007-2014 (1.1.2) | Yes | GPL | Windows, Linux, OSX | download | C | No | geometric ICA | Wiener post-filter (noise only) | CC-PHAT localization | No | |||
HARK | 2010-* (2.1.2) | Yes | non-commercial | Windows, Linux, OSX | download | C++ | BLAS | DS, weighted DS, LCMV, GJ, max SNR beamforming;
geometric ICA |
Wiener post-filter (noise only) | MUSIC localization; MCRA noise estimation | Yes | |||
FASST | 2012-* (2.0) | Yes | QPL | Windows, Linux, OSX | download | C++, Matlab, Python | OpenMP | full-rank spatial covariance model | NMF, source-filter NMF, harmonic NMF, smooth NMF | EM and multiplicative updates | No |
Other applications
Contribute software
To contribute new software, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the software and year of the latest version
- authors, institution, contact information
- link to the software, ideally including a short demo, and to the external libraries needed
- short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
- whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user
In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.