Difference between revisions of "Software"

From rosp
(Speech enhancement and separation)
(Speech enhancement and separation)
Line 239: Line 239:
 
|{{yes|Yes}}
 
|{{yes|Yes}}
 
|NIST RT06 (included), [https://github.com/kaldi-asr/kaldi/tree/master/egs/ami/ AMI]
 
|NIST RT06 (included), [https://github.com/kaldi-asr/kaldi/tree/master/egs/ami/ AMI]
 +
|-
 +
!HARK
 +
|2010-* (2.1.2)
 +
|{{yes|Yes}}
 +
|{{some|[http://www.hark.jp/HARK_License_Agreement.pdf non-commercial]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://www.hark.jp/ website]
 +
[http://www.tandfonline.com/doi/abs/10.1163/016918610X493561#.VeYVmLPcI_s paper]
 +
|
 +
|Python, C++
 +
|{{yes|BLAS}}
 +
|DS, weighted DS, LCMV, GJ, max SNR beamforming;
 +
geometrically constrained ICA
 +
|Wiener post-filter (noise only)
 +
|MUSIC localization; MCRA noise estimation
 +
|{{yes|Yes}}
 +
|
 +
|
 
|}
 
|}
  

Revision as of 22:25, 1 September 2015

This page provides software grouped by application.

Automatic speech recognition

ASR engines General attributes Programming Implemented ASR techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization / compensation acoustic models model adaptation / compensation decoding techniques training techniques online ASR robust ASR training recipes reproducible results
CMU Sphinx 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) Yes BSD-like Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) website

paper paper mail-list forum github

Java (Sphinx4), C (pocketsphinx) No Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch Yes AURORA4 (WSJ0)
HTK 1993-2009 (3.4.1) Yes proprietary Windows, Linux, OSX website

book mail-list

official

ATK uncertain features diagonal uncertainty decoding full uncertainty decoding

C No Yes MFCC, PLP VTLN, CMN GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch, MMI, MPE, MWE Yes AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.)
Kaldi 2009-* (continous updates) Yes Apache 2.0 Windows (not mantained as of 2014), Linux, OSX website

paper mail-list forum SVN

uncertain features

diagonal uncertainty decoding Matlab conversion tools DNN Uncertainty Decoding

C++ BLAS, LAPACK, GPU (for DNNs) Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform aligment, N-best, lattice rescoring (uses OpenFST) Baum-Welch, MMI (boosted), MC, feature-based Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code
Spraak 2008-* (1.1.374) Yes proprietary Windows (limited), Linux, OSX website

paper mail-list forum SVN

Missing Data Techniques (MDT) C, Python No Yes Flexible preprocessing script language -- examples for MFCC, PLP VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] Viterbi Yes AURORA4, [8]
Julius 1997-* (4.3.1) Yes propietary Windows, Linux, OSX website

book book online mail-list forum CVS

htk2Julius grammar

phoneme seg.

C No Yes MFCC VTLN,CMVN GMM (Tied-Mix) aligment, two-pass decoder Baum-Welch Yes (low latency)
RWTH 2001-* (0.6.1) Yes non-commercial Windows, Linux, OSX website

paper wiki forum

C BLAS, LAPACK, GPU (CUDA), OpenMP Yes MFCC, PLP, Gammatone, Tandem (MLP) VTLN, CMVN, PCA, LDA GMM (Tied covariance), DNN MLLR, CMLLR, BIC aligment, lattice rescoring, system fusion Baum-Welch, MPE Yes

Speaker identification and verification

Speech enhancement and separation

Software General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization spatial model spectral model estimation algorithm online separation public recipes reproducible results
BTK 2005-* Yes proprietary Linux, OSX website

papers

Python, C++ BLAS DS, SD, MVDR, MN beamforming;

Zelinski, McCowan, Lefkimmiatis post-filters

none GCC-PHAT localization Yes
BeamformIt 2006-2014 (3.51) Yes ICSI Open Source Speech Tools Windows, Linux, OSX website

paper thesis

C++ No weighted DS beamforming none GCC-PHAT localization Yes NIST RT06 (included), AMI
HARK 2010-* (2.1.2) Yes non-commercial Windows, Linux, OSX website

paper

Python, C++ BLAS DS, weighted DS, LCMV, GJ, max SNR beamforming;

geometrically constrained ICA

Wiener post-filter (noise only) MUSIC localization; MCRA noise estimation Yes

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.