Difference between revisions of "Software"

From rosp
(Automatic Speech Recognition)
m (Automatic Speech Recognition)
Line 5: Line 5:
 
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 
|-
 
|-
!style="width: 40px" rowspan="2" class="unsortable"|ASR Engines
+
!scope="col" width="40px" | ASR engines
!scope="col" width="40px" | Release/update
+
!scope="col" width="40px" | release / update
!scope="col" width="40px" | Actively Developed
+
!scope="col" width="40px" | actively developed
!scope="col" width="40px" | Corpora Training-Recipes
+
!scope="col" width="40px" | training recipes
!scope="col" width="40px" | Reproducible Results
+
!scope="col" width="40px" | reproducible results
!scope="col" width="40px" | Licence
+
!scope="col" width="40px" | licence
!scope="col" width="40px" | Platforms
+
!scope="col" width="40px" | platforms
!scope="col" width="40px" | Language
+
!scope="col" width="40px" | language
 
!scope="col" width="40px" | VAD
 
!scope="col" width="40px" | VAD
!scope="col" width="40px" | Acoustic features
+
!scope="col" width="40px" | acoustic features
!scope="col" width="40px" | Feature normalization/compensation
+
!scope="col" width="40px" | feature normalization / compensation
!scope="col" width="40px" | Acoustic models
+
!scope="col" width="40px" | acoustic models
!scope="col" width="40px" | Model adaptation/compensation
+
!scope="col" width="40px" | model adaptation / compensation
 
!scope="col" width="40px" | decoding techniques
 
!scope="col" width="40px" | decoding techniques
 
!scope="col" width="40px" | training techniques
 
!scope="col" width="40px" | training techniques
!scope="col" width="40px" | Hardware Optimization
+
!scope="col" width="40px" | hardware optimization
!scope="col" width="40px" | Online ASR
+
!scope="col" width="40px" | online ASR
!scope="col" width="40px" | Links
+
!scope="col" width="40px" | links
!scope="col" width="40px" | Forums/Mail-Lists
+
!scope="col" width="40px" | forum / mailing list
!scope="col" width="40px" | Online Repository
+
!scope="col" width="40px" | online repository
!scope="col" width="40px" | Extensions
+
!scope="col" width="40px" | extensions
 
|-
 
|-
 
!HTK
 
!HTK

Revision as of 11:25, 3 September 2014

This page provides software grouped by application.

Automatic Speech Recognition

ASR engines release / update actively developed training recipes reproducible results licence platforms language VAD acoustic features feature normalization / compensation acoustic models model adaptation / compensation decoding techniques training techniques hardware optimization online ASR links forum / mailing list online repository extensions
HTK 1993-2009 (3.4.1) No AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.) limited [1] Windows, Linux, OSX C Yes MFCC, PLP VTLN, CMS GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch No Yes Website Book (need registration) mail-lists (low activity) No official, ATK, Uncertainty Decoding,
CMU Sphinx 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) Yes AURORA4 (WSJ0) limited Copyright, allows modif. Sphinx4 (Windows, Linux, OSX) pocketsphinx (Raspberry-pi, IPhone, Android) Sphinx4 (Java), pocketsphinx (C) Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch No Yes Website

Sphinx4 pocketsphinx

mail-lists forums Github
Kaldi 2009-* (continous updates) Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code Apache 2.0 Windows (not mantained as of 2014), Linux, OSX C++ Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform Uses OpenFST, aligment, N-best, lattice rescoring Baum-Welch, MMI (boosted), MC, feature-based, sequence training BLAS, LAPACK, GPU (for DNNs) Yes Website paper mail-lists forums SVN
Spraak 2012 (1.1) No AURORA4 Academic/commercial Windows (limited), Linux C, Python Yes MFCC, PLP VTLN,CMN, MIDA, MDT Techniques GMM, Tied-Mix, Exemplar based CMLLR aligment, N-best, lattice rescoring, paralel latices Baum-Welch No unclear Website paper mail-lists forums SVN (needs registration)

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.