Difference between revisions of "Software"

From rosp
m (Automatic speech recognition)
m (Automatic speech recognition)
Line 18: Line 18:
 
!scope="col" width="40px" | extensions
 
!scope="col" width="40px" | extensions
 
!scope="col" width="40px" | language
 
!scope="col" width="40px" | language
 +
!scope="col" width="40px" | hardware optimization
 
!scope="col" width="40px" | VAD
 
!scope="col" width="40px" | VAD
 
!scope="col" width="40px" | acoustic features
 
!scope="col" width="40px" | acoustic features
Line 25: Line 26:
 
!scope="col" width="40px" | decoding techniques
 
!scope="col" width="40px" | decoding techniques
 
!scope="col" width="40px" | training techniques
 
!scope="col" width="40px" | training techniques
!scope="col" width="40px" | hardware optimization
 
 
!scope="col" width="40px" | online ASR
 
!scope="col" width="40px" | online ASR
 
!scope="col" width="40px" | training recipes  
 
!scope="col" width="40px" | training recipes  
Line 33: Line 33:
 
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8)   
 
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8)   
 
|{{yes|Yes}}
 
|{{yes|Yes}}
|{{some|limited [https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms Copyright, allows modif.]}}
+
|{{some|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms proprietary], allows modif.}}
 
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
 
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
 
|[http://cmusphinx.sourceforge.net/ website]  
 
|[http://cmusphinx.sourceforge.net/ website]  
Line 58: Line 58:
 
|1993-2009 (3.4.1)  
 
|1993-2009 (3.4.1)  
 
|{{no|No}}
 
|{{no|No}}
|{{no|limited [http://htk.eng.cam.ac.uk/docs/license.shtml]}}
+
|{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}}
 
|{{yes|Windows, Linux, OSX}}
 
|{{yes|Windows, Linux, OSX}}
 
|[http://htk.eng.cam.ac.uk/download.shtml website]
 
|[http://htk.eng.cam.ac.uk/download.shtml website]
Line 104: Line 104:
 
|2012 (1.1)
 
|2012 (1.1)
 
|{{no|No}}
 
|{{no|No}}
|{{some|[http://www.spraak.org/obtaining-spraak/license Academic/commercial]}}
+
|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}}
 
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux}}
 
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux}}
 
|[http://www.spraak.org/ website]
 
|[http://www.spraak.org/ website]

Revision as of 12:10, 3 September 2014

This page provides software grouped by application.

Automatic speech recognition

ASR engines General attributes Programming Implemented ASR techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization / compensation acoustic models model adaptation / compensation decoding techniques training techniques online ASR training recipes reproducible results
CMU Sphinx 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) Yes proprietary, allows modif. Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) website

paper paper mail-list forum github

Java (Sphinx4), C (pocketsphinx) No Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch Yes AURORA4 (WSJ0)
HTK 1993-2009 (3.4.1) No proprietary Windows, Linux, OSX website

book mail-list

official, ATK, uncertainty decoding C No Yes MFCC, PLP VTLN, CMS GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch Yes AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.)
Kaldi 2009-* (continous updates) Yes Apache 2.0 Windows (not mantained as of 2014), Linux, OSX website

paper mail-list forum SVN

C++ BLAS, LAPACK, GPU (for DNNs) Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform Uses OpenFST, aligment, N-best, lattice rescoring Baum-Welch, MMI (boosted), MC, feature-based, sequence training Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code
Spraak 2012 (1.1) No proprietary Windows (limited), Linux website

paper mail-list forum SVN

C, Python No Yes MFCC, PLP VTLN,CMN, MIDA, MDT Techniques GMM, Tied-Mix, Exemplar based CMLLR aligment, N-best, lattice rescoring, paralel latices Baum-Welch ? AURORA4

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.