Difference between revisions of "Software"
From rosp
m (→Automatic speech recognition) |
m (→Automatic speech recognition) |
||
Line 18: | Line 18: | ||
!scope="col" width="40px" | extensions | !scope="col" width="40px" | extensions | ||
!scope="col" width="40px" | language | !scope="col" width="40px" | language | ||
+ | !scope="col" width="40px" | hardware optimization | ||
!scope="col" width="40px" | VAD | !scope="col" width="40px" | VAD | ||
!scope="col" width="40px" | acoustic features | !scope="col" width="40px" | acoustic features | ||
Line 25: | Line 26: | ||
!scope="col" width="40px" | decoding techniques | !scope="col" width="40px" | decoding techniques | ||
!scope="col" width="40px" | training techniques | !scope="col" width="40px" | training techniques | ||
− | |||
!scope="col" width="40px" | online ASR | !scope="col" width="40px" | online ASR | ||
!scope="col" width="40px" | training recipes | !scope="col" width="40px" | training recipes | ||
Line 33: | Line 33: | ||
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | |1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | ||
|{{yes|Yes}} | |{{yes|Yes}} | ||
− | |{{some| | + | |{{some|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms proprietary], allows modif.}} |
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}} | |{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}} | ||
|[http://cmusphinx.sourceforge.net/ website] | |[http://cmusphinx.sourceforge.net/ website] | ||
Line 58: | Line 58: | ||
|1993-2009 (3.4.1) | |1993-2009 (3.4.1) | ||
|{{no|No}} | |{{no|No}} | ||
− | |{{no| | + | |{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}} |
|{{yes|Windows, Linux, OSX}} | |{{yes|Windows, Linux, OSX}} | ||
|[http://htk.eng.cam.ac.uk/download.shtml website] | |[http://htk.eng.cam.ac.uk/download.shtml website] | ||
Line 104: | Line 104: | ||
|2012 (1.1) | |2012 (1.1) | ||
|{{no|No}} | |{{no|No}} | ||
− | |{{ | + | |{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}} |
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux}} | |{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux}} | ||
|[http://www.spraak.org/ website] | |[http://www.spraak.org/ website] |
Revision as of 12:10, 3 September 2014
This page provides software grouped by application.
Contents
Automatic speech recognition
ASR engines | General attributes | Programming | Implemented ASR techniques | Reproducible research | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | VAD | acoustic features | feature normalization / compensation | acoustic models | model adaptation / compensation | decoding techniques | training techniques | online ASR | training recipes | reproducible results | |
CMU Sphinx | 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | Yes | proprietary, allows modif. | Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) | website | Java (Sphinx4), C (pocketsphinx) | No | Yes | MFCC, PLP | CMN, Mel-Spectrum subtraction | GMM, Streams | MLLR, MAP | aligment, N-best, lattice rescoring | Baum-Welch | Yes | AURORA4 (WSJ0) | ||
HTK | 1993-2009 (3.4.1) | No | proprietary | Windows, Linux, OSX | website | official, ATK, uncertainty decoding | C | No | Yes | MFCC, PLP | VTLN, CMS | GMM (Full Cov.), Tied-Mix, Streams | HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP | aligment, N-best, lattice rescoring | Baum-Welch | Yes | AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, REVERB | ETSI-AFE-AURORA2 paper (see AURORA2 purch.) |
Kaldi | 2009-* (continous updates) | Yes | Apache 2.0 | Windows (not mantained as of 2014), Linux, OSX | website | C++ | BLAS, LAPACK, GPU (for DNNs) | Yes | MFCC, PLP | VTLN, CMVN | GMM (Full Cov.), SGMM, DNN | HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | Uses OpenFST, aligment, N-best, lattice rescoring | Baum-Welch, MMI (boosted), MC, feature-based, sequence training | Yes | AURORA4 (WSJ0), CHIME-2 | Weniger2014-REVERB Paper Code | |
Spraak | 2012 (1.1) | No | proprietary | Windows (limited), Linux | website | C, Python | No | Yes | MFCC, PLP | VTLN,CMN, MIDA, MDT Techniques | GMM, Tied-Mix, Exemplar based | CMLLR | aligment, N-best, lattice rescoring, paralel latices | Baum-Welch | ? | AURORA4 |
Speaker identification and verification
Speech enhancement and separation
Other applications
Contribute software
To contribute new software, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the software and year of the latest version
- authors, institution, contact information
- link to the software, ideally including a short demo, and to the external libraries needed
- short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
- whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user
In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.