Difference between revisions of "Software"
From rosp
(→Automatic speech recognition) |
(→Automatic speech recognition) |
||
Line 100: | Line 100: | ||
|[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/chime_challenge/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz CHIME-2] | |[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/chime_challenge/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz CHIME-2] | ||
|Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Code] | |Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Code] | ||
+ | |- | ||
+ | !Spraak | ||
+ | |2008-* (1.1.374) | ||
+ | |{{yes|Yes}} | ||
+ | |{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}} | ||
+ | |{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}} | ||
+ | |[http://www.spraak.org/ website] | ||
+ | [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper] | ||
+ | [http://www.spraak.org/mailing-lists mail-list] | ||
+ | [http://sourceforge.net/p/kaldi/discussion/ forum] | ||
+ | [http://www.spraak.org/documentation/doxygen/doc/html/spr__svn.html SVN] | ||
+ | |Missing Data Techniques (MDT) | ||
+ | |C, Python | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |Flexible preprocessing script language -- examples for MFCC, PLP | ||
+ | |VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10] | ||
+ | |GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368] | ||
+ | |CMLLR, eigenvoices, GMM-weight based (NMF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/SPEECOM2013/manuscript_zhang.pdf&auto&xzhang13] -- (all have Matlab dependencies); MAP | ||
+ | |aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09] | ||
+ | |Viterbi | ||
+ | |{{yes|Yes}} | ||
+ | |[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10] | ||
|- | |- | ||
!Spraak | !Spraak |
Revision as of 18:44, 17 November 2014
This page provides software grouped by application.
Contents
Automatic speech recognition
ASR engines | General attributes | Programming | Implemented ASR techniques | Reproducible research | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | VAD | acoustic features | feature normalization / compensation | acoustic models | model adaptation / compensation | decoding techniques | training techniques | online ASR | robust ASR training recipes | reproducible results | |
CMU Sphinx | 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | Yes | BSD-like | Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) | website | Java (Sphinx4), C (pocketsphinx) | No | Yes | MFCC, PLP | CMN, Mel-Spectrum subtraction | GMM, Streams | MLLR, MAP | aligment, N-best, lattice rescoring | Baum-Welch | Yes | AURORA4 (WSJ0) | ||
HTK | 1993-2009 (3.4.1) | Yes | proprietary | Windows, Linux, OSX | website | official, ATK, FE uncertainty decoding | C | No | Yes | MFCC, PLP | VTLN, CMN | GMM (Full Cov.), Tied-Mix, Streams | HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP | aligment, N-best, lattice rescoring | Baum-Welch, MMI, MPE, MWE | Yes | AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB | ETSI-AFE-AURORA2 paper (see AURORA2 purch.) |
Kaldi | 2009-* (continous updates) | Yes | Apache 2.0 | Windows (not mantained as of 2014), Linux, OSX | website | C++ | BLAS, LAPACK, GPU (for DNNs) | Yes | MFCC, PLP | VTLN, CMVN | GMM (Full Cov.), SGMM, DNN | HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | aligment, N-best, lattice rescoring (using OpenFST) | Baum-Welch, MMI (boosted), MC, feature-based | Yes | AURORA4 (WSJ0), CHIME-2 | Weniger2014-REVERB Paper Code | |
Spraak | 2008-* (1.1.374) | Yes | proprietary | Windows (limited), Linux, OSX | website | Missing Data Techniques (MDT) | C, Python | No | Yes | Flexible preprocessing script language -- examples for MFCC, PLP | VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] | GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] | CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP | aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] | Viterbi | Yes | AURORA4, [8] | |
Spraak | 2008-* (1.1.374) | Yes | proprietary | Windows (limited), Linux, OSX | website | Missing Data Techniques (MDT) | C, Python | No | Yes | Flexible preprocessing script language -- examples for MFCC, PLP | VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [9], Noise normalization [10] | GMM (Tied-Mix), Exemplar based [11], NN, CRF, ... (flexible using the preprocessing script) [12] | CMLLR, eigenvoices, GMM-weight based (NMF) [13] -- (all have Matlab dependencies); MAP | aligment, lattice rescoring, SCRF rescoring (using SCARF) [14], phone lattice rescoring [15] | Viterbi | Yes | AURORA4, [16] |
Speaker identification and verification
Speech enhancement and separation
Other applications
Contribute software
To contribute new software, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the software and year of the latest version
- authors, institution, contact information
- link to the software, ideally including a short demo, and to the external libraries needed
- short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
- whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user
In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.