Difference between revisions of "Software"
From rosp
(→Automatic speech recognition) |
(→Automatic speech recognition) |
||
Line 1: | Line 1: | ||
This page provides software grouped by application. | This page provides software grouped by application. | ||
− | == [[ | + | {| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" |
− | + | |- | |
− | + | !style="width: 40px" rowspan="2" class="unsortable"|ASR Engines | |
− | + | !scope="col" width="40px" | Release/update | |
− | + | !scope="col" width="40px" | Actively Developed | |
− | + | !scope="col" width="40px" | Corpora Training-Recipes | |
− | + | !scope="col" width="40px" | Reproducible Results | |
− | + | !scope="col" width="40px" | Licence | |
− | + | !scope="col" width="40px" | Platforms | |
− | + | !scope="col" width="40px" | Language | |
− | + | !scope="col" width="40px" | VAD | |
− | + | !scope="col" width="40px" | Acoustic features | |
− | + | !scope="col" width="40px" | Feature normalization/compensation | |
− | + | !scope="col" width="40px" | Acoustic models | |
− | + | !scope="col" width="40px" | Model adaptation/compensation | |
− | + | !scope="col" width="40px" | decoding techniques | |
− | + | !scope="col" width="40px" | training techniques | |
− | + | !scope="col" width="40px" | Hardware Optimization | |
− | + | !scope="col" width="40px" | Online ASR | |
− | + | !scope="col" width="40px" | Links | |
− | * | + | !scope="col" width="40px" | Forums/Mail-Lists |
− | + | !scope="col" width="40px" | Online Repository | |
− | + | !scope="col" width="40px" | Extensions | |
+ | |- | ||
+ | !HTK | ||
+ | |1993-2009 (3.4.1) | ||
+ | |{{no|No}} | ||
+ | |[http://catalog.elra.info/product_info.php?cPath=37_40&products_id=693 AURORA2 (purch.)] [http://catalog.elra.info/index.php?cPath=37_40 AURORA3 (purch.)], [http://www.keithv.com/software/htk/ AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/projects/chime/PCC/data/pcchome.tar.gz CHIME-1], [ftp://ftp.dcs.shef.ac.uk/share/spandh/chime_challenge/grid/eval_tools_grid.tgz CHIME-2-I], [http://reverb2014.dereverberation.com/tools/REVERB_TOOLS_FOR_ASR_ver2.0.tgz REVERB] | ||
+ | |ETSI-AFE-AURORA2 [http://aurora.hsnr.de/download/Aurora2_afe_v1_1.pdf paper] (see AURORA2 purch.) | ||
+ | |{{no|limited [http://htk.eng.cam.ac.uk/docs/license.shtml]}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |C | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, PLP | ||
+ | |VTLN, CMS | ||
+ | |GMM (Full Cov.), Tied-Mix, Streams | ||
+ | |HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP | ||
+ | |aligment, N-best, lattice rescoring | ||
+ | |Baum-Welch | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |[http://htk.eng.cam.ac.uk/download.shtml Website ] [http://htk.eng.cam.ac.uk/docs/docs.shtml Book] (need registration) | ||
+ | |[http://htk.eng.cam.ac.uk/mailing/subscribe_mail.shtml mail-lists] (low activity) | ||
+ | |{{no|No}} | ||
+ | |[http://htk.eng.cam.ac.uk/extensions/index.shtml official], [http://htk.eng.cam.ac.uk/develop/atk.shtml ATK], [https://github.com/ramon-astudillo/custom_fe Uncertainty Decoding], | ||
+ | |- | ||
+ | !Sphinx4 | ||
+ | |1986-2011 (4.1.0) | ||
+ | |{{yes|Yes}} | ||
+ | |[http://www.keithv.com/software/sphinx/ AURORA4 (WSJ0)] | ||
+ | | | ||
+ | |{{some|limited [https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms Copyright, allows modif.]}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |Java | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, PLP | ||
+ | |CMN, Mel-Spectrum subtraction | ||
+ | |GMM, Streams | ||
+ | |MLLR, MAP | ||
+ | |aligment, N-best, lattice rescoring | ||
+ | |Baum-Welch | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |[http://cmusphinx.sourceforge.net/ Website] | ||
+ | [http://www.researchgate.net/publication/228770826_Sphinx-4_A_flexible_open_source_framework_for_speech_recognition/file/79e4150c20aeb37c52.pdf paper] | ||
+ | |[http://sourceforge.net/p/cmusphinx/mailman/ mail-lists] [http://sourceforge.net/p/cmusphinx/discussion/ forums] | ||
+ | |{{yes|[https://github.com/cmusphinx/sphinx4 Github]}} | ||
+ | | | ||
+ | |- | ||
+ | !Kaldi | ||
+ | |2009-* (continous updates) | ||
+ | |{{yes|Yes}} | ||
+ | |[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], CHIME-2 | ||
+ | |Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Code] | ||
+ | |{{Yes|Apache 2.0}} | ||
+ | |{{some|Windows (not mantained as of 2014), Linux, OSX}} | ||
+ | |C++ | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, PLP | ||
+ | |VTLN, CMVN | ||
+ | |GMM (Full Cov.), SGMM, DNN | ||
+ | |HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | ||
+ | |Uses OpenFST, aligment, N-best, lattice rescoring | ||
+ | |Baum-Welch, MMI (boosted), MC, feature-based, sequence training | ||
+ | |{{yes|BLAS, LAPACK, GPU (for DNNs)}} | ||
+ | |{{yes|Yes}} | ||
+ | |[http://kaldi.sourceforge.net/about.html Website] [http://homepages.inf.ed.ac.uk/aghoshal/pubs/asru11-kaldi.pdf paper] | ||
+ | |[http://sourceforge.net/p/kaldi/mailman/kaldi-users/ mail-lists] [http://sourceforge.net/p/kaldi/discussion/ forums] | ||
+ | |{{yes|[http://kaldi.sourceforge.net/install.html SVN]}} | ||
+ | | | ||
+ | |- | ||
+ | !Spraak | ||
+ | |2012 (1.1) | ||
+ | |{{no|No}} | ||
+ | |[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4] | ||
+ | | | ||
+ | |{{some|[http://www.spraak.org/obtaining-spraak/license Academic/commercial]}} | ||
+ | |{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux}} | ||
+ | |C, Python | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, PLP | ||
+ | |VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques] | ||
+ | |GMM, Tied-Mix, Exemplar based | ||
+ | |CMLLR | ||
+ | |aligment, N-best, lattice rescoring, paralel latices | ||
+ | |Baum-Welch | ||
+ | |{{no|No}} | ||
+ | |{{some|unclear}} | ||
+ | |[http://www.spraak.org/ Website] [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper] | ||
+ | |[http://www.spraak.org/mailing-lists mail-lists] [http://sourceforge.net/p/kaldi/discussion/ forums] | ||
+ | |{{some|[http://www.spraak.org/documentation/doxygen/doc/html/spr__svn.html SVN (needs registration)]}} | ||
+ | | | ||
+ | |} | ||
== [[Speaker identification and verification]] == | == [[Speaker identification and verification]] == |
Revision as of 20:01, 29 August 2014
This page provides software grouped by application.
ASR Engines | Release/update | Actively Developed | Corpora Training-Recipes | Reproducible Results | Licence | Platforms | Language | VAD | Acoustic features | Feature normalization/compensation | Acoustic models | Model adaptation/compensation | decoding techniques | training techniques | Hardware Optimization | Online ASR | Links | Forums/Mail-Lists | Online Repository | Extensions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HTK | 1993-2009 (3.4.1) | No | AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, REVERB | ETSI-AFE-AURORA2 paper (see AURORA2 purch.) | limited [1] | Windows, Linux, OSX | C | Yes | MFCC, PLP | VTLN, CMS | GMM (Full Cov.), Tied-Mix, Streams | HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP | aligment, N-best, lattice rescoring | Baum-Welch | No | Yes | Website Book (need registration) | mail-lists (low activity) | No | official, ATK, Uncertainty Decoding, |
Sphinx4 | 1986-2011 (4.1.0) | Yes | AURORA4 (WSJ0) | limited Copyright, allows modif. | Windows, Linux, OSX | Java | Yes | MFCC, PLP | CMN, Mel-Spectrum subtraction | GMM, Streams | MLLR, MAP | aligment, N-best, lattice rescoring | Baum-Welch | No | Yes | Website | mail-lists forums | Github | ||
Kaldi | 2009-* (continous updates) | Yes | AURORA4 (WSJ0), CHIME-2 | Weniger2014-REVERB Paper Code | Apache 2.0 | Windows (not mantained as of 2014), Linux, OSX | C++ | Yes | MFCC, PLP | VTLN, CMVN | GMM (Full Cov.), SGMM, DNN | HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | Uses OpenFST, aligment, N-best, lattice rescoring | Baum-Welch, MMI (boosted), MC, feature-based, sequence training | BLAS, LAPACK, GPU (for DNNs) | Yes | Website paper | mail-lists forums | SVN | |
Spraak | 2012 (1.1) | No | AURORA4 | Academic/commercial | Windows (limited), Linux | C, Python | Yes | MFCC, PLP | VTLN,CMN, MIDA, MDT Techniques | GMM, Tied-Mix, Exemplar based | CMLLR | aligment, N-best, lattice rescoring, paralel latices | Baum-Welch | No | unclear | Website paper | mail-lists forums | SVN (needs registration) |
Contents
Speaker identification and verification
Speech enhancement and separation
Other applications
Contribute software
To contribute new software, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the software and year of the latest version
- authors, institution, contact information
- link to the software, ideally including a short demo, and to the external libraries needed
- short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
- whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user
In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.