Difference between revisions of "Software"
From rosp
(→Speaker identification and verification) |
|||
(31 intermediate revisions by 3 users not shown) | |||
Line 2: | Line 2: | ||
== [[Automatic speech recognition]] == | == [[Automatic speech recognition]] == | ||
− | |||
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | {| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | ||
|- | |- | ||
!style="width: 40px" rowspan="2" class="unsortable"| ASR engines | !style="width: 40px" rowspan="2" class="unsortable"| ASR engines | ||
− | !colspan="6" |General | + | !colspan="6" |General attributes |
!colspan="2" |Programming | !colspan="2" |Programming | ||
− | !colspan="8" |Implemented | + | !colspan="8" |Implemented techniques |
!colspan="2" |Reproducible research | !colspan="2" |Reproducible research | ||
|- | |- | ||
Line 15: | Line 14: | ||
!scope="col" width="40px" | licence | !scope="col" width="40px" | licence | ||
!scope="col" width="40px" | platforms | !scope="col" width="40px" | platforms | ||
− | !scope="col" width="40px" | links | + | !scope="col" width="40px" | links |
!scope="col" width="40px" | extensions | !scope="col" width="40px" | extensions | ||
!scope="col" width="40px" | language | !scope="col" width="40px" | language | ||
Line 21: | Line 20: | ||
!scope="col" width="40px" | VAD | !scope="col" width="40px" | VAD | ||
!scope="col" width="40px" | acoustic features | !scope="col" width="40px" | acoustic features | ||
− | !scope="col" width="40px" | feature normalization | + | !scope="col" width="40px" | feature normalization |
!scope="col" width="40px" | acoustic models | !scope="col" width="40px" | acoustic models | ||
− | !scope="col" width="40px" | model adaptation | + | !scope="col" width="40px" | model adaptation |
!scope="col" width="40px" | decoding techniques | !scope="col" width="40px" | decoding techniques | ||
!scope="col" width="40px" | training techniques | !scope="col" width="40px" | training techniques | ||
!scope="col" width="40px" | online ASR | !scope="col" width="40px" | online ASR | ||
− | !scope="col" width="40px" | robust ASR | + | !scope="col" width="40px" | robust ASR recipes |
!scope="col" width="40px" | reproducible results | !scope="col" width="40px" | reproducible results | ||
|- | |- | ||
!CMU Sphinx | !CMU Sphinx | ||
− | |1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | + | |1986-* (Sphinx 4.1.0, pocketsphinx 0.8) |
|{{yes|Yes}} | |{{yes|Yes}} | ||
|{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}} | |{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}} | ||
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}} | |{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}} | ||
− | |[http://cmusphinx.sourceforge.net/ website] | + | |[http://cmusphinx.sourceforge.net/ website] |
[http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper] | [http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper] | ||
[https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper] | [https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper] | ||
Line 56: | Line 55: | ||
|- | |- | ||
!HTK | !HTK | ||
− | |1993-2009 (3.4.1) | + | |1993-2009 (3.4.1) |
|{{yes|Yes}} | |{{yes|Yes}} | ||
|{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}} | |{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}} | ||
Line 63: | Line 62: | ||
[http://htk.eng.cam.ac.uk/docs/docs.shtml book] | [http://htk.eng.cam.ac.uk/docs/docs.shtml book] | ||
[http://htk.eng.cam.ac.uk/mailing/subscribe_mail.shtml mail-list] | [http://htk.eng.cam.ac.uk/mailing/subscribe_mail.shtml mail-list] | ||
− | |[http://htk.eng.cam.ac.uk/extensions/index.shtml official] | + | |[http://htk.eng.cam.ac.uk/extensions/index.shtml official] |
+ | [http://htk.eng.cam.ac.uk/develop/atk.shtml ATK] | ||
+ | [https://github.com/ramon-astudillo/custom_fe uncertain features] | ||
+ | [http://www.astudillo.com/ramon/research/stft-up/ diagonal uncertainty decoding] | ||
+ | [http://full-ud-htk.gforge.inria.fr/ full uncertainty decoding] | ||
|C | |C | ||
|{{no|No}} | |{{no|No}} | ||
Line 87: | Line 90: | ||
[http://sourceforge.net/p/kaldi/discussion/ forum] | [http://sourceforge.net/p/kaldi/discussion/ forum] | ||
[http://kaldi.sourceforge.net/install.html SVN] | [http://kaldi.sourceforge.net/install.html SVN] | ||
− | | | + | |[https://github.com/ramon-astudillo/custom_fe uncertain features] |
+ | [http://ud-kaldi.gforge.inria.fr/ diagonal uncertainty decoding] | ||
+ | [http://kaldi-to-matlab.gforge.inria.fr/ Matlab conversion tools] | ||
+ | [https://github.com/makladios/Kaldi_Matlab_DNN_UP DNN Uncertainty Decoding] | ||
|C++ | |C++ | ||
|{{yes|BLAS, LAPACK, GPU (for DNNs)}} | |{{yes|BLAS, LAPACK, GPU (for DNNs)}} | ||
Line 95: | Line 101: | ||
|GMM (Full Cov.), SGMM, DNN | |GMM (Full Cov.), SGMM, DNN | ||
|HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | |HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | ||
− | |aligment, N-best, lattice rescoring ( | + | |aligment, N-best, lattice rescoring (uses OpenFST) |
|Baum-Welch, MMI (boosted), MC, feature-based | |Baum-Welch, MMI (boosted), MC, feature-based | ||
|{{yes|Yes}} | |{{yes|Yes}} | ||
− | |[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/chime_challenge/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz CHIME-2] | + | |[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz CHIME-2] |
|Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Code] | |Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz Code] | ||
|- | |- | ||
Line 104: | Line 110: | ||
|2008-* (1.1.374) | |2008-* (1.1.374) | ||
|{{yes|Yes}} | |{{yes|Yes}} | ||
− | |{{no|[http://www.spraak.org/obtaining-spraak/license | + | |{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}} |
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}} | |{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}} | ||
|[http://www.spraak.org/ website] | |[http://www.spraak.org/ website] | ||
[http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper] | [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper] | ||
− | |||
[http://www.spraak.org/mailing-lists mail-list] | [http://www.spraak.org/mailing-lists mail-list] | ||
[http://sourceforge.net/p/kaldi/discussion/ forum] | [http://sourceforge.net/p/kaldi/discussion/ forum] | ||
Line 114: | Line 119: | ||
|Missing Data Techniques (MDT) | |Missing Data Techniques (MDT) | ||
|C, Python | |C, Python | ||
− | |{{no|No}} | + | |{{no|No}} |
|{{yes|Yes}} | |{{yes|Yes}} | ||
− | |Flexible preprocessing script language -- examples for MFCC, PLP | + | |Flexible preprocessing script language -- examples for MFCC, PLP |
|VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10] | |VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10] | ||
|GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368] | |GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368] | ||
Line 122: | Line 127: | ||
|aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09] | |aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09] | ||
|Viterbi | |Viterbi | ||
− | |{{yes|Yes} | + | |{{yes|Yes}} |
|[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10] | |[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10] | ||
+ | |- | ||
+ | !Julius | ||
+ | |1997-* (4.3.1) | ||
+ | |{{yes|Yes}} | ||
+ | |{{no|[http://julius.sourceforge.jp/LICENSE.txt propietary]}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://julius.sourceforge.jp/en_index.php website] | ||
+ | [http://jaist.dl.sourceforge.jp/julius/47534/Juliusbook-4.1.5.pdf book] | ||
+ | [http://julius.sourceforge.jp/juliusbook/en/ book online] | ||
+ | [mailto:julius-info@lists.sourceforge.jp mail-list] | ||
+ | [http://julius.sourceforge.jp/forum/ forum] | ||
+ | [http://julius.sourceforge.jp/en_index.php?q=index-en.html#get_by_cvs CVS] | ||
+ | |[http://prdownloads.sourceforge.jp/julius/23332/slf2dfa-1.0.tar.gz htk2Julius grammar] | ||
+ | [http://sourceforge.jp/projects/julius/downloads/32570/julius4-segmentation-kit-v1.0.tar.gz phoneme seg.] | ||
+ | |C | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | | MFCC | ||
+ | |VTLN,CMVN | ||
+ | |GMM (Tied-Mix) | ||
+ | | | ||
+ | |aligment, two-pass decoder | ||
+ | |Baum-Welch | ||
+ | |{{yes|Yes (low latency)}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !RWTH | ||
+ | |2001-* (0.6.1) | ||
+ | |{{yes|Yes}} | ||
+ | |{{some|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/rwth-asr-license.html non-commercial]}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://www-i6.informatik.rwth-aachen.de/rwth-asr/ website] | ||
+ | [http://www.isca-speech.org/archive/interspeech_2009/i09_2111.html paper] | ||
+ | [http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/ wiki] | ||
+ | [http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/index.php/Special:AWCforum/%3Faction%3Dsc/id3 forum] | ||
+ | | | ||
+ | |C | ||
+ | |{{yes|BLAS, LAPACK, GPU (CUDA), OpenMP}} | ||
+ | |{{yes|Yes}} | ||
+ | | MFCC, PLP, Gammatone, Tandem (MLP) | ||
+ | |VTLN, CMVN, PCA, LDA | ||
+ | |GMM (Tied covariance), DNN | ||
+ | |MLLR, CMLLR, BIC | ||
+ | |aligment, lattice rescoring, system fusion | ||
+ | |Baum-Welch, MPE | ||
+ | |{{yes|Yes}} | ||
| | | | ||
|} | |} | ||
== [[Speaker identification and verification]] == | == [[Speaker identification and verification]] == | ||
+ | {| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | ||
+ | |- | ||
+ | !style="width: 40px" rowspan="2" class="unsortable"| Software | ||
+ | !colspan="6" |General attributes | ||
+ | !colspan="2" |Programming | ||
+ | !colspan="8" |Implemented techniques | ||
+ | !colspan="2" |Reproducible research | ||
+ | |- | ||
+ | !scope="col" width="40px" | release / update | ||
+ | !scope="col" width="40px" | actively developed | ||
+ | !scope="col" width="40px" | licence | ||
+ | !scope="col" width="40px" | platforms | ||
+ | !scope="col" width="40px" | links | ||
+ | !scope="col" width="40px" | extensions | ||
+ | !scope="col" width="40px" | language | ||
+ | !scope="col" width="40px" | hardware optimization | ||
+ | !scope="col" width="40px" | VAD | ||
+ | !scope="col" width="40px" | acoustic features | ||
+ | !scope="col" width="40px" | feature normalization | ||
+ | !scope="col" width="40px" | UBM | ||
+ | !scope="col" width="40px" | subspace projection | ||
+ | !scope="col" width="40px" | subspace normalization | ||
+ | !scope="col" width="40px" | scoring | ||
+ | !scope="col" width="40px" | diarization | ||
+ | !scope="col" width="40px" | robust recognition recipes | ||
+ | !scope="col" width="40px" | reproducible results | ||
+ | |- | ||
+ | !BECARS | ||
+ | |2002-2005 (1.1.9) | ||
+ | |{{no|No}} | ||
+ | |{{yes|CeCILL}} | ||
+ | |{{some|Windows, Linux}} | ||
+ | |[http://perso.telecom-paristech.fr/~chollet/becars/ download] | ||
+ | [http://isca-speech.org/archive_open/archive_papers/odyssey_04/ody4_145.pdf paper] | ||
+ | | | ||
+ | |C | ||
+ | |{{no|No}} | ||
+ | |{{no|No}} | ||
+ | |MFCC | ||
+ | |Gaussianization | ||
+ | |GMM | ||
+ | | | ||
+ | | | ||
+ | |MMI-weighted LLR | ||
+ | |{{no|No}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !ALIZE | ||
+ | |2005-* | ||
+ | |{{yes|Yes}} | ||
+ | |{{yes|LGPL}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://alize.univ-avignon.fr/ download] | ||
+ | [https://listes.univ-avignon.fr/wws/info/dev-alize mail-list] | ||
+ | [http://www.linkedin.com/groups?mostPopular=&gid=2323703&trk=myg_ugrp_ovr linkedin] | ||
+ | [http://alize.univ-avignon.fr/doc/publi/05_Interspeech_Bonastre.pdf paper] | ||
+ | | | ||
+ | |C++, Perl, Bash | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, LFCC | ||
+ | |CMVN | ||
+ | |GMM | ||
+ | |JFA, i-vector | ||
+ | |whitening, length norm, LDA, WCCN | ||
+ | |cosine, Mahalanobis, SVM, PLDA, Z/T norm | ||
+ | |{{yes|Yes}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !LIUM SpkDiarization | ||
+ | |2009-2013 | ||
+ | |{{no|No}} | ||
+ | |{{yes|GPL}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://lium3.univ-lemans.fr/diarization/ download] | ||
+ | [http://lium3.univ-lemans.fr/diarization/lib/exe/fetch.php/toolkit-interspeech2013.pdf paper] | ||
+ | |[https://code.google.com/p/voiceid/ Python extension] | ||
+ | |Java | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, LFCC | ||
+ | |CMVN | ||
+ | |GMM | ||
+ | |i-vector | ||
+ | | | ||
+ | |cosine, Mahalanobis | ||
+ | |{{yes|Yes}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !MSR Identity Toolbox | ||
+ | |2013 | ||
+ | |{{no|No}} | ||
+ | |{{no|[http://research.microsoft.com/en-us/downloads/2476c44a-1f63-4fe0-b805-8c2de395bb2c/MSR-LA%20No%20distrib-OK%20to%20modify.txt proprietary]}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://research.microsoft.com/en-us/downloads/2476c44a-1f63-4fe0-b805-8c2de395bb2c/ download] | ||
+ | [http://research.microsoft.com/pubs/211317/MSR%20Identity%20Toolbox%20v1_1.pdf paper] | ||
+ | | | ||
+ | |Matlab | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC | ||
+ | |CMVN, Gaussianization | ||
+ | |GMM | ||
+ | |i-vector | ||
+ | |whitening, length norm, LDA | ||
+ | |PLDA | ||
+ | |{{no|No}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !SIDEKIT | ||
+ | |2014-* | ||
+ | |{{yes|Yes}} | ||
+ | |{{yes|LGPL}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://lium.univ-lemans.fr/sidekit] | ||
+ | [http://lium.univ-lemans.fr/sidekit] | ||
+ | |[http://lium.univ-lemans.fr/sidekit/s4d] | ||
+ | |Python | ||
+ | |{{yes|Yes multiprocessing, threading}} | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, LFCC, FB, bottleneck | ||
+ | |CMS, CMVN, Gaussianization, RASTA | ||
+ | |GMM, DNN | ||
+ | |i-vector, JFA, LFA, SVM | ||
+ | |whitening, length norm, LDA, WCCN, EFR, SphericalNorm | ||
+ | |PLDA, Cosine, Mahalanobis, 2 Covariance, Dot-product | ||
+ | |{{yes|Yes}} | ||
+ | |{{no|No}} | ||
+ | |{{yes|Yes}} | ||
+ | |- | ||
+ | !SPEAR | ||
+ | |2014-* | ||
+ | |{{yes|Yes}} | ||
+ | |{{yes|GPL}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[https://pypi.python.org/pypi/bob.bio.spear download] | ||
+ | [http://publications.idiap.ch/downloads/papers/2014/Khoury_ICASSP_2014.pdf paper] | ||
+ | | | ||
+ | |Python | ||
+ | |{{yes|SGE grid}} | ||
+ | |{{yes|Yes}} | ||
+ | |MFCC, LFCC | ||
+ | |CMVN | ||
+ | |GMM | ||
+ | |ISV, JFA, i-vector | ||
+ | |whitening, length norm, LDA, WCCN | ||
+ | |PLDA, Z/T norm, score fusion | ||
+ | |{{yes|Yes}} | ||
+ | | | ||
+ | |} | ||
== [[Speech enhancement and separation]] == | == [[Speech enhancement and separation]] == | ||
+ | {| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;" | ||
+ | |- | ||
+ | !style="width: 40px" rowspan="2" class="unsortable"| Software | ||
+ | !colspan="6" |General attributes | ||
+ | !colspan="2" |Programming | ||
+ | !colspan="4" |Implemented techniques | ||
+ | !colspan="2" |Reproducible research | ||
+ | |- | ||
+ | !scope="col" width="40px" | release / update | ||
+ | !scope="col" width="40px" | actively developed | ||
+ | !scope="col" width="40px" | licence | ||
+ | !scope="col" width="40px" | platforms | ||
+ | !scope="col" width="40px" | links | ||
+ | !scope="col" width="40px" | extensions | ||
+ | !scope="col" width="40px" | language | ||
+ | !scope="col" width="40px" | hardware optimization | ||
+ | !scope="col" width="40px" | spatial model | ||
+ | !scope="col" width="40px" | spectral model | ||
+ | !scope="col" width="40px" | estimation algorithm | ||
+ | !scope="col" width="40px" | online separation | ||
+ | !scope="col" width="40px" | public recipes | ||
+ | !scope="col" width="40px" | reproducible results | ||
+ | |- | ||
+ | !BTK | ||
+ | |2005-* | ||
+ | |{{yes|Yes}} | ||
+ | |{{no|proprietary}} | ||
+ | |{{some|Linux, OSX}} | ||
+ | |[http://distantspeechrecognition.sourceforge.net/ download] | ||
+ | [http://distantspeechrecognition.sourceforge.net/manual.htm#_References papers] | ||
+ | | | ||
+ | |C++, Python | ||
+ | |{{yes|BLAS}} | ||
+ | |DS, SD, MVDR, MN beamforming; | ||
+ | Zelinski, McCowan, Lefkimmiatis post-filters | ||
+ | |none | ||
+ | |GCC-PHAT localization | ||
+ | |{{yes|Yes}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !MESSL | ||
+ | |2006-2009 | ||
+ | |{{no|No}} | ||
+ | |{{no|proprietary}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[https://github.com/mim/messl download] | ||
+ | [https://www.ee.columbia.edu/~ronw/pubs/taslp09-messl.pdf paper] | ||
+ | | | ||
+ | |Matlab | ||
+ | |{{no|No}} | ||
+ | |IPD/ILD clustering | ||
+ | |none | ||
+ | |EM | ||
+ | |{{no|No}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !BeamformIt | ||
+ | |2006-2014 (3.51) | ||
+ | |{{yes|Yes}} | ||
+ | |{{yes|ICSI Open Source Speech Tools}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://www.xavieranguera.com/beamformit/ download] | ||
+ | [http://www.xavieranguera.com/papers/transactions_taslp_2007.pdf paper] | ||
+ | [http://nlp.lsi.upc.edu/papers/thesis_xanguera.pdf thesis] | ||
+ | | | ||
+ | |C++ | ||
+ | |{{no|No}} | ||
+ | |weighted DS beamforming | ||
+ | |none | ||
+ | |GCC-PHAT localization | ||
+ | |{{yes|Yes}} | ||
+ | |NIST RT06 (included), [https://github.com/kaldi-asr/kaldi/tree/master/egs/ami/ AMI] | ||
+ | |- | ||
+ | !ManyEars | ||
+ | |2007-2014 (1.1.2) | ||
+ | |{{yes|Yes}} | ||
+ | |{{yes|GPL}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://sourceforge.net/projects/manyears/ download] | ||
+ | [http://link.springer.com/article/10.1007/s10514-012-9316-x# paper] | ||
+ | | | ||
+ | |C | ||
+ | |{{no|No}} | ||
+ | |geometric ICA | ||
+ | |Wiener post-filter (noise only) | ||
+ | |CC-PHAT localization | ||
+ | |{{no|No}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !HARK | ||
+ | |2010-* (2.1.2) | ||
+ | |{{yes|Yes}} | ||
+ | |{{some|[http://www.hark.jp/HARK_License_Agreement.pdf non-commercial]}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://www.hark.jp/ download] | ||
+ | [http://www.tandfonline.com/doi/abs/10.1163/016918610X493561#.VeYVmLPcI_s paper] | ||
+ | | | ||
+ | |C++ | ||
+ | |{{yes|BLAS}} | ||
+ | |DS, weighted DS, LCMV, GJ, max SNR beamforming; | ||
+ | geometric ICA | ||
+ | |Wiener post-filter (noise only) | ||
+ | |MUSIC localization; MCRA noise estimation | ||
+ | |{{yes|Yes}} | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | !FASST | ||
+ | |2012-* (2.0) | ||
+ | |{{yes|Yes}} | ||
+ | |{{yes|QPL}} | ||
+ | |{{yes|Windows, Linux, OSX}} | ||
+ | |[http://bass-db.gforge.inria.fr/fasst/ download] | ||
+ | [https://hal.inria.fr/hal-00626962v2/document paper] | ||
+ | | | ||
+ | |C++, Matlab, Python | ||
+ | |{{yes|OpenMP}} | ||
+ | |full-rank spatial covariance model | ||
+ | |NMF, source-filter NMF, harmonic NMF, smooth NMF | ||
+ | |EM and multiplicative updates | ||
+ | |{{no|No}} | ||
+ | | | ||
+ | | | ||
+ | |} | ||
== [[Other applications]] == | == [[Other applications]] == | ||
Line 135: | Line 468: | ||
== Contribute software == | == Contribute software == | ||
To contribute new software, please | To contribute new software, please | ||
− | * [[ | + | * [[Main_Page#Contribute|create an account]] and login |
* go to the wiki page above corresponding to your application; if it does not exist yet, you may create it | * go to the wiki page above corresponding to your application; if it does not exist yet, you may create it | ||
* click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version) | * click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version) |
Latest revision as of 12:08, 17 March 2016
This page provides software grouped by application.
Contents
Automatic speech recognition
ASR engines | General attributes | Programming | Implemented techniques | Reproducible research | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | VAD | acoustic features | feature normalization | acoustic models | model adaptation | decoding techniques | training techniques | online ASR | robust ASR recipes | reproducible results | |
CMU Sphinx | 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) | Yes | BSD-like | Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) | website | Java (Sphinx4), C (pocketsphinx) | No | Yes | MFCC, PLP | CMN, Mel-Spectrum subtraction | GMM, Streams | MLLR, MAP | aligment, N-best, lattice rescoring | Baum-Welch | Yes | AURORA4 (WSJ0) | ||
HTK | 1993-2009 (3.4.1) | Yes | proprietary | Windows, Linux, OSX | website | official
ATK uncertain features diagonal uncertainty decoding full uncertainty decoding |
C | No | Yes | MFCC, PLP | VTLN, CMN | GMM (Full Cov.), Tied-Mix, Streams | HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP | aligment, N-best, lattice rescoring | Baum-Welch, MMI, MPE, MWE | Yes | AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB | ETSI-AFE-AURORA2 paper (see AURORA2 purch.) |
Kaldi | 2009-* (continous updates) | Yes | Apache 2.0 | Windows (not mantained as of 2014), Linux, OSX | website | uncertain features
diagonal uncertainty decoding Matlab conversion tools DNN Uncertainty Decoding |
C++ | BLAS, LAPACK, GPU (for DNNs) | Yes | MFCC, PLP | VTLN, CMVN | GMM (Full Cov.), SGMM, DNN | HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform | aligment, N-best, lattice rescoring (uses OpenFST) | Baum-Welch, MMI (boosted), MC, feature-based | Yes | AURORA4 (WSJ0), CHIME-2 | Weniger2014-REVERB Paper Code |
Spraak | 2008-* (1.1.374) | Yes | proprietary | Windows (limited), Linux, OSX | website | Missing Data Techniques (MDT) | C, Python | No | Yes | Flexible preprocessing script language -- examples for MFCC, PLP | VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] | GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] | CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP | aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] | Viterbi | Yes | AURORA4, [8] | |
Julius | 1997-* (4.3.1) | Yes | propietary | Windows, Linux, OSX | website | htk2Julius grammar | C | No | Yes | MFCC | VTLN,CMVN | GMM (Tied-Mix) | aligment, two-pass decoder | Baum-Welch | Yes (low latency) | |||
RWTH | 2001-* (0.6.1) | Yes | non-commercial | Windows, Linux, OSX | website | C | BLAS, LAPACK, GPU (CUDA), OpenMP | Yes | MFCC, PLP, Gammatone, Tandem (MLP) | VTLN, CMVN, PCA, LDA | GMM (Tied covariance), DNN | MLLR, CMLLR, BIC | aligment, lattice rescoring, system fusion | Baum-Welch, MPE | Yes |
Speaker identification and verification
Software | General attributes | Programming | Implemented techniques | Reproducible research | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | VAD | acoustic features | feature normalization | UBM | subspace projection | subspace normalization | scoring | diarization | robust recognition recipes | reproducible results | |
BECARS | 2002-2005 (1.1.9) | No | CeCILL | Windows, Linux | download | C | No | No | MFCC | Gaussianization | GMM | MMI-weighted LLR | No | |||||
ALIZE | 2005-* | Yes | LGPL | Windows, Linux, OSX | download | C++, Perl, Bash | No | Yes | MFCC, LFCC | CMVN | GMM | JFA, i-vector | whitening, length norm, LDA, WCCN | cosine, Mahalanobis, SVM, PLDA, Z/T norm | Yes | |||
LIUM SpkDiarization | 2009-2013 | No | GPL | Windows, Linux, OSX | download | Python extension | Java | No | Yes | MFCC, LFCC | CMVN | GMM | i-vector | cosine, Mahalanobis | Yes | |||
MSR Identity Toolbox | 2013 | No | proprietary | Windows, Linux, OSX | download | Matlab | No | Yes | MFCC | CMVN, Gaussianization | GMM | i-vector | whitening, length norm, LDA | PLDA | No | |||
SIDEKIT | 2014-* | Yes | LGPL | Windows, Linux, OSX | [9] | [11] | Python | Yes multiprocessing, threading | Yes | MFCC, LFCC, FB, bottleneck | CMS, CMVN, Gaussianization, RASTA | GMM, DNN | i-vector, JFA, LFA, SVM | whitening, length norm, LDA, WCCN, EFR, SphericalNorm | PLDA, Cosine, Mahalanobis, 2 Covariance, Dot-product | Yes | No | Yes |
SPEAR | 2014-* | Yes | GPL | Windows, Linux, OSX | download | Python | SGE grid | Yes | MFCC, LFCC | CMVN | GMM | ISV, JFA, i-vector | whitening, length norm, LDA, WCCN | PLDA, Z/T norm, score fusion | Yes |
Speech enhancement and separation
Software | General attributes | Programming | Implemented techniques | Reproducible research | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
release / update | actively developed | licence | platforms | links | extensions | language | hardware optimization | spatial model | spectral model | estimation algorithm | online separation | public recipes | reproducible results | |
BTK | 2005-* | Yes | proprietary | Linux, OSX | download | C++, Python | BLAS | DS, SD, MVDR, MN beamforming;
Zelinski, McCowan, Lefkimmiatis post-filters |
none | GCC-PHAT localization | Yes | |||
MESSL | 2006-2009 | No | proprietary | Windows, Linux, OSX | download | Matlab | No | IPD/ILD clustering | none | EM | No | |||
BeamformIt | 2006-2014 (3.51) | Yes | ICSI Open Source Speech Tools | Windows, Linux, OSX | download | C++ | No | weighted DS beamforming | none | GCC-PHAT localization | Yes | NIST RT06 (included), AMI | ||
ManyEars | 2007-2014 (1.1.2) | Yes | GPL | Windows, Linux, OSX | download | C | No | geometric ICA | Wiener post-filter (noise only) | CC-PHAT localization | No | |||
HARK | 2010-* (2.1.2) | Yes | non-commercial | Windows, Linux, OSX | download | C++ | BLAS | DS, weighted DS, LCMV, GJ, max SNR beamforming;
geometric ICA |
Wiener post-filter (noise only) | MUSIC localization; MCRA noise estimation | Yes | |||
FASST | 2012-* (2.0) | Yes | QPL | Windows, Linux, OSX | download | C++, Matlab, Python | OpenMP | full-rank spatial covariance model | NMF, source-filter NMF, harmonic NMF, smooth NMF | EM and multiplicative updates | No |
Other applications
Contribute software
To contribute new software, please
- create an account and login
- go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
- click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
- click on the "Save page" link at the bottom of the page to save your modifications
Please make sure to provide the following information:
- name of the software and year of the latest version
- authors, institution, contact information
- link to the software, ideally including a short demo, and to the external libraries needed
- short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
- whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user
In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.