Difference between revisions of "Software"

From rosp
(Speaker identification and verification)
 
(31 intermediate revisions by 3 users not shown)
Line 2: Line 2:
  
 
== [[Automatic speech recognition]] ==
 
== [[Automatic speech recognition]] ==
 
 
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 
|-
 
|-
 
!style="width: 40px" rowspan="2" class="unsortable"| ASR engines
 
!style="width: 40px" rowspan="2" class="unsortable"| ASR engines
!colspan="6" |General attributes2
+
!colspan="6" |General attributes
 
!colspan="2" |Programming
 
!colspan="2" |Programming
!colspan="8" |Implemented ASR techniques
+
!colspan="8" |Implemented techniques
 
!colspan="2" |Reproducible research
 
!colspan="2" |Reproducible research
 
|-
 
|-
Line 15: Line 14:
 
!scope="col" width="40px" | licence
 
!scope="col" width="40px" | licence
 
!scope="col" width="40px" | platforms
 
!scope="col" width="40px" | platforms
!scope="col" width="40px" | links  
+
!scope="col" width="40px" | links
 
!scope="col" width="40px" | extensions
 
!scope="col" width="40px" | extensions
 
!scope="col" width="40px" | language
 
!scope="col" width="40px" | language
Line 21: Line 20:
 
!scope="col" width="40px" | VAD
 
!scope="col" width="40px" | VAD
 
!scope="col" width="40px" | acoustic features
 
!scope="col" width="40px" | acoustic features
!scope="col" width="40px" | feature normalization / compensation
+
!scope="col" width="40px" | feature normalization
 
!scope="col" width="40px" | acoustic models
 
!scope="col" width="40px" | acoustic models
!scope="col" width="40px" | model adaptation / compensation
+
!scope="col" width="40px" | model adaptation
 
!scope="col" width="40px" | decoding techniques
 
!scope="col" width="40px" | decoding techniques
 
!scope="col" width="40px" | training techniques
 
!scope="col" width="40px" | training techniques
 
!scope="col" width="40px" | online ASR
 
!scope="col" width="40px" | online ASR
!scope="col" width="40px" | robust ASR training recipes  
+
!scope="col" width="40px" | robust ASR recipes
 
!scope="col" width="40px" | reproducible results
 
!scope="col" width="40px" | reproducible results
 
|-
 
|-
 
!CMU Sphinx
 
!CMU Sphinx
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8)
+
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8)
 
|{{yes|Yes}}
 
|{{yes|Yes}}
 
|{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}}
 
|{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}}
 
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
 
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
|[http://cmusphinx.sourceforge.net/ website]  
+
|[http://cmusphinx.sourceforge.net/ website]
 
[http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper]
 
[http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper]
 
[https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper]
 
[https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper]
Line 56: Line 55:
 
|-
 
|-
 
!HTK
 
!HTK
|1993-2009 (3.4.1)  
+
|1993-2009 (3.4.1)
 
|{{yes|Yes}}
 
|{{yes|Yes}}
 
|{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}}
 
|{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}}
Line 63: Line 62:
 
[http://htk.eng.cam.ac.uk/docs/docs.shtml book]
 
[http://htk.eng.cam.ac.uk/docs/docs.shtml book]
 
[http://htk.eng.cam.ac.uk/mailing/subscribe_mail.shtml mail-list]
 
[http://htk.eng.cam.ac.uk/mailing/subscribe_mail.shtml mail-list]
|[http://htk.eng.cam.ac.uk/extensions/index.shtml official], [http://htk.eng.cam.ac.uk/develop/atk.shtml ATK], [https://github.com/ramon-astudillo/custom_fe FE uncertainty decoding]
+
|[http://htk.eng.cam.ac.uk/extensions/index.shtml official]
 +
[http://htk.eng.cam.ac.uk/develop/atk.shtml ATK]
 +
[https://github.com/ramon-astudillo/custom_fe uncertain features]
 +
[http://www.astudillo.com/ramon/research/stft-up/ diagonal uncertainty decoding]
 +
[http://full-ud-htk.gforge.inria.fr/ full uncertainty decoding]
 
|C
 
|C
 
|{{no|No}}
 
|{{no|No}}
Line 87: Line 90:
 
[http://sourceforge.net/p/kaldi/discussion/ forum]
 
[http://sourceforge.net/p/kaldi/discussion/ forum]
 
[http://kaldi.sourceforge.net/install.html SVN]
 
[http://kaldi.sourceforge.net/install.html SVN]
|
+
|[https://github.com/ramon-astudillo/custom_fe uncertain features]
 +
[http://ud-kaldi.gforge.inria.fr/ diagonal uncertainty decoding]
 +
[http://kaldi-to-matlab.gforge.inria.fr/ Matlab conversion tools]
 +
[https://github.com/makladios/Kaldi_Matlab_DNN_UP DNN Uncertainty Decoding]
 
|C++
 
|C++
 
|{{yes|BLAS, LAPACK, GPU (for DNNs)}}
 
|{{yes|BLAS, LAPACK, GPU (for DNNs)}}
Line 95: Line 101:
 
|GMM (Full Cov.), SGMM, DNN
 
|GMM (Full Cov.), SGMM, DNN
 
|HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform
 
|HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform
|aligment, N-best, lattice rescoring (using OpenFST)
+
|aligment, N-best, lattice rescoring (uses OpenFST)
 
|Baum-Welch, MMI (boosted), MC, feature-based
 
|Baum-Welch, MMI (boosted), MC, feature-based
 
|{{yes|Yes}}
 
|{{yes|Yes}}
|[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/chime_challenge/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz CHIME-2]
+
|[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/chime_challenge/chime2013/WSJ0public/CHiME2012-WSJ0-Kaldi_0.03.tar.gz CHIME-2]
 
|Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz  Code]
 
|Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz  Code]
 
|-
 
|-
Line 104: Line 110:
 
|2008-* (1.1.374)
 
|2008-* (1.1.374)
 
|{{yes|Yes}}
 
|{{yes|Yes}}
|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary2]}}
+
|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}}
 
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}}
 
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}}
 
|[http://www.spraak.org/ website]
 
|[http://www.spraak.org/ website]
 
[http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper]
 
[http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper]
[http://www.spraak.org/mailing-lists2 mail-list]
 
 
[http://www.spraak.org/mailing-lists mail-list]
 
[http://www.spraak.org/mailing-lists mail-list]
 
[http://sourceforge.net/p/kaldi/discussion/ forum]
 
[http://sourceforge.net/p/kaldi/discussion/ forum]
Line 114: Line 119:
 
|Missing Data Techniques (MDT)
 
|Missing Data Techniques (MDT)
 
|C, Python
 
|C, Python
|{{no|No}}  
+
|{{no|No}}
 
|{{yes|Yes}}
 
|{{yes|Yes}}
|Flexible preprocessing script language -- examples for MFCC, PLP  
+
|Flexible preprocessing script language -- examples for MFCC, PLP
 
|VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 
|VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 
|GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368]
 
|GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368]
Line 122: Line 127:
 
|aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09]
 
|aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09]
 
|Viterbi
 
|Viterbi
|{{yes|Yes}
+
|{{yes|Yes}}
 
|[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 
|[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 +
|-
 +
!Julius
 +
|1997-* (4.3.1)
 +
|{{yes|Yes}}
 +
|{{no|[http://julius.sourceforge.jp/LICENSE.txt propietary]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://julius.sourceforge.jp/en_index.php website]
 +
[http://jaist.dl.sourceforge.jp/julius/47534/Juliusbook-4.1.5.pdf book]
 +
[http://julius.sourceforge.jp/juliusbook/en/ book online]
 +
[mailto:julius-info@lists.sourceforge.jp mail-list]
 +
[http://julius.sourceforge.jp/forum/ forum]
 +
[http://julius.sourceforge.jp/en_index.php?q=index-en.html#get_by_cvs CVS]
 +
|[http://prdownloads.sourceforge.jp/julius/23332/slf2dfa-1.0.tar.gz htk2Julius grammar]
 +
[http://sourceforge.jp/projects/julius/downloads/32570/julius4-segmentation-kit-v1.0.tar.gz phoneme seg.]
 +
|C
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
| MFCC
 +
|VTLN,CMVN
 +
|GMM (Tied-Mix)
 +
|
 +
|aligment, two-pass decoder
 +
|Baum-Welch
 +
|{{yes|Yes (low latency)}}
 +
|
 +
|
 +
|-
 +
!RWTH
 +
|2001-* (0.6.1)
 +
|{{yes|Yes}}
 +
|{{some|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/rwth-asr-license.html non-commercial]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/ website]
 +
[http://www.isca-speech.org/archive/interspeech_2009/i09_2111.html paper]
 +
[http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/ wiki]
 +
[http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/index.php/Special:AWCforum/%3Faction%3Dsc/id3 forum]
 +
|
 +
|C
 +
|{{yes|BLAS, LAPACK, GPU (CUDA), OpenMP}}
 +
|{{yes|Yes}}
 +
| MFCC, PLP, Gammatone, Tandem (MLP)
 +
|VTLN, CMVN, PCA, LDA
 +
|GMM (Tied covariance), DNN
 +
|MLLR, CMLLR, BIC
 +
|aligment, lattice rescoring, system fusion
 +
|Baum-Welch, MPE
 +
|{{yes|Yes}}
 
|
 
|
 
|}
 
|}
  
 
== [[Speaker identification and verification]] ==
 
== [[Speaker identification and verification]] ==
 +
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 +
|-
 +
!style="width: 40px" rowspan="2" class="unsortable"| Software
 +
!colspan="6" |General attributes
 +
!colspan="2" |Programming
 +
!colspan="8" |Implemented techniques
 +
!colspan="2" |Reproducible research
 +
|-
 +
!scope="col" width="40px" | release / update
 +
!scope="col" width="40px" | actively developed
 +
!scope="col" width="40px" | licence
 +
!scope="col" width="40px" | platforms
 +
!scope="col" width="40px" | links
 +
!scope="col" width="40px" | extensions
 +
!scope="col" width="40px" | language
 +
!scope="col" width="40px" | hardware optimization
 +
!scope="col" width="40px" | VAD
 +
!scope="col" width="40px" | acoustic features
 +
!scope="col" width="40px" | feature normalization
 +
!scope="col" width="40px" | UBM
 +
!scope="col" width="40px" | subspace projection
 +
!scope="col" width="40px" | subspace normalization
 +
!scope="col" width="40px" | scoring
 +
!scope="col" width="40px" | diarization
 +
!scope="col" width="40px" | robust recognition recipes
 +
!scope="col" width="40px" | reproducible results
 +
|-
 +
!BECARS
 +
|2002-2005 (1.1.9)
 +
|{{no|No}}
 +
|{{yes|CeCILL}}
 +
|{{some|Windows, Linux}}
 +
|[http://perso.telecom-paristech.fr/~chollet/becars/ download]
 +
[http://isca-speech.org/archive_open/archive_papers/odyssey_04/ody4_145.pdf paper]
 +
|
 +
|C
 +
|{{no|No}}
 +
|{{no|No}}
 +
|MFCC
 +
|Gaussianization
 +
|GMM
 +
|
 +
|
 +
|MMI-weighted LLR
 +
|{{no|No}}
 +
|
 +
|
 +
|-
 +
!ALIZE
 +
|2005-*
 +
|{{yes|Yes}}
 +
|{{yes|LGPL}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://alize.univ-avignon.fr/ download]
 +
[https://listes.univ-avignon.fr/wws/info/dev-alize mail-list]
 +
[http://www.linkedin.com/groups?mostPopular=&gid=2323703&trk=myg_ugrp_ovr linkedin]
 +
[http://alize.univ-avignon.fr/doc/publi/05_Interspeech_Bonastre.pdf paper]
 +
|
 +
|C++, Perl, Bash
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
|MFCC, LFCC
 +
|CMVN
 +
|GMM
 +
|JFA, i-vector
 +
|whitening, length norm, LDA, WCCN
 +
|cosine, Mahalanobis, SVM, PLDA, Z/T norm
 +
|{{yes|Yes}}
 +
|
 +
|
 +
|-
 +
!LIUM SpkDiarization
 +
|2009-2013
 +
|{{no|No}}
 +
|{{yes|GPL}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://lium3.univ-lemans.fr/diarization/ download]
 +
[http://lium3.univ-lemans.fr/diarization/lib/exe/fetch.php/toolkit-interspeech2013.pdf paper]
 +
|[https://code.google.com/p/voiceid/ Python extension]
 +
|Java
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
|MFCC, LFCC
 +
|CMVN
 +
|GMM
 +
|i-vector
 +
|
 +
|cosine, Mahalanobis
 +
|{{yes|Yes}}
 +
|
 +
|
 +
|-
 +
!MSR Identity Toolbox
 +
|2013
 +
|{{no|No}}
 +
|{{no|[http://research.microsoft.com/en-us/downloads/2476c44a-1f63-4fe0-b805-8c2de395bb2c/MSR-LA%20No%20distrib-OK%20to%20modify.txt proprietary]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://research.microsoft.com/en-us/downloads/2476c44a-1f63-4fe0-b805-8c2de395bb2c/ download]
 +
[http://research.microsoft.com/pubs/211317/MSR%20Identity%20Toolbox%20v1_1.pdf paper]
 +
|
 +
|Matlab
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
|MFCC
 +
|CMVN, Gaussianization
 +
|GMM
 +
|i-vector
 +
|whitening, length norm, LDA
 +
|PLDA
 +
|{{no|No}}
 +
|
 +
|
 +
|-
 +
!SIDEKIT
 +
|2014-*
 +
|{{yes|Yes}}
 +
|{{yes|LGPL}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://lium.univ-lemans.fr/sidekit]
 +
[http://lium.univ-lemans.fr/sidekit]
 +
|[http://lium.univ-lemans.fr/sidekit/s4d]
 +
|Python
 +
|{{yes|Yes multiprocessing, threading}}
 +
|{{yes|Yes}}
 +
|MFCC, LFCC, FB, bottleneck
 +
|CMS, CMVN, Gaussianization, RASTA
 +
|GMM, DNN
 +
|i-vector, JFA, LFA, SVM
 +
|whitening, length norm, LDA, WCCN, EFR, SphericalNorm
 +
|PLDA, Cosine, Mahalanobis, 2 Covariance, Dot-product
 +
|{{yes|Yes}}
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
|-
 +
!SPEAR
 +
|2014-*
 +
|{{yes|Yes}}
 +
|{{yes|GPL}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[https://pypi.python.org/pypi/bob.bio.spear download]
 +
[http://publications.idiap.ch/downloads/papers/2014/Khoury_ICASSP_2014.pdf paper]
 +
|
 +
|Python
 +
|{{yes|SGE grid}}
 +
|{{yes|Yes}}
 +
|MFCC, LFCC
 +
|CMVN
 +
|GMM
 +
|ISV, JFA, i-vector
 +
|whitening, length norm, LDA, WCCN
 +
|PLDA, Z/T norm, score fusion
 +
|{{yes|Yes}}
 +
|
 +
|}
  
 
== [[Speech enhancement and separation]] ==
 
== [[Speech enhancement and separation]] ==
 +
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 +
|-
 +
!style="width: 40px" rowspan="2" class="unsortable"| Software
 +
!colspan="6" |General attributes
 +
!colspan="2" |Programming
 +
!colspan="4" |Implemented techniques
 +
!colspan="2" |Reproducible research
 +
|-
 +
!scope="col" width="40px" | release / update
 +
!scope="col" width="40px" | actively developed
 +
!scope="col" width="40px" | licence
 +
!scope="col" width="40px" | platforms
 +
!scope="col" width="40px" | links
 +
!scope="col" width="40px" | extensions
 +
!scope="col" width="40px" | language
 +
!scope="col" width="40px" | hardware optimization
 +
!scope="col" width="40px" | spatial model
 +
!scope="col" width="40px" | spectral model
 +
!scope="col" width="40px" | estimation algorithm
 +
!scope="col" width="40px" | online separation
 +
!scope="col" width="40px" | public recipes
 +
!scope="col" width="40px" | reproducible results
 +
|-
 +
!BTK
 +
|2005-*
 +
|{{yes|Yes}}
 +
|{{no|proprietary}}
 +
|{{some|Linux, OSX}}
 +
|[http://distantspeechrecognition.sourceforge.net/ download]
 +
[http://distantspeechrecognition.sourceforge.net/manual.htm#_References papers]
 +
|
 +
|C++, Python
 +
|{{yes|BLAS}}
 +
|DS, SD, MVDR, MN beamforming;
 +
Zelinski, McCowan, Lefkimmiatis post-filters
 +
|none
 +
|GCC-PHAT localization
 +
|{{yes|Yes}}
 +
|
 +
|
 +
|-
 +
!MESSL
 +
|2006-2009
 +
|{{no|No}}
 +
|{{no|proprietary}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[https://github.com/mim/messl download]
 +
[https://www.ee.columbia.edu/~ronw/pubs/taslp09-messl.pdf paper]
 +
|
 +
|Matlab
 +
|{{no|No}}
 +
|IPD/ILD clustering
 +
|none
 +
|EM
 +
|{{no|No}}
 +
|
 +
|
 +
|-
 +
!BeamformIt
 +
|2006-2014 (3.51)
 +
|{{yes|Yes}}
 +
|{{yes|ICSI Open Source Speech Tools}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://www.xavieranguera.com/beamformit/ download]
 +
[http://www.xavieranguera.com/papers/transactions_taslp_2007.pdf paper]
 +
[http://nlp.lsi.upc.edu/papers/thesis_xanguera.pdf thesis]
 +
|
 +
|C++
 +
|{{no|No}}
 +
|weighted DS beamforming
 +
|none
 +
|GCC-PHAT localization
 +
|{{yes|Yes}}
 +
|NIST RT06 (included), [https://github.com/kaldi-asr/kaldi/tree/master/egs/ami/ AMI]
 +
|-
 +
!ManyEars
 +
|2007-2014 (1.1.2)
 +
|{{yes|Yes}}
 +
|{{yes|GPL}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://sourceforge.net/projects/manyears/ download]
 +
[http://link.springer.com/article/10.1007/s10514-012-9316-x# paper]
 +
|
 +
|C
 +
|{{no|No}}
 +
|geometric ICA
 +
|Wiener post-filter (noise only)
 +
|CC-PHAT localization
 +
|{{no|No}}
 +
|
 +
|
 +
|-
 +
!HARK
 +
|2010-* (2.1.2)
 +
|{{yes|Yes}}
 +
|{{some|[http://www.hark.jp/HARK_License_Agreement.pdf non-commercial]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://www.hark.jp/ download]
 +
[http://www.tandfonline.com/doi/abs/10.1163/016918610X493561#.VeYVmLPcI_s paper]
 +
|
 +
|C++
 +
|{{yes|BLAS}}
 +
|DS, weighted DS, LCMV, GJ, max SNR beamforming;
 +
geometric ICA
 +
|Wiener post-filter (noise only)
 +
|MUSIC localization; MCRA noise estimation
 +
|{{yes|Yes}}
 +
|
 +
|
 +
|-
 +
!FASST
 +
|2012-* (2.0)
 +
|{{yes|Yes}}
 +
|{{yes|QPL}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://bass-db.gforge.inria.fr/fasst/ download]
 +
[https://hal.inria.fr/hal-00626962v2/document paper]
 +
|
 +
|C++, Matlab, Python
 +
|{{yes|OpenMP}}
 +
|full-rank spatial covariance model
 +
|NMF, source-filter NMF, harmonic NMF, smooth NMF
 +
|EM and multiplicative updates
 +
|{{no|No}}
 +
|
 +
|
 +
|}
  
 
== [[Other applications]] ==
 
== [[Other applications]] ==
Line 135: Line 468:
 
== Contribute software ==
 
== Contribute software ==
 
To contribute new software, please
 
To contribute new software, please
* [[Special:UserLogin|create an account]] and login
+
* [[Main_Page#Contribute|create an account]] and login
 
* go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
 
* go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
 
* click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
 
* click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)

Latest revision as of 12:08, 17 March 2016

This page provides software grouped by application.

Automatic speech recognition

ASR engines General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization acoustic models model adaptation decoding techniques training techniques online ASR robust ASR recipes reproducible results
CMU Sphinx 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) Yes BSD-like Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) website

paper paper mail-list forum github

Java (Sphinx4), C (pocketsphinx) No Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch Yes AURORA4 (WSJ0)
HTK 1993-2009 (3.4.1) Yes proprietary Windows, Linux, OSX website

book mail-list

official

ATK uncertain features diagonal uncertainty decoding full uncertainty decoding

C No Yes MFCC, PLP VTLN, CMN GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch, MMI, MPE, MWE Yes AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.)
Kaldi 2009-* (continous updates) Yes Apache 2.0 Windows (not mantained as of 2014), Linux, OSX website

paper mail-list forum SVN

uncertain features

diagonal uncertainty decoding Matlab conversion tools DNN Uncertainty Decoding

C++ BLAS, LAPACK, GPU (for DNNs) Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform aligment, N-best, lattice rescoring (uses OpenFST) Baum-Welch, MMI (boosted), MC, feature-based Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code
Spraak 2008-* (1.1.374) Yes proprietary Windows (limited), Linux, OSX website

paper mail-list forum SVN

Missing Data Techniques (MDT) C, Python No Yes Flexible preprocessing script language -- examples for MFCC, PLP VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] Viterbi Yes AURORA4, [8]
Julius 1997-* (4.3.1) Yes propietary Windows, Linux, OSX website

book book online mail-list forum CVS

htk2Julius grammar

phoneme seg.

C No Yes MFCC VTLN,CMVN GMM (Tied-Mix) aligment, two-pass decoder Baum-Welch Yes (low latency)
RWTH 2001-* (0.6.1) Yes non-commercial Windows, Linux, OSX website

paper wiki forum

C BLAS, LAPACK, GPU (CUDA), OpenMP Yes MFCC, PLP, Gammatone, Tandem (MLP) VTLN, CMVN, PCA, LDA GMM (Tied covariance), DNN MLLR, CMLLR, BIC aligment, lattice rescoring, system fusion Baum-Welch, MPE Yes

Speaker identification and verification

Software General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization UBM subspace projection subspace normalization scoring diarization robust recognition recipes reproducible results
BECARS 2002-2005 (1.1.9) No CeCILL Windows, Linux download

paper

C No No MFCC Gaussianization GMM MMI-weighted LLR No
ALIZE 2005-* Yes LGPL Windows, Linux, OSX download

mail-list linkedin paper

C++, Perl, Bash No Yes MFCC, LFCC CMVN GMM JFA, i-vector whitening, length norm, LDA, WCCN cosine, Mahalanobis, SVM, PLDA, Z/T norm Yes
LIUM SpkDiarization 2009-2013 No GPL Windows, Linux, OSX download

paper

Python extension Java No Yes MFCC, LFCC CMVN GMM i-vector cosine, Mahalanobis Yes
MSR Identity Toolbox 2013 No proprietary Windows, Linux, OSX download

paper

Matlab No Yes MFCC CMVN, Gaussianization GMM i-vector whitening, length norm, LDA PLDA No
SIDEKIT 2014-* Yes LGPL Windows, Linux, OSX [9]

[10]

[11] Python Yes multiprocessing, threading Yes MFCC, LFCC, FB, bottleneck CMS, CMVN, Gaussianization, RASTA GMM, DNN i-vector, JFA, LFA, SVM whitening, length norm, LDA, WCCN, EFR, SphericalNorm PLDA, Cosine, Mahalanobis, 2 Covariance, Dot-product Yes No Yes
SPEAR 2014-* Yes GPL Windows, Linux, OSX download

paper

Python SGE grid Yes MFCC, LFCC CMVN GMM ISV, JFA, i-vector whitening, length norm, LDA, WCCN PLDA, Z/T norm, score fusion Yes

Speech enhancement and separation

Software General attributes Programming Implemented techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization spatial model spectral model estimation algorithm online separation public recipes reproducible results
BTK 2005-* Yes proprietary Linux, OSX download

papers

C++, Python BLAS DS, SD, MVDR, MN beamforming;

Zelinski, McCowan, Lefkimmiatis post-filters

none GCC-PHAT localization Yes
MESSL 2006-2009 No proprietary Windows, Linux, OSX download

paper

Matlab No IPD/ILD clustering none EM No
BeamformIt 2006-2014 (3.51) Yes ICSI Open Source Speech Tools Windows, Linux, OSX download

paper thesis

C++ No weighted DS beamforming none GCC-PHAT localization Yes NIST RT06 (included), AMI
ManyEars 2007-2014 (1.1.2) Yes GPL Windows, Linux, OSX download

paper

C No geometric ICA Wiener post-filter (noise only) CC-PHAT localization No
HARK 2010-* (2.1.2) Yes non-commercial Windows, Linux, OSX download

paper

C++ BLAS DS, weighted DS, LCMV, GJ, max SNR beamforming;

geometric ICA

Wiener post-filter (noise only) MUSIC localization; MCRA noise estimation Yes
FASST 2012-* (2.0) Yes QPL Windows, Linux, OSX download

paper

C++, Matlab, Python OpenMP full-rank spatial covariance model NMF, source-filter NMF, harmonic NMF, smooth NMF EM and multiplicative updates No

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.