Difference between revisions of "Software"

From rosp
(Automatic speech recognition)
Line 15: Line 15:
 
!scope="col" width="40px" | licence
 
!scope="col" width="40px" | licence
 
!scope="col" width="40px" | platforms
 
!scope="col" width="40px" | platforms
!scope="col" width="40px" | links  
+
!scope="col" width="40px" | links
 
!scope="col" width="40px" | extensions
 
!scope="col" width="40px" | extensions
 
!scope="col" width="40px" | language
 
!scope="col" width="40px" | language
Line 27: Line 27:
 
!scope="col" width="40px" | training techniques
 
!scope="col" width="40px" | training techniques
 
!scope="col" width="40px" | online ASR
 
!scope="col" width="40px" | online ASR
!scope="col" width="40px" | robust ASR training recipes  
+
!scope="col" width="40px" | robust ASR training recipes
 
!scope="col" width="40px" | reproducible results
 
!scope="col" width="40px" | reproducible results
 
|-
 
|-
 
!CMU Sphinx
 
!CMU Sphinx
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8)
+
|1986-* (Sphinx 4.1.0, pocketsphinx 0.8)
 
|{{yes|Yes}}
 
|{{yes|Yes}}
 
|{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}}
 
|{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}}
 
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
 
|{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
|[http://cmusphinx.sourceforge.net/ website]  
+
|[http://cmusphinx.sourceforge.net/ website]
 
[http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper]
 
[http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper]
 
[https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper]
 
[https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper]
Line 56: Line 56:
 
|-
 
|-
 
!HTK
 
!HTK
|1993-2009 (3.4.1)  
+
|1993-2009 (3.4.1)
 
|{{yes|Yes}}
 
|{{yes|Yes}}
 
|{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}}
 
|{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}}
Line 95: Line 95:
 
|GMM (Full Cov.), SGMM, DNN
 
|GMM (Full Cov.), SGMM, DNN
 
|HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform
 
|HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform
|aligment, N-best, lattice rescoring (using OpenFST)
+
|aligment, N-best, lattice rescoring (uses OpenFST)
 
|Baum-Welch, MMI (boosted), MC, feature-based
 
|Baum-Welch, MMI (boosted), MC, feature-based
 
|{{yes|Yes}}
 
|{{yes|Yes}}
Line 104: Line 104:
 
|2008-* (1.1.374)
 
|2008-* (1.1.374)
 
|{{yes|Yes}}
 
|{{yes|Yes}}
|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary2]}}
+
|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}}
 
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}}
 
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}}
 
|[http://www.spraak.org/ website]
 
|[http://www.spraak.org/ website]
 
[http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper]
 
[http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper]
[http://www.spraak.org/mailing-lists mail-list]
 
 
[http://www.spraak.org/mailing-lists mail-list]
 
[http://www.spraak.org/mailing-lists mail-list]
 
[http://sourceforge.net/p/kaldi/discussion/ forum]
 
[http://sourceforge.net/p/kaldi/discussion/ forum]
Line 114: Line 113:
 
|Missing Data Techniques (MDT)
 
|Missing Data Techniques (MDT)
 
|C, Python
 
|C, Python
|{{no|No}}  
+
|{{no|No}}
 
|{{yes|Yes}}
 
|{{yes|Yes}}
|Flexible preprocessing script language -- examples for MFCC, PLP  
+
|Flexible preprocessing script language -- examples for MFCC, PLP
 
|VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 
|VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 
|GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368]
 
|GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368]
Line 122: Line 121:
 
|aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09]
 
|aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09]
 
|Viterbi
 
|Viterbi
|{{yes|Yes}
+
|{{yes|Yes}}
 
|[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 
|[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 +
|-
 +
!Julius
 +
|1997-* (4.3.1)
 +
|{{yes|Yes}}
 +
|{{no|[http://julius.sourceforge.jp/LICENSE.txt propietary]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://julius.sourceforge.jp/en_index.php website]
 +
[http://jaist.dl.sourceforge.jp/julius/47534/Juliusbook-4.1.5.pdf book]
 +
[http://julius.sourceforge.jp/juliusbook/en/ book online]
 +
[mailto:julius-info@lists.sourceforge.jp mail-list]
 +
[http://julius.sourceforge.jp/forum/ forum]
 +
[http://julius.sourceforge.jp/en_index.php?q=index-en.html#get_by_cvs CVS]
 +
|[http://prdownloads.sourceforge.jp/julius/23332/slf2dfa-1.0.tar.gz htk2Julius grammar]
 +
[http://sourceforge.jp/projects/julius/downloads/32570/julius4-segmentation-kit-v1.0.tar.gz phoneme seg.]
 +
|C
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
| MFCC
 +
|VTLN,CMVN
 +
|GMM (Tied-Mix)
 +
|
 +
|aligment, two-pass decoder
 +
|Baum-Welch
 +
|{{yes|Yes (low latency)}}
 +
|
 +
|
 +
|-
 +
!RWTH
 +
|2001-* (0.6.1)
 +
|{{yes|Yes}}
 +
|{{some|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/rwth-asr-license.html non-commercial]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/ website]
 +
[http://www.isca-speech.org/archive/interspeech_2009/i09_2111.html paper]
 +
[http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/ wiki]
 +
[http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/index.php/Special:AWCforum/%3Faction%3Dsc/id3 forum]
 +
|
 +
|C
 +
|{{yes|BLAS, LAPACK, GPU (CUDA), OpenMP}}
 +
|{{yes|Yes}}
 +
| MFCC, PLP, Gammatone, Tandem (MLP)
 +
|VTLN, CMVN, PCA, LDA
 +
|GMM (Tied covariance), DNN
 +
|MLLR, CMLLR, BIC
 +
|aligment, lattice rescoring, system fusion
 +
|Baum-Welch, MPE
 +
|{{yes|Yes}}
 
|
 
|
 
|}
 
|}

Revision as of 20:36, 27 November 2014

This page provides software grouped by application.

Automatic speech recognition

ASR engines General attributes Programming Implemented ASR techniques Reproducible research
release / update actively developed licence platforms links extensions language hardware optimization VAD acoustic features feature normalization / compensation acoustic models model adaptation / compensation decoding techniques training techniques online ASR robust ASR training recipes reproducible results
CMU Sphinx 1986-* (Sphinx 4.1.0, pocketsphinx 0.8) Yes BSD-like Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx) website

paper paper mail-list forum github

Java (Sphinx4), C (pocketsphinx) No Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch Yes AURORA4 (WSJ0)
HTK 1993-2009 (3.4.1) Yes proprietary Windows, Linux, OSX website

book mail-list

official, ATK, FE uncertainty decoding C No Yes MFCC, PLP VTLN, CMN GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch, MMI, MPE, MWE Yes AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.)
Kaldi 2009-* (continous updates) Yes Apache 2.0 Windows (not mantained as of 2014), Linux, OSX website

paper mail-list forum SVN

C++ BLAS, LAPACK, GPU (for DNNs) Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform aligment, N-best, lattice rescoring (uses OpenFST) Baum-Welch, MMI (boosted), MC, feature-based Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code
Spraak 2008-* (1.1.374) Yes proprietary Windows (limited), Linux, OSX website

paper mail-list forum SVN

Missing Data Techniques (MDT) C, Python No Yes Flexible preprocessing script language -- examples for MFCC, PLP VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2] GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4] CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7] Viterbi Yes AURORA4, [8]
Julius 1997-* (4.3.1) Yes propietary Windows, Linux, OSX website

book book online mail-list forum CVS

htk2Julius grammar

phoneme seg.

C No Yes MFCC VTLN,CMVN GMM (Tied-Mix) aligment, two-pass decoder Baum-Welch Yes (low latency)
RWTH 2001-* (0.6.1) Yes non-commercial Windows, Linux, OSX website

paper wiki forum

C BLAS, LAPACK, GPU (CUDA), OpenMP Yes MFCC, PLP, Gammatone, Tandem (MLP) VTLN, CMVN, PCA, LDA GMM (Tied covariance), DNN MLLR, CMLLR, BIC aligment, lattice rescoring, system fusion Baum-Welch, MPE Yes

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.