Difference between revisions of "Software"

Revision as of 20:36, 27 November 2014

This page provides software grouped by application.

Automatic speech recognition

ASR engines	General attributes						Programming		Implemented ASR techniques								Reproducible research
ASR engines	release / update	actively developed	licence	platforms	links	extensions	language	hardware optimization	VAD	acoustic features	feature normalization / compensation	acoustic models	model adaptation / compensation	decoding techniques	training techniques	online ASR	robust ASR training recipes	reproducible results
CMU Sphinx	1986-* (Sphinx 4.1.0, pocketsphinx 0.8)	Yes	BSD-like	Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)	website paper paper mail-list forum github		Java (Sphinx4), C (pocketsphinx)	No	Yes	MFCC, PLP	CMN, Mel-Spectrum subtraction	GMM, Streams	MLLR, MAP	aligment, N-best, lattice rescoring	Baum-Welch	Yes	AURORA4 (WSJ0)
HTK	1993-2009 (3.4.1)	Yes	proprietary	Windows, Linux, OSX	website book mail-list	official, ATK, FE uncertainty decoding	C	No	Yes	MFCC, PLP	VTLN, CMN	GMM (Full Cov.), Tied-Mix, Streams	HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP	aligment, N-best, lattice rescoring	Baum-Welch, MMI, MPE, MWE	Yes	AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, CHIME-2-II,REVERB	ETSI-AFE-AURORA2 paper (see AURORA2 purch.)
Kaldi	2009-* (continous updates)	Yes	Apache 2.0	Windows (not mantained as of 2014), Linux, OSX	website paper mail-list forum SVN		C++	BLAS, LAPACK, GPU (for DNNs)	Yes	MFCC, PLP	VTLN, CMVN	GMM (Full Cov.), SGMM, DNN	HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform	aligment, N-best, lattice rescoring (uses OpenFST)	Baum-Welch, MMI (boosted), MC, feature-based	Yes	AURORA4 (WSJ0), CHIME-2	Weniger2014-REVERB Paper Code
Spraak	2008-* (1.1.374)	Yes	proprietary	Windows (limited), Linux, OSX	website paper mail-list forum SVN	Missing Data Techniques (MDT)	C, Python	No	Yes	Flexible preprocessing script language -- examples for MFCC, PLP	VTLN,CMN, MIDA, MDT Techniques, Parametric HistEq [1], Noise normalization [2]	GMM (Tied-Mix), Exemplar based [3], NN, CRF, ... (flexible using the preprocessing script) [4]	CMLLR, eigenvoices, GMM-weight based (NMF) [5] -- (all have Matlab dependencies); MAP	aligment, lattice rescoring, SCRF rescoring (using SCARF) [6], phone lattice rescoring [7]	Viterbi	Yes	AURORA4, [8]
Julius	1997-* (4.3.1)	Yes	propietary	Windows, Linux, OSX	website book book online mail-list forum CVS	htk2Julius grammar phoneme seg.	C	No	Yes	MFCC	VTLN,CMVN	GMM (Tied-Mix)		aligment, two-pass decoder	Baum-Welch	Yes (low latency)
RWTH	2001-* (0.6.1)	Yes	non-commercial	Windows, Linux, OSX	website paper wiki forum		C	BLAS, LAPACK, GPU (CUDA), OpenMP	Yes	MFCC, PLP, Gammatone, Tandem (MLP)	VTLN, CMVN, PCA, LDA	GMM (Tied covariance), DNN	MLLR, CMLLR, BIC	aligment, lattice rescoring, system fusion	Baum-Welch, MPE	Yes

Speaker identification and verification

Speech enhancement and separation

Contribute software

To contribute new software, please

create an account and login
go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

name of the software and year of the latest version
authors, institution, contact information
link to the software, ideally including a short demo, and to the external libraries needed
short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.

@@ Line 15: / Line 15: @@
 !scope="col" width="40px" | licence
 !scope="col" width="40px" | platforms
 !scope="col" width="40px" | links
 !scope="col" width="40px" | extensions
 !scope="col" width="40px" | language
@@ Line 27: / Line 27: @@
 !scope="col" width="40px" | training techniques
 !scope="col" width="40px" | online ASR
 !scope="col" width="40px" | robust ASR training recipes
 !scope="col" width="40px" | reproducible results
 |-
 !CMU Sphinx
 |1986-* (Sphinx 4.1.0, pocketsphinx 0.8)
 |{{yes|Yes}}
 |{{yes|[https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms BSD-like]}}
 |{{yes|Windows, Linux, OSX (Sphinx4) / Raspberry-pi, iPhone, Android (pocketsphinx)}}
 |[http://cmusphinx.sourceforge.net/ website]
 [http://www.cs.cmu.edu/~rsingh/homepage/papers/icassp03-sphinx4_2.pdf paper]
 [https://www.cs.cmu.edu/~awb/papers/ICASSP2006/0100185.pdf paper]
@@ Line 56: / Line 56: @@
 |-
 !HTK
 |1993-2009 (3.4.1)
 |{{yes|Yes}}
 |{{no|[http://htk.eng.cam.ac.uk/docs/license.shtml proprietary]}}
@@ Line 95: / Line 95: @@
 |GMM (Full Cov.), SGMM, DNN
 |HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform
-|aligment, N-best, lattice rescoring (using OpenFST)
+|aligment, N-best, lattice rescoring (uses OpenFST)
 |Baum-Welch, MMI (boosted), MC, feature-based
 |{{yes|Yes}}
@@ Line 104: / Line 104: @@
 |2008-* (1.1.374)
 |{{yes|Yes}}
-|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary2]}}
+|{{no|[http://www.spraak.org/obtaining-spraak/license proprietary]}}
 |{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux, OSX}}
 |[http://www.spraak.org/ website]
 [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper]
-[http://www.spraak.org/mailing-lists mail-list]
 [http://www.spraak.org/mailing-lists mail-list]
 [http://sourceforge.net/p/kaldi/discussion/ forum]
@@ Line 114: / Line 113: @@
 |Missing Data Techniques (MDT)
 |C, Python
 |{{no|No}}
 |{{yes|Yes}}
 |Flexible preprocessing script language -- examples for MFCC, PLP
 |VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques], Parametric HistEq [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/xzhang/ICASSP2010/zhang.pdf&auto&xz:icassp10], Noise normalization [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
 |GMM (Tied-Mix), Exemplar based [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/dtw/paper_dtw.pdf&auto&kd:icassp11a], NN, CRF, ... (flexible using the preprocessing script) [http://lib.ugent.be/catalog/pug01:4382368]
@@ Line 122: / Line 121: @@
 |aligment, lattice rescoring, SCRF rescoring (using SCARF) [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/icassp11/scarf/paper_scarf.pdf&auto&kd:icassp11b], phone lattice rescoring [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/duchato/eurospeech09/flavor.pdf&auto&jd:intersp09]
 |Viterbi
-|{{yes|Yes}
+|{{yes|Yes}}
 |[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4], [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/krisdm/intersp10/mb_vs_fe.pdf&auto&kd:intersp10]
+|-
+!Julius
+|1997-* (4.3.1)
+|{{yes|Yes}}
+|{{no|[http://julius.sourceforge.jp/LICENSE.txt propietary]}}
+|{{yes|Windows, Linux, OSX}}
+|[http://julius.sourceforge.jp/en_index.php website]
+[http://jaist.dl.sourceforge.jp/julius/47534/Juliusbook-4.1.5.pdf book]
+[http://julius.sourceforge.jp/juliusbook/en/ book online]
+[mailto:julius-info@lists.sourceforge.jp mail-list]
+[http://julius.sourceforge.jp/forum/ forum]
+[http://julius.sourceforge.jp/en_index.php?q=index-en.html#get_by_cvs CVS]
+|[http://prdownloads.sourceforge.jp/julius/23332/slf2dfa-1.0.tar.gz htk2Julius grammar]
+[http://sourceforge.jp/projects/julius/downloads/32570/julius4-segmentation-kit-v1.0.tar.gz phoneme seg.]
+|C
+|{{no|No}}
+|{{yes|Yes}}
+| MFCC
+|VTLN,CMVN
+|GMM (Tied-Mix)
+|
+|aligment, two-pass decoder
+|Baum-Welch
+|{{yes|Yes (low latency)}}
+|
+|
+|-
+!RWTH
+|2001-* (0.6.1)
+|{{yes|Yes}}
+|{{some|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/rwth-asr-license.html non-commercial]}}
+|{{yes|Windows, Linux, OSX}}
+|[http://www-i6.informatik.rwth-aachen.de/rwth-asr/ website]
+[http://www.isca-speech.org/archive/interspeech_2009/i09_2111.html paper]
+[http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/ wiki]
+[http://www-i6.informatik.rwth-aachen.de/rwth-asr/manual/index.php/Special:AWCforum/%3Faction%3Dsc/id3 forum]
+|
+|C
+|{{yes|BLAS, LAPACK, GPU (CUDA), OpenMP}}
+|{{yes|Yes}}
+| MFCC, PLP, Gammatone, Tandem (MLP)
+|VTLN, CMVN, PCA, LDA
+|GMM (Tied covariance), DNN
+|MLLR, CMLLR, BIC
+|aligment, lattice rescoring, system fusion
+|Baum-Welch, MPE
+|{{yes|Yes}}
 |
 |}

Not logged in

Search

Navigation

Tools

Difference between revisions of "Software"

Namespaces

Views

Actions