Difference between revisions of "Software"

From rosp
(Automatic speech recognition)
(Automatic speech recognition)
Line 1: Line 1:
 
This page provides software grouped by application.  
 
This page provides software grouped by application.  
  
== [[Automatic speech recognition]] ==
+
{| class="wikitable sortable" style="font-size:72%; border:gray solid 1px; text-align:center; width:auto; table-layout:fixed;"
 
+
|-
'''Kaldi'''
+
!style="width: 40px" rowspan="2" class="unsortable"|ASR Engines
 
+
!scope="col" width="40px" | Release/update
Available at sourceforge [http://kaldi.sourceforge.net/ here]
+
!scope="col" width="40px" | Actively Developed
 
+
!scope="col" width="40px" | Corpora Training-Recipes
 
+
!scope="col" width="40px" | Reproducible Results
'''CMUSphinx'''
+
!scope="col" width="40px" | Licence
 
+
!scope="col" width="40px" | Platforms
Available at sourceforge [http://cmusphinx.sourceforge.net/ here]
+
!scope="col" width="40px" | Language
 
+
!scope="col" width="40px" | VAD
 
+
!scope="col" width="40px" | Acoustic features
'''Hidden Markov Model Toolkit (HTK)'''
+
!scope="col" width="40px" | Feature normalization/compensation
 
+
!scope="col" width="40px" | Acoustic models
Available from the Cambridge University [http://htk.eng.cam.ac.uk/ here] (you need to register to download)
+
!scope="col" width="40px" | Model adaptation/compensation
 
+
!scope="col" width="40px" | decoding techniques
''Resources related to robustness''
+
!scope="col" width="40px" | training techniques
 
+
!scope="col" width="40px" | Hardware Optimization
*Scripts available for various robust ASR Corpora, see the [[Datasets#Automatic_speech_recognition|Datasets section]]
+
!scope="col" width="40px" | Online ASR
 
+
!scope="col" width="40px" | Links
*The [http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html voicebox] MATLAB toolbox allows writing and reading feature vectors in HTK format, thus making possible custom robust front ends.
+
!scope="col" width="40px" | Forums/Mail-Lists
 
+
!scope="col" width="40px" | Online Repository
*Patches to perform Uncertainty Decoding and Modified Imputation available [http://www.astudillo.com/ramon/research/stft-up/ here]
+
!scope="col" width="40px" | Extensions
 +
|-
 +
!HTK
 +
|1993-2009 (3.4.1)
 +
|{{no|No}}
 +
|[http://catalog.elra.info/product_info.php?cPath=37_40&products_id=693 AURORA2 (purch.)] [http://catalog.elra.info/index.php?cPath=37_40 AURORA3 (purch.)], [http://www.keithv.com/software/htk/ AURORA4 (WSJ0)], [http://spandh.dcs.shef.ac.uk/projects/chime/PCC/data/pcchome.tar.gz CHIME-1], [ftp://ftp.dcs.shef.ac.uk/share/spandh/chime_challenge/grid/eval_tools_grid.tgz CHIME-2-I], [http://reverb2014.dereverberation.com/tools/REVERB_TOOLS_FOR_ASR_ver2.0.tgz REVERB]
 +
|ETSI-AFE-AURORA2 [http://aurora.hsnr.de/download/Aurora2_afe_v1_1.pdf paper] (see AURORA2 purch.)
 +
|{{no|limited [http://htk.eng.cam.ac.uk/docs/license.shtml]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|C
 +
|{{yes|Yes}}
 +
|MFCC, PLP
 +
|VTLN, CMS
 +
|GMM (Full Cov.), Tied-Mix, Streams
 +
|HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP
 +
|aligment, N-best, lattice rescoring
 +
|Baum-Welch
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
|[http://htk.eng.cam.ac.uk/download.shtml Website ] [http://htk.eng.cam.ac.uk/docs/docs.shtml Book] (need registration)
 +
|[http://htk.eng.cam.ac.uk/mailing/subscribe_mail.shtml mail-lists] (low activity)
 +
|{{no|No}}
 +
|[http://htk.eng.cam.ac.uk/extensions/index.shtml official], [http://htk.eng.cam.ac.uk/develop/atk.shtml ATK], [https://github.com/ramon-astudillo/custom_fe Uncertainty Decoding],
 +
|-
 +
!Sphinx4
 +
|1986-2011 (4.1.0) 
 +
|{{yes|Yes}}
 +
|[http://www.keithv.com/software/sphinx/ AURORA4 (WSJ0)]
 +
|
 +
|{{some|limited [https://raw.githubusercontent.com/cmusphinx/sphinx4/master/license.terms Copyright, allows modif.]}}
 +
|{{yes|Windows, Linux, OSX}}
 +
|Java
 +
|{{yes|Yes}}
 +
|MFCC, PLP
 +
|CMN, Mel-Spectrum subtraction
 +
|GMM, Streams
 +
|MLLR, MAP
 +
|aligment, N-best, lattice rescoring
 +
|Baum-Welch
 +
|{{no|No}}
 +
|{{yes|Yes}}
 +
|[http://cmusphinx.sourceforge.net/ Website]
 +
[http://www.researchgate.net/publication/228770826_Sphinx-4_A_flexible_open_source_framework_for_speech_recognition/file/79e4150c20aeb37c52.pdf paper]
 +
|[http://sourceforge.net/p/cmusphinx/mailman/ mail-lists] [http://sourceforge.net/p/cmusphinx/discussion/ forums]  
 +
|{{yes|[https://github.com/cmusphinx/sphinx4 Github]}}
 +
|
 +
|-
 +
!Kaldi
 +
|2009-* (continous updates)
 +
|{{yes|Yes}}
 +
|[http://kaldi.sourceforge.net/data_prep.html AURORA4 (WSJ0)], CHIME-2
 +
|Weniger2014-REVERB [http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf Paper] [http://www.mmk.ei.tum.de/~wen/REVERB_2014/kaldi_baseline.tar.gz  Code]
 +
|{{Yes|Apache 2.0}}
 +
|{{some|Windows (not mantained as of 2014), Linux, OSX}}
 +
|C++
 +
|{{yes|Yes}}
 +
|MFCC, PLP
 +
|VTLN, CMVN
 +
|GMM (Full Cov.), SGMM, DNN
 +
|HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform
 +
|Uses OpenFST, aligment, N-best, lattice rescoring
 +
|Baum-Welch, MMI (boosted), MC, feature-based, sequence training
 +
|{{yes|BLAS, LAPACK, GPU (for DNNs)}}
 +
|{{yes|Yes}}
 +
|[http://kaldi.sourceforge.net/about.html Website] [http://homepages.inf.ed.ac.uk/aghoshal/pubs/asru11-kaldi.pdf paper]
 +
|[http://sourceforge.net/p/kaldi/mailman/kaldi-users/ mail-lists] [http://sourceforge.net/p/kaldi/discussion/ forums]
 +
|{{yes|[http://kaldi.sourceforge.net/install.html SVN]}} 
 +
|
 +
|-
 +
!Spraak
 +
|2012 (1.1)
 +
|{{no|No}}
 +
|[http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__example.html AURORA4]
 +
|
 +
|{{some|[http://www.spraak.org/obtaining-spraak/license Academic/commercial]}}
 +
|{{some|Windows ([http://www.spraak.org/obtaining-spraak/system-requirements limited]), Linux}}
 +
|C, Python
 +
|{{yes|Yes}}
 +
|MFCC, PLP
 +
|VTLN,CMN, [http://www.spraak.org/documentation/doxygen/doc/html/spr__tut__mida.html MIDA], [http://www.spraak.org/documentation/doxygen/doc/html/spr__mdt__intro.html MDT Techniques]
 +
|GMM, Tied-Mix, Exemplar based
 +
|CMLLR
 +
|aligment, N-best, lattice rescoring, paralel latices
 +
|Baum-Welch
 +
|{{no|No}}
 +
|{{some|unclear}}
 +
|[http://www.spraak.org/ Website] [http://www.esat.kuleuven.be/psi/spraak/cgi-bin/get_file.cgi?/wambacq/interspeech08/is2008_spraak_v3.pdf paper]
 +
|[http://www.spraak.org/mailing-lists mail-lists] [http://sourceforge.net/p/kaldi/discussion/ forums]
 +
|{{some|[http://www.spraak.org/documentation/doxygen/doc/html/spr__svn.html SVN (needs registration)]}} 
 +
|
 +
|}
  
 
== [[Speaker identification and verification]] ==
 
== [[Speaker identification and verification]] ==

Revision as of 21:01, 29 August 2014

This page provides software grouped by application.

ASR Engines Release/update Actively Developed Corpora Training-Recipes Reproducible Results Licence Platforms Language VAD Acoustic features Feature normalization/compensation Acoustic models Model adaptation/compensation decoding techniques training techniques Hardware Optimization Online ASR Links Forums/Mail-Lists Online Repository Extensions
HTK 1993-2009 (3.4.1) No AURORA2 (purch.) AURORA3 (purch.), AURORA4 (WSJ0), CHIME-1, CHIME-2-I, REVERB ETSI-AFE-AURORA2 paper (see AURORA2 purch.) limited [1] Windows, Linux, OSX C Yes MFCC, PLP VTLN, CMS GMM (Full Cov.), Tied-Mix, Streams HLDA, MLLR (w/ reg. trees), CMLR (w/ adaptive training), MAP aligment, N-best, lattice rescoring Baum-Welch No Yes Website Book (need registration) mail-lists (low activity) No official, ATK, Uncertainty Decoding,
Sphinx4 1986-2011 (4.1.0) Yes AURORA4 (WSJ0) limited Copyright, allows modif. Windows, Linux, OSX Java Yes MFCC, PLP CMN, Mel-Spectrum subtraction GMM, Streams MLLR, MAP aligment, N-best, lattice rescoring Baum-Welch No Yes Website

paper

mail-lists forums Github
Kaldi 2009-* (continous updates) Yes AURORA4 (WSJ0), CHIME-2 Weniger2014-REVERB Paper Code Apache 2.0 Windows (not mantained as of 2014), Linux, OSX C++ Yes MFCC, PLP VTLN, CMVN GMM (Full Cov.), SGMM, DNN HLDA, STC, MLLT, MLLR, CMLLR (w/ reg. trees), Exponential transform Uses OpenFST, aligment, N-best, lattice rescoring Baum-Welch, MMI (boosted), MC, feature-based, sequence training BLAS, LAPACK, GPU (for DNNs) Yes Website paper mail-lists forums SVN
Spraak 2012 (1.1) No AURORA4 Academic/commercial Windows (limited), Linux C, Python Yes MFCC, PLP VTLN,CMN, MIDA, MDT Techniques GMM, Tied-Mix, Exemplar based CMLLR aligment, N-best, lattice rescoring, paralel latices Baum-Welch No unclear Website paper mail-lists forums SVN (needs registration)

Speaker identification and verification

Speech enhancement and separation

Other applications

Contribute software

To contribute new software, please

  • create an account and login
  • go to the wiki page above corresponding to your application; if it does not exist yet, you may create it
  • click on the "Edit" link at the top of the page and add a new section for your software (software is ordered by year of the latest version)
  • click on the "Save page" link at the bottom of the page to save your modifications

Please make sure to provide the following information:

  • name of the software and year of the latest version
  • authors, institution, contact information
  • link to the software, ideally including a short demo, and to the external libraries needed
  • short description (functionalities, inputs and outputs, programming language, operating system, license, etc) and link to a paper/report describing the software, if any
  • whether running on well-known baselines (Aurora-2, Aurora-4, Switchboard, CHiME, etc) is included or requires wrapping by the user

In order to save storage space, please do not upload the software on this wiki, but link it as much as possible from a public repository (e.g., bitbucket, github, sourceforge) or from a stable URL on the website of your institution. If this is not possible, please contact the resources sharing working group.