This python package allows to extract bottleneck, stacked bottleneck features and phoneme/senones posteriors from audio files. Primarily, bottleneck features are tuned for the task of spoken language recognition but can be used in other applications (e.g. speaker recognition, speech recognition) as well. Package includes three neural networks i.e. there are three types of features one can extract with it. Two networks are trained on English data only and the third is trained in multilingual fashion on data coming from 17 different languages. Also, there is a possibility to extract phoneme classes posteriors, create phoneme strings (one best), lattices and summed soft counts.
The detailed description of the tool and its components can be found at http://www.fit.vutbr.cz/~matejkap/software/BUT-Phonexia-BottleneckFeatureExtractor_20180301.tgz [87,7 MB].
Licence:
The models (pretrained networks) are released for noncommercial usage under CC BY-NC-ND 4.0 license (https://creativecommons.org/licenses/by-nc-nd/4.0/) and python code under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0). For any other use, please contact Jan Cernocky or Phonexia.
Citation:
Anna Silnova, Pavel Matejka, Ondrej Glembek, Oldrich Plchot, Ondrej Novotny, Frantisek Grezl, Petr Schwarz, Lukas Burget, Jan “Honza” Cernocky, "BUT/Phonexia Bottleneck Feature Extractor", Submitted to Odyssey: The Speaker and Language Recognition Workshop 2018
FÉR Radek, MATĚJKA Pavel, GRÉZL František, PLCHOT Oldřich, VESELÝ Karel and ČERNOCKÝ Jan. Multilingually Trained Bottleneck Features in Spoken Language Recognition. Computer Speech and Language. Amsterdam: Elsevier Science, 2017, vol. 2017, no. 46, pp. 252-267. ISSN 0885-2308. Available from: http://www.fit.vutbr.cz/research/groups/speech/publi/2017/fer_CSL2017.pdf