Unexpected rare events are potentially information rich but still poorly processed by today's computing systems. DIRAC project has addressed this crucial machine weakness and developed methods for environment-adaptive autonomous artificial cognitive systems that will detect, identify and classify possibly threatening rare events from the information derived by multiple active information-seeking audio-visual sensors.
The project was coordinated by the Hebrew University Of Jerusalem (scientific coordination) and Carl von Ossietzky University Oldenburg (administrative coordination). BUT mainly worked on out-of-vocabulary (OOV) word detection and handling.
The series of projects have have set up serious grounds in multiple research areas related to human-human interaction modeling, computer enhanced human-human communication (especially in the context of face-to-face and remote meetings), social communication sensing, and social signal processing.
The projects were coordinated by IDIAP research institute (scientific coordination) and the University of Edinburgh (administrative coordination). A notable output of M4/AMI/AMIDA is the AMI meeting corpus – a valuable resource to train ASR of spontaneous non-native English. The work of BUT concentrated on the ASR of meetings, the AMI ASR team was headed by Thomas Hain from the University of Sheffield.
The project aimed at studying, developing and assessing multimedia knowledge-based content analysis, knowledge extraction components, and metadata management sub-systems in the context of automated situation awareness, diagnosis and decision support. It focused on the extraction of a structured knowledge from large multimedia collections recorded over networks of camera and microphones deployed in real sites.
The project was coordinated by Thales Communications. BUT worked on both video and audio analysis. In audio, we were applying the know-how from speech recognition to the identification of rare audio events.
During the lifetime of the project, originally scheduled to last two years, partners collected speech data for 18 languages or dialectal zones, including most of the languages spoken in the EU. SpeeCon devoted special attention to the environment of the recordings - at home, in the office, in public places or in moving vehicles.
The project was coordinated by Siemens R&D and BUT together with the Czech Technical University in Prague were sub-contracted by Harmann/Becker to collect the data for Czech. Czech (as well as other) SpeeCon databases are currently available from ELRA.
The project focused on Spoken Language Resources, namely speech databases for fixed telephone networks including associated annotations and pronunciation lexica. Speech from 2500 speakers was collected for Russian and from 1000 speakers for Czech, Slovak , Polish and Hungarian.
The project was coordinated by Lernout & Hauspie. BUT together with the Czech Technical University in Prague. As you might expect, we were working on the Czech. The project was the first EU project funded at BUT and we worked on it while still at the “old” Faculty of Electrical Engineering and Computer Science (the transition of the speech group to FIT happened only in 2002). Czech (as well as other) SpeechDat-E databases are currently available from ELRA.
The MATERIAL Program seeks to develop methods for finding speech and text content in low-resource languages that is relevant to domain-contextualized English queries. Such methods must use minimal training data and be rapidly deployable to new languages and domains.
BUT’s task in MATERIAL is to work on automatic speech recognition in Material target languages, supported by other technologies, such as automatic language identification to filter out non-target speech data. We are part of the “FLAIR” team coordinated by Raytheon BBN Technologies. BUT's principal investigator in MATERIAL is Dr. Martin Karafiat.
Text-independent speaker verification (SV) is currently the only bastion in the domain of speech data mining that resists the massive attack of deep neural networks (DNNs). We have already seen the end-to-end DNN approach to yield very good performance in the area of text-dependent SV and DNNs have been very successful in the related domain of spoken language recognition. In this project, we will depart from existing DNN approaches for SV and advance towards full-DNN systems.
This project is financed by Faculty Research Award by Google, its principal investigator is Oldrich Plchot.
The goal of the Low Resource Languages for Emergent Incidents (LORELEI) Program is to dramatically advance the state of computational linguistics and human language technology to enable rapid, low-cost development of capabilities for low-resource languages. The program aims at providing situational awareness by identifying elements of information in foreign language and English sources, such as topics, names, events, sentiment and relationships.
BUT works on information mining from speech and concentrates on topic detection, sentiment analysis and system training without much or any resources in the target language. We are part of the “ELISA” team coordinated by the University of South California (USC) in L.A.
Existing speech signal processing technologies are inadequate for most noisy or degraded speech signals that are important to military intelligence. The Robust Automatic Transcription of Speech (RATS) program is creating algorithms and software for performing the following tasks on potentially speech-containing signals received over communication channels that are extremely noisy and/or highly distorted: Speech Activity Detection (SAD), Language Identification (LID), Speaker Identification (SID) and Key Word Spotting (KWS).
BUT’s task in RATS is to work on robust techniques for SAD, LID and SID, especially using neural-network based algorithms. We are part of the “RATS-Patrol” team coordinated by Raytheon BBN.
The Babel Program develops agile and robust speech recognition technology that can be rapidly applied to any human language in order to provide effective search capability for analysts to efficiently process massive amounts of real-world recorded speech. Today's transcription systems are built on technology that was originally developed for English, with markedly lower performance on non-English languages. These systems have often taken years to develop and cover only a small subset of the languages of the world. Babel intends to demonstrate the ability to generate a speech transcription system for any new language within one week to support keyword search performance for effective triage of massive amounts of speech recorded in challenging real-world situations.
BUT’s task in Babel is to develop algorithms and solutions for fast prototyping of recognizers in shortening times and on lower and lower amounts of data (note that the “VLLP” condition has only 3 hours of training data). We are part of the “Babelon” team coordinated by Raytheon BBN.
IARPA Biometrics Exploitation Science & Technology (BEST) program sought to significantly advance the state-of-the-science for biometrics technologies. The overarching goals for the program are: (1) To significantly advance the Intelligence Community's (IC) ability to achieve high-confidence match performance, even when the features are derived from non-ideal data, (2) To significantly relax the constraints currently required to acquire high fidelity biometric signatures.
BUT was part of the PRISM team coordinated by the STAR laboratory of SRI International in Menlo Park, CA, USA. We were working on high-level features for speaker recognition (SRE). Among the notable achievements were the advances on multinomial distribution describing discrete features for SRE and the definition of PRISM data set.
This project proposed to fill the gap of insufficient training data for language recognition (LRE) by using the data acquired from public sources, namely radio broadcasts.
The project was finances by the U.S. Air Force European Office of Aerospace Research & Development (EOARD). This work helped NIST and LDC to generate data for the NIST 2009 language recognition evaluation. See the technical report for details.
In fast developing big markets such as the Indian one, severe problems make the exploitation of speech difficult: multitude of languages (some of them with limited or missing resources), highly noisy conditions (lots of business is simply done on the streets in Indian cities), and highly variable numbers of speakers in a conversation (from normal two to whole families). These make the development of automatic speech recognition (ASR), speaker recognition (SR) and speaker diarization (determining who spoke when, SD) complicated. In the project, two established research institutes with significant track multi-lingual ASR, robust SR and SD: Brno University of Technology (BUT), IIT Madras (IIT-M) team up with an important player on the Indian and global personal electronics markets - Samsung R&D Institute India-Bangalore (SRI-B), and work on significant advances in several speech technologies, notably in multi-lingual low-resource ASR.
Psychotherapy is an expert activity requiring continuous decision-making and continuous evaluation of the course of the psychotherapeutic process by the psychotherapist. In practice, however, psychotherapists suffer from a lack of immediate feedback to support this decision. The project aims to create a tool that enables automated analysis of audio recordings of psychotherapeutic sessions to provide psychotherapists feedback on the course in a short time. The project aims at technologies of automatic speech recognition, natural language computer processing, machine learning, expert coding of psychotherapeutic process and self-assessment questionnaire methods. Its expected outcome will be software providing psychotherapists with user-friendly and practically beneficial feedback with the potential to improve psychotherapeutic care.
The project was proposed in cooperation of Brno University of Technology and Masaryk University and it is funded by the Techbology Agency of the Czexch Republic within the ETA program. The PI of the project is Pavel Matejka.
The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of such NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has an ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.
The project is supported by the program "excellence in basic research" (EXPRO) of the Czech Science Foundation (GACR). We are working with partners from Charles University in Prague. The PI is Lukas Burget.
The project focuses on research and development of artificial intelligence technologies for automated reception and processing of emergency calls in the environment of integrated rescue system by means of voice chat-bota (HCHB).
BUT is member of a consortium coordinated by
We are responsible for R&D in speech data mining and for data processing, and BUT PI is Ondrej Glembek.
Speech data mining is becoming indispensable for units fighting criminality and terrorism. The current versions allow for successful deployment on data acquired from close-talk microphones. The goal of DRAPAK is to increase the performance of speech data mining from distant microphones in real environments and to generate relevant information in corresponding operational scenarios. The output is a set of software tools to be tested by the Police of the Czech Republic and other state agencies.
DRAPAK is supported by the Ministry of Interior of the Czech Republic and is coordinated by BUT, that is responsible for core speech and signal processing R&D. The project is tightly linked to the 2015 Frederick Jelinek workshop group “Far-Field Enhancement and Recognition in Mismatched Settings”. Our partner in the project is Phonexia, responsible for industrial R&D and relations with security-oriented customers.
The goals of this project are (1) improve existing and design new neural network techniques for speech signal processing and speech data mining, mainly in the fields of remote sensing (microphone arrays), training on limited real data, language modeling, speaker recognition and detection of out-of-vocabulary words (OOV). (2) prepare the research results for industrial adoption in the form of functioning software, consultations with the industrial partner and intensive transfer of know-how.
BUT is the prime contractor of this project, with Phonexia as an industrial partner. The project is sponsored by the Technology Agency of the Czech Republic under the "Zeta" program. As this program accentuates gender equality, the research team is in composed in large part of female researchers and developpers from the BUT Speech@FIT group and Phonexia.
The goal of this project is R&D in the field of meeting audio processing (including meetings, team briefings, customer relations, etc.) leading to creation of prototype of an intelligent meeting assistant helping during a meeting (on-line), with the processing of meeting minutes (off-line), and with the following storage and sharing of meeting-related materials.
BUT coordinates this project and is responsible for the core speech data mining R&D. We have partnered with Phonexia (prototype integration, production aspects of speech data mining, speech I/O), Lingea (terminology, natural language processing and translation) and Tovek (data mining from heterogeneous resources, use cases).
Past few years have witnessed a substantial progress in theory and algorithmization of speaker recognition (SRE). ZAOM aimed at adaptation of SRE algorithms for specific needs of police and intelligence services, in order to (1) provide precise but easy-to-understand visualization so that responsible personnel obtains timely information needed to cope with threats and to speed up investigation, (2) be able to adapt systems to target user data and substantially improve their performances.
ZAOM is supported by the Ministry of Interior of the Czech Republic and was coordinated by BUT, that was responsible for core speech and signal processing R&D. Phonexia was our industrial partner, responsible for the development part and interaction with security oriented customers. An important output of the project is our proposal of the Voice Biometry Standard.
The project aimed at development of advanced techniques in speech recognition and their deployment in the functional applications: search in electronic dictionaries on mobile devices, dictating translations, in defense and security, in dialogue systems, in client-care systems (CRM, helpdesk etc.) and in audio-visual access to teaching materials.
BUT coordinates this project and partnered with Phonexia (security and defense applications), Lingea (electronic dictionaries) and Optimsys (interactive voice response (IVR) systems). The main output of BUT is the lecture browsing system now available at prednasky.com and superlectures.com.
The project aimed at bringing speech data mining technologies to the use of the Czech national security community.
The project was supported by the Ministry of Interior of the Czech Republic. BUT was the member of the consortium including University of West Bohemia in Pilsen and Technical University Liberec. In addition to advances in language recognition, speaker recognition and speech transcription, the project produced very valuable Czech spontaneous speech database that is still serving to R&D in the ASR of Czech. It also started the tradition of annual meetings of the Czech speech researchers with the members of national security community.
The project aimed at research, development and assessment of technologies for prototyping of speech recognition and search systems with only a few hours of transcribed training data, without the need for phonetic or linguistic expertise. These technologies were tested in the domain of electronic dictionaries.
The project was supported by the Ministry of Trade and Commerce of the Czech Republic under the “TIP” program. It was coordinated by Lingea, BUT was responsible for development of training paradigms requiring small amounts of training data. The MoC project contributed to the definition of Subspace Gaussian Mixture models (SGMMs) and it allowed us to jump-start the work under IARPA Babel.