The Open Language Interface for Voice Exploitation (OLIVE) speech processing system provides robust speech information extraction amid high levels of noise and distortion in real-world data.
AI algorithms underlying OLIVE enable the technology to:
- Detect the presence of speech, not just an open channel (speech activity detection);
- Find and/or track speakers of interest (speaker identification);
- Detect languages and dialects from a set of languages of interest (language and dialect identification); and
- Detect specific keywords and phrases (keyword spotting).
Graphical user interfaces in OLIVE enable close editing of audio files, enrollment of new speakers, scoring of segments, speech activity segmentation, and semi-supervised speaker diarization (identification of an individual person based on voice qualities).
Initially developed under the DARPA Robust Automatic Transcription of Speech (RATS) program, OLIVE is designed for easy integration into end-user applications. The technology is under continuous development and refinement based on user feedback.
Capabilities
- Automatic detection of speech, speaker, keywords, and languages of interest from live streaming input or file-based audio
- Automatic speaker segmentation of audio, labeling where each person speaksFunctions with high accuracy in tactical communications with high noise and across multiple channels
Key technologies
Speech Activity Detection (SAD)
- Accurate on noisy operational audio
- Detect speech, not just an open channel
- Process hundreds of channels on low-powered hardware
Language Identification (LID)
- Detect languages/dialects
- Add new languages using collected audio
Speaker Identification (SID)
- Find/track speakers of interest across time and channels
- Add new speakers offline or live, with as little as 8 seconds of speech
Query by Example (QBE)
- Language agnostic keyword spotting
- Enroll key words/phrases offline or live with as little as a single example
Other technologies
Keyword Spotting (KWS)
Word detection in Spanish, Mandarin, Iraqi Arabic
Acoustic Event Detection (AED)
Detection of non-speech acoustic events including whistling, barking, vehicles, gunshots
Speaker Diarization (DIA)
Separation and labeling of unknown speakers in multi-speaker conversations
Forensic Speaker Identification
Close analysis speaker identification for forensic use
OLIVE GUIs
Batch processing/ data triage
- Automatically discard files with no speech
- Find files most likely to contain target language, speaker or keywords
- Processing speed scales with available CPUs
Live streaming
- Live monitoring of incoming audio streams
- Save, search and review past audio
- On-the-fly enrollment of new speakers
- Up to 16 channels running SAD, SID, LID
Close waveform analysis
- Forensic analysis of speech
- Simple but powerful GUI for selecting and reviewing audio segments
- Run any plugin on any selected selected segments