Ajay Divakaran

Senior Technical Director, Vision and Learning Laboratory, Center for Vision Technologies

Ajay Divakaran, Ph.D., is technical director of the Vision and Learning Laboratory in SRI International’s Center for Vision Technologies. In this role, he is responsible for the proposal and execution of contract research projects in computer vision as well as multi-sensor systems that combine various modalities.

Divakaran is currently the principal investigator for a number of SRI research projects. His work includes social multimedia (video-audio-text) analytics, multimodal modeling and analysis of affective, cognitive, and physiological aspects of human behavior, interactive virtual reality-based training, applied machine learning, tracking of individuals in dense crowds and multi-camera tracking, and audio analysis for event detection in open-source video. He has developed several innovative technologies for multimodal systems in both commercial and government programs during the course of his career.

Prior to joining SRI in 2008, Divakaran worked at Mitsubishi Electric Research Labs for 10 years, where he was the lead inventor of the world’s first sports highlights playback-enabled DVR. He also oversaw a wide variety of product applications for machine learning.

Divakaran was named a Fellow of the IEEE in 2011 for his contributions to multimedia content analysis. He developed techniques for recognition of agitated speech for his work on automatic sports highlights extraction from broadcast sports video. He established a sound experimental and theoretical framework for human perception of action in video sequences as lead-inventor of the MPEG-7 video standard motion activity descriptor. He serves on Technical Program Committees of key multimedia conferences, and served as an associate editor of IEEE Transactions on Multimedia from 2007 to 2010. He has authored two books and has more than 100 publications to his credit, as well as more than 50 issued patents.

Divakaran received his M.S. and Ph.D. degrees in electrical engineering from Rensselaer Polytechnic Institute. His B.E. in electronics and communication engineering is from the University of Jodhpur in India.

Ajay Divakaran talks about big data, social media influence and robotic navigation on The Dish TV

Recent publications

Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

We present a series of two studies conducted to understand user’s affective states during voice-based human-machine interactions.
Broadening AI Ethics Narratives: An Indic Arts View

We investigate uncovering the unique socio-cultural perspectives embedded in human-made art, which in turn, can be valuable in expanding the horizon of AI ethics.
Model-Free Generative Replay For Lifelong Reinforcement Learning: Application To Starcraft-2

We evaluate our proposed algorithms on three different scenarios comprising tasks from the Starcraft 2 and Minigrid domains.
Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models

Error maps can indicate when a correctly attended region may be processed incorrectly leading to an incorrect answer, and hence, improve users’ understanding of those cases.
Challenges in Procedural Multimodal Machine Comprehension: A Novel Way to Benchmark

We identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models.
Comprehension Based Question Answering Using Bloom’s Taxonomy

Our experiments focus on zero-shot question answering, using the taxonomy to provide proximal context that helps the model answer questions by being relevant to those questions.

Ajay Divakaran

Recent publications

Towards Understanding Confusion and Affective States Under Communication Failures in Voice-Based Human-Machine Interaction

Broadening AI Ethics Narratives: An Indic Arts View

Model-Free Generative Replay For Lifelong Reinforcement Learning: Application To Starcraft-2

Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models

Challenges in Procedural Multimodal Machine Comprehension: A Novel Way to Benchmark

Comprehension Based Question Answering Using Bloom’s Taxonomy