Multi-modal data analytics


SRI develops multi-modal data analytics solutions using machine learning for a variety of applications. These include image and video search, activity and fine grained recognition, social media analytics and RF signal exploitation.


SRI has developed the Computer Vision AI Search Tool (CVAST) for rapidly building searchable-image and user-annotation AI training databases. The CVAST tool supports ingestion and rapid object clustering and annotation of common scene object features. A flexible image/attribute database allows users to search for related features within a vast collection of image sets.  

Under the DARPA Visual Media Reasoning (VMR) program, SRI created visual exploitation and indexing tools to rapidly extract mission-relevant visual intelligence from large quantities of diverse, ill-defined, unstructured imagery captured from multiple adversary sources. 

Activity recognition

Under the IARPA Automated Low-Level Analysis and Description of Diverse Intelligence Video (ALADDIN) () and Deep Intermodal Video Analytics (DIVA) (TARDIS-V) programs, CVT has developed large-scale video content-retrieval systems using machine learning-based indexing of content through automatic detection of concepts comprising objects, events, actions, locations and similar. 

Video recognition software for activities (pushup by subject)

Fine-grained recognition

SRI has developed state of the art algorithms and systems for fine-grained classification of objects such as vehicles, natural objects and other classes.  

Fine-grained recognition is a challenging task because small and localized differences between similar looking objects indicate the specific fine-grained label. At the same time, accurate recognition needs to discount spurious changes in appearance caused by occlusions, partial views and proximity to other clutter objects in scenes. 

SRI has developed a novel multi-task deep network architecture that jointly optimizes both localization of parts and fine-grained recognition by learning from training data. We have developed memory and computational efficient algorithms for fine-grained  recognition that can be easily embedded in mobile applications.

Fine grain computer vision recognition

Social media analytics 

Under the DARPA Social Media in Strategic Communication (SMISC), Computational Simulation of Online Social Behavior (SocSim) (SBIR M3I system), ONR CEROSS and AFRL Multimedia-Enhanced Social Media Analytics (MESA)programs, CVT has developed social media content analytics for seamless multi-way cross-platform retrieval between images, videos, text, and users using multimodal embedding of users and content in the same geometric space. Furthermore, CVT has developed a system that can detect the intent behind social media postings. Our work provides a framework for tracking the propagation of influence in social media.

Infographic data, technologies, and use cases small image

Recent work

Recent publications

more +