The 2012 SESAME Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) Systems

Citation

Murat Akbacak, Robert C. Bolles, Brian J. Burns, Mark Eliot, Aaron Heller, James A. Herson, Gregory K. Myers, Ramesh Nallapati, Stephanie Pancoast, Julien van Hout, Eric Yeh, Amirhossein Habibian, Dennis C. Koelma, Zhenyang Li, Masoud Mazloom, Silvia-Laura Pintea, Koen E. A. van de Sande, Arnold W. M. Smeulders, Cees G. M. Snoek, Sung Chun Lee, Ram Nevatia, Pramod Sharma, Chen Sun, and Remi Trichet. The 2012 SESAME Multimedia Event Detection (MED) and Multimedia Event Recounting (MER) Systems. Proceedings of the 10th TRECVID Workshop, 2012.

Abstract

The SESAME team submitted four runs for the MED12 pre-specified events, two runs for the ad hoc events, and a run for multimedia event recounting. The detection runs included combinations of low-level visual, motion, and audio features; high-level semantic visual concepts; and textbased modalities (automatic speech recognition [ASR] and video optical character recognition [OCR]). The individual types of features and concepts produced a total of 14 event classifiers. We combined the event detection results for these classifiers using three fusion methods, two of which relied on the particular set of detection scores that were available for each video clip. In addition, we applied three methods for selecting the detection threshold. Performance on the ad hoc events was comparable to that for the pre-specified events. Low-level visual features were the strongest performers across all training conditions and events. However, detectors based on visual concepts and low-level, motion-based features were very competitive in performance.


Read more from SRI