Inhalt des Dokuments
|Florian Metze, CMU, USA|
|Monday, June 24, 2013, 2.15 p.m., TU Hochhaus, 20th floor, Auditorium 1|
|HOST: Tim Polzehl |
Given the deluge of multimedia content that is becoming available over the Internet, it is increasingly important to be able to effectively examine and organize these large stores of information in ways that go beyond browsing or collaborative filtering. In this talk I present our Topic-Oriented Multimedia Summarization system using natural language generation: given a set of automatically extracted features from a video, our system will automatically generate a paragraph of natural language, which summarizes the important information in a video belonging to a certain topic, and for example provides explanations for why a video was matched and retrieved. Possible features include visual semantic concepts, objects, and actions, environmental sounds, and transcripts from automatic speech recognition (ASR).
We see this as a first step towards systems that will be able to discriminate visually similar, but semantically different videos, compare two videos and provide textual output or summarize a large number of videos at once. I also propose possible experimental designs for continuously evaluating and improving TOMS systems, and present results of a pilot evaluation of our initial system.