top of page

Abstract

In this paper, we address the temporal moment localization issue, namely, localizing a video moment described by a natural language query in an untrimmed video. This is a general yet challenging vision-language task since it requires not only the localization of moments, but also the multimodal comprehension of textual-temporal information (e.g., "first" and "leaving") that helps to distinguish the desired moment from the others, especially those with the similar visual content. While existing studies treat the given language queries as a single unit, we propose to decompose them into two components: the relevant cue related to the desired moment localization and the irrelevant one meaningless to the localization. This allows us to flexibly adapt to arbitrary queries in an end-to-end framework. In our proposed model, a language-temporal attention network is utilized to learn the word attention based on the temporal context information in the video. Therefore, our model can automatically select "what words to listen to" for localizing the desired moment. We evaluate the proposed model on two public benchmark datasets: DiDeMo and Charades-STA. The experimental results verify its superiority over several state-of-the-art methods.

pipeline

Pipeline

code

Code & Data

  • Code:

1. MCN:

2. CTRL:

3. ROLE:

  • Dataset:

1. DiDeMo:

2. Charades-STA:

concat

Thanks! Message sent.

  • Facebook Social Icon
  • Twitter Social Icon
  • Google+ Social Icon

Copyright (C) <2018>  Shandong University

 

This program is licensed under the GNU General Public License 3.0 (https://www.gnu.org/licenses/gpl-3.0.html). Any derivative work obtained under this license must be licensed under the GNU General Public License as published by the Free Software Foundation, either Version 3 of the License, or (at your option) any later version, if this derivative work is distributed to a third party.

 

The copyright for the program is owned by Shandong University. For commercial projects that require the ability to distribute the code of this program as part of a program that cannot be distributed under the GNU General Public License, please contact <mengliu.sdu@gmail.com> to purchase a commercial license.

bottom of page