arxivst stuff from arxiv that you should probably bookmark

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

Abstract · Apr 11, 2017 18:00 ·

similarity fire text image entities flickr30 phrases phrase pit cs-cv

Arxiv Abstract

  • Liwei Wang
  • Yin Li
  • Svetlana Lazebnik

This paper investigates two-branch neural networks for image-text matching tasks. We propose two different network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. The second one, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our two-branch networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets.

Read the paper (pdf) »