arxivst stuff from arxiv that you should probably bookmark

Joint Modeling of Text and Acoustic-Prosodic Cues for Neural Parsing

Abstract · Apr 24, 2017 15:33 ·

cues johnson disfluencies parsing speech prosodic constituent acoustic cs-cl cs-lg cs-sd

Arxiv Abstract

  • Trang Tran
  • Shubham Toshniwal
  • Mohit Bansal
  • Kevin Gimpel
  • Karen Livescu
  • Mari Ostendorf

In conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing a spoken utterance, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and word-based prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together improve parse F1 scores significantly over a strong text-only baseline. For this study with known sentence boundaries, error analysis shows that the main benefit of acoustic-prosodic features is in sentences with disfluencies and that attachment errors are most improved.

Read the paper (pdf) »