arxivst stuff from arxiv that you should probably bookmark

Automatic Evaluation of Generated Summaries

Post · Apr 25, 2017 17:04 ·

nlp duc cs-cl cs-ai

Looking for an automated way to assess the quality of your generated summaries? The authors of this paper generate a bunch of questions based off of a source text, and then ask those questions using the generated text as the database. If the information is found in both, then the generated text can be said to be a good representation of the source text. Seems reasonable to me. I might try it for this site.

Highlights From the Paper

  • Requires minimum manual efforts.
  • Clearly shows how a measure is calculated.
  • Pinpoints exactly the content differences of two text passages.
  • In general our scores and human scores correlate very well.


Arxiv Abstract

  • Ping Chen
  • Fei Wu
  • Tong Wang

Many Natural Language Processing and Computational Linguistics applications involves the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for decades, that is, how to automatically and accurately assess quality of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation: how to pinpoint content differences of two text passages (especially for large pas-sages such as articles and books). Our idea is intuitive and very different from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of questions to exhaustively identify all content points in it. By comparing the correctly answered questions from two text passages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results.

Read the paper (pdf) »