New State of The Art on Keyphrase Boundary Classification

Post · Apr 4, 2017 18:41 ·

Keyphrase extraction and classification has uses ranging from chatbots to scientific analysis. We were just looking into it the other day as a way to automate tag generation. This latest paper gets a new state-of-the-art on the SemEval-2017 Task 10 and ACL RD-TEC 2.0 datasets.

Highlights From the Paper

  • “When abundant labelled data is available for an auxiliary task, but little data for the target task, multi-task learning can act as a form of semi-supervised learning combined with a distant supervision signal.”
  • “Each task is associated with an independent classification function, but all tasks share the hidden layers. Note that for our experiments, we only consider one auxiliary task at a time.”
  • SemEval-2017 Task 10 dataset - beats state of the art on this dataset with error reductions of up to 9.64%, mostly due to better identification and labelling of long keyphrases.”

Arxiv Abstract

  • Isabelle Augenstein
  • Anders Søgaard

Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far under-explored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases.

