Keyphrase extraction and classification has uses ranging from chatbots to scientific analysis. We were just looking into it the other day as a way to automate tag generation. This latest paper gets a new state-of-the-art on the SemEval-2017 Task 10 and ACL RD-TEC 2.0 datasets.
Highlights From the Paper
- “When abundant labelled data is available for an auxiliary task, but little data for the target task, multi-task learning can act as a form of semi-supervised learning combined with a distant supervision signal.”
- “Each task is associated with an independent classification function, but all tasks share the hidden layers. Note that for our experiments, we only consider one auxiliary task at a time.”
- “SemEval-2017 Task 10 dataset - beats state of the art on this dataset with error reductions of up to 9.64%, mostly due to better identification and labelling of long keyphrases.”