Building a Neural Machine Translation System Using Only Synthetic Parallel Data

Apr 2, 2017


  • Jaehong Park
  • Byunggook Na
  • Sungroh Yoon

Recent works have proved that synthetic parallel data generated by existing translation models can be an effective solution to various neural machine translation (NMT) issues. In this study, we construct NMT systems using only synthetic parallel data. As an effective alternative to real parallel data, we also present a new type of synthetic parallel corpus. The proposed pseudo parallel data are distinct from previous approaches in that real and synthetic sentences are mixed on both sides of sentence pairs. Experiments on Czech-German and French-German translations demonstrate the efficacy of the proposed pseudo parallel corpus that guarantees not only both balanced and competitive performance for bidirectional translation but also substantial improvement with the aid of a real parallel corpus.

