Research in natural language inference is currently exclusive to English. Here, we propose to advance toward multilingual evaluation. To that end, we provide test data for four major languages. We experiment with a set of baselines based on cross-lingual embeddings and machine translation. While our best system scores an average accuracy of just over 75%, we focus largely on enabling further research in multilingual inference.