
This is the download page of Version 3.0 of the aligned multilingual parallel corpus JRC-ACQUIS . The dataset contains resources for the following languages: Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene, Swedish.
News: Version 3.0 of the corpus includes documents from 2005 and 2006, which almost triples the total size compared to version 2.2. The Bulgarian language has been added as a 22nd language. The corpus contains 463,792 texts and a total of over one Billion words.
The pairwise alignment for all 231 language pairs is now available, using two alternative alignment tools: Vanilla and HunAlign.
See the history of changes in news .
Note: Some corrections have been done on the Bulgarian corpus. The online version has been modified on 13/07/2007. Please replace your version of the Bulgarian corpus if you have downloaded it before that date.
By downloading these resources, you agree to the usage conditions.
1. JRC-ACQUIS Multilingual Parallel Corpus, Version 2.2
This multilingual parallel corpus has been compiled by the Language Technology team of the European Commission's Joint Research Centre (JRC).
