Where did you get the data from? And what languages are covered and by what ratio? - JRC-Acquis - ClueWeb 09 - Wikipedia - Reuters RCV2 - Debian i18n