README update
Expanded README.md with corpus inventory (pair counts, licences, sources), data format description, dataset statistics, tooling documentation, and academic references.
Expanded README.md with corpus inventory (pair counts, licences, sources), data format description, dataset statistics, tooling documentation, and academic references.