Validate the homogeneity of the split

The README on the UD repo gives a breakdown of the split per text, which seems to at least be similar to those of other treebanks but it would be nice to assess its composition in terms of epoch, genre and perhaps in terms of syntactic constructs and add it to our doc here.