First UD 2.9 release candidate

Check out the full diff for details

Added

  • Approximately 350 sentences (mostly averbal) imported from the equivalent texts in Base de français mediéval. Some of them (like interjections) do not have a lot of syntactic value, but they make for more consistent and realistic documents.
  • The new sentences have been added to the split by attaching them to the part of the previous tree

Removed

From the sentences in the previous version, we remove:

  • 7111 (in Roland), which was actually incomplete
  • 11226 (in Graal), which was a duplicate of 11228
  • 14607 (in TroyesYvain), which has been merged with 14606

Changed

  • Synchronization with the Base de français mediéval : (almost) all tokens now have a XmlId attribute in MISC that link them to the corresponding word in the TXM release, allowing going back and forth between the two. Some token attributes have changed to make them reflect their BFM equivalents
  • A total of 2733 punctuation tokens have been added (with automatic heuristic attachments) in texts whose transcriptions had punctuation.
  • A few sentences from StAlexis that are not in the BFM version have been moved to StAlexis_extra.conllu.
  • A number of manual corrections on existing trees including form corrections and resegmentations, see the full diff for details.