CWB-encode fails to process UTF-8 diacritics following accent marks
When diacritics accents are encoded on the next character cwb-encode fails. This issue has been observed only on Mac
Solution
Normalize the accents when writing CQP corpus sources files
At the beginning of the importer steps of import modules use : String s2 = java.text.Normalizer.normalize(s, java.text.Normalizer.Form.NFC)
This must be done before any other import step: tokenizer, annotate, etc.
Edited by Matthieu Decorde