import, impossible to tokenize words written with point (.) characters inside
Given transcription principles using point characters inside words, for example the following TXT input where words are separated by space :
ḫr ḥm nỉ Ḥrw ‘nḫ-mst.pl nb.tỉ ‘nḫ-mst.pl nswt-bỉtỉ Ḫpr-kȝ-R‘
A) It is not possible to find correct XTZ or TXT import module parameters values to tokenize words with points inside.
Even when removing punctuations regex and point from sentence segmentation parameters.
Given impossibility to provide a import.xml file for example parameters, here is a screenshot of the parameters setting: import-txt-words-no-point.png
Here is the index of the “.\..” CQL : import-txt-words-no-point-words-with-points.png
B) Points are always rendered in editions with respect to default point formating rules of the current language
MD: when correctly tokenized the points rendering (in Edition and Concordance) is OK
See edition screenshot: import-txt-words-no-point-edition.png
(from redmine: issue id 3389, created on 2023/05/15 by Serge Heiden)
- Uploads:
- import-txt-words-no-point.png
- import-txt-words-no-point-words-with-points.png
- import-txt-words-no-point-edition.png