Corpus, sample texts to n first words
Help to sample a corpus at:
a)
- import
- cut texts at n first words after tokenization
- add ‘Sampling’ section in import parameters form
- add ‘Sample texts to [ ] first words’ parameter
- add ‘Cut at sentence boundaries’ option parameter
- cut texts at n first words after tokenization
or
b)
- update
- add new corpus command ‘Sample texts at n first words’ (on
XML-TXM pivot)
- add ‘Number of words’ parameter
- add ‘Cut at sentence boundaries’ option parameter
- update corpus
- add new corpus command ‘Sample texts at n first words’ (on
XML-TXM pivot)
or
c)
- update
- add new corpus command ‘Sample texts from sub-corpus’ (on
XML-TXM pivot from sub-corpus matches)
- for example with sub-corpus built with query
<text> []{1,10000}
and MatchingStrategy set at ‘longest’ - update corpus
- for example with sub-corpus built with query
- add new corpus command ‘Sample texts from sub-corpus’ (on
XML-TXM pivot from sub-corpus matches)
(from redmine: issue id 3353, created on 2023/03/14 by Serge Heiden)