SubCorpus2Corpus utility
An utility to create a corpus from texts of a sub-corpus. See https://groupes.renater.fr/wiki/txm-info/public/annotation/specs_manual_annotation/specs_store_annotation#outil_exporttextstocorpus
Validation test
- Create&Select a subcorpus of De Gaulle's texts
- Run the SubCorpus2Corpus macro
- Set the parameters :
- new_corpus_name = TESTCORPUS
- properties_to_remove = frpos
- Run the macro. Output in console is :
Compilation de SubCorpus2CorpusMacro.groovy...
Exécution du script Groovy SubCorpus2CorpusMacro.groovy…
Get text ids from subcorpus...
IDS (10)=[1-De Gaulle-1959, 1-De Gaulle-1960, 1-De Gaulle-1961, 1-De Gaulle-1962, 1-De Gaulle-1963, 1-De Gaulle-1964, 1-De Gaulle-1965, 1-De Gaulle-1966, 1-De Gaulle-1967, 1-De Gaulle-1968]
Export corpus .../corpora/TESTCORPUS.txm...
Delete previous corpus TESTCORPUS
Partition "text@loc" supprimé(e).
Partition "text@id" supprimé(e).
Corpus "199x" supprimé(e).
Concordance "<[]{3,5}>" supprimé(e).
Corpus "TESTCORPUS" supprimé(e).
History "Query history" supprimé(e).
Load corpus...
Delete text files...
.../corpora/TESTCORPUS/txm/TESTCORPUS
Delete properties frpos...
Removing [frpos] to TESTCORPUS XML-TXM files...
10 .........1 - Done
Update corpus TESTCORPUS...
Updating TESTCORPUS
Compiling the "xtz" import module scripts...
Démarrage du module d'import "xtz"...
-- COMPILING - Building Search Engine indexes
-- Scanning structures&properties to create for 10 texts...
10 .........1 - Done
-- Building CQP files 10/10...
10 .........1
-- Running cwb-encode...
Word properties: id, frlemma, n
Structures: lb:0+n, p:0+n, text:0+id+base+project+loc+annee, txmcorpus:0+lang
10 .........1
-- Running cwb-makeall...
-- EDITION - Building editions
-- Building 'default' edition of 10/10 texts...
10 .........1
Re-export corpus .../corpora/TESTCORPUS.txm...
Done: .../corpora/TESTCORPUS.txm
Effectué en 6289 ms.
- Load the corpus, check the present texts and that the frpos property is gone