Codex palatinus graecus 23 - Ground Truth Dataset Medieval Greek Manuscripts
Dataset of HTR ground truth for the Codex palatinus graecus 23 (Palatine Anthology), byzantine writing from the X^th^ century.
License
This work is licensed under CC BY 4.0. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
Dataset description
This dataset was produced by the Canada Research Chair on Digital Textualities, as part of the Anthologia graeca project.
A first batch of 50 pages (143-195) were initially transcribed to train a transcription model prototype. We then added 20 pages (196-215) to produce the first version of a transcription model for Greek manuscripts. The transcription of these 70 pages can be found in data/CPgr23
.
Transcription guidelines
To come.
Model description
A transcription model for Greek manuscripts was trained using this dataset. It can be found here: {placeholder}.
Images
This ground truth is based on images of the codex palatinus graecus 23 digitized by the Universitätsbibliothek Heidelberg (where the first part of the manuscript is kept -- the second one being in the BNF, as Supplementum graecum 384), and then uploaded to eScriptorium using IIIF. Find the manuscript here.
How to cite
This dataset was built and is maintained by Maxime Guénette (@mguenette), Mathilde Verstraete (@mverstraete), Alix Chagué (@achague), Marcello Vitali-Rosati (@marviro). The digitization is not copyright-free, but the transcription is. However, properly annotating a corpus takes time and is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:
- Ajouter la référence Zenodo.
Cite the Model
Cite the Dataset
Guénette, M., Verstraete, M., Chagué, A., & Vitali-Rosati, M. Codex palatinus graecus 23 - Ground Truth Dataset Medieval Greek Manuscripts [Data set]. https://gitlab.huma-num.fr/ecrinum/anthologia/htr_cpgr23
@misc{Guenette_Codex_palatinus_graecus,
author = {Guénette, Maxime and Verstraete, Mathilde and Chagué, Alix and Vitali-Rosati, Marcello},
title = {{Codex palatinus graecus 23 - Ground Truth Dataset Medieval Greek Manuscripts}},
url = {https://gitlab.huma-num.fr/ecrinum/anthologia/htr_cpgr23}
}
Cite the Project
Funding
Infrastructure
This dataset project relied on the CREMMA infrastructure.