Skip to content
Snippets Groups Projects

Codex palatinus graecus 23 - Ground Truth Dataset Medieval Greek Manuscripts

characters badge regions badge lines badge files badge

Dataset of HTR ground truth for the Codex palatinus graecus 23 (Palatine Anthology), byzantine writing from the X^th^ century.

License

This work is licensed under CC BY 4.0. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Dataset description

This dataset was produced by the Canada Research Chair on Digital Textualities, as part of the Anthologia graeca project.

A first batch of 50 pages (143-195) were initially transcribed to train a transcription model prototype. We then added 20 pages (196-215) to produce the first version of a transcription model for Greek manuscripts. The transcription of these 70 pages can be found in data/CPgr23.

Transcription guidelines

To come.

Model description

A transcription model for Greek manuscripts was trained using this dataset. It can be found here: {placeholder}.

Images

This ground truth is based on images of the codex palatinus graecus 23 digitized by the Universitätsbibliothek Heidelberg (where the first part of the manuscript is kept -- the second one being in the BNF, as Supplementum graecum 384), and then uploaded to eScriptorium using IIIF. Find the manuscript here.

How to cite

This dataset was built and is maintained by Maxime Guénette (@mguenette), Mathilde Verstraete (@mverstraete), Alix Chagué (@achague), Marcello Vitali-Rosati (@marviro). The digitization is not copyright-free, but the transcription is. However, properly annotating a corpus takes time and is a task that should be recognized. If you use any item from this corpus of ground truth, cite the dataset using the following information:

  • Ajouter la référence Zenodo.

Cite the Model

Cite the Dataset

Guénette, M., Verstraete, M., Chagué, A., & Vitali-Rosati, M. Codex palatinus graecus 23 - Ground Truth Dataset Medieval Greek Manuscripts [Data set]. https://gitlab.huma-num.fr/ecrinum/anthologia/htr_cpgr23

@misc{Guenette_Codex_palatinus_graecus,
author = {Guénette, Maxime and Verstraete, Mathilde and Chagué, Alix and Vitali-Rosati, Marcello},
title = {{Codex palatinus graecus 23 - Ground Truth Dataset Medieval Greek Manuscripts}},
url = {https://gitlab.huma-num.fr/ecrinum/anthologia/htr_cpgr23}
}

Cite the Project

Funding

Infrastructure

This dataset project relied on the CREMMA infrastructure.