Changes

Marie Bizais-Lilig · a2fba0a5
--- a/home/database/database-construction.md
+++ b/home/database/database-construction.md
+## Database sources
+
+A structure of Chinese words and earliest text attestations of these words is taken over from _Computerlinguistische Datierung schriftsprachlicher chinesischer Texte_ [^1]. Information is parsed from a UTF-8 plain text version of _Hanyu da cidian_ 漢語大詞典 (_HDC_) [^2] and enriched with information from other sources, such as _CBDB_ [^3] and a corpus of _Early Chinese Texts_, see »Loewe corpus« in [^4].
+
+Different sources were parsed to document word categories, words within categories and equivalent/synonymic relationships, mainly:
+- The Erya 爾雅
+- The 
+
+Biographical and bibliographical information
+
+A database of texts, comments and people of Chinese history relevant to the project has been started separately and was then merged into the semantic database.
+
+## References
+
+[^1]: Schalmey, T, 2022: _Computerlinguistische Datierung schriftsprachlicher chinesischer Texte_. Diss., Universität Trier. Forthcoming
+
+[^2]: _HDC_ = Luo Zhufeng 羅竹風, ed., Hanyu da cidian 漢語大詞典, 13 vols., Shanghai 上海: Cishu chubanshe 辭書出版社, 1986–1994
+
+[^3]: CBDB = Fuller, Michael A., ed., China Biographical Database Project, 2017, https://projects.iq.harvard.edu/cbdb
+
+[^4]: Schalmey, T., 2021: “Raw frequency data: Thoughts on "Reliable" Learner's Vocabularies for Classical and Literary Chinese”, https://doi.org/10.5281/zenodo.5638881 
+
+
 TODO

 How the information is added to the database