isidore-en.md 40.4 KB
Newer Older
1
---
2
lang: en
3
description: Presentation of ISIDORE, the search engine for discovering publications, digital data and profiles of social sciences and humanities researchers from around the world.
4
5
6
7
---

# ISIDORE

8
## What is ISIDORE?
9

10
ISIDORE is a search engine for discovering and finding publications, digital data and the profiles of researchers in the social sciences and humanities (SSH) from around the world.
11

12
The full text of several million documents (articles, theses and dissertations, reports, datasets, web pages, database records, descriptions of archival holdings, etc.) and event announcements (seminars, conferences, etc.) can be searched. In addition, ISIDORE links these millions of documents together by enriching them with scientific concepts created by SSH research communities.
13

Nicolas Larrousse's avatar
Nicolas Larrousse committed
14
It is accessible on the Web through the portal [isidore.science](https://isidore.science).
15

16
It also offers scientific social network functionalities. As such, it falls into the category of search engines and assistants and offers many features to organize scientific monitoring.
17

18
Launched on December 8, 2010, ISIDORE is the result of a collaboration between the CNRS "very large equipment" Adonis (2007-2013), the Center for Direct Scientific Communication and the companies Antidot, Mondéca and Sword. It is currently developed, updated and operated by the TGIR Huma-Num.
19

20
References on the history of ISIDORE:
21
22

- Yannick Maignien, "ISIDORE, de l'interconnexion de données à l'intégration de services", Hyper Article en Ligne - Sciences de l'Homme et de la Société, [10670/1.k9lck9](https://isidore.science/document/10670/1.k9lck9)
23
- Stéphane Pouyllau et al, "Bilan 2011 de la plateforme ISIDORE et perspectives 2012-2015", MoDyCo, Modèles, Dynamiques, Corpus - UMR 7114, [10670/1.bqexsj](https://isidore.science/document/10670/1.bqexsj)
24
25
- Philippe Bourdenet, "L'espace documentaire en restructuration : l'évolution des services des bibliothèques universitaires", Le serveur TEL (thèses-en-ligne), [10670/1.lnieuv](https://isidore.science/document/10670/1.lnieuv)

26
## How does ISIDORE work?
27

28
ISIDORE harvests textual metadata and full text, enriches them and then indexes them. It uses the metadata of the documents as well as the full text. The goal is to analyze this information in order to enrich the document, to link them to the concepts of the scientific vocabularies (thesaurus, etc.) and to link them to the authors' identifiers (ORCID, IDRef, IDHAL, VIAF, etc.).
29

30
Several enrichments are performed:
31

Nicolas Larrousse's avatar
Nicolas Larrousse committed
32
- Semantic annotation: the words present in the metadata of the documents are compared to the entries of the vocabularies through an algorithm based on a morphological analysis of the terms. If an equivalence is found between a term from the document and an entry in one of the vocabularies, then the resource will be linked to that vocabulary entry. The vocabularies are multilingual and aligned with each other. Thus, the semantic annotation is multilingual.
33

34
- Disciplinary categorization: ISIDORE uses a semantic classifier that, after being trained on a reference corpus, categorizes all documents in ISIDORE into the SSH disciplines of the MORESS vocabulary. The classifier is trained with the help of the manual categorization completed by researchers in [HAL](https://hal.archives-ouvertes.fr/) when depositing their publications.
35

36
- Detection of the authors: ISIDORE detects the authors of the documents and enriches the author form (first name and last name) with the help of international (ORCID, VIAF, ISNI) and national (IDHAL, IDRef) author identifiers.
37

38
ISIDORE indexes, in its search engine:
39

40
41
42
43
44
- Document metadata;
- The full text (if it is available in open access) ;
- The semantic annotations ;
- Disciplinary classification;
- Author enrichment and normalization.
45

Nicolas Larrousse's avatar
Nicolas Larrousse committed
46
More information is available on [the "Vocabularies" page](https://isidore.science/vocabularies) of ISIDORE.
47

48
### Can ISIDORE index multilingual documents and data?
49

50
Yes. Since 2015, documents and datasets in English, Spanish
51
and French are indexed, enriched and linked to scientific repositories by ISIDORE (metadata and full text). For full text outside these three languages, it is indexed in the language of the document but in that cas no enrichment is done.
Nicolas Larrousse's avatar
Nicolas Larrousse committed
52
For more information, you can consult our blog post on the subject: [Isidore speaks English, sino también español et toujours en français](https://humanum.hypotheses.org/921).
53

54
### How often is ISIDORE updated?
55

56
57
ISIDORE is updated, incrementally, on average once a month. Why this delay? In addition to harvesting and indexing documents, ISIDORE enriches them with concepts from scientific repositories (thesauri, taxonomies, etc.). This semantic enrichment is automatic and allows us to offer you suggestions for reading to help you discover documents other than those you were looking for. This requires a certain amount of processing and calculation time.
The updates of the documents with which you are associated, which will thus be proposed in your user account as documents to be claimed, will also be done monthly.
58

59
60
The ISIDORE's change log is available on [https://isidore.science/releases](https://isidore.science/releases).

61
62
63
64
65
66
67
### What is the circuit for adding collections in ISIDORE?

Two scenarios:

- A research project, a team, a laboratory, a library can propose collections to be harvested by simple e-mail to <isidore-sources@huma-num.fr>. The Huma-Num team studies the request and exchanges with the requester in order to fully understand how the metadata and the data to be indexed are described. Most often, a first harvest and a first indexing and enrichment are carried out so that the requester can see and analyze how their data will be indexed in ISIDORE. Then, the exchanges potentially continue to adjust the indexing process as well as possible.

- The Huma-Num team identifies a data warehouse or a digital library and contacts the data producer or the structure that distributes this data to exchange and propose harvesting and indexing in ISIDORE. A first harvesting and a first indexing and enrichment are carried out so that the requester can see and analyze how their data will be indexed in ISIDORE. Then, the exchanges potentially continue to adjust the indexing process as well as possible.
68

69
## How to use ISIDORE?
70

71
ISIDORE offers several tools to search, discover, collect and organize the contents it indexes:
72

73
### The isidore.science portal
74

75
The [isidore.science](https://isidore.science) portal is a website in three languages that provides a [relevance search engine](https://isidore.science) that can be used with several query methods.
76

77
78
79
- By default, ISIDORE searches for all the words in a query posed by the user by removing empty words ("of", "the", "the",  "the", etc.);
- It is possible to search for a document with a complete sentence or a group of words by using quotation marks around the sentence or word group, for example: "direction of consciousness" will search for exactly this expression. Thus, in this case, the "of" will not be considered as an empty word;

80

81
82
83
84
#### Search operators
Several boolean search operators are available in
ISIDORE. Note that the syntax of the operators is important in
ISIDORE, they are always in UPPERCASE (e.g. AND):
85

86
87
- AND: the intersection will find the terms (or set of terms) common to the query.
    For example:
88
    - consciousness AND gender
Nicolas Larrousse's avatar
Nicolas Larrousse committed
89
    - "cold war" AND migration
90
- OR: the union will find the terms belonging to both sets of terms, or to one or the other.
91
92
    For example:
    - "semantic web" OR "web 3.0"
93
- EXCEPT (NOT): the exclusion will reduce the noise by excluding terms. For example:
94
    - revolution NOT French
95
96
97
98
- NEAR(n.): the NEAR(n.) operator (i.e. "close to") will link terms by indicating a value "n." of proximity between them. It works like an AND with n. word(s) between the terms. The value "n." indicates the number of words that separate the two terms. NEAR also works without the value n. and is in this case equal to a NEAR(10), i.e. 10 words between the searched terms (standard spacing).
    - house NEAR(4) nobility : searches for house and nobility with
        a proximity of 4 words

99

100
#### Sorting of search results
101

102
By default, in [isidore.science](https://isidore.science), the results are sorted by semantic relevance. It is possible to change the sorting of the search results to:
103

Nicolas Larrousse's avatar
Nicolas Larrousse committed
104
- sorting by novelty
105
106
107
- sorting by author's name in alphabetical order
- sorting by author's name in reverse alphabetical order
- sort by ascending date
Nicolas Larrousse's avatar
Nicolas Larrousse committed
108
- sorting by decreasing date
109

110
VVery soon, two more options will also be available:
111

Nicolas Larrousse's avatar
Nicolas Larrousse committed
112
- sorting on the title by alphabetical order
113
- sorting on the title by reverse alphabetical order
114

115
### Advanced Search
116

117
118
An advanced search is also available at [https://isidore.science/as](https://isidore.science/as) and also accessible from
the first page of the [portal](https://isidore.science/as).
119

Edward Gray's avatar
Edward Gray committed
120
### Personal space for researchers
121

Nicolas Larrousse's avatar
Nicolas Larrousse committed
122
Isidore.science offers a personal space for researchers allowing them to:
123

124
- collect, classify and organize the documents found;
125
126
- gather all their scientific production in order to edit it in a personal profile page;
- follow the productions of colleagues;
127
- record and publish queries and their results for monitoring purposes;
128
- create bibliographies that can be exported to Zotero.
129

130
### The APIs of isidore.science
131

Edward Gray's avatar
Edward Gray committed
132
The [isidore.science search engine APIs](https://api.isidore.science) are available through the GET method on HTTP or HTTPS.
133
They provide a fast, accurate and reliable query service for ISIDORE data with advanced search features (auto-completion, spell checking, multi-criteria, boolean and faceted searches, sorting, aggregation of answers, etc).
134

135
Each request to the engine is submitted by means of a URI pointing to a specific web service. The response is a stream in XML (default format) or JSON format.
136

Edward Gray's avatar
Edward Gray committed
137
The [isidore.science API web page](https://api.isidore.science) details all the commands available for the different services available.
138

139
### Enriched metadata for *Linked Open Data*.
140

141
ISIDORE's metadata, ontologies and vocabularies are available in a triplet repository [RDF (Resource Description Framework) or *TripleStore*](https://en.wikipedia.org/wiki/Resource_Description_Framework), thus placing ISIDORE data in the *Linked Open Data*. A web interface for querying using the SPARQL language and browsing the ISIDORE graph is available via:
142

143
144
- A documented SPARQL query interface and presentation of the ISIDORE data model: https://isidore.science/sqe  
- The basic Virtuoso software interface: https://isidore.science/sparql
145

146
In the ISIDORE *TripleStore*, the main vocabularies for structuring information are:
147

148
- RDF and RDFS
149
- Dublin Core Element Set
150
- Dublin Core TERMS
151
152
153
154
155
156
157
- SIOC
- FOAF
- OWL
- SKOS
- ORE
- DBPEDIA

158
(The complete list is available at <https://isidore.science/sparql?nsdecl>)
159
160


161
### Complementarity between ISIDORE and Zotero
162

163
#### Use from ISIDORE of the Zotero connector to feed its bibliographic database
164

165
ISIDORE is compatible with Zotero. The references of documents can be imported on two levels as soon as the user has installed [the Zotero connector](https://www.zotero.org/download/) in his browser:
166

167
- On the page listing the results of a search,
168
- On the page listing the results of a search, in the page displaying a document.
169

170
#### Using the ISIDORE search connector from Zotero
171

172
Zotero (Linux, MacOS, Windows client) uses search engines to search or complete bibliographic references directly from the Zotero interface. We propose here two ISIDORE connectors for Zotero that make it possible to use ISIDORE from author search.
173

174
175
176
177
By adding ISIDORE to Zotero you can:

- complete references from a search on the author's name: this is the "ISIDORE, help me find what he/she has published."
- find documents in which the author is cited: this is the "ISIDORE, what do you have on the author?"
178
179


180
These [connectors and installation documentation are available on the TGIR Huma-Num GitLab](https://gitlab.huma-num.fr/spouyllau/ISIDORtero).
181

182
### Use of RSS feeds
183

184
ISIDORE can propose its research results in the form of RSS feeds in order to feed scientific monitoring software (including Zotero for example), research notebooks, etc. The RSS feeds created in ISIDORE are updated, like all the contents of the search engine, approximately once a month during the general update of the ISIDORE contents. Thus, it is possible to follow, from Zotero, the update of the ISIDORE documents resulting from the registered queries.
185

186
To do so, access your personal space (login required), and click "My queries" to see your registered queries:
187

188
![My Image](media/isidore.png)
189

190
For a registered query, you have to click on the pictogram "Request
Edward Gray's avatar
Edward Gray committed
191
RSS feed of the query" available on the right ![My Image](media/isidore-rss-001.png){: style="width:170px"} and to copy the link with ![My Image](media/isidore-requeteRSS.png){: style="width:120px"}.
192

193
The copied link is in the form: `https://isidore.science/feed/lt3913`.
194

195
If your browser is equipped with a module for reading RSS feeds, this link can be used directly in your browser.
196
For our example, we will continue with Zotero.
197

198
199
In Zotero, you have to choose: New feed > From URI:

200

201
![My Image](media/zot-001.png){: style="width:60%;margin-left:20%"}
202

203
Then add the url of the feed provided by ISIDORE (N.B. When using Safari under MacOS, take care to remove the mention "feed:" from
204
205
the url). Then paste it in "URL" of the Zotero RSS feed creation window, example below:

206

207
![My Image](media/zot-002.png)
208

209
Then you have to give a title to your feed, for example:
210
"isidore.science - Query on ...".
211

212
## What can be found in ISIDORE?
213

214
### Organization of documents and data in ISIDORE
215

216
ISIDORE contains several million documents in SSH that are harvested, enriched with scientific references and indexed. They are organized into:
217

218
219
220
- Research documents and data (archives, raw materials, photographs, films, datasets, statistics, etc.), identified in the ISIDORE ontology by: http://isidore.science/class/primaires
- Published documents and data (articles, books, dissertations and theses, reports, etc.), identified in the ISIDORE ontology by: http://isidore.science/class/secondaires
- Scientific events (conferences, study days, etc.), identified in the ISIDORE ontology by: http://isidore.science/class/evenementielles
221
222


223
For a large number of SSH disciplines, ISIDORE makes it possible to search documents coming from the main publication platforms worldwide, as well as a large number of digitized collections from national, university and
224
municipal libraries.
225

226
For advanced search uses, the [ISIDORE advanced search](https://isidore.science/as) offers, for example, the possibility of searching for documents between two dates and by discipline or by collections.
227

Nicolas Larrousse's avatar
Nicolas Larrousse committed
228
The main publication platforms (journals and books) present in ISIDORE are:
229
230
231

- OpenEdition
- Cairn
232
- Perseus
233
234
235
236
237
- Erudit
- Oapen
- Redalyc
- Scielo Books

Edward Gray's avatar
Edward Gray committed
238
The complete list of collections containing publications can be obtained by querying [the ISIDORE Triple Store](https://isidore.science/sqe) with the [following SPARQL request](https://isidore.science/sparql?query=SELECT+*+WHERE+%7B%0D%0A%3Fs+rdf%3Atype+%3Chttp%3A%2F%2Fisidore.science%2Fclass%2FCollection%3E.%0D%0A%3Fs+rdf%3Atype+%3Chttp%3A%2F%2Fisidore.science%2Fclass%2Fpublications%3E.%0D%0A%3Fs+dcterms%3Atitle+%3Ftitre%0D%0A%7D+ORDER+BY+ASC%28%3Ftitre%29&format=text%2Fhtml&debug=on&timeout=0) :
239
240
241
242
243

```
SELECT * WHERE {
 ?s rdf:type <http://isidore.science/class/Collection>.
 ?s rdf:type <http://isidore.science/class/publications>.
244
245
 ?s dcterms:title ?title
} ORDER BY ASC(?title)
246
```
Nicolas Larrousse's avatar
Nicolas Larrousse committed
247
The main digital libraries (municipal, national, etc.) present in ISIDORE are:
248
249

- Gallica
250
- Selene
251
252
253
254
255
256
257
258
259
260
- E-rara
- NuBIS
- Octaviana
- Burgerbibliothek
- Berkeley Library Digital Collections
- Argonnaute
- BNE
- Cornell
- Didόmena

Edward Gray's avatar
Edward Gray committed
261
The complete list of collections containing archival holdings and book collections can be obtained by querying [the ISIDORE Triple Store](https://isidore.science/sqe) with the [following SPARQL request](https://isidore.science/sparql/?default-graph-uri=&query=SELECT+*+WHERE+%7B%0D%0A%3Fs+rdf%3Atype+%3Chttp%3A%2F%2Fisidore.science%2Fclass%2FCollection%3E.%0D%0A%3Fs+rdf%3Atype+%3Chttp%3A%2F%2Fisidore.science%2Fclass%2Fprimaires%3E.%0D%0A%3Fs+dcterms%3Atitle+%3Ftitre%0D%0A%7D+ORDER+BY+ASC%28%3Ftitre%29&format=text%2Fhtml&timeout=0&debug=on) :
262
263
264
265
266

```
SELECT * WHERE {
 ?s rdf:type <http://isidore.science/class/Collection>.
 ?s rdf:type <http://isidore.science/class/primaires>.
267
268
 ?s dcterms:title ?title
} ORDER BY ASC(?title)
269
270
```

271
### Indexing of the main data platforms in SHS
272

273
ISIDORE harvests and indexes the contents of many SSH data platforms, allowing researchers to group all their data in their user profile. We encourage researchers, for their research programs, to use platforms offering open interoperability devices and protocols to present documentary and scientific metadata.
274

275
The main data platforms (sources, archives but also publications) are harvested by ISIDORE.
276

Edward Gray's avatar
Edward Gray committed
277
The complete list of collections can be obtained by querying [the ISIDORE 3store](https://isidore.science/sqe) with the [following SPARQL request](https://isidore.science/sparql/?default-graph-uri=&query=SELECT+*+WHERE+%7B%0D%0A+%3Fs+rdf%3Atype+%3Chttp%3A%2F%2Fisidore.science%2Fclass%2FCollection%3E.%0D%0A+%3Fs+dcterms%3Atitle+%3Ftitre%0D%0A%7D+ORDER+BY+ASC%28%3Ftitre%29%0D%0A&format=text%2Fhtml&timeout=0&debug=on)
278
279
280
281

```
SELECT * WHERE {
 ?s rdf:type <http://isidore.science/class/Collection>.
282
283
 ?s dcterms:title ?title
} ORDER BY ASC(?title)
284
285
```

286
Please feel free to report any new source to us.
287

288
#### Can data deposited and documented in NAKALA be referenced by ISIDORE?
289

290
Yes, data deposited and documented in [NAKALA (the data repository for SSH by Huma-Num)](https://documentation.huma-num.fr/nakala/) can be
291
292
accessible in ISIDORE. NAKALA offers as standard the [OAI-PMH](https://en.wikipedia.org/wiki/Open_Archives_Initiative_Protocol_for_Metadata_Harvesting) interoperability protocol which allows for the harvesting of document metadata, and therefore for referencing, enrichment and indexation by ISIDORE.

293

294
295
However, referencing by OAI-PMH harvesting is not
automatic for the moment, in particular to allow users to prepare and organize their data and
296
data and metadata. To be referenced, simply request by email to be indexed in ISIDORE via <isidore-sources@huma-num.fr>.
297

298
#### How will scientific articles and images deposited in the HAL, HAL-SHS and MédiHAL open archive be accessible in ISIDORE?
299

300
All the files (PDF, illustrations, photographs, audio and video) deposited and documented in the open archive HAL, including HAL-SHS, as well as MédiHAL are automatically referenced in ISIDORE and indexed at the level of their metadata. All these documents and their notices are thus accessible through the various query interfaces of ISIDORE.
301

302
#### Can the data deposited in the Didómena (EHESS) warehouse be referenced by ISIDORE?
303

304
Yes, [Didómena](https://didomena.ehess.fr) (the research data warehouse of EHESS) offers OAI-PMH interoperability. Be careful, harvesting is not automatic. For your collection to be referenced, please provide us with the OAI-PMH access point via <isidore-sources@huma-num.fr>.
305

306
#### Can data deposited in Calames (ABES) be referenced by ISIDORE?
307

308
Yes, descriptions of archival holdings cataloged in [Calames](http://calames.abes.fr) (the catalog of archives and manuscripts of French university libraries) are indexed in ISIDORE. However, the EAD-XML standard, used in Calames, does not always allow an optimal documentary indexing: this mainly concerns the richness of the metadata. This is due to the logic of the EAD-XML standard in the encoding of information in the levels of description of the collections.
309

310
#### Can the data deposited in the Data.sciencespo warehouse be referenced by ISIDORE?
311

312
Yes, the data deposited and documented in [Data.sciencespo](https://data.sciencespo.fr) (Dataverse) offer interoperability in OAI-PMH. They are harvested automatically by ISIDORE.
313

314
#### Can the data deposited in the COCOON platform be referenced by ISIDORE?
315

316
317
Yes, the data deposited and documented in [the COCOON platform](https://cocoon.huma-num.fr) offer interoperability in OAI-PMH. This platform is automatically harvested by ISIDORE.

318

319
#### Can files and documents deposited in the European Zenodo platform be referenced by ISIDORE?
320

321
322
Yes, it is possible for ISIDORE to reference the files and
documents deposited and documented on the platform
323
[Zenodo](https://zenodo.org).
324

325
326
327
328
The referencing is based on the principle of OAI-PMH harvesting on a
set of files and data (and thus their metadata) corresponding to one or more
identifier(s) corresponding to the "communities" identifiers in Zenodo (see https://developers.zenodo.org/#sets).
We can also group several Zenodo identifiers in the same
329
ISIDORE collection, allowing the depositors of several corpora
330
331
deposited in Zenodo to group them in ISIDORE to give them more
visibility.
332

333

334
To add your Zenodo repositories in ISIDORE, [please send us the URL OAI-PMH](mailto:isidore-sources@huma-num.fr?subject=%22Je%20souhaiterai%20faire%20moissonner%20mes%20dépôts%20Zenodo%22)
335
of your repository (see <https://developers.zenodo.org/#oai-pmh>).
336

337
338
339
#### Can files and documents deposited in *Gallica Marque Blanche* platform be referenced by ISIDORE?

Yes, the data deposited and documented in [*Gallica Marque Blanche*](https://www.bnf.fr/fr/gallica-marque-blanche) offer interoperability in OAI-PMH with a dedicated "Set".
340

341
342
343
344
345
#### Can Omeka farm powered by INIST-CNRS be referenced by ISIDORE?

Yes, it is possible for ISIDORE to reference the files and
documents deposited and documented on the [Omeka farm powrered by INIST-CNRS](https://www.inist.fr/realisations/omeka-pour-des-bases-de-donnees-valorisees/).

346
## How do I get data referenced by ISIDORE?
347

348
349
There are several ways to get data and documents referenced by
ISIDORE:
350

351
- Submit your data via [an XML stream of standardized metadata and using the OAI-PMH protocol](#how-to-signal-data-in-isidore-with-metadata-and-the-oai-pmh-protocol) associated with metadata in Dublin core format. This method is adapted for documentary databases, corpora, scientific archives and document/data libraries. As an example, [a tool such as Omeka (Classic or S) offers the OAI-PMH protocol via modules](#a-website-using-omeka-classic-and-omeka-s-can-be-referenced-by-isidore).
352
353
This method is adapted to research program websites presenting document or data corpora, scientific blogs (except Hypotheses.org), and web pages in general.

354

355
These two methods are also often implemented by data publication tools (CMS, etc.), for example:
356

357
### Can a web site using Drupal be indexed by ISIDORE?
358

359
Yes, it is possible to have web pages generated by the Drupal CMS indexed by ISIDORE.
360
There are two ways to do this, depending on the nature of the
361
content of your pages:
362

363
- Either via the OAI-PMH protocol and in this case there are several
Edward Gray's avatar
Edward Gray committed
364
    modules for Drupal, see [OAI-PMH for Drupal](https://www.drupal.org/search/site/OAI-PMH?f%5B0%5D=ss_meta_type%3Amodule).
365
366
367
368
- Or via the use of a Dublin
    Core metadata structure in the web pages generated by Drupal using RDFa and a
    sitemap.xml. An article dedicated to this way of proceeding is
    available at the above address.
369

370
### Can a website using Omeka Classic and Omeka-S be referenced by ISIDORE?
371

372
Yes, Omeka *Classic* and Omeka S offer modules to expose metadata according to the OAI-PMH protocol:
373

374
375
- Module for [Omeka S](https://omeka.org/s/modules/OaiPmhRepository/)
- Module for [Omeka Classic](https://omeka.org/classic/docs/Plugins/OaiPmhRepository/)
376
377


378
### How to report data in ISIDORE with metadata and OAI-PMH protocol?
379

380
To report your data in ISIDORE using the
381
OAI-PMH protocol, you just have to:
382

383
384
385
- Prepare your data and metadata using the
    Documentary vocabulary Dublin Core Element Set or Dublin Core
    Terms, depending on the level of precision you want, and to
386
    make them accessible via [the OAI-PMH protocol](https://en.wikipedia.org/wiki/Open_Archives_Initiative_Protocol_for_Metadata_Harvesting);
387
- To organize and document the *Sets* in its OAI-PMH repository.
388
- To write to <isidore-sources@huma-num.fr> and give the address of the repository to Huma-Num.
389

390
#### Document sets in OAI-PMH: *Sets*
391

392
The OAI-PMH protocol makes it possible, through the creation of *Sets*, to bring together a coherent set of records whose perimeter makes sense from a scientific or editorial point of view and which is left to the discretion of the producer of the data.
393

394
It also makes it possible to define a hierarchy in the *Sets* with an inheritance mechanism by specifying
395
in the set name the name of the parent *Set* and the child *Set*,
396
397
separated by the `:` character. ISIDORE is able to use these
*Sets* to limit harvesting to a set of records or to differentiate between different
398
data sources within the same warehouse.
399
400
The producer will therefore have to specify the harvesting methods that seem to be
appropriate in order to make the most of their resources within ISIDORE.
401
402
403
404
To do this, he must indicate the *Set* or *Sets*
concerned or a rule enabling the *Sets* to be taken into
account to be distinguished.

405

406
The *Sets* can present metadata, in the Dublin Core Element Set, which are specific to them. For example:
407

Edward Gray's avatar
Edward Gray committed
408
```xml
409
410
<set>
 <setSpec>OuvColl</setSpec>
411
 <setName>OuvColl</setName>
412
413
 <setDescription>
  <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
414
   <dc:description>Research works distributed on Cairn.info</dc:description>
415
416
417
418
419
  </oai_dc:dc>
 </setDescription>
</set>
```

420
#### Records in OAI-PMH or *Records*:
421

422
423
424
In the ISIDORE framework, each OAI-PMH "record" corresponds to a document.
The ISIDORE harvester thus exploits the metadata described according to the
application profile defined by the Open Archive Initiative for the
425
Dublin Core Element Set (also known as Dublin Core "simple").
426
In addition, the harvester also collects the full-text document(s) whose URLs
427
428
429
430
431
432
 (beginning with `https://` or `http://`) are specified in the
`<dc:identifier>` element.

We recommend data producers to provide records that are as metadata-rich as possible
 since relevance in
ISIDORE favors the richest possible metadata. Fields such as:
433
434


Edward Gray's avatar
Edward Gray committed
435
```xml
436
437
438
439
440
<dcterms:description>
<dcterms:creator>
<dcterms:date>
```

441
are essential.
442

443
##### Example of a complete record according to the OAI-PMH protocol:
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459

```xml
<record>
 <header>
  <identifier>oai:halshs.archives-ouvertes.fr:halshs-00514304</identifier>
  <datestamp>2010-09-02T11:06:50Z</datestamp>
  <setSpec>halshs</setSpec>
  <setSpec>SHS:ECO</setSpec>
  <setSpec>SDV:BIO</setSpec>
  <setSpec>INFO:INFO_BT</setSpec>
  <setSpec>SDV:SA:AEP</setSpec>
  <setSpec>SDV:SA:STA</setSpec>
  <setSpec>CIRAD</setSpec>
  <setSpec>SHS</setSpec>
 </header>
 <metadata>
460
  <oai_dc:dc xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
461
462
463
464
465
  <dc:identifier>http://halshs.archives-ouvertes.fr/halshs-00514304/en/ </dc:identifier>
  <dc:identifier>http://halshs.archives-ouvertes.fr/docs/00/51/43/98/PDF/Regulation_GMO_pprint.pdf</dc:identifier>
  <dc:identifier>http://halshs.archives-ouvertes.fr/docs/00/51/43/98/PDF/ppt_nocmt_broader_regulation.pdf </dc:identifier>
  <dc:title>Broadening the scope of regulation: a prerequisite for a positive contribution of transgenic crop useto sustainable development</dc:title>
  <dc:creator>Fok, Michel</dc:creator>
466
  <dc:subject>[SHS:ECO] Humanities and Social Sciences/Economy and finances</dc:subject>
467
468
  <dc:subject>[SDV:BIO] Life Sciences/Biotechnology</dc:subject>
  <dc:subject>[INFO:INFO_BT] Computer Science/Biotechnology</dc:subject>
469
  <dc:subject>[SDV:SA:AEP] Life Sciences/Agricultural sciences/Agriculture, economy and politics</dc:subject>
470
471
472
473
474
475
476
477
478
479
  <dc:subject>[SDV:SA:STA] Life Sciences/Agricultural sciences/Sciences and technics of agriculture</dc:subject>
  <dc:subject>regulation</dc:subject>
  <dc:subject>coordination</dc:subject>
  <dc:subject>GMO</dc:subject>
  <dc:subject>biotechnology</dc:subject>
  <dc:subject>seed price</dc:subject>
  <dc:subject>research</dc:subject>
  <dc:subject>weed resistance</dc:subject>
  <dc:subject>pest complex shift</dc:subject>
  <dc:description>Ex-ante regulation of transgenic crop use generally prevails, before the authorization of commercial release.This kind of regulation addresses the concerns of biosafety and coexistence, under pressure of pros and/or cons of GMO. After fifteen years of large scale use of transgenic crops (notablysoybean and cotton) in various countries (USA, China, Brasil, India...), ecological and economic phenomena are observed and which could threaten the sustainable use of transgenic varieties. I advocate that the regulation scope must be extended so as to a) promote a systemic and coordinatedapproach of transgenic crop use, b) ensure seed purity with regard to the transgenic trait, c) maintain research on non-transgenic varieties, and d) warrant fair pricing of transgenic seeds.</dc:description>
480
  <dc:coverage>Montpelier</dc:coverage>
481
482
483
484
485
486
487
488
489
490
491
  <dc:coverage>France</dc:coverage>
  <dc:date>2010-08-29</dc:date>
  <dc:language>English</dc:language>
  <dc:type>proceeding with peer review</dc:type>
  <dc:source>Proceedings of Agro2010, the XIth ESA Congress</dc:source>
  <dc:source>Agro2010, the XIth ESA Congress</dc:source>
 </oai_dc:dc>
</metadata>
</record>
```

492
493
In addition to this description in the *Dublin Core Element Set*, each
record can be described in one or more metadata formats, the choice of which is left to the
494
the administrator of the OAI-PMH warehouse.
495

496

497
498
The ISIDORE harvester is able to use the *Dublin Core Terms* format and any XML schema allowing
full-text exposure (including TEI or EAD) thus improving its indexing.
499
500
The data producer will have to take care to scrupulously respect the specifications of the OAI-PMH protocol version 2.0, in particular as regards:

501

502
503
504
- The strict respect of the "datestamp" values in the *records* in order to synchronize the updates between the producer and ISIDORE;
- The good management of deleted data ([detail on the OAI-PMH protocol documentation](http://www.openarchives.org/OAI/openarchivesprotocol.html#DeletedRecords));
- In the case of a publisher's data warehouse or one of significant size, access to its OAI-PMH warehouse via the IP addresses of ISIDORE's OAI-PMH harvesters (harvesting reported by ISIDORE to its IT department).
505

506
We advise producers to regularly validate the compliance of their repository using, for example, the [tools of the Open archive initiative](https://www.openarchives.org/pmh/tools/). Finally, we advise data producers to contact the Huma-Num team for any information requests.
507

508
### How to report data in ISIDORE with RDFa metadata?
509

510
RDFa can express a metadata structure according to the principles of the Semantic Web (RDF for *[Resource Description Framework](https://en.wikipedia.org/wiki/Resource_Description_Framework)*) in the HTML code of Web pages. The "a" in RDFa stands for "in
511
attributes", i.e. within the HTML code).
512

513
How to express metadata of a web page very simply by
514
515
516
using the [RDFa syntax](https://tcuvelier.developpez.com/tutoriels/web-semantique/rdfa/introduction/)
? For example, in a blog post published with WordPress. While there
exist [plugins to do this](https://wordpress.org/plugins/search/RDFa/),
517
the obsolescence of the latter can make it difficult to maintain them
518
519
over time. Another solution is to implement RDFa in the
HTML code of the WordPress theme you have chosen. For this to be easy
520
521
522
and manageable over time, the simplest way is to use the HTML header
in order to place `<meta>` tags that will contain some metadata.

523
524

Expressing metadata according to the RDF model via the RDFa syntax allows
525
machines (mainly search engines and indexers) to better process information because it becomes more explicit: for a machine, a string can be a title or a summary; if you don't tell the machine that it's a title or a summary it
526
527
will not guess it. So, at the very least, it is possible to use the
tags to define an RDF structure that allows you to structure the minimal metadata
528
529
for example with the Dublin Core Element Set.

530
531
532
533
534

#### How to do it practically?

First of all, it is necessary to indicate in the DOCTYPE of the web page, that it will
contain information that will use the RDF model, so the
535
DOCTYPE will be:
536

Edward Gray's avatar
Edward Gray committed
537
```xml
538
539
540
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
```

541
The `<html>` tag must contain the addresses of the
542
ontology (via their *NameSpace XML*) which are used
543
to "type" the information. RDFa - which places metadata in the Semantic Web, requires at least the use of RDF and RDF Schema ontologies and the Dublin Core Element Set (dc). It is possible to use in addition - in order to refine the metadata - the Dublin Core Terms (dcterms):
544

545

Edward Gray's avatar
Edward Gray committed
546
```xml
547
548
549
550
551
552
553
<html xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/">
```

554
To encode more information, It is possible to use more
555
document ontologies:
556

557

Edward Gray's avatar
Edward Gray committed
558
```xml
559
560
561
562
563
564
565
566
567
568
569
<html
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:cc="http://creativecommons.org/ns#">
```

570
IIn the example above, [foaf](http://www.foaf-project.org/) is used to encode information about a person or object described by the metadata. The [CC](https://creativecommons.org) ontology is used to indicate which license, from the *Creative Commons*, applies to this content.
571

572
The RDFa structure through tags
573
574
575
in the `<head>` header of the HTML page. In a first step
, using a `<link>` tag, we will define the digital object to which the
RDF encoded information will be attached:
576

Edward Gray's avatar
Edward Gray committed
577
```xml
578
579
580
<link rel="dc:identifier" href="http://monblog.com/monbillet.html" />
```

581
582
This tag defines a container for the information that we are going to indicate using the `<meta>` tags. This container is
identified by a URI which is a URL, i.e. the address of the
583
page on the web.
584
585


586
The `<meta>` tags then define a set of metadata, which in our case is descriptive information about the blog post's web page:
587

Edward Gray's avatar
Edward Gray committed
588
```xml
589
590
591
<meta property="dc:title" content="The title of my post" />
<meta property="dc:creator" content="First name Last name of author 1" />
<meta property="dc:creator" content="First name Last name of author 2" />
592
<meta property="dcterms:created" content="2011-01-27" />
593
<meta property="dcterms:abstract" content="A descriptive summary of my page's content" xml:lang="en" />
594
<meta property="dcterms:abstract" content="A summary in english" xml:lang="en" />
595
596
<meta property="dc:subject" content="keyword 3" />
<meta property="dc:type" content="ticket" />
597
<meta property="dc:format" content="text/html" />
598
<meta property="dc:relation" content="A link to a complementary web page" />
599
600
```

601
Depending on the nature of the content of the web page, it is of course possible
602
to be more precise, more refined and more complete in the
603
604
encoded information. For example, it would be wise to use the DC Terms vocabulary.

605

606
The DC Terms allow, for example, a precise form for a bibliographic reference of the content to be included:
607
608


Edward Gray's avatar
Edward Gray committed
609
```xml
610
<meta property="dcterms:bibliographicCitation" content="Put a bibliographic reference here" />
611
612
```

613
It would be possible to describe the entire text of a web page using the SIOC vocabulary [using the
614
property](http://www.lespetitescases.net/rdfaiser-votre-blog-2-la-pratique).
615

616
617
618
It is also possible to link web pages together (to
define a corpus of authors for example) by using in the
DC Terms vocabulary the DC Terms property: `dcterms:isPartOf`.
619
620

```xml
621
<meta property="dcterms:isPartOf" content="URL of another web page" />
622
623
```

624
#### Creating the Sitemap
625

626
627
628
Once the RDFa encoding has been done in the HTML pages, you still need to create
a Sitemap XML file listing the pages you want ISIDORE to harvest and to submit the URL of this sitemap:

629

Edward Gray's avatar
Edward Gray committed
630
```xml
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
	<url>
		<loc>http://monsiteweb.com/</loc>
		<lastmod>2018-01-01</lastmod>
		<changefreq>monthly</changefreq>
		<priority>1.0</priority>
	</url>
	<url>
		<loc>http://monsiteweb.com/page1/</loc>
		<lastmod>2018-03-05</lastmod>
		<changefreq>weekly</changefreq>
		<priority>0.5</priority>
  </url>
</urlset>
```

647
648
649
650
651
652
653
654
655
656
It is possible to test the extraction that ISIDORE will do of your
RDFa metadata using the "ISIDORE on demand" application
available at <https://rd.isidore.science/ondemand/fr/rdfa.html>

## ISIDORE perimeter

### Why are some items not found in ISIDORE?

If you do not find all of your scientific production
in [ISIDORE](https://isidore.science), there may be several
657
explanations. It may be that your articles are published in
658
journals that are not electronic or that do not make their articles available even
659
 long after they have been published. Since its
660
creation, [ISIDORE](https://isidore.science) favors open
661
access since indexing is better for articles available in
662
open access. Many electronic journals have made this choice through
663
portals such as Open Edition Journal (formerly Revues.org)
664
Érudit, Persée, and Cairn.info, Redalyc, OApen and
665
and articles from these journals are therefore collected and indexed by
666
667
[ISIDORE](https://isidore.science).

668
669
It is also possible that your articles are published online, but not
on an electronic publishing platform (but a website), or on an electronic publishing platform
670
that does not allow indexing via the standard protocol
671
(see the question and answer on OAI-PMH).
672

673
Other journals make their articles available, but only after an
674
embargo period. In this case,
675
676
[ISIDORE](https://isidore.science) indexes only the metadata
of the article. If you connect via your university library
677
, documentation center or via BibCNRS,
678
you may still have access to these articles.
679

680
681
The collections indexed by
[ISIDORE](https://isidore.science) can be searched by using the engine itself and by
682
indicating that you want to search the collections.
683

684
It is also possible that your article is published as a PDF image,
685
in which case only the indexing by
Edward Gray's avatar
Edward Gray committed
686
[ISIDORE](https://isidore.science) will be allowed, but not its
687
full text indexing.
688

689
Lastly, it is possible that some of your articles are published in
Nicolas Larrousse's avatar
Nicolas Larrousse committed
690
journals that are not classified in SSH.
691

692
In all these cases, you can deposit your articles in an
693
open archive such as HAL (HAL-SHS in particular) which is also indexed by
Edward Gray's avatar
Edward Gray committed
694
[ISIDORE](https://isidore.science) or contact your
695
bu/documentation center.
696

697
If none of these cases correspond to your problem and you therefore think that there may be an error, you can send us an e-mail to isidore@huma-num.fr.
698

699
### Why are some books/chapters of books not reported in ISIDORE?
700

701
702
ISIDORE knows how to identify that a document is of the type "book", thus, there are
more than 500,000 books and book chapters reported in
703
704
ISIDORE.

705
706
It should be noted that there are relatively few platforms that publish online books in open access. ISIDORE indexes in SSH, for example, the
contents of book platforms such as:
707

708
709

- [OpenEdition Books](https://isidore.science/search/?collection=10670/3.szxq6s) (at the chapter level, and to flag them);
Edward Gray's avatar
Edward Gray committed
710
- [Scielo Books](https://isidore.science/search/?collection=10670/3.7oraz1) (Brazil);
711
- [OApen](https://isidore.science/search/?collection=10670/3.pwofj8) (Netherlands);
712
- [Erudit](https://isidore.science/s/collection?q=erudit) (Canada);
713
- ...
714

715
In addition, you can, in agreement with your publisher, deposit your work or
716
book or book chapters in the open archive
Edward Gray's avatar
Edward Gray committed
717
[HAL-SHS](https://halshs.archives-ouvertes.fr). It will then be indexed by
718
719
ISIDORE within the framework of the indexing of HAL-SHS and recognized as a book chapter.

720

721
### Why are some databases are not reported in ISIDORE?
722

723
Harvesting by ISIDORE requires standardized and normalized metadata exposure (documentary, scientific, etc.) (either using the OAI-PMH protocol or using an XML Sitemap and RDFa metadata, see above).
724

725
If you know of any databases that are not present in ISIDORE, please inform us so that we can check with their publishers/data producers.
726

727
## ISIDORE training courses
728

729
Here we list training courses, functional presentations and online self-training courses on the use of ISIDORE. Do not hesitate to let us know about any training session you would like to organize:
730

731
- The *Urfist Méditerranée* proposes a new e-learning training on Isidore (only in french)](https://urfist.univ-cotedazur.fr/nouvelle-formation-en-ligne-une-initiation-a-isidore/) (March 2021)
732
- ["Isidore, my personal research assistant"](https://ig.hypotheses.org/2215) by Johanna Daniel (April 2020)