Persistent Identifier (PID)

In Research data Management (RDM), the term “persistent identifier (PID)” refers to a permanent digital identifier consisting of alphanumeric characters that is assigned to a data set or other digital object and refers directly to it. By entering the PID in a search engine, interested parties can reliably find the corresponding object in a straightforward manner. In addition, any number of metadata can be linked to the persistent identifier and thus to the associated dataset. read more

Duration: 6:50 mins

Content: The following short video from RWTH Aachen University provides a good introduction to the topic of persistent identifiers (PID). You will learn what PIDs are and how they are used in the context of research data management.

Dominik Schmitz, Daniela Hausen, Ute Trautwein-Bruns (2020): Persistent Identifier (PID). RWTH Aachen University. Available at DOI: 10.18154/RWTH-2019-10059.

License: CC BY 4.0, excepted: avatars/illustrations (Shutterstock’s standard image license)

Transcript

Welcome to our series on research data management. Today's topics are persistent identifiers. And the point is, this video explains what a persistent identifier is and what it can be used for. So let's start.

We refer again to our fake science example of squirrel research. Luise Leader is heading a project in regards to squirrel research in Germany and Great Britain, comparing it to her theoretical model. As a first attempt on working with her data she has chosen a small geographical area, the Eifel National Park. It is a little bit tedious but of course possible to select the control points from Germany that belong to this geographical area. And she's plotting her results. Well, that is something strange in this data. Of course there are more squirrels in summer than in winter, but where do all the squirrels in Rurberg come from from May onwards? That's nothing that she can easy explain. So she simply calls Frank Forscher, her partner in Germany. And he immediately knows what the situation is about: "Well, in May we put up another control point. That's the point. It doesn't occur in your data that we've aggregated, but it's clear in my data." So that's clear from here onwards. But the next question Luise has is: "Well, I want to stick with my time period and geographical area. So tell me which of the stations, Rurberg North or South, was the one, that was already available in April?" Frank couldn't answer this question so easy. He has to contact the rangers in the National Park.

In the meantime, the two of them talk about a recent paper, that has Frank Forscher on his desk. It fits perfectly well: „Squirrels in Germany. Why so many?“. Frank simply mentions the DOI of this journal article to Luise and she can immediately find the article herself. And indeed, the article is of interest to her. But much more interesting is: who is Josiah Carberry? Could that be a potential collaborator? Fortunately he has noted his ORCID-ID within the paper. From there it is quite easy for Luise to find out the current affiliation and to get in contact with him. So they have a great telephone call and discuss some issues of squirrel research even though Josiah Carberry is only interested in red-brown squirrels.

Now Luise starts to think: "Well, would a persistent identifier, such as the DOI for journal articles or the ORCID for researchers, would that have helped me with the confusion within this data project?" And indeed it could. If we would replace the names of the location for the control points by some kind of identifier, then we would have the chance to detect quite easily which of the stations were available on April and in May, Rurberg North or South in the earlier slides. And further on we could add any additional information to some separate table that's denoting the ID and then possibly the geographic location by a latitude and longitude, camera version, things like that. And if that has been available, it would have been much easier for Luise Leader to select the control points, that belong to the geographical area of Eifel National Park.

Now, such an ID is something that every researcher can give to data, that he or she produces. Now, what makes an ID persistent? If there's some central or at least community-based registration, that takes care of these ideas, ensures that there's no doubling and ensures that the information is durably or reliably available then we arrive at some persistent identifier. For example, the EPIC consortium is providing identifiers for data, even before data is published.

Now, let's summarize. What is a PID? A persistent identifier is a globally, unique and persistent identification of a, not necessarily digital, resource. What you need is some centralized or world or at least community-wide registration, some organization that is taking care of that and that also takes care of resolving any reference to this persistent identifier.

If you have that available, you can refer your data from for example within your publication. You can name it towards your funders. You can even use it to describe more complex structures within your data - for example you have different versions of your data available and relate them via PIDs. Or you have even more complex structures, where you have a hierarchy of data elements that we refer to via persistent identifiers. That's for example the case with the climate report. Since such a persistent identifier resolves with only one click, in general it is very fast and easy to arrive at the place, where the data is stored, even if it is not yet accessible.

Further on, we avoid the so-called broken links. This is the case because we decouple the reference from storage location so even if the data has been moved from the researcher's laptop to the institutional archive, we still can find the data since the reference stays the same. And finally, you can associate any additional supplementary information with the data simply by adding it to some record as we have done with the control points or as we have the authors, title, year with a DOI.

If you have any more questions on persistent identifiers or need assistance in using them, please contact us via the service desk.

Citation

FAIR Data Austria (2021). “Persistent Identifier (PID)“. In: Research Data Management Open Educational Resources Collection. (https://fair-office.at/index.php/pid/?lang=en)

License: CC BY 4.0 unless otherwise stated.

Persistent Identifier (PID)

Quiz

Further Information

Citation