TransWikia.com

Given a DOI, how can I programmatically obtain all the author affiliations?

Academia Asked on October 21, 2021

Given a DOI, how can I programmatically obtain all the author affiliations? The coding part isn’t the issue, but finding a proper database/API is.

E.g. for DOI 10.1186/s12920-019-0598-0, the author affiliations are:

  1. Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9 Canada
  2. Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9 Canada
  3. School of Mathematics and Statistics, Hainan Normal University, Haikou, 571158 China

2 Answers

https://github.com/pybliometrics-dev/pybliometrics The example on the github page is quite close to what you want to do.

Answered by BND on October 21, 2021

First of all, the main sources of citation data are:

  • Proprietary data sources:
    • Google Scholar
    • Scopus
    • Web of Science (WoS)
  • Open access data:
    • Crossref
    • MEDLINE (focusing on medical papers)

Some papers compare the comprehensiveness between these different sources, e.g. see {1,2}.


To extract the author affiliations given a DOI, a few options (search for "affiliations" on the links below) :

  1. https://support.datacite.org/docs/api-get-doi
  2. https://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html (MEDLINE database https://www.nlm.nih.gov/bsd/medline.html)
  3. https://github.com/CrossRef/rest-api-doc suggested by Anyon.

For option 3 (CrossRef API), one can use the https://github.com/CrossRef/rest-api-doc API via the Python library https://gitlab.com/crossref/crossref_commons_py:

# If testing in Docker
docker run --interactive --tty ubuntu:18.04 bash
apt update; apt install -y git nano wget htop python3 python3-pip unzip

# Requirements
pip3 install crossref-commons

# Python code
import crossref_commons.retrieval
crossref_commons.retrieval.get_publication_as_json('10.5621/sciefictstud.40.2.0382')  # affiliations are empty
crossref_commons.retrieval.get_publication_as_json('10.1148/radiol.2018180887')       # affiliations are present

though it seems that quite often authors have no affiliations on CrossRef.

My guess is that MEDLINE (option 2) has more thorough metainformation (I based my guess given the information I see on the PubMed website, which relies on MEDLINE database, e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6936069/ has author affiliations, but not crossref_commons.retrieval.get_publication_as_json('10.1186/s12920-019-0598-0'), even though 10.1186/s12920-019-0598-0 = PMC6936069. Anyon's comment also questions CrossRef's comprehensiveness for the author affiliations field). The MEDLINE database can either be downloaded or accessed via API (https://www.ncbi.nlm.nih.gov/home/develop/api/). See https://stackoverflow.com/a/62974197/395857 on how to access the MEDLINE database in Python.


If one cannot find the affiliation in the metainformation and if the PDF can be obtained from the DOI, one could use PDF-to-text extraction programs for scientific papers, such as:


To test pybliometrics that BND refers to in their answer:

# If testing in Docker
docker run --interactive --tty ubuntu:18.04 bash
apt update; apt install -y git nano wget htop python3 python3-pip unzip

# Install and configure pybliometrics
pip3 install pybliometrics
from pybliometrics.scopus.utils import config
config['Authentication']['APIKey'] = '' # Enter Elsevier API key obtained on http//dev.elsevier.com/myapikey.html

# Retrieve author affiliations
from pybliometrics.scopus import AbstractRetrieval
ab = AbstractRetrieval("10.1016/j.softx.2019.100263")
from pybliometrics.scopus import AuthorRetrieval
au1 = AuthorRetrieval(ab.authors[0].auid)
print(au1.affiliation_current)

Unfortunately pybliometrics relies on Elsevier Scopus's API, which isn't free: some institutions have subscribed to it, but fewer and fewer are willing to feed the Elsevier sharks.


References:

Answered by Franck Dernoncourt on October 21, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP