TransWikia.com

NCT search in PubMed via Entrez (python)

Bioinformatics Asked by Nutarelli Federico on December 5, 2020

first time I post here so, please, be patient.
So I am currently working with Entrez (Biopython) in order to retrieve the number of articles for a given disease/indication. My data provide both the indication at level 3 (i.e. at ATC level 3) and the New Clinical trial (NCT) identifier. In order to avoid confusion on the search (i.e. for instance inputting "short term insomnia" gives me different results than inputting "short-term insomnia" and so on), I would like to make the search either by mesh term or by NCT id. Further, I would like to do it from 2004 to 2013.
Summarizing: input –> NCT id (as secondary key I guess); Output –> number of articles for that NCT id in 2004, number of articles for that NCT ID in 2005,…, number of articles for that NCT ID in 2013.

For the moment I am focusing on the safest way, which is, using NCT ids as secondary keys, but my code (displayed below) does not seem to perform well:

id_list = ["NCT00714714[SI]","NCT94839294[SI]",..."NCT00714584[SI]"]  #the NCT provided is only an example 
years = [2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]
records = {}
for indication in id_list:
    for year in years:
        records[(indication, year)] = 0
search_results = {}
count={}

for indication in id_list:
    for year in years:
        Entrez.email = "[email protected]"
        #handle = Entrez.efetch(db="pubmed", id=indication, rettype="gb", retmode="xml")
        #record = Entrez.read(handle)
        #abstract=record['PubmedArticle'][0]['MedlineCitation']['Article']
        search_results[(indication, year)] = Entrez.read(Entrez.esearch(db="pubmed",
                                            term=indication,
                                            mindate=year, maxdate=year, datetype="pdat",
                                            usehistory="y"))
        count[(indication, year)] = int(search_results[(indication, year)]["Count"])
        #records[(indication, year)].append(count[(indication, year)])
        records[(indication, year)] = count[(indication, year)]

Can anyone help me on this please?

EDIT:

    from Bio import Entrez
Entrez.email = "[email protected]"
identifier= "NCT00714714[SI]"
handle = Entrez.esearch(db="pubmed", term=identifier, rettype="gb", retmode="xml")
record = Entrez.read(handle)

Also with a simple command as the above, the result is:

{'Count': '0', 'RetMax': '0', 'RetStart': '0', 'IdList': [], 'TranslationSet': [], 'QueryTranslation': '(NCT00714714[SI])', 'ErrorList': {'FieldNotFound': [], 'PhraseNotFound': ['NCT00714714[SI]']}, 'WarningList': {'PhraseIgnored': [], 'QuotedPhraseNotFound': [], 'OutputMessage': ['No items found.']}}

which sounds very strange to me since this is an NCT that I have in my data for Acne in 2008

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP