TransWikia.com

Why doesn't an (Entrez eutils) einfo request for "gene" return the link gene_nucleotide or gene_nucleotide_pos links?

Bioinformatics Asked by hepcat72 on September 10, 2020

I’m updating a galaxy tool wrapper for Entrez’s eutils suite and I’m trying to create a form with valid link selections (among other things) based on the "from" & "to" databases to reduce the size of the select list.

Einfo doesn’t return these links for the gene or nuccore databases. I’d found this resource which seemed to show all the possible links, but it it doesn’t show these links either. Yet, in working on elink‘s acheck feature, using some gene IDs in a test run, I saw 2 links that were not listed in either resource:

  • gene_nucleotide
  • gene_nucleotide_pos

I manually checked that I can use those as link names in valid elink queries. It did return valid results (i.e. no error).

So I ran elink on the gene database using a few random UIDs to see if I get back a comprehensive list of links (albeit possibly empty results for those links), but it also was missing these 2 links. I’m concerned that I’m not getting all of the possible links for my form interface and that it could produce valid results and that users may want to link via these links, given prior knowledge that they exist.

How do I get a comprehensive list of database links from Entrez and why doesn’t einfo return all possible links?

One Answer

I've been corresponding with NLM about this issue and I finally took the time to try out their suggestion (which personally I found hard to see between the lines and which is not a discrete solution, but rather a very time-consuming manual process containing false positives because they say that to get a formal and comprehensive response to my query:

Collecting the linkname will be a difficult task, that will take time and coordinate/check with relevant groups maintaining individual databases. Your patience will be greatly apprecired. [sic]

It took me a bit to decipher their suggestion on how to manually find all links, so I will share what I've learned. There is (currently) no codified or formal means to obtain all possible link names and I infer that many links exist simply because they are utilized in the internal workings of their website. I could be wrong about that, but suffice it to say that there are many undocumented links and they change constantly.

You can get a list of filter items using each individual database's advanced search web interface, "most of which should be the linkname from that source database to other target database". (So this should result in all possible links among a series of false positives.)

Here's how you do that. Let's take the gene database as an example:

  1. Go to the advanced search web page for that database, e.g. https://www.ncbi.nlm.nih.gov/gene/advanced
  2. In the "Builder" section, select "Filter"
  3. Click "Show Index List" to the right of the search term field next to the select list where you selected "Filter" (you will see a multi-select list appear below the search field)
  4. Click "Next 200" until you find a selection in that new select list matching the database name followed by "all" and a number in parentheses, e.g. gene all (29994947) and click to select that item enter image description here
  5. Click "Refresh index" at the lower right corner next to the multi-select list enter image description here

The multi-select list will be repopulated and most of the items will be linknames (though you have to replace spaces with underscores in order for those linknames to work with the elink utility).

Doing this for the gene database and then scrolling through just links to nuccore (and its alias "nucleotide" - ignoring est and gss), you will find:

  • gene nuccore
  • gene nuccore clust
  • gene nuccore mgc
  • gene nuccore pos
  • gene nuccore refseqgene
  • gene nuccore refseqrna
  • gene nucleotide
  • gene nucleotide clust
  • gene nucleotide mgc
  • gene nucleotide mgc url
  • gene nucleotide pos

If you look only at the linknames in the documentation or via einfo (by supplying the gene database and looking at similar links to nuccore), you only get these links:

  • gene_nuccore
  • gene_nuccore_mgc
  • gene_nuccore_refseqgene
  • gene_nuccore_refseqrna
  • gene_nucleotide_mgc_url

So I believe that given this information and the ever-changing nature of these links, I believe I will allow the user the option to enter a linkname manually, if the set in the select list I generate via einfo does not contain the link they need.

Correct answer by hepcat72 on September 10, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP