TransWikia.com

Querying metadata (GDC) using a filter

Bioinformatics Asked by lab on July 10, 2021

I have been trying to access the GDC metadata with python through the filter below. However, this code I have been using always yields a blank tsv file. Any help on fixing the code is appreciated!

import requests
import json

cases_endpt = 'https://api.gdc.cancer.gov/cases'

# The 'fields' parameter is passed as a comma-separated string of single names.
fields = [
    "sample_id",
    "sample_uuid",
    "sample_type",
    "sample_type_id",
    "tissue_type",
    "tumor_code",
    "tumor_code_id",
    "oct_embedded",
    "shortest_dimension",
    "intermediate dimension",
    "longest dimension",
    "is_ffpe",
    "pathology_report_uuid",
    "tumor_descriptor",
    "current weight",
    "initial weight",
    "composition",
    "time_between_clamping_and_freezing",
    "time_between_excision_and_freezing",
    "days_to_sample_procurement",
    "freezing_method",
    "preservation_method",
    "days_to_collection",
    "portions"
    ]

    fields = ','.join(fields)

filters = {
    "op": "in",
    "content": {
        "field": "project_id",
        "value": ["TCGA-BRCA"]
    }
 }

params = {
    "filters": json.dumps(filters),
    "fields": fields,
    "format": "TSV",
    "size": "10000"
    }

response = requests.get(cases_endpt, params=params)

file = open("query10.tsv", "w")
file.write(response.text)
file.close()

One Answer

I think you are using an outdated version of the API, you need to change the "field" with cases.project.project_id

import requests
import json

cases_endpt = "https://api.gdc.cancer.gov/cases"

fields = [
    "sample_id",
    "sample_uuid",
    "sample_type",
    "sample_type_id",
    "tissue_type",
    "tumor_code",
    "tumor_code_id",
    "oct_embedded",
    "shortest_dimension",
    "intermediate dimension",
    "longest dimension",
    "is_ffpe",
    "pathology_report_uuid",
    "tumor_descriptor",
    "current weight",
    "initial weight",
    "composition",
    "time_between_clamping_and_freezing",
    "time_between_excision_and_freezing",
    "days_to_sample_procurement",
    "freezing_method",
    "preservation_method",
    "days_to_collection",
    "portions",
]

fields = ",".join(fields)

filters = {
    "op": "in",
    "content": {"field": "cases.project.project_id",
                "value": ["TCGA-BRCA"]},
}

params = {"filters": json.dumps(filters), 
          "format": "TSV", 
          "size": "10000"}

response = requests.get(cases_endpt, params=params)
content = str(response.content, "utf-8")

with open("query10.tsv", "w") as file:
    file.write(content)

And if you use with open, you don't need to use close.

Correct answer by zorbax on July 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP