Entrez Direct: E-utilities on the UNIX Command Line
Entrez Direct: E-utilities on the UNIX Command Line
Install
|
or, you can using Bioconda
|
Cookbook
Origin: NCBI-Hackathons (Archive)
Best Practices for EDirect:
- Please keep to <50,000 expected hits (it simply won’t work)
- Please do not run from multiple processors on a compute farm
- Update to latest version
For more information and documentation on EDirect, please see:
- Entrez Direct: E-utilities on the Unix Command Line
- Insiders Guide to Accessing NLM Data: EDirect Overview
All items below come with no explicit or implicit warranty.
All code is as-is and produced for the bioinformatics community, from the bioinformatics community.
EDirect Scripts
1. Protein
Description (optional):
Written by: Peter Cooper
Confirmed by: Ben Busby
Databases: Taxonomy
|
backs to:
|
2. taxids of taxonomy
Description (optional): Note: Options for parsing nodes.dmp from NCBI Taxonomy are cited in issue #25, intentionally left open
Written by: Scott McGinnis (11/17/2017)
Confirmed by:
Databases: Taxonomy
|
backs to:
|
3. SRA from BioProject
Description: Given an SRA Run ID (e.g. SRR532256) that is a member of a BioProject that has additional runs, retrieve all the other run IDs. This is a variant of the BioProject call below.
Written by: Rob Edwards (1/11/2018)
Confirmed by:
Databases: SRA, BioProject
|
backs to SRA list:
|
3.1 Get all SRA for a BioProject
Description (optional):
Written by: Bob Sanders (3/22/2017)
Confirmed by:
Databases: SRA, BioProject
|
backs to SRA list:
|
3.2 Get latitiude and longitude for SRA Datasets (e.g. outbreaks and metagenomes)
Description (optional):
Written by: BB, Mike D, Rob Edwards (4/12/2017)
Confirmed by:
Databases: SRA, BioSample
|
backs to:
|
returns nothing - -
3.3 SRA sizes
Description (optional): This retrieves the SRR id and the size in bp of the run from a file (ids.txt
) of SRR IDs. You can also change bases
to size_MB
to get the size of the dataset in MB. Question: Does the size in MB include the sequence identifiers (i.e. the size of the file) or just the sequences?
Written by: Rob Edwards (7/6/2017)
Confirmed by:
Databases: SRA
|
backs to:
|
4 Gene
4.1 Gene Aliases
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: gene
|
4.2 Genomic.fa Download
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by: Peter Cooper (NCBI) and Wayne Matten (NCBI) (12/29/2016, v6.00)
Databases: assembly
|
|
backs to:
|
4.3 organellar contigs from genbank
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore
|
4.4 Get protein sequences from nucleotide accessions
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore, protein
|
4.5 taxonomy (KPCOFG) for taxids
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: taxonomy
|
backs to:
|
5 Obtain UniProt IDs from gene symbols
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: gene, protein
|
6. Taxon IDs from genome accession numbers
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore
|
7. Convert article DOI to PMID
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by: Mike Davidson (NLM) (12/16/2016, v5.80)
Databases: pubmed
|
backs to:
|
8. Access organism specific meta-data from NCBI genome database
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: genome, bioproject
|
9. Get the status of records from PubMed search
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by: Mike Davidson (NLM) (12/16/2016, v5.80)
Databases: pubmed
|
backs to:
|
9.1 Conduct a PubMed search and retrieve the results as a list of PMIDs
Description (optional):
Written by: Mike Davidson (2/22/2017)
Confirmed by: Mike Davidson (NLM) (2/22/2017, v6.30)
Databases: pubmed
|
backs to:
|
10. Sort the hits by sequence length in nucleotide database
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore
|
11. Getting meta data from assembly
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: assembly
|
12. Fetch HSPs from a BLAST hit in FASTA
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: nuccore
|
13. Get all Gene Ontology IDs for a given protein accession
Description (optional):
Written by: NCBI Folks (12/14/2016)
Confirmed by:
Databases: protien, biosystems
|
14. Get the ten most frequently-occurring authors for a set of articles
Description (optional): Searches PubMed for the string “traumatic brain injury athletes”, restricts results to those published in 2015 and 2016, retrieves the full XML records for each of the search results, extracts the last name and initials of every author on every record, sorts the authors by frequency of occurrence in the results set, and presents the top ten most frequently-occurring authors, along with the number of times that author appeared.
Written by: Mike Davidson (NLM) (12/15/2016)
Confirmed by: Mike Davidson (NLM) (12/16/2016)
Databases: pubmed
|
15. Get the ten funding agencies who are most active in funding articles on a particular topic
Description (optional): Searches PubMed for the string “diabetes AND pregnancy”, restricts results to those published in 2014 through 2016, retrieves the full XML records for each of the search results, extracts the funding agencies for every grant on every record, sorts the agencies by frequency of occurrence in the results set, and presents the top ten most frequently-occurring agencies, along with the number of times that agency appeared.
Written by: Mike Davidson (2/17/2017)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed
|
16. Look up the publication date for thousands of PMIDs (option one)
Description (optional): Takes a file which contains a list of PMIDs (table_of_pubmed_ids) and uses cat
to access the contents of the file, epost
to post the PMIDs to the history server, efetch
to retrieve the records and xtract
to extract PMID and Publication Date.
Written by: NCBI Folks (12/15/2016)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed
|
17. Look up the publication date for thousands of PMIDs (option two)
Description (optional): Takes a file which contains a list of PMIDs (table_of_pubmed_ids) and epost -input
to access the contents of the file and post the PMIDs to the history server, efetch
to retrieve the records and xtract
to extract PMID and Publication Date.
Written by: Mike Davidson (2/17/2017)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed
|
18. Find the first author for a set of PubMed records
Description (optional): Outputs the PMID and first author’s last name and initials for one or more PubMed records
Written by: Mike Davidson (2/17/2017)
Confirmed by: Mike Davidson (NLM) (v6.30, 2/17/2017)
Databases: pubmed
|
19. Find the first author and any other authors who contributed equally for a set of PubMed records
Description (optional): Outputs the PMID and first author’s last name and initials for one or more PubMed records. If the record indicates equal contributors to the first author, the last name and initials for all equal contributors will also be output, separated by commas.
Written by: Mike Davidson (10/27/2017)
Confirmed by: Mike Davidson (NLM) (v7.40, 10/27/2017)
Databases: pubmed
|
20 Download GEO Data from a BioProject Accession
Description (optional):
Written by: NCBI Folks (12/16/2016)
Confirmed by:
Databases: gds
|
21 Extract all MeSH Headings from a given PMID
Description (optional): Retrieves the PMID of a PubMed record, followed by a pipe-delimitted list of MeSH Descriptors for a PMID.
Written by: Mike Davidson (10/02/2017)
Confirmed by: Mike Davidson (NLM) (v7.30, 10/02/2017)
Databases: pubmed
|
Extract all MeSH Headings and Subheadings from a given PMID
Description (optional): Retrieves the PMID of a PubMed record, followed by a pipe-delimitted list of MeSH Descriptors and Qualifiers for a PMID. Each Descriptor is followed by any attached qualifiers, separated by “/”.
Written by: Mike Davidson (10/02/2017)
Confirmed by: Mike Davidson (NLM) (v7.30, 10/02/2017)
Databases: pubmed
|
Search for articles by authors affiliated with a specific institution by matching two partial affiliation strings.
Description (optional): Searching PubMed for two affiliation strings ANDed together (e.g. “translational medicine[AD] AND thomas jefferson[AD]”) will retrieve all records that have both strings listed somewhere in the record’s Affiliation data, but does not require both strings be listed on the same author’s affiliation. To generate a list of PMIDs where both strings are present in the same affiliation element, use the following script.
Written by: Mike Davidson (4/2/2018)
Confirmed by: Mike Davidson (NLM) (v8.10, 4/2/2018)
Databases: pubmed
|
Search for PMC articles citing a gived PubMed articler; retrieve title, source, ID
Description: Retrieve information about all PMC articles (wihich have free fulltext available) which cite a gived PubMed article
Written by: Lukas Wagner (08/16/2018)
Databases: pubmed, pmc
|
Xtract
Exp1: aquire infor inner a tag
|
SRA507436 SRX2439829 SRP095511 SRS1874418 SRR5125027 SRA507436 SRX2439828 SRP095511 SRS1874418 SRR5125026 SRA507436 SRX2439827 SRP095511 SRS1874418 SRR5125025 SRA507436 SRX2439826 SRP095511 SRS1874418 SRR5125024
Example of single Runs
|
As a result, we can aquire total spots with @total_spots
Entrez Direct: E-utilities on the UNIX Command Line