The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS

csv life sciences STRIDES txt xml

Description

PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The PubMed Central (PMC) Article Datasets include full-text articles archived in PMC and made available under license terms that allow for text mining and other types of secondary analysis and reuse. The articles are organized on AWS based on general license type:

The PMC Open Access (OA) Subset, which includes all articles in PMC with a machine-readable Creative Commons license

The Author Manuscript Dataset, which includes all articles collected under a funder policy in PMC and made available in machine-readable formats for text mining

These datasets collectively span more than half of PMC’s total collection of full-text articles. PMC enables access to these datasets to expand the impact of open access and publicly-funded research; enable greater machine learning across the spectrum of scientific research; reach new audiences; and open new doors for discovery. The bucket in this registry contains individual articles in NISO Z39.96-2015 JATS XML format as well as in plain text as extracted from the XML. The bucket is updated daily with new and updated articles. Also included are file lists that include metadata for articles in each dataset.

Update Frequency

Daily

License

PMC Copyright

Documentation

https://www.ncbi.nlm.nih.gov/pmc/tools/pmcaws

Managed By

National Library of Medicine (NLM)

See all datasets managed by National Library of Medicine (NLM).

Contact

pubmedcentral@ncbi.nlm.nih.gov

How to Cite

NIH NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS was accessed on DATE from https://registry.opendata.aws/ncbi-pmc.

Usage Examples

Tutorials

Resources on AWS

  • Description
    .xml and .txt files with the full-text of articles; .txt and .csv file lists for metadata; all located in a public S3 bucket
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::pmc-oa-opendata
    AWS Region
    us-east-1
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://pmc-oa-opendata/

Edit this dataset entry on GitHub

Tell us about your project

Home