The Klarna Product-Page Dataset

commerce computer vision deep learning graph information retrieval internet machine learning natural language processing

Description

A collection of 51,701 product pages from 8175 e-commerce websites across 8 markets (US, GB, SE, NL, FI, NO, DE, AT) with 5 manually labelled elements, specifically, the product price, name and image, add-to-cart and go-to-cart buttons. The dataset was collected between 2018 and 2019 and is made availalbe has MHTML and as WebTraversalLibrary-format snapshots.

Update Frequency

The dataset is not expected to update frequently.

License

CC BY-NC-SA

Documentation

https://github.com/klarna/product-page-dataset

Managed By

Web Automation Research, Klarna

See all datasets managed by Web Automation Research, Klarna.

Contact

https://github.com/klarna/product-page-dataset/issues, stefan.magureanu@klarna.com, riccardo.risuleo@klarna.com, alexandra.hotti@klarna.com

How to Cite

The Klarna Product-Page Dataset was accessed on DATE from https://registry.opendata.aws/klarna_productpage_dataset.

Resources on AWS

  • Description
    Bucket containing the two datasets (one in the MHTML and one in the WTL snapshot formats) as tar-balls.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::klarna-research-public-datasets/
    AWS Region
    eu-west-1
    AWS CLI Access (No AWS account required)
    aws s3 ls s3://klarna-research-public-datasets/ --no-sign-request

Edit this dataset entry on GitHub

Home