commerce computer vision deep learning graph information retrieval internet machine learning natural language processing
A collection of 51,701 product pages from 8175 e-commerce websites across 8 markets (US, GB, SE, NL, FI, NO, DE, AT) with 5 manually labelled elements, specifically, the product price, name and image, add-to-cart and go-to-cart buttons. The dataset was collected between 2018 and 2019 and is made available has MHTML and as WebTraversalLibrary-format snapshots.
The dataset is not expected to update frequently.
CC BY-NC-SA
https://github.com/klarna/product-page-dataset
Web Automation Research, Klarna
See all datasets managed by Web Automation Research, Klarna.
https://github.com/klarna/product-page-dataset/issues, stefan.magureanu@klarna.com, riccardo.risuleo@klarna.com, alexandra.hotti@klarna.com
The Klarna Product-Page Dataset was accessed on DATE
from https://registry.opendata.aws/klarna_productpage_dataset.
arn:aws:s3:::klarna-research-public-datasets/
eu-west-1
aws s3 ls --no-sign-request s3://klarna-research-public-datasets/