The Registry of Open Data on AWS is now available on AWS Data Exchange
All datasets on the Registry of Open Data are now discoverable on AWS Data Exchange alongside 3,000+ existing data products from category-leading data providers across industries. Explore the catalog to find open, free, and commercial data sets. Learn more about AWS Data Exchange

Gretel Synthetic Safety Alignment Dataset

ai safety machine learning natural language processing synthetic data

Description

A comprehensive dataset designed for aligning language models with safety and ethical guidelines. Contains 8,361 curated triplets of prompts, responses, and safe responses across various risk categories. Each entry includes safety scores, judge reasoning, and harm probability assessments, making it valuable for model alignment, testing, and benchmarking.

Update Frequency

Static dataset, version 1.0 (Released December 2024)

License

Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)

Documentation

https://huggingface.co/datasets/gretelai/gretel-safety-alignment-en-v1

Managed By

Gretel.ai

See all datasets managed by Gretel.ai.

Contact

https://gretel.ai/discord

How to Cite

Gretel Synthetic Safety Alignment Dataset was accessed on DATE from https://registry.opendata.aws/gretel-synthetic-safety-alignment-en-v1. @dataset{gretelai_gretel-safety-alignment-en-v1, title = {Gretel Synthetic Safety Alignment Dataset}, year = {2024}, month = {12}, publisher = {Gretel}, url = {https://huggingface.co/datasets/gretelai/gretel-safety-alignment-en-v1}}

Usage Examples

Tutorials
Tools & Applications

Resources on AWS

  • Description
    The dataset is available as three files - train with 6,000 records, validation with 1,200 records, and test with 1,161 records. Each file is in parquet format and contains 14 columns including prompts, responses, safety scores, and probability assessments. Generated using Apache 2.0 licensed models including ibm-granite/granite-3.0-8b, Qwen2.5-7B, and Mistral-Nemo-Instruct-2407.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::gretel-datasets-public/gretel-synthetic-safety-alignment-en-v1
    AWS Region
    us-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://gretel-datasets-public/gretel-synthetic-safety-alignment-en-v1/
    Explore
    Browse Bucket

Edit this dataset entry on GitHub

Tell us about your project

Home