Add Ethical Red Team dataset loader by 96528025 · Pull Request #1519 · Azure/PyRIT

96528025 · 2026-03-18T20:04:09Z

Summary

add a remote dataset loader for the Hugging Face dataset srushtisingh/Ethical_redteam
register the loader in the remote dataset package for automatic discovery
add unit tests for dataset loading and custom config handling

Testing

.venv/bin/python -m pytest tests/unit/datasets/test_ethical_redteam_dataset.py -q
.venv/bin/python -m pytest tests/unit/datasets/test_harmful_qa_dataset.py tests/unit/datasets/test_toxic_chat_dataset.py -q
.venv/bin/python -m ruff check pyrit/datasets/seed_datasets/remote/ethical_redteam_dataset.py tests/unit/datasets/test_ethical_redteam_dataset.py pyrit/datasets/seed_datasets/remote/__init__.py

96528025 · 2026-03-18T20:45:47Z

@microsoft-github-policy-service agree

romanlutz · 2026-03-19T13:34:42Z

You will need to rerun notebook 1 from the datasets docs to show the dataset in the list there.

romanlutz · 2026-03-19T13:35:17Z

pyrit/datasets/seed_datasets/remote/ethical_redteam_dataset.py

+    def __init__(
+        self,
+        *,
+        source: str = "srushtisingh/Ethical_redteam",


Probably needn't be configurable

romanlutz · 2026-03-19T13:35:49Z

pyrit/datasets/seed_datasets/remote/ethical_redteam_dataset.py

+                    "Ethical Red Team dataset from Hugging Face. "
+                    "Contains prompts intended for red-teaming and safety testing of language models."
+                ),
+                source=f"https://huggingface.co/datasets/{self.source}",


Authors/groups missing.

romanlutz · 2026-03-19T13:36:19Z

pyrit/datasets/seed_datasets/remote/ethical_redteam_dataset.py

+
+        logger.info(f"Successfully loaded {len(seed_prompts)} prompts from Ethical Red Team dataset")
+
+        return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)


A maintainer should run the integration test and make sure it works and looks as expected

romanlutz · 2026-03-19T13:39:09Z

pyrit/datasets/seed_datasets/remote/ethical_redteam_dataset.py

+            SeedPrompt(
+                value=item["prompt"],
+                data_type="text",
+                dataset_name=self.dataset_name,


Sadly the dataset is not annotated with harm categories... That would be really useful.

Add Ethical Red Team dataset loader

5a3a99e

romanlutz reviewed Mar 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ethical Red Team dataset loader#1519

Add Ethical Red Team dataset loader#1519
96528025 wants to merge 1 commit intoAzure:mainfrom
96528025:codex/add-ethical-redteam-dataset

96528025 commented Mar 18, 2026

Uh oh!

96528025 commented Mar 18, 2026

Uh oh!

romanlutz commented Mar 19, 2026

Uh oh!

romanlutz Mar 19, 2026

Uh oh!

romanlutz Mar 19, 2026

Uh oh!

romanlutz Mar 19, 2026

Uh oh!

romanlutz Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		logger.info(f"Successfully loaded {len(seed_prompts)} prompts from Ethical Red Team dataset")

		return SeedDataset(seeds=seed_prompts, dataset_name=self.dataset_name)

Conversation

96528025 commented Mar 18, 2026

Summary

Testing

Uh oh!

96528025 commented Mar 18, 2026

Uh oh!

romanlutz commented Mar 19, 2026

Uh oh!

romanlutz Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

romanlutz Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

romanlutz Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

romanlutz Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants