academic-paper

Constitutional AI: Harmlessness from AI Feedback

Publisher	Anthropic
Author	Yuntao Bai et al.
URL	https://arxiv.org/abs/2212.08073
Access date	2026-01-15
Published	2022-12-15
Source ID	`src_i9j0k1l2`

Excerpt

We propose a method for training AI assistants to be harmless without human feedback labels for harms. The method involves both supervised learning and reinforcement learning from AI feedback.

Citing claims (1)

Constitutional AI Reduces Need for Human Harm Labels