academic-paper

Constitutional AI: Harmlessness from AI Feedback

PublisherAnthropic
AuthorYuntao Bai et al.
URLhttps://arxiv.org/abs/2212.08073
Access date2026-01-15
Published2022-12-15
Source IDsrc_i9j0k1l2

Excerpt

We propose a method for training AI assistants to be harmless without human feedback labels for harms. The method involves both supervised learning and reinforcement learning from AI feedback.

Citing claims (1)