跳轉至

Red Teaming Language Models to Reduce Harms- Methods, Scaling Behaviors, and Lessons Learned