Group of hooded hacker figures on a red background holding computers

Red Teaming

Red teaming is a process of testing or evaluating the security of an organization or system by simulating an malicious activity from an adversary. In the context of AI, it refers to crafting prompts to make the AI perform task that it should not be able to do.

Key Points

  • Red Teaming is a computer security term that has been co-opted by the AI community to refer to the practice of trying to break AI.
  • Red teaming a Large Language Model (LLM) based AI like ChatGPT, might be asking it for instructions on how to make a bomb, or asking an image generator like Midjourney to produce offensive or copyrighted artwork.
  • While useful for highlighting issues, most LLMs are highly susceptible to these kind of attacks becuase of how they work, so results from this work should not be surprising if the pre-training data set contains this information.

Learn more

Red teaming is a process of testing or evaluating the security of an organization or system by simulating an malicious activity from an adversary. In the context of LLMs, it refers to crafting prompts to make the AI perform task that it should not be able to do.

Normally an internal team or external contractors is hired by the organisation owning the AI, to try various adversarial tactics. When these attacks are discovered it is possible to then train various ensure the AI is fit for public use ahead of wide release.

Subscribe to our Newsletter and stay up to date!

Subscribe to our newsletter for the latest news and work updates straight to your inbox.

Oops! There was an error sending the email, please try again.

Awesome! Now check your inbox and click the link to confirm your subscription.