- Red Teaming is a computer security term that has been co-opted by the AI community to refer to the practice of trying to break AI.
- Red teaming a Large Language Model (LLM) based AI like ChatGPT, might be asking it for instructions on how to make a bomb, or asking an image generator like Midjourney to produce offensive or copyrighted artwork.
- While useful for highlighting issues, most LLMs are highly susceptible to these kind of attacks becuase of how they work, so results from this work should not be surprising if the pre-training data set contains this information.
Red teaming is a process of testing or evaluating the security of an organization or system by simulating an malicious activity from an adversary. In the context of LLMs, it refers to crafting prompts to make the AI perform task that it should not be able to do.
Normally an internal team or external contractors is hired by the organisation owning the AI, to try various adversarial tactics. When these attacks are discovered it is possible to then train various ensure the AI is fit for public use ahead of wide release.