Decomposition Jailbreak

Dropped

2024 · With Palisade Research

Study of how breaking harmful requests into benign-looking subtasks bypasses model refusals.

Decomposition attack diagram
Technical:
  • 4-role async pipeline: Surrogate → Decomposer → Target → Composer
  • Tree-based task decomposition with configurable depth
  • LLM-as-a-Judge evaluation with Elo scoring
  • HarmBench test suite
Why dropped: Hard to measure, and scope kept expanding—each finding raised even more questions. Similar research was published during our work, most notably Adversaries Can Misuse Combinations of Safe Models.

Predicting AI Releases via Side Channels

Abandoned

January 2025

Attempt to predict OpenAI releases by analyzing Twitter activity of their red team members. Hypothesis: intensive testing before launches reduces social media engagement.

AI Release Prediction analysis
Why abandoned: Weak signal, Twitter API restrictions, and no free time for projects like this.