Graveyard — Reworr

Decomposition Jailbreak

Dropped

2024 · With Palisade Research

Study of how breaking harmful requests into benign-looking subtasks bypasses model refusals.

Technical:

4-role async pipeline: Surrogate → Decomposer → Target → Composer
Tree-based task decomposition with configurable depth
LLM-as-a-Judge evaluation with Elo scoring
HarmBench test suite

Why dropped: Hard to measure, and scope kept expanding—each finding raised even more questions. Similar research was published during our work, most notably Adversaries Can Misuse Combinations of Safe Models.

Partial Writeup

Predicting AI Releases via Side Channels

Abandoned

January 2025

Attempt to predict OpenAI releases by analyzing Twitter activity of their red team members. Hypothesis: intensive testing before launches reduces social media engagement.

Why abandoned: Weak signal, Twitter API restrictions, and no free time for projects like this.

LessWrong Post