BenGER Benchathon 2026
Multi-agent AI · Legal NLP · Python · 2026
Built a multi-agent AI pipeline for the BenGER Benchathon 2026 — a competition testing whether AI systems can solve German legal exam cases (Staatsexamen-level) as well as trained jurists.
The core idea was to decompose complex legal exam questions into structured sub-problems, route each to a specialized agent, and recombine the reasoning into a final answer. The pipeline scored 18/18 — the highest result in the competition — achieved without any legal training.
The pipeline handles the full chain: parsing the case facts, identifying the relevant legal norms, applying a structured legal subsumption logic, and generating a coherent exam-style answer. Each step runs as an independent agent with its own prompt strategy and tool access.
Pipeline
12 agents across 4 phases. Concurrent execution, web-grounded verification, self-correcting critique loops. 15–20 min per case.
Analyze
Parse & classify the exam case
- Case Facts
- Norm Research
- Classification
Argue
Adversarial problem-finding
- Plaintiff Agent
- Defendant Agent
Synthesize
Formal legal opinion in exam style
- Schema Generator
- Gutachtenstil Writer
Refine
Critique-revise loops until convergence
- Content Review
- Method Review
- Form Review
* Scored by LLM-based evaluation, not human legal experts