Skip to content
BenGER

BenGER Benchathon 2026

Multi-agent AI · Legal NLP · Python · 2026

Built a multi-agent AI pipeline for the BenGER Benchathon 2026 — a competition testing whether AI systems can solve German legal exam cases (Staatsexamen-level) as well as trained jurists.

The core idea was to decompose complex legal exam questions into structured sub-problems, route each to a specialized agent, and recombine the reasoning into a final answer. The pipeline scored 18/18 — the highest result in the competition — achieved without any legal training.

The pipeline handles the full chain: parsing the case facts, identifying the relevant legal norms, applying a structured legal subsumption logic, and generating a coherent exam-style answer. Each step runs as an independent agent with its own prompt strategy and tool access.

Pipeline

12 agents across 4 phases. Concurrent execution, web-grounded verification, self-correcting critique loops. 15–20 min per case.

18/18 *
CASE FACTS → NORM RESEARCH → CLASSIFICATION → PLAINTIFF → DEFENDANT → SCHEMA GEN → GUTACHTENSTIL → CONTENT REVIEW → METHOD REVIEW → FORM REVIEW → CASE FACTS → NORM RESEARCH → CLASSIFICATION → PLAINTIFF → DEFENDANT → SCHEMA GEN → GUTACHTENSTIL → CONTENT REVIEW → METHOD REVIEW → FORM REVIEW →
PHASE 01PARALLEL

Analyze

Parse & classify the exam case

  • Case Facts
  • Norm Research
  • Classification
PHASE 02PARALLEL

Argue

Adversarial problem-finding

  • Plaintiff Agent
  • Defendant Agent
PHASE 03SEQUENTIAL

Synthesize

Formal legal opinion in exam style

  • Schema Generator
  • Gutachtenstil Writer
PHASE 04ITERATIVE

Refine

Critique-revise loops until convergence

  • Content Review
  • Method Review
  • Form Review

* Scored by LLM-based evaluation, not human legal experts

Links