Back

AI Agents - Reply Code Challenge 2026

I built a ReAct-based multi-agent system for the Reply Code Challenge 2026, finishing 57th globally out of 2000 teams.

Repository
LangChainLangfuseReActmulti-agentfraud detectiontoken budgetprompt tuning

AI Agents - Reply Code Challenge 2026

AI Agent Challenge Banner

Team: spaghetti-overflow

This repository and all the work contained within were created through the collaborative effort of our team members: Andrea Botticella, Elia Innocenti, Renato Mignone, and Simone Romano. See the Authors section for contact information and links to their profiles.

This repository contains the full learning path, sandbox training materials, and challenge solution workspace for the Reply Code Challenge 2026 - AI Agents track.

Documentation Design Notice All documentation in this repository is written and structured for AI agent readability and token efficiency. Every file follows these rules: no emojis, no decorative padding, no redundant prose, no information duplicated across files. Each README is scoped strictly to its own folder. Cross-folder navigation is handled through explicit pointer lines only. The primary entry point for any AI agent working in this repository is AI_Agent.md in this root directory.


What this repository is for

This repository contains a competition-winning implementation for the Reply Code Challenge 2026 - AI Agents track, completing the challenge with a global ranking of 57 out of 2000 teams.

Challenge Details:

  • Date: April 16th, 2026 (6-hour timed event)
  • Theme: Monitor. Adapt. Defend.
  • Objective: Design an LLM-powered multi-agent system to detect fraudulent financial transactions in real-time by analyzing behavioral anomalies across complex datasets.

Technical Achievement:

  • Built a ReAct-based agentic AI system using LangChain and Langfuse for real-time fraud detection
  • Implemented adaptive parameter optimization including recursion limits (90→70), token reduction (1800→1200), and temperature tuning
  • Achieved multi-dimensional performance improvements: enhanced z-score scaling (3.0→3.5), balance-impact scoring, and economic-aware risk assessment
  • Optimized for efficiency: reduced model invocations, eliminated redundant fallback models, lower token overhead while maintaining detection quality
  • Managed a constrained token budget across multiple competition datasets while balancing quality, cost, and latency

The repository is organized into three phases:

  1. Learning (00_AI_Agents_Learning/) - Four progressive tutorials covering agentic AI fundamentals, tools, multi-agent orchestration, and resource management
  2. Pre-Challenge Training (01_AI_Agents_Training/) - Sandbox environment with practice datasets, used for iterating before April 16th
  3. Challenge (02_AI_Agents_Challenge/) - Official competition solution achieving top-3% global ranking; final submitted code in 01_Implementation/01_Implementation_Code/

Results & Technical Architecture

Final Ranking: 57 / 2000 teams (Top 3%)

Agentic System Design: The solution implements a multi-agent ReAct orchestrator with specialized components:

  • Data Analyst Agent - Extracts pattern signatures from transaction history, user personas, and behavioral context
  • Anomaly Detection Engine - Applies heuristic baseline + LLM-based decision making with economic impact awareness
  • Review Agent - Secondary review pass for uncertain/disputed cases to improve precision-recall balance

Key Optimizations:

MetricInitialOptimizedOutcome
Recursion Limit9070 (adaptive by dataset)30% token overhead reduction
Token Usage1800 (max)1200 (max)Cost efficiency maintained
Fallback Models3 duplicates1 uniqueFailure cascade prevention
Z-Score Threshold3.03.5Enhanced outlier detection
Model InvocationsFull batchAdaptive calibrationBudget-aware selection

Detection Quality Improvements:

  • Enhanced system prompt with explicit fraud signals (behavioral anomalies, economic misalignment, channel anomalies, location contradictions)
  • Implemented balance-impact scorer detecting transactions causing >50% balance drops
  • Added whitelisted legitimate patterns (salary, recurring utilities, subscriptions) for false positive reduction
  • Calibrated ranking to prioritize high-value fraud detection

Technology Stack:

  • LangChain (agentic orchestration), OpenRouter API (LLM access), Langfuse (observability & tracing)
  • Environment: Python 3.10+, Jupyter for experimentation, Makefile for reproducible setup
  • Infrastructure: Full .env-based credential management, token budget tracking, submission session logging

Repository Structure

AI_Agents_Reply_Challenge/
  AI_Agent.md                    - Primary entry point for AI agents
  README.md                      - This file
  Makefile                       - Run 'make' to set up the entire environment
  .env.example                   - Credential template (safe to commit, copy to .env and fill in)
  .env                           - Your real credentials (not committed, excluded by .gitignore)
  .gitignore                     - Excludes .env, .venv, __pycache__, build artifacts
  .venv/                         - Root virtual environment (created by 'make setup')

  .scripts/                      - Environment setup scripts and dependency manifest
    requirements.txt             - All Python dependencies for the entire project
    check_setup.py               - Verifies imports and .env credentials (run via 'make check')
    utils.py                     - Shared data loader utility for parsing dataset schemas

  00_AI_Agents_Learning/         - Tutorial notebooks (start here if new to the stack)
    README.md                    - Setup, credential config, notebook order
    Notebooks/                   - Four Jupyter notebooks to run in sequence
    TXT/                         - Source instructions used to build the notebooks

  01_AI_Agents_Training/         - Pre-challenge sandbox training (April 10-15, 2026)
    README.md                    - Problem domain, file schemas, submission interface
    GUIDE.md                     - Step-by-step sandbox workflow
    00_Sandbox_Sample_Material/  - Official organizer-provided materials and training datasets
    01_Sandbox_Implementations/  - Pre-challenge iterative implementation
    resources/                   - Screenshots of the sandbox challenge interface

  02_AI_Agents_Challenge/        - Official competition workspace (April 16th, 2026)
    README.md                    - Challenge status, solution overview, and structure
    00_How_It_Works/             - Official rules, API docs, and model reference
      README.md                  - Competition rules, timeline, scoring, prizes
      submission_guide.md        - Challenge-day fast path: generation, validation, upload order
      challenge_day_checklist.md - 60-second pre-submit go/no-go checklist
      api_guidelines.md          - Langfuse integration code and best practices
      model_whitelist.md         - All whitelisted OpenRouter model IDs
    01_Implementation/           - Challenge day workspace
      README.md                  - Architecture design and optimization notes
      01_Implementation_Code/    - FINAL SUBMITTED CODE
        Dataset1_Implementation/ - Submission for Dataset 1
        Dataset2_Implementation/ - Submission for Dataset 2
        Dataset3_Implementation/ - Submission for Dataset 3
        Dataset4_Implementation/ - Submission for Dataset 4
        Dataset5_Implementation/ - Submission for Dataset 5
      00_Training_Material/      - Official challenge training materials

Getting Started

Prerequisites:

  • Python 3.10 to 3.13 - Python 3.14 is incompatible with Langfuse, do not use it
  • GNU Make (pre-installed on Linux and macOS)
  • An OpenRouter API key (free at openrouter.ai)
  • Langfuse credentials provided by the challenge organizers on challenge day
  • For sandbox training: sandbox keys available on the challenge platform under "View my Keys"

One-command setup (from the repo root):

make

This creates the root .venv/, installs all dependencies from .scripts/requirements.txt, and registers the Jupyter kernel.

Then configure credentials:

cp .env.example .env
# Edit .env and fill in your real values

Verify everything is working:

make check

Launch Jupyter:

make jupyter
# or activate manually: source .venv/bin/activate && jupyter lab 00_AI_Agents_Learning/Notebooks/

Learning Path

The 00_AI_Agents_Learning section contains four progressive tutorials. Run them in order:

#NotebookConcepts
01Basic Agent CreationLangChain, OpenRouter, system prompts
02Tools and Function Calling@tool decorator, automatic tool selection
03Multi-Agent SystemsOrchestrator pattern, Agents as Tools
04Resource ManagementLangfuse tracing, session IDs, cost tracking

See 00_AI_Agents_Learning/README.md for full setup and usage instructions.


Challenge Overview

The competition uses 5 datasets of increasing complexity. They unlock in two stages:

StageDatasetsToken BudgetUnlock Condition
11, 2, 3$40Available at start
24, 5$120 moreSubmit eval solutions for all of 1-3

Every submission requires three elements: a Langfuse session ID, a UTF-8 output file, and (for evaluation datasets only) a source code zip. Training submissions are unlimited and show a score each time. Evaluation submissions are one per dataset and cannot be re-submitted.

See 02_AI_Agents_Challenge/00_How_It_Works/README.md for the full rules, scoring breakdown, prizes, and submission format. For rapid challenge-day operations, use 02_AI_Agents_Challenge/00_How_It_Works/submission_guide.md and 02_AI_Agents_Challenge/00_How_It_Works/challenge_day_checklist.md.


Tech Stack

LibraryPurpose
LangChainAgent framework and tool abstractions
LangGraphReAct agent execution engine
langchain-openaiOpenAI-compatible model connector
OpenRouterUnified LLM API gateway
LangfuseObservability: token tracking, cost monitoring
ulid-pyUnique session ID generation
python-dotenv.env file loading

Makefile targets

TargetWhat it does
make or make setupCreates root .venv/, installs all deps, registers Jupyter kernel
make checkVerifies all imports work and .env has all required credentials filled in
make jupyterLaunches Jupyter Lab in the learning notebooks folder
make cleanRemoves the root .venv/ (run make again to recreate)