Files
CensorBot/CLAUDE.md
2025-08-29 21:33:33 +02:00

3.6 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

CensorBot is a Python application that acts as a data sanitization tool for IT service companies. It uses a small LLM (like DeepSeek) to automatically detect and censor sensitive customer information in text inputs. Users input text containing customer data, and CensorBot returns a sanitized version with all sensitive information replaced by placeholders. This censored text can then be safely used with any external LLM service (Claude, GPT-4, etc.) without risking data breaches. The application uses NiceGUI for the frontend.

Key Architecture

Core Components

  • Frontend: NiceGUI-based web interface (to be implemented in src/main.py)
  • LLM Integration: src/lib/llm.py provides async HTTP client for LLM API communication
    • Supports both streaming and non-streaming responses
    • Uses httpx for async HTTP requests
    • Expects OpenAI-compatible chat completions API

Configuration

  • Environment Variables (via .env file):
    • BACKEND_BASE_URL: Censoring LLM backend URL (e.g., DeepSeek API)
    • BACKEND_API_TOKEN: API authentication token for the censoring LLM
    • BACKEND_MODEL: Model to use for censoring (e.g., "deepseek-chat")
  • System Prompt: Located in src/prompt.md - defines the censoring LLM's behavior for identifying and redacting sensitive data

Development Commands

Package Management (using uv)

# Install dependencies
uv sync

# Add a dependency
uv add <package>

# Run the application
uv run src/main.py

Running the Application

# Run the NiceGUI application (once implemented)
uv run python src/main.py

Important Implementation Notes

  1. LLM Integration: The get_response function in src/lib/llm.py is fully functional and expects:

    • Backend configuration with BACKEND_BASE_URL, BACKEND_API_TOKEN and BACKEND_MODEL
    • Messages in OpenAI format with roles: "system", "assistant", "user"
    • Returns async generators for both streaming and non-streaming modes
    • Used exclusively for the censoring functionality
  2. Security Focus: This application handles sensitive customer data. Always:

    • Ensure proper data sanitization before and after LLM processing
    • Never log or expose raw customer information
    • Keep API tokens secure and never commit them
  3. Frontend Development: When implementing the NiceGUI interface in src/main.py:

    • Provide input field for text containing sensitive data
    • Display censored output that users can copy
    • Use async handlers to integrate with the LLM backend
    • Implement proper error handling for API failures
    • Consider showing before/after comparison
    • Add copy-to-clipboard functionality for the censored text
  4. System Prompt: The src/prompt.md file should contain clear instructions for the censoring LLM on:

    • What constitutes customer information (names, addresses, phone numbers, emails, etc.)
    • How to censor/redact sensitive data (e.g., replace with placeholders like [CUSTOMER_NAME], [EMAIL], etc.)
    • Maintaining context while protecting privacy
    • Ensuring the output remains useful for the downstream processing LLM
  5. Usage Flow:

    • User pastes text with customer data into CensorBot
    • CensorBot uses small LLM to identify and replace sensitive information
    • User receives censored text with placeholders
    • User can copy censored text and use it with any external LLM service
    • No direct integration with external LLMs - CensorBot is a standalone sanitization tool