# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview CensorBot is a Python application that acts as a data sanitization tool for IT service companies. It uses a small LLM (like DeepSeek) to automatically detect and censor sensitive customer information in text inputs. Users input text containing customer data, and CensorBot returns a sanitized version with all sensitive information replaced by placeholders. This censored text can then be safely used with any external LLM service (Claude, GPT-4, etc.) without risking data breaches. The application uses NiceGUI for the frontend. ## Key Architecture ### Core Components - **Frontend**: NiceGUI-based web interface (to be implemented in `src/main.py`) - **LLM Integration**: `src/lib/llm.py` provides async HTTP client for LLM API communication - Supports both streaming and non-streaming responses - Uses httpx for async HTTP requests - Expects OpenAI-compatible chat completions API ### Configuration - **Environment Variables** (via `.env` file): - `BACKEND_BASE_URL`: Censoring LLM backend URL (e.g., DeepSeek API) - `BACKEND_API_TOKEN`: API authentication token for the censoring LLM - `BACKEND_MODEL`: Model to use for censoring (e.g., "deepseek-chat") - **System Prompt**: Located in `src/prompt.md` - defines the censoring LLM's behavior for identifying and redacting sensitive data ## Development Commands ### Package Management (using uv) ```bash # Install dependencies uv sync # Add a dependency uv add # Run the application uv run src/main.py ``` ### Running the Application ```bash # Run the NiceGUI application (once implemented) uv run python src/main.py ``` ## Important Implementation Notes 1. **LLM Integration**: The `get_response` function in `src/lib/llm.py` is fully functional and expects: - Backend configuration with `BACKEND_BASE_URL`, `BACKEND_API_TOKEN` and `BACKEND_MODEL` - Messages in OpenAI format with roles: "system", "assistant", "user" - Returns async generators for both streaming and non-streaming modes - Used exclusively for the censoring functionality 2. **Security Focus**: This application handles sensitive customer data. Always: - Ensure proper data sanitization before and after LLM processing - Never log or expose raw customer information - Keep API tokens secure and never commit them 3. **Frontend Development**: When implementing the NiceGUI interface in `src/main.py`: - Provide input field for text containing sensitive data - Display censored output that users can copy - Use async handlers to integrate with the LLM backend - Implement proper error handling for API failures - Consider showing before/after comparison - Add copy-to-clipboard functionality for the censored text 4. **System Prompt**: The `src/prompt.md` file should contain clear instructions for the censoring LLM on: - What constitutes customer information (names, addresses, phone numbers, emails, etc.) - How to censor/redact sensitive data (e.g., replace with placeholders like [CUSTOMER_NAME], [EMAIL], etc.) - Maintaining context while protecting privacy - Ensuring the output remains useful for the downstream processing LLM 5. **Usage Flow**: - User pastes text with customer data into CensorBot - CensorBot uses small LLM to identify and replace sensitive information - User receives censored text with placeholders - User can copy censored text and use it with any external LLM service - No direct integration with external LLMs - CensorBot is a standalone sanitization tool