init

2025-08-29 21:33:33 +02:00
parent df4eeca9cb
commit 2b8271263d
36 changed files with 1439 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,77 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+CensorBot is a Python application that acts as a data sanitization tool for IT service companies. It uses a small LLM (like DeepSeek) to automatically detect and censor sensitive customer information in text inputs. Users input text containing customer data, and CensorBot returns a sanitized version with all sensitive information replaced by placeholders. This censored text can then be safely used with any external LLM service (Claude, GPT-4, etc.) without risking data breaches. The application uses NiceGUI for the frontend.
+
+## Key Architecture
+
+### Core Components
+- **Frontend**: NiceGUI-based web interface (to be implemented in `src/main.py`)
+- **LLM Integration**: `src/lib/llm.py` provides async HTTP client for LLM API communication
+  - Supports both streaming and non-streaming responses
+  - Uses httpx for async HTTP requests
+  - Expects OpenAI-compatible chat completions API
+
+### Configuration
+- **Environment Variables** (via `.env` file):
+  - `BACKEND_BASE_URL`: Censoring LLM backend URL (e.g., DeepSeek API)
+  - `BACKEND_API_TOKEN`: API authentication token for the censoring LLM
+  - `BACKEND_MODEL`: Model to use for censoring (e.g., "deepseek-chat")
+- **System Prompt**: Located in `src/prompt.md` - defines the censoring LLM's behavior for identifying and redacting sensitive data
+
+## Development Commands
+
+### Package Management (using uv)
+```bash
+# Install dependencies
+uv sync
+
+# Add a dependency
+uv add <package>
+
+# Run the application
+uv run src/main.py
+```
+
+### Running the Application
+```bash
+# Run the NiceGUI application (once implemented)
+uv run python src/main.py
+```
+
+## Important Implementation Notes
+
+1. **LLM Integration**: The `get_response` function in `src/lib/llm.py` is fully functional and expects:
+   - Backend configuration with `BACKEND_BASE_URL`, `BACKEND_API_TOKEN` and `BACKEND_MODEL`
+   - Messages in OpenAI format with roles: "system", "assistant", "user"
+   - Returns async generators for both streaming and non-streaming modes
+   - Used exclusively for the censoring functionality
+
+2. **Security Focus**: This application handles sensitive customer data. Always:
+   - Ensure proper data sanitization before and after LLM processing
+   - Never log or expose raw customer information
+   - Keep API tokens secure and never commit them
+
+3. **Frontend Development**: When implementing the NiceGUI interface in `src/main.py`:
+   - Provide input field for text containing sensitive data
+   - Display censored output that users can copy
+   - Use async handlers to integrate with the LLM backend
+   - Implement proper error handling for API failures
+   - Consider showing before/after comparison
+   - Add copy-to-clipboard functionality for the censored text
+
+4. **System Prompt**: The `src/prompt.md` file should contain clear instructions for the censoring LLM on:
+   - What constitutes customer information (names, addresses, phone numbers, emails, etc.)
+   - How to censor/redact sensitive data (e.g., replace with placeholders like [CUSTOMER_NAME], [EMAIL], etc.)
+   - Maintaining context while protecting privacy
+   - Ensuring the output remains useful for the downstream processing LLM
+
+5. **Usage Flow**:
+   - User pastes text with customer data into CensorBot
+   - CensorBot uses small LLM to identify and replace sensitive information
+   - User receives censored text with placeholders
+   - User can copy censored text and use it with any external LLM service
+   - No direct integration with external LLMs - CensorBot is a standalone sanitization tool