3.6 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
CensorBot is a Python application that acts as a data sanitization tool for IT service companies. It uses a small LLM (like DeepSeek) to automatically detect and censor sensitive customer information in text inputs. Users input text containing customer data, and CensorBot returns a sanitized version with all sensitive information replaced by placeholders. This censored text can then be safely used with any external LLM service (Claude, GPT-4, etc.) without risking data breaches. The application uses NiceGUI for the frontend.
Key Architecture
Core Components
- Frontend: NiceGUI-based web interface (to be implemented in
src/main.py) - LLM Integration:
src/lib/llm.pyprovides async HTTP client for LLM API communication- Supports both streaming and non-streaming responses
- Uses httpx for async HTTP requests
- Expects OpenAI-compatible chat completions API
Configuration
- Environment Variables (via
.envfile):BACKEND_BASE_URL: Censoring LLM backend URL (e.g., DeepSeek API)BACKEND_API_TOKEN: API authentication token for the censoring LLMBACKEND_MODEL: Model to use for censoring (e.g., "deepseek-chat")
- System Prompt: Located in
src/prompt.md- defines the censoring LLM's behavior for identifying and redacting sensitive data
Development Commands
Package Management (using uv)
# Install dependencies
uv sync
# Add a dependency
uv add <package>
# Run the application
uv run src/main.py
Running the Application
# Run the NiceGUI application (once implemented)
uv run python src/main.py
Important Implementation Notes
-
LLM Integration: The
get_responsefunction insrc/lib/llm.pyis fully functional and expects:- Backend configuration with
BACKEND_BASE_URL,BACKEND_API_TOKENandBACKEND_MODEL - Messages in OpenAI format with roles: "system", "assistant", "user"
- Returns async generators for both streaming and non-streaming modes
- Used exclusively for the censoring functionality
- Backend configuration with
-
Security Focus: This application handles sensitive customer data. Always:
- Ensure proper data sanitization before and after LLM processing
- Never log or expose raw customer information
- Keep API tokens secure and never commit them
-
Frontend Development: When implementing the NiceGUI interface in
src/main.py:- Provide input field for text containing sensitive data
- Display censored output that users can copy
- Use async handlers to integrate with the LLM backend
- Implement proper error handling for API failures
- Consider showing before/after comparison
- Add copy-to-clipboard functionality for the censored text
-
System Prompt: The
src/prompt.mdfile should contain clear instructions for the censoring LLM on:- What constitutes customer information (names, addresses, phone numbers, emails, etc.)
- How to censor/redact sensitive data (e.g., replace with placeholders like [CUSTOMER_NAME], [EMAIL], etc.)
- Maintaining context while protecting privacy
- Ensuring the output remains useful for the downstream processing LLM
-
Usage Flow:
- User pastes text with customer data into CensorBot
- CensorBot uses small LLM to identify and replace sensitive information
- User receives censored text with placeholders
- User can copy censored text and use it with any external LLM service
- No direct integration with external LLMs - CensorBot is a standalone sanitization tool