init
This commit is contained in:
77
CLAUDE.md
Normal file
77
CLAUDE.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
CensorBot is a Python application that acts as a data sanitization tool for IT service companies. It uses a small LLM (like DeepSeek) to automatically detect and censor sensitive customer information in text inputs. Users input text containing customer data, and CensorBot returns a sanitized version with all sensitive information replaced by placeholders. This censored text can then be safely used with any external LLM service (Claude, GPT-4, etc.) without risking data breaches. The application uses NiceGUI for the frontend.
|
||||
|
||||
## Key Architecture
|
||||
|
||||
### Core Components
|
||||
- **Frontend**: NiceGUI-based web interface (to be implemented in `src/main.py`)
|
||||
- **LLM Integration**: `src/lib/llm.py` provides async HTTP client for LLM API communication
|
||||
- Supports both streaming and non-streaming responses
|
||||
- Uses httpx for async HTTP requests
|
||||
- Expects OpenAI-compatible chat completions API
|
||||
|
||||
### Configuration
|
||||
- **Environment Variables** (via `.env` file):
|
||||
- `BACKEND_BASE_URL`: Censoring LLM backend URL (e.g., DeepSeek API)
|
||||
- `BACKEND_API_TOKEN`: API authentication token for the censoring LLM
|
||||
- `BACKEND_MODEL`: Model to use for censoring (e.g., "deepseek-chat")
|
||||
- **System Prompt**: Located in `src/prompt.md` - defines the censoring LLM's behavior for identifying and redacting sensitive data
|
||||
|
||||
## Development Commands
|
||||
|
||||
### Package Management (using uv)
|
||||
```bash
|
||||
# Install dependencies
|
||||
uv sync
|
||||
|
||||
# Add a dependency
|
||||
uv add <package>
|
||||
|
||||
# Run the application
|
||||
uv run src/main.py
|
||||
```
|
||||
|
||||
### Running the Application
|
||||
```bash
|
||||
# Run the NiceGUI application (once implemented)
|
||||
uv run python src/main.py
|
||||
```
|
||||
|
||||
## Important Implementation Notes
|
||||
|
||||
1. **LLM Integration**: The `get_response` function in `src/lib/llm.py` is fully functional and expects:
|
||||
- Backend configuration with `BACKEND_BASE_URL`, `BACKEND_API_TOKEN` and `BACKEND_MODEL`
|
||||
- Messages in OpenAI format with roles: "system", "assistant", "user"
|
||||
- Returns async generators for both streaming and non-streaming modes
|
||||
- Used exclusively for the censoring functionality
|
||||
|
||||
2. **Security Focus**: This application handles sensitive customer data. Always:
|
||||
- Ensure proper data sanitization before and after LLM processing
|
||||
- Never log or expose raw customer information
|
||||
- Keep API tokens secure and never commit them
|
||||
|
||||
3. **Frontend Development**: When implementing the NiceGUI interface in `src/main.py`:
|
||||
- Provide input field for text containing sensitive data
|
||||
- Display censored output that users can copy
|
||||
- Use async handlers to integrate with the LLM backend
|
||||
- Implement proper error handling for API failures
|
||||
- Consider showing before/after comparison
|
||||
- Add copy-to-clipboard functionality for the censored text
|
||||
|
||||
4. **System Prompt**: The `src/prompt.md` file should contain clear instructions for the censoring LLM on:
|
||||
- What constitutes customer information (names, addresses, phone numbers, emails, etc.)
|
||||
- How to censor/redact sensitive data (e.g., replace with placeholders like [CUSTOMER_NAME], [EMAIL], etc.)
|
||||
- Maintaining context while protecting privacy
|
||||
- Ensuring the output remains useful for the downstream processing LLM
|
||||
|
||||
5. **Usage Flow**:
|
||||
- User pastes text with customer data into CensorBot
|
||||
- CensorBot uses small LLM to identify and replace sensitive information
|
||||
- User receives censored text with placeholders
|
||||
- User can copy censored text and use it with any external LLM service
|
||||
- No direct integration with external LLMs - CensorBot is a standalone sanitization tool
|
||||
Reference in New Issue
Block a user