generated readme

2025-08-29 21:44:48 +02:00
parent 4b0ca5469a
commit 50e9df2a42
1 changed files with 177 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,178 @@
-# CensorBot
+# 🔒 CensorBot

+A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.
+
+## Overview
+
+CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.
+
+## Features
+
+- 🛡️ **Automatic Detection** - Identifies names, emails, phone numbers, addresses, SSNs, and more
+- 🔄 **Real-time Processing** - Stream-based censoring for immediate feedback
+- 🎯 **High Accuracy** - AI-powered detection understands context, not just patterns
+- 💼 **Enterprise Ready** - Designed for IT service companies handling customer data
+- 🌐 **Web Interface** - Clean, intuitive UI built with NiceGUI
+- 📝 **30+ Test Examples** - Comprehensive test suite covering various scenarios
+
+## Quick Start
+
+### Prerequisites
+
+- Python 3.8+
+- [uv](https://github.com/astral-sh/uv) package manager
+- An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)
+
+### Installation
+
+1. Clone the repository:
+```bash
+git clone https://github.com/yourusername/CensorBot.git
+cd CensorBot
+```
+
+2. Install dependencies:
+```bash
+uv sync
+```
+
+3. Configure environment variables:
+```bash
+cp .env.example .env
+# Edit .env with your API credentials
+```
+
+4. Run the application:
+```bash
+uv run python src/main.py
+```
+
+5. Open your browser to `http://localhost:8080`
+
+## Configuration
+
+Create a `.env` file with the following variables:
+
+```env
+# LLM Backend Configuration
+BACKEND_BASE_URL=https://api.deepseek.com  # Your LLM API endpoint
+BACKEND_API_TOKEN=your-api-token-here       # API authentication token
+BACKEND_MODEL=deepseek-chat                 # Model to use for censoring
+```
+
+## Usage
+
+1. **Paste Text**: Copy your text containing sensitive customer information into the input field
+2. **Process**: Click "Censor Data" to automatically detect and replace sensitive information
+3. **Copy Result**: Use the censored text safely with any external AI service
+
+### What Gets Censored
+
+- Personal names
+- Email addresses
+- Phone numbers
+- Physical addresses
+- Social Security Numbers
+- Credit card numbers
+- Bank account numbers
+- Driver's license numbers
+- Passport numbers
+- Medical record numbers
+- IP addresses
+- Usernames and passwords
+- Company names (in customer context)
+- Dates of birth
+
+## Project Structure
+
+```
+CensorBot/
+├── src/
+│   ├── main.py          # Main application with NiceGUI interface
+│   ├── prompt.md        # System prompt for the censoring LLM
+│   └── lib/
+│       └── llm.py       # LLM integration module
+├── examples/            # 30+ test cases with various sensitive data
+│   ├── 01_customer_support.txt
+│   ├── 02_medical_record.txt
+│   └── ...
+├── .env.example         # Environment variables template
+├── pyproject.toml       # Project dependencies
+└── CLAUDE.md           # AI assistant instructions
+```
+
+## Development
+
+### Running Tests
+
+Test the censoring with example files:
+```bash
+# The application loads a random example on startup
+uv run python src/main.py
+```
+
+### Adding Dependencies
+
+```bash
+uv add <package-name>
+```
+
+### Project Commands
+
+```bash
+# Install dependencies
+uv sync
+
+# Run the application
+uv run python src/main.py
+
+# Format code (if configured)
+uv run black src/
+
+# Type checking (if configured)
+uv run mypy src/
+```
+
+## Security Considerations
+
+- **Local Processing**: Use a local or self-hosted LLM for maximum security
+- **No Data Storage**: CensorBot doesn't store any processed text
+- **API Security**: Keep your API tokens secure and never commit them
+- **HTTPS Only**: Use HTTPS for API communications
+- **Regular Updates**: Keep dependencies updated for security patches
+
+## Use Cases
+
+- **IT Support Tickets**: Sanitize customer tickets before using AI for solutions
+- **Documentation**: Remove sensitive data from technical documentation
+- **Training Data**: Prepare datasets for ML training without privacy concerns
+- **Compliance**: Meet GDPR, HIPAA, and other privacy regulations
+- **Knowledge Base**: Create sanitized versions of customer interactions
+
+## Contributing
+
+Contributions are welcome! Please feel free to submit a Pull Request.
+
+1. Fork the repository
+2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
+3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
+4. Push to the branch (`git push origin feature/AmazingFeature`)
+5. Open a Pull Request
+
+## License
+
+This project is licensed under the MIT License - see the LICENSE file for details.
+
+## Acknowledgments
+
+- Built with [NiceGUI](https://nicegui.io/) for the web interface
+- Powered by [uv](https://github.com/astral-sh/uv) for fast Python package management
+- AI censoring via OpenAI-compatible APIs
+
+## Support
+
+For issues, questions, or suggestions, please open an issue on GitHub.
+
+---
+
+**⚠️ Important**: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.