generated readme

2025-08-29 21:44:48 +02:00
parent 4b0ca5469a
commit 50e9df2a42
1 changed files with 177 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -1,2 +1,178 @@
-# CensorBot
+# 🔒 CensorBot
 A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.
 ## Overview
 CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.
 ## Features
 - 🛡️ **Automatic Detection** - Identifies names, emails, phone numbers, addresses, SSNs, and more
 - 🔄 **Real-time Processing** - Stream-based censoring for immediate feedback
 - 🎯 **High Accuracy** - AI-powered detection understands context, not just patterns
 - 💼 **Enterprise Ready** - Designed for IT service companies handling customer data
 - 🌐 **Web Interface** - Clean, intuitive UI built with NiceGUI
 - 📝 **30+ Test Examples** - Comprehensive test suite covering various scenarios
 ## Quick Start
 ### Prerequisites
 - Python 3.8+
 - [uv](https://github.com/astral-sh/uv) package manager
 - An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)
 ### Installation
 1. Clone the repository:
 ```bash
 git clone https://github.com/yourusername/CensorBot.git
 cd CensorBot
 ```
 2. Install dependencies:
 ```bash
 uv sync
 ```
 3. Configure environment variables:
 ```bash
 cp .env.example .env
 # Edit .env with your API credentials
 ```
 4. Run the application:
 ```bash
 uv run python src/main.py
 ```
 5. Open your browser to `http://localhost:8080`
 ## Configuration
 Create a `.env` file with the following variables:
 ```env
 # LLM Backend Configuration
 BACKEND_BASE_URL=https://api.deepseek.com  # Your LLM API endpoint
 BACKEND_API_TOKEN=your-api-token-here       # API authentication token
 BACKEND_MODEL=deepseek-chat                 # Model to use for censoring
 ```
 ## Usage
 1. **Paste Text**: Copy your text containing sensitive customer information into the input field
 2. **Process**: Click "Censor Data" to automatically detect and replace sensitive information
 3. **Copy Result**: Use the censored text safely with any external AI service
 ### What Gets Censored
 - Personal names
 - Email addresses
 - Phone numbers
 - Physical addresses
 - Social Security Numbers
 - Credit card numbers
 - Bank account numbers
 - Driver's license numbers
 - Passport numbers
 - Medical record numbers
 - IP addresses
 - Usernames and passwords
 - Company names (in customer context)
 - Dates of birth
 ## Project Structure
 ```
 CensorBot/
 ├── src/
 │   ├── main.py          # Main application with NiceGUI interface
 │   ├── prompt.md        # System prompt for the censoring LLM
 │   └── lib/
 │       └── llm.py       # LLM integration module
 ├── examples/            # 30+ test cases with various sensitive data
 │   ├── 01_customer_support.txt
 │   ├── 02_medical_record.txt
 │   └── ...
 ├── .env.example         # Environment variables template
 ├── pyproject.toml       # Project dependencies
 └── CLAUDE.md           # AI assistant instructions
 ```
 ## Development
 ### Running Tests
 Test the censoring with example files:
 ```bash
 # The application loads a random example on startup
 uv run python src/main.py
 ```
 ### Adding Dependencies
 ```bash
 uv add <package-name>
 ```
 ### Project Commands
 ```bash
 # Install dependencies
 uv sync
 # Run the application
 uv run python src/main.py
 # Format code (if configured)
 uv run black src/
 # Type checking (if configured)
 uv run mypy src/
 ```
 ## Security Considerations
 - **Local Processing**: Use a local or self-hosted LLM for maximum security
 - **No Data Storage**: CensorBot doesn't store any processed text
 - **API Security**: Keep your API tokens secure and never commit them
 - **HTTPS Only**: Use HTTPS for API communications
 - **Regular Updates**: Keep dependencies updated for security patches
 ## Use Cases
 - **IT Support Tickets**: Sanitize customer tickets before using AI for solutions
 - **Documentation**: Remove sensitive data from technical documentation
 - **Training Data**: Prepare datasets for ML training without privacy concerns
 - **Compliance**: Meet GDPR, HIPAA, and other privacy regulations
 - **Knowledge Base**: Create sanitized versions of customer interactions
 ## Contributing
 Contributions are welcome! Please feel free to submit a Pull Request.
 1. Fork the repository
 2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
 4. Push to the branch (`git push origin feature/AmazingFeature`)
 5. Open a Pull Request
 ## License
 This project is licensed under the MIT License - see the LICENSE file for details.
 ## Acknowledgments
 - Built with [NiceGUI](https://nicegui.io/) for the web interface
 - Powered by [uv](https://github.com/astral-sh/uv) for fast Python package management
 - AI censoring via OpenAI-compatible APIs
 ## Support
 For issues, questions, or suggestions, please open an issue on GitHub.
 ---
 **⚠️ Important**: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.