# 🔒 CensorBot A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI. ## Overview CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services. ## Features - 🛡️ **Automatic Detection** - Identifies names, emails, phone numbers, addresses, SSNs, and more - 🔄 **Real-time Processing** - Stream-based censoring for immediate feedback - 🎯 **High Accuracy** - AI-powered detection understands context, not just patterns - 💼 **Enterprise Ready** - Designed for IT service companies handling customer data - 🌐 **Web Interface** - Clean, intuitive UI built with NiceGUI - 📝 **30+ Test Examples** - Comprehensive test suite covering various scenarios ## Quick Start ### Prerequisites - Python 3.8+ - [uv](https://github.com/astral-sh/uv) package manager - An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM) ### Installation 1. Clone the repository: ```bash git clone https://github.com/yourusername/CensorBot.git cd CensorBot ``` 2. Install dependencies: ```bash uv sync ``` 3. Configure environment variables: ```bash cp .env.example .env # Edit .env with your API credentials ``` 4. Run the application: ```bash uv run python src/main.py ``` 5. Open your browser to `http://localhost:8080` ## Configuration Create a `.env` file with the following variables: ```env # LLM Backend Configuration BACKEND_BASE_URL=https://api.deepseek.com # Your LLM API endpoint BACKEND_API_TOKEN=your-api-token-here # API authentication token BACKEND_MODEL=deepseek-chat # Model to use for censoring ``` ## Usage 1. **Paste Text**: Copy your text containing sensitive customer information into the input field 2. **Process**: Click "Censor Data" to automatically detect and replace sensitive information 3. **Copy Result**: Use the censored text safely with any external AI service ### What Gets Censored - Personal names - Email addresses - Phone numbers - Physical addresses - Social Security Numbers - Credit card numbers - Bank account numbers - Driver's license numbers - Passport numbers - Medical record numbers - IP addresses - Usernames and passwords - Company names (in customer context) - Dates of birth ## Project Structure ``` CensorBot/ ├── src/ │ ├── main.py # Main application with NiceGUI interface │ ├── prompt.md # System prompt for the censoring LLM │ └── lib/ │ └── llm.py # LLM integration module ├── examples/ # 30+ test cases with various sensitive data │ ├── 01_customer_support.txt │ ├── 02_medical_record.txt │ └── ... ├── .env.example # Environment variables template ├── pyproject.toml # Project dependencies └── CLAUDE.md # AI assistant instructions ``` ## Development ### Running Tests Test the censoring with example files: ```bash # The application loads a random example on startup uv run python src/main.py ``` ### Adding Dependencies ```bash uv add ``` ### Project Commands ```bash # Install dependencies uv sync # Run the application uv run python src/main.py # Format code (if configured) uv run black src/ # Type checking (if configured) uv run mypy src/ ``` ## Security Considerations - **Local Processing**: Use a local or self-hosted LLM for maximum security - **No Data Storage**: CensorBot doesn't store any processed text - **API Security**: Keep your API tokens secure and never commit them - **HTTPS Only**: Use HTTPS for API communications - **Regular Updates**: Keep dependencies updated for security patches ## Use Cases - **IT Support Tickets**: Sanitize customer tickets before using AI for solutions - **Documentation**: Remove sensitive data from technical documentation - **Training Data**: Prepare datasets for ML training without privacy concerns - **Compliance**: Meet GDPR, HIPAA, and other privacy regulations - **Knowledge Base**: Create sanitized versions of customer interactions ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. 1. Fork the repository 2. Create your feature branch (`git checkout -b feature/AmazingFeature`) 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) 4. Push to the branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request ## License This project is licensed under the MIT License - see the LICENSE file for details. ## Acknowledgments - Built with [NiceGUI](https://nicegui.io/) for the web interface - Powered by [uv](https://github.com/astral-sh/uv) for fast Python package management - AI censoring via OpenAI-compatible APIs ## Support For issues, questions, or suggestions, please open an issue on GitHub. --- **⚠️ Important**: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.