From 50e9df2a424e566309ea056250f8b412554b511e Mon Sep 17 00:00:00 2001 From: Alexander Thiess Date: Fri, 29 Aug 2025 21:44:48 +0200 Subject: [PATCH] generated readme --- README.md | 178 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 177 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cddb841..a7f5716 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,178 @@ -# CensorBot +# 🔒 CensorBot +A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI. + +## Overview + +CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services. + +## Features + +- 🛡️ **Automatic Detection** - Identifies names, emails, phone numbers, addresses, SSNs, and more +- 🔄 **Real-time Processing** - Stream-based censoring for immediate feedback +- 🎯 **High Accuracy** - AI-powered detection understands context, not just patterns +- 💼 **Enterprise Ready** - Designed for IT service companies handling customer data +- 🌐 **Web Interface** - Clean, intuitive UI built with NiceGUI +- 📝 **30+ Test Examples** - Comprehensive test suite covering various scenarios + +## Quick Start + +### Prerequisites + +- Python 3.8+ +- [uv](https://github.com/astral-sh/uv) package manager +- An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM) + +### Installation + +1. Clone the repository: +```bash +git clone https://github.com/yourusername/CensorBot.git +cd CensorBot +``` + +2. Install dependencies: +```bash +uv sync +``` + +3. Configure environment variables: +```bash +cp .env.example .env +# Edit .env with your API credentials +``` + +4. Run the application: +```bash +uv run python src/main.py +``` + +5. Open your browser to `http://localhost:8080` + +## Configuration + +Create a `.env` file with the following variables: + +```env +# LLM Backend Configuration +BACKEND_BASE_URL=https://api.deepseek.com # Your LLM API endpoint +BACKEND_API_TOKEN=your-api-token-here # API authentication token +BACKEND_MODEL=deepseek-chat # Model to use for censoring +``` + +## Usage + +1. **Paste Text**: Copy your text containing sensitive customer information into the input field +2. **Process**: Click "Censor Data" to automatically detect and replace sensitive information +3. **Copy Result**: Use the censored text safely with any external AI service + +### What Gets Censored + +- Personal names +- Email addresses +- Phone numbers +- Physical addresses +- Social Security Numbers +- Credit card numbers +- Bank account numbers +- Driver's license numbers +- Passport numbers +- Medical record numbers +- IP addresses +- Usernames and passwords +- Company names (in customer context) +- Dates of birth + +## Project Structure + +``` +CensorBot/ +├── src/ +│ ├── main.py # Main application with NiceGUI interface +│ ├── prompt.md # System prompt for the censoring LLM +│ └── lib/ +│ └── llm.py # LLM integration module +├── examples/ # 30+ test cases with various sensitive data +│ ├── 01_customer_support.txt +│ ├── 02_medical_record.txt +│ └── ... +├── .env.example # Environment variables template +├── pyproject.toml # Project dependencies +└── CLAUDE.md # AI assistant instructions +``` + +## Development + +### Running Tests + +Test the censoring with example files: +```bash +# The application loads a random example on startup +uv run python src/main.py +``` + +### Adding Dependencies + +```bash +uv add +``` + +### Project Commands + +```bash +# Install dependencies +uv sync + +# Run the application +uv run python src/main.py + +# Format code (if configured) +uv run black src/ + +# Type checking (if configured) +uv run mypy src/ +``` + +## Security Considerations + +- **Local Processing**: Use a local or self-hosted LLM for maximum security +- **No Data Storage**: CensorBot doesn't store any processed text +- **API Security**: Keep your API tokens secure and never commit them +- **HTTPS Only**: Use HTTPS for API communications +- **Regular Updates**: Keep dependencies updated for security patches + +## Use Cases + +- **IT Support Tickets**: Sanitize customer tickets before using AI for solutions +- **Documentation**: Remove sensitive data from technical documentation +- **Training Data**: Prepare datasets for ML training without privacy concerns +- **Compliance**: Meet GDPR, HIPAA, and other privacy regulations +- **Knowledge Base**: Create sanitized versions of customer interactions + +## Contributing + +Contributions are welcome! Please feel free to submit a Pull Request. + +1. Fork the repository +2. Create your feature branch (`git checkout -b feature/AmazingFeature`) +3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) +4. Push to the branch (`git push origin feature/AmazingFeature`) +5. Open a Pull Request + +## License + +This project is licensed under the MIT License - see the LICENSE file for details. + +## Acknowledgments + +- Built with [NiceGUI](https://nicegui.io/) for the web interface +- Powered by [uv](https://github.com/astral-sh/uv) for fast Python package management +- AI censoring via OpenAI-compatible APIs + +## Support + +For issues, questions, or suggestions, please open an issue on GitHub. + +--- + +**⚠️ Important**: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.