Files
CensorBot/README.md
2025-08-29 21:44:48 +02:00

179 lines
5.1 KiB
Markdown

# 🔒 CensorBot
A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.
## Overview
CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.
## Features
- 🛡️ **Automatic Detection** - Identifies names, emails, phone numbers, addresses, SSNs, and more
- 🔄 **Real-time Processing** - Stream-based censoring for immediate feedback
- 🎯 **High Accuracy** - AI-powered detection understands context, not just patterns
- 💼 **Enterprise Ready** - Designed for IT service companies handling customer data
- 🌐 **Web Interface** - Clean, intuitive UI built with NiceGUI
- 📝 **30+ Test Examples** - Comprehensive test suite covering various scenarios
## Quick Start
### Prerequisites
- Python 3.8+
- [uv](https://github.com/astral-sh/uv) package manager
- An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)
### Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/CensorBot.git
cd CensorBot
```
2. Install dependencies:
```bash
uv sync
```
3. Configure environment variables:
```bash
cp .env.example .env
# Edit .env with your API credentials
```
4. Run the application:
```bash
uv run python src/main.py
```
5. Open your browser to `http://localhost:8080`
## Configuration
Create a `.env` file with the following variables:
```env
# LLM Backend Configuration
BACKEND_BASE_URL=https://api.deepseek.com # Your LLM API endpoint
BACKEND_API_TOKEN=your-api-token-here # API authentication token
BACKEND_MODEL=deepseek-chat # Model to use for censoring
```
## Usage
1. **Paste Text**: Copy your text containing sensitive customer information into the input field
2. **Process**: Click "Censor Data" to automatically detect and replace sensitive information
3. **Copy Result**: Use the censored text safely with any external AI service
### What Gets Censored
- Personal names
- Email addresses
- Phone numbers
- Physical addresses
- Social Security Numbers
- Credit card numbers
- Bank account numbers
- Driver's license numbers
- Passport numbers
- Medical record numbers
- IP addresses
- Usernames and passwords
- Company names (in customer context)
- Dates of birth
## Project Structure
```
CensorBot/
├── src/
│ ├── main.py # Main application with NiceGUI interface
│ ├── prompt.md # System prompt for the censoring LLM
│ └── lib/
│ └── llm.py # LLM integration module
├── examples/ # 30+ test cases with various sensitive data
│ ├── 01_customer_support.txt
│ ├── 02_medical_record.txt
│ └── ...
├── .env.example # Environment variables template
├── pyproject.toml # Project dependencies
└── CLAUDE.md # AI assistant instructions
```
## Development
### Running Tests
Test the censoring with example files:
```bash
# The application loads a random example on startup
uv run python src/main.py
```
### Adding Dependencies
```bash
uv add <package-name>
```
### Project Commands
```bash
# Install dependencies
uv sync
# Run the application
uv run python src/main.py
# Format code (if configured)
uv run black src/
# Type checking (if configured)
uv run mypy src/
```
## Security Considerations
- **Local Processing**: Use a local or self-hosted LLM for maximum security
- **No Data Storage**: CensorBot doesn't store any processed text
- **API Security**: Keep your API tokens secure and never commit them
- **HTTPS Only**: Use HTTPS for API communications
- **Regular Updates**: Keep dependencies updated for security patches
## Use Cases
- **IT Support Tickets**: Sanitize customer tickets before using AI for solutions
- **Documentation**: Remove sensitive data from technical documentation
- **Training Data**: Prepare datasets for ML training without privacy concerns
- **Compliance**: Meet GDPR, HIPAA, and other privacy regulations
- **Knowledge Base**: Create sanitized versions of customer interactions
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- Built with [NiceGUI](https://nicegui.io/) for the web interface
- Powered by [uv](https://github.com/astral-sh/uv) for fast Python package management
- AI censoring via OpenAI-compatible APIs
## Support
For issues, questions, or suggestions, please open an issue on GitHub.
---
**⚠️ Important**: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.