🔒 CensorBot
A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.
Overview
CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.
Features
- 🛡️ Automatic Detection - Identifies names, emails, phone numbers, addresses, SSNs, and more
- 🔄 Real-time Processing - Stream-based censoring for immediate feedback
- 🎯 High Accuracy - AI-powered detection understands context, not just patterns
- 💼 Enterprise Ready - Designed for IT service companies handling customer data
- 🌐 Web Interface - Clean, intuitive UI built with NiceGUI
- 📝 30+ Test Examples - Comprehensive test suite covering various scenarios
Quick Start
Prerequisites
- Python 3.8+
- uv package manager
- An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)
Installation
- Clone the repository:
git clone https://github.com/yourusername/CensorBot.git
cd CensorBot
- Install dependencies:
uv sync
- Configure environment variables:
cp .env.example .env
# Edit .env with your API credentials
- Run the application:
uv run python src/main.py
- Open your browser to
http://localhost:8080
Configuration
Create a .env file with the following variables:
# LLM Backend Configuration
BACKEND_BASE_URL=https://api.deepseek.com # Your LLM API endpoint
BACKEND_API_TOKEN=your-api-token-here # API authentication token
BACKEND_MODEL=deepseek-chat # Model to use for censoring
Usage
- Paste Text: Copy your text containing sensitive customer information into the input field
- Process: Click "Censor Data" to automatically detect and replace sensitive information
- Copy Result: Use the censored text safely with any external AI service
What Gets Censored
- Personal names
- Email addresses
- Phone numbers
- Physical addresses
- Social Security Numbers
- Credit card numbers
- Bank account numbers
- Driver's license numbers
- Passport numbers
- Medical record numbers
- IP addresses
- Usernames and passwords
- Company names (in customer context)
- Dates of birth
Project Structure
CensorBot/
├── src/
│ ├── main.py # Main application with NiceGUI interface
│ ├── prompt.md # System prompt for the censoring LLM
│ └── lib/
│ └── llm.py # LLM integration module
├── examples/ # 30+ test cases with various sensitive data
│ ├── 01_customer_support.txt
│ ├── 02_medical_record.txt
│ └── ...
├── .env.example # Environment variables template
├── pyproject.toml # Project dependencies
└── CLAUDE.md # AI assistant instructions
Development
Running Tests
Test the censoring with example files:
# The application loads a random example on startup
uv run python src/main.py
Adding Dependencies
uv add <package-name>
Project Commands
# Install dependencies
uv sync
# Run the application
uv run python src/main.py
# Format code (if configured)
uv run black src/
# Type checking (if configured)
uv run mypy src/
Security Considerations
- Local Processing: Use a local or self-hosted LLM for maximum security
- No Data Storage: CensorBot doesn't store any processed text
- API Security: Keep your API tokens secure and never commit them
- HTTPS Only: Use HTTPS for API communications
- Regular Updates: Keep dependencies updated for security patches
Use Cases
- IT Support Tickets: Sanitize customer tickets before using AI for solutions
- Documentation: Remove sensitive data from technical documentation
- Training Data: Prepare datasets for ML training without privacy concerns
- Compliance: Meet GDPR, HIPAA, and other privacy regulations
- Knowledge Base: Create sanitized versions of customer interactions
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built with NiceGUI for the web interface
- Powered by uv for fast Python package management
- AI censoring via OpenAI-compatible APIs
Support
For issues, questions, or suggestions, please open an issue on GitHub.
⚠️ Important: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.