Files
CensorBot/README.md
2025-08-29 21:44:48 +02:00

5.1 KiB

🔒 CensorBot

A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.

Overview

CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.

Features

  • 🛡️ Automatic Detection - Identifies names, emails, phone numbers, addresses, SSNs, and more
  • 🔄 Real-time Processing - Stream-based censoring for immediate feedback
  • 🎯 High Accuracy - AI-powered detection understands context, not just patterns
  • 💼 Enterprise Ready - Designed for IT service companies handling customer data
  • 🌐 Web Interface - Clean, intuitive UI built with NiceGUI
  • 📝 30+ Test Examples - Comprehensive test suite covering various scenarios

Quick Start

Prerequisites

  • Python 3.8+
  • uv package manager
  • An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/CensorBot.git
cd CensorBot
  1. Install dependencies:
uv sync
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your API credentials
  1. Run the application:
uv run python src/main.py
  1. Open your browser to http://localhost:8080

Configuration

Create a .env file with the following variables:

# LLM Backend Configuration
BACKEND_BASE_URL=https://api.deepseek.com  # Your LLM API endpoint
BACKEND_API_TOKEN=your-api-token-here       # API authentication token
BACKEND_MODEL=deepseek-chat                 # Model to use for censoring

Usage

  1. Paste Text: Copy your text containing sensitive customer information into the input field
  2. Process: Click "Censor Data" to automatically detect and replace sensitive information
  3. Copy Result: Use the censored text safely with any external AI service

What Gets Censored

  • Personal names
  • Email addresses
  • Phone numbers
  • Physical addresses
  • Social Security Numbers
  • Credit card numbers
  • Bank account numbers
  • Driver's license numbers
  • Passport numbers
  • Medical record numbers
  • IP addresses
  • Usernames and passwords
  • Company names (in customer context)
  • Dates of birth

Project Structure

CensorBot/
├── src/
│   ├── main.py          # Main application with NiceGUI interface
│   ├── prompt.md        # System prompt for the censoring LLM
│   └── lib/
│       └── llm.py       # LLM integration module
├── examples/            # 30+ test cases with various sensitive data
│   ├── 01_customer_support.txt
│   ├── 02_medical_record.txt
│   └── ...
├── .env.example         # Environment variables template
├── pyproject.toml       # Project dependencies
└── CLAUDE.md           # AI assistant instructions

Development

Running Tests

Test the censoring with example files:

# The application loads a random example on startup
uv run python src/main.py

Adding Dependencies

uv add <package-name>

Project Commands

# Install dependencies
uv sync

# Run the application
uv run python src/main.py

# Format code (if configured)
uv run black src/

# Type checking (if configured)
uv run mypy src/

Security Considerations

  • Local Processing: Use a local or self-hosted LLM for maximum security
  • No Data Storage: CensorBot doesn't store any processed text
  • API Security: Keep your API tokens secure and never commit them
  • HTTPS Only: Use HTTPS for API communications
  • Regular Updates: Keep dependencies updated for security patches

Use Cases

  • IT Support Tickets: Sanitize customer tickets before using AI for solutions
  • Documentation: Remove sensitive data from technical documentation
  • Training Data: Prepare datasets for ML training without privacy concerns
  • Compliance: Meet GDPR, HIPAA, and other privacy regulations
  • Knowledge Base: Create sanitized versions of customer interactions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built with NiceGUI for the web interface
  • Powered by uv for fast Python package management
  • AI censoring via OpenAI-compatible APIs

Support

For issues, questions, or suggestions, please open an issue on GitHub.


⚠️ Important: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.