2025-08-29 21:54:22 +02:00
2025-08-29 21:33:33 +02:00
2025-08-29 21:54:22 +02:00
2025-08-29 21:40:40 +02:00
2025-08-29 19:50:16 +02:00
2025-08-29 21:33:33 +02:00
2025-08-29 21:33:33 +02:00
2025-08-29 19:50:16 +02:00
2025-08-29 21:33:33 +02:00
2025-08-29 21:44:48 +02:00
2025-08-29 21:33:33 +02:00

🔒 CensorBot

A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.

Overview

CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.

Features

  • 🛡️ Automatic Detection - Identifies names, emails, phone numbers, addresses, SSNs, and more
  • 🔄 Real-time Processing - Stream-based censoring for immediate feedback
  • 🎯 High Accuracy - AI-powered detection understands context, not just patterns
  • 💼 Enterprise Ready - Designed for IT service companies handling customer data
  • 🌐 Web Interface - Clean, intuitive UI built with NiceGUI
  • 📝 30+ Test Examples - Comprehensive test suite covering various scenarios

Quick Start

Prerequisites

  • Python 3.8+
  • uv package manager
  • An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/CensorBot.git
cd CensorBot
  1. Install dependencies:
uv sync
  1. Configure environment variables:
cp .env.example .env
# Edit .env with your API credentials
  1. Run the application:
uv run python src/main.py
  1. Open your browser to http://localhost:8080

Configuration

Create a .env file with the following variables:

# LLM Backend Configuration
BACKEND_BASE_URL=https://api.deepseek.com  # Your LLM API endpoint
BACKEND_API_TOKEN=your-api-token-here       # API authentication token
BACKEND_MODEL=deepseek-chat                 # Model to use for censoring

Usage

  1. Paste Text: Copy your text containing sensitive customer information into the input field
  2. Process: Click "Censor Data" to automatically detect and replace sensitive information
  3. Copy Result: Use the censored text safely with any external AI service

What Gets Censored

  • Personal names
  • Email addresses
  • Phone numbers
  • Physical addresses
  • Social Security Numbers
  • Credit card numbers
  • Bank account numbers
  • Driver's license numbers
  • Passport numbers
  • Medical record numbers
  • IP addresses
  • Usernames and passwords
  • Company names (in customer context)
  • Dates of birth

Project Structure

CensorBot/
├── src/
│   ├── main.py          # Main application with NiceGUI interface
│   ├── prompt.md        # System prompt for the censoring LLM
│   └── lib/
│       └── llm.py       # LLM integration module
├── examples/            # 30+ test cases with various sensitive data
│   ├── 01_customer_support.txt
│   ├── 02_medical_record.txt
│   └── ...
├── .env.example         # Environment variables template
├── pyproject.toml       # Project dependencies
└── CLAUDE.md           # AI assistant instructions

Development

Running Tests

Test the censoring with example files:

# The application loads a random example on startup
uv run python src/main.py

Adding Dependencies

uv add <package-name>

Project Commands

# Install dependencies
uv sync

# Run the application
uv run python src/main.py

# Format code (if configured)
uv run black src/

# Type checking (if configured)
uv run mypy src/

Security Considerations

  • Local Processing: Use a local or self-hosted LLM for maximum security
  • No Data Storage: CensorBot doesn't store any processed text
  • API Security: Keep your API tokens secure and never commit them
  • HTTPS Only: Use HTTPS for API communications
  • Regular Updates: Keep dependencies updated for security patches

Use Cases

  • IT Support Tickets: Sanitize customer tickets before using AI for solutions
  • Documentation: Remove sensitive data from technical documentation
  • Training Data: Prepare datasets for ML training without privacy concerns
  • Compliance: Meet GDPR, HIPAA, and other privacy regulations
  • Knowledge Base: Create sanitized versions of customer interactions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Built with NiceGUI for the web interface
  • Powered by uv for fast Python package management
  • AI censoring via OpenAI-compatible APIs

Support

For issues, questions, or suggestions, please open an issue on GitHub.


⚠️ Important: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.

Description
No description provided
Readme MIT 108 KiB
Languages
Python 100%