gmarth/CensorBot

Fork 0

Go to file

Alexander Thiess 3a6b1cfee7 copy to clipboard

2025-08-29 21:54:22 +02:00

examples

init

2025-08-29 21:33:33 +02:00

src

copy to clipboard

2025-08-29 21:54:22 +02:00

.env.example

added example env

2025-08-29 21:40:40 +02:00

.gitignore

Initial commit

2025-08-29 19:50:16 +02:00

.python-version

init

2025-08-29 21:33:33 +02:00

CLAUDE.md

init

2025-08-29 21:33:33 +02:00

LICENSE

Initial commit

2025-08-29 19:50:16 +02:00

pyproject.toml

init

2025-08-29 21:33:33 +02:00

README.md

generated readme

2025-08-29 21:44:48 +02:00

uv.lock

init

2025-08-29 21:33:33 +02:00

README.md

🔒 CensorBot

A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.

Overview

CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.

Features

🛡️ Automatic Detection - Identifies names, emails, phone numbers, addresses, SSNs, and more
🔄 Real-time Processing - Stream-based censoring for immediate feedback
🎯 High Accuracy - AI-powered detection understands context, not just patterns
💼 Enterprise Ready - Designed for IT service companies handling customer data
🌐 Web Interface - Clean, intuitive UI built with NiceGUI
📝 30+ Test Examples - Comprehensive test suite covering various scenarios

Quick Start

Prerequisites

Python 3.8+
uv package manager
An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)

Installation

Clone the repository:

git clone https://github.com/yourusername/CensorBot.git
cd CensorBot

Install dependencies:

uv sync

Configure environment variables:

cp .env.example .env
# Edit .env with your API credentials

Run the application:

uv run python src/main.py

Open your browser to http://localhost:8080

Configuration

Create a .env file with the following variables:

# LLM Backend Configuration
BACKEND_BASE_URL=https://api.deepseek.com  # Your LLM API endpoint
BACKEND_API_TOKEN=your-api-token-here       # API authentication token
BACKEND_MODEL=deepseek-chat                 # Model to use for censoring

Usage

Paste Text: Copy your text containing sensitive customer information into the input field
Process: Click "Censor Data" to automatically detect and replace sensitive information
Copy Result: Use the censored text safely with any external AI service

What Gets Censored

Personal names
Email addresses
Phone numbers
Physical addresses
Social Security Numbers
Credit card numbers
Bank account numbers
Driver's license numbers
Passport numbers
Medical record numbers
IP addresses
Usernames and passwords
Company names (in customer context)
Dates of birth

Project Structure

CensorBot/
├── src/
│   ├── main.py          # Main application with NiceGUI interface
│   ├── prompt.md        # System prompt for the censoring LLM
│   └── lib/
│       └── llm.py       # LLM integration module
├── examples/            # 30+ test cases with various sensitive data
│   ├── 01_customer_support.txt
│   ├── 02_medical_record.txt
│   └── ...
├── .env.example         # Environment variables template
├── pyproject.toml       # Project dependencies
└── CLAUDE.md           # AI assistant instructions

Development

Running Tests

Test the censoring with example files:

# The application loads a random example on startup
uv run python src/main.py

Adding Dependencies

uv add <package-name>

Project Commands

# Install dependencies
uv sync

# Run the application
uv run python src/main.py

# Format code (if configured)
uv run black src/

# Type checking (if configured)
uv run mypy src/

Security Considerations

Local Processing: Use a local or self-hosted LLM for maximum security
No Data Storage: CensorBot doesn't store any processed text
API Security: Keep your API tokens secure and never commit them
HTTPS Only: Use HTTPS for API communications
Regular Updates: Keep dependencies updated for security patches

Use Cases

IT Support Tickets: Sanitize customer tickets before using AI for solutions
Documentation: Remove sensitive data from technical documentation
Training Data: Prepare datasets for ML training without privacy concerns
Compliance: Meet GDPR, HIPAA, and other privacy regulations
Knowledge Base: Create sanitized versions of customer interactions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built with NiceGUI for the web interface
Powered by uv for fast Python package management
AI censoring via OpenAI-compatible APIs

Support

For issues, questions, or suggestions, please open an issue on GitHub.

⚠️ Important: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.