generated readme
This commit is contained in:
178
README.md
178
README.md
@@ -1,2 +1,178 @@
|
|||||||
# CensorBot
|
# 🔒 CensorBot
|
||||||
|
|
||||||
|
A secure data sanitization tool for IT service companies that automatically detects and censors sensitive customer information using AI.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
CensorBot is a Python application that helps protect customer privacy by automatically identifying and replacing sensitive information with placeholders. It uses a small, efficient LLM (like DeepSeek) to process text locally, ensuring that sensitive data never leaves your control before being sent to external AI services.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- 🛡️ **Automatic Detection** - Identifies names, emails, phone numbers, addresses, SSNs, and more
|
||||||
|
- 🔄 **Real-time Processing** - Stream-based censoring for immediate feedback
|
||||||
|
- 🎯 **High Accuracy** - AI-powered detection understands context, not just patterns
|
||||||
|
- 💼 **Enterprise Ready** - Designed for IT service companies handling customer data
|
||||||
|
- 🌐 **Web Interface** - Clean, intuitive UI built with NiceGUI
|
||||||
|
- 📝 **30+ Test Examples** - Comprehensive test suite covering various scenarios
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Python 3.8+
|
||||||
|
- [uv](https://github.com/astral-sh/uv) package manager
|
||||||
|
- An OpenAI-compatible API endpoint (e.g., DeepSeek, local LLM)
|
||||||
|
|
||||||
|
### Installation
|
||||||
|
|
||||||
|
1. Clone the repository:
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/yourusername/CensorBot.git
|
||||||
|
cd CensorBot
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install dependencies:
|
||||||
|
```bash
|
||||||
|
uv sync
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Configure environment variables:
|
||||||
|
```bash
|
||||||
|
cp .env.example .env
|
||||||
|
# Edit .env with your API credentials
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Run the application:
|
||||||
|
```bash
|
||||||
|
uv run python src/main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Open your browser to `http://localhost:8080`
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Create a `.env` file with the following variables:
|
||||||
|
|
||||||
|
```env
|
||||||
|
# LLM Backend Configuration
|
||||||
|
BACKEND_BASE_URL=https://api.deepseek.com # Your LLM API endpoint
|
||||||
|
BACKEND_API_TOKEN=your-api-token-here # API authentication token
|
||||||
|
BACKEND_MODEL=deepseek-chat # Model to use for censoring
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
1. **Paste Text**: Copy your text containing sensitive customer information into the input field
|
||||||
|
2. **Process**: Click "Censor Data" to automatically detect and replace sensitive information
|
||||||
|
3. **Copy Result**: Use the censored text safely with any external AI service
|
||||||
|
|
||||||
|
### What Gets Censored
|
||||||
|
|
||||||
|
- Personal names
|
||||||
|
- Email addresses
|
||||||
|
- Phone numbers
|
||||||
|
- Physical addresses
|
||||||
|
- Social Security Numbers
|
||||||
|
- Credit card numbers
|
||||||
|
- Bank account numbers
|
||||||
|
- Driver's license numbers
|
||||||
|
- Passport numbers
|
||||||
|
- Medical record numbers
|
||||||
|
- IP addresses
|
||||||
|
- Usernames and passwords
|
||||||
|
- Company names (in customer context)
|
||||||
|
- Dates of birth
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
CensorBot/
|
||||||
|
├── src/
|
||||||
|
│ ├── main.py # Main application with NiceGUI interface
|
||||||
|
│ ├── prompt.md # System prompt for the censoring LLM
|
||||||
|
│ └── lib/
|
||||||
|
│ └── llm.py # LLM integration module
|
||||||
|
├── examples/ # 30+ test cases with various sensitive data
|
||||||
|
│ ├── 01_customer_support.txt
|
||||||
|
│ ├── 02_medical_record.txt
|
||||||
|
│ └── ...
|
||||||
|
├── .env.example # Environment variables template
|
||||||
|
├── pyproject.toml # Project dependencies
|
||||||
|
└── CLAUDE.md # AI assistant instructions
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
|
|
||||||
|
Test the censoring with example files:
|
||||||
|
```bash
|
||||||
|
# The application loads a random example on startup
|
||||||
|
uv run python src/main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Adding Dependencies
|
||||||
|
|
||||||
|
```bash
|
||||||
|
uv add <package-name>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Project Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
uv run python src/main.py
|
||||||
|
|
||||||
|
# Format code (if configured)
|
||||||
|
uv run black src/
|
||||||
|
|
||||||
|
# Type checking (if configured)
|
||||||
|
uv run mypy src/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Considerations
|
||||||
|
|
||||||
|
- **Local Processing**: Use a local or self-hosted LLM for maximum security
|
||||||
|
- **No Data Storage**: CensorBot doesn't store any processed text
|
||||||
|
- **API Security**: Keep your API tokens secure and never commit them
|
||||||
|
- **HTTPS Only**: Use HTTPS for API communications
|
||||||
|
- **Regular Updates**: Keep dependencies updated for security patches
|
||||||
|
|
||||||
|
## Use Cases
|
||||||
|
|
||||||
|
- **IT Support Tickets**: Sanitize customer tickets before using AI for solutions
|
||||||
|
- **Documentation**: Remove sensitive data from technical documentation
|
||||||
|
- **Training Data**: Prepare datasets for ML training without privacy concerns
|
||||||
|
- **Compliance**: Meet GDPR, HIPAA, and other privacy regulations
|
||||||
|
- **Knowledge Base**: Create sanitized versions of customer interactions
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions are welcome! Please feel free to submit a Pull Request.
|
||||||
|
|
||||||
|
1. Fork the repository
|
||||||
|
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
|
||||||
|
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
|
||||||
|
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
||||||
|
5. Open a Pull Request
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
This project is licensed under the MIT License - see the LICENSE file for details.
|
||||||
|
|
||||||
|
## Acknowledgments
|
||||||
|
|
||||||
|
- Built with [NiceGUI](https://nicegui.io/) for the web interface
|
||||||
|
- Powered by [uv](https://github.com/astral-sh/uv) for fast Python package management
|
||||||
|
- AI censoring via OpenAI-compatible APIs
|
||||||
|
|
||||||
|
## Support
|
||||||
|
|
||||||
|
For issues, questions, or suggestions, please open an issue on GitHub.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**⚠️ Important**: This tool is designed to help protect privacy but should not be the only measure. Always review censored output and follow your organization's data protection policies.
|
||||||
|
|||||||
Reference in New Issue
Block a user