init
This commit is contained in:
1
.python-version
Normal file
1
.python-version
Normal file
@@ -0,0 +1 @@
|
|||||||
|
3.12
|
||||||
77
CLAUDE.md
Normal file
77
CLAUDE.md
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
CensorBot is a Python application that acts as a data sanitization tool for IT service companies. It uses a small LLM (like DeepSeek) to automatically detect and censor sensitive customer information in text inputs. Users input text containing customer data, and CensorBot returns a sanitized version with all sensitive information replaced by placeholders. This censored text can then be safely used with any external LLM service (Claude, GPT-4, etc.) without risking data breaches. The application uses NiceGUI for the frontend.
|
||||||
|
|
||||||
|
## Key Architecture
|
||||||
|
|
||||||
|
### Core Components
|
||||||
|
- **Frontend**: NiceGUI-based web interface (to be implemented in `src/main.py`)
|
||||||
|
- **LLM Integration**: `src/lib/llm.py` provides async HTTP client for LLM API communication
|
||||||
|
- Supports both streaming and non-streaming responses
|
||||||
|
- Uses httpx for async HTTP requests
|
||||||
|
- Expects OpenAI-compatible chat completions API
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
- **Environment Variables** (via `.env` file):
|
||||||
|
- `BACKEND_BASE_URL`: Censoring LLM backend URL (e.g., DeepSeek API)
|
||||||
|
- `BACKEND_API_TOKEN`: API authentication token for the censoring LLM
|
||||||
|
- `BACKEND_MODEL`: Model to use for censoring (e.g., "deepseek-chat")
|
||||||
|
- **System Prompt**: Located in `src/prompt.md` - defines the censoring LLM's behavior for identifying and redacting sensitive data
|
||||||
|
|
||||||
|
## Development Commands
|
||||||
|
|
||||||
|
### Package Management (using uv)
|
||||||
|
```bash
|
||||||
|
# Install dependencies
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# Add a dependency
|
||||||
|
uv add <package>
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
uv run src/main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running the Application
|
||||||
|
```bash
|
||||||
|
# Run the NiceGUI application (once implemented)
|
||||||
|
uv run python src/main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Important Implementation Notes
|
||||||
|
|
||||||
|
1. **LLM Integration**: The `get_response` function in `src/lib/llm.py` is fully functional and expects:
|
||||||
|
- Backend configuration with `BACKEND_BASE_URL`, `BACKEND_API_TOKEN` and `BACKEND_MODEL`
|
||||||
|
- Messages in OpenAI format with roles: "system", "assistant", "user"
|
||||||
|
- Returns async generators for both streaming and non-streaming modes
|
||||||
|
- Used exclusively for the censoring functionality
|
||||||
|
|
||||||
|
2. **Security Focus**: This application handles sensitive customer data. Always:
|
||||||
|
- Ensure proper data sanitization before and after LLM processing
|
||||||
|
- Never log or expose raw customer information
|
||||||
|
- Keep API tokens secure and never commit them
|
||||||
|
|
||||||
|
3. **Frontend Development**: When implementing the NiceGUI interface in `src/main.py`:
|
||||||
|
- Provide input field for text containing sensitive data
|
||||||
|
- Display censored output that users can copy
|
||||||
|
- Use async handlers to integrate with the LLM backend
|
||||||
|
- Implement proper error handling for API failures
|
||||||
|
- Consider showing before/after comparison
|
||||||
|
- Add copy-to-clipboard functionality for the censored text
|
||||||
|
|
||||||
|
4. **System Prompt**: The `src/prompt.md` file should contain clear instructions for the censoring LLM on:
|
||||||
|
- What constitutes customer information (names, addresses, phone numbers, emails, etc.)
|
||||||
|
- How to censor/redact sensitive data (e.g., replace with placeholders like [CUSTOMER_NAME], [EMAIL], etc.)
|
||||||
|
- Maintaining context while protecting privacy
|
||||||
|
- Ensuring the output remains useful for the downstream processing LLM
|
||||||
|
|
||||||
|
5. **Usage Flow**:
|
||||||
|
- User pastes text with customer data into CensorBot
|
||||||
|
- CensorBot uses small LLM to identify and replace sensitive information
|
||||||
|
- User receives censored text with placeholders
|
||||||
|
- User can copy censored text and use it with any external LLM service
|
||||||
|
- No direct integration with external LLMs - CensorBot is a standalone sanitization tool
|
||||||
1
examples/01_customer_support.txt
Normal file
1
examples/01_customer_support.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Hello, my name is Robert Johnson and I'm having issues with my account. My email is robert.j@techcorp.com and you can reach me at 555-123-4567. I live at 123 Main Street, Springfield, IL 62701. My account number is ACC-789456123.
|
||||||
1
examples/02_medical_record.txt
Normal file
1
examples/02_medical_record.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Patient Maria Garcia, DOB: 03/15/1985, MRN: MED-445566, visited our clinic on January 10, 2024. Her insurance ID is INS-778899-X. Contact phone: (312) 555-9876. Emergency contact: Juan Garcia at 312-555-6543.
|
||||||
1
examples/03_financial_transaction.txt
Normal file
1
examples/03_financial_transaction.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Customer Sarah Williams (SSN: 123-45-6789) requested a wire transfer from account 9876543210 to routing number 021000021. Her credit card ending in 4532 was used for verification. She can be contacted at sarah.w@finance.net or 202-555-0147.
|
||||||
1
examples/04_it_support.txt
Normal file
1
examples/04_it_support.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Username: jsmith2024, IP Address: 192.168.1.105. The user John Smith from DataTech Solutions reported that his password "SecurePass123!" was compromised. His employee ID is EMP-00456 and desk phone is ext. 3421.
|
||||||
1
examples/05_ecommerce_order.txt
Normal file
1
examples/05_ecommerce_order.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Order #ORD-2024-78945 for Michael Chen. Shipping to: 456 Oak Avenue, Apt 12B, San Francisco, CA 94102. Billing address same as shipping. Phone: 415-555-3698. Email: m.chen@webmail.com. Paid with Visa ending in 8901.
|
||||||
1
examples/06_legal_document.txt
Normal file
1
examples/06_legal_document.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
This agreement is between Jennifer Anderson (Driver's License: DL-456789123) residing at 789 Elm Street, Boston, MA 02134, and Thomas Brown (Passport: P12345678) of 321 Pine Road, Cambridge, MA 02139. Contact: jennifer@lawfirm.com, thomas.brown@corporate.org.
|
||||||
1
examples/07_travel_booking.txt
Normal file
1
examples/07_travel_booking.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Passenger: David Lee, Passport Number: A12345678, Date of Birth: 07/22/1990. Flight booking confirmation: ABC123XYZ. Contact email: david.lee@travel.com, Mobile: +1-650-555-2468. Frequent flyer number: FF-998877665.
|
||||||
1
examples/08_real_estate.txt
Normal file
1
examples/08_real_estate.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
I'm Lisa Martinez interested in the property at 567 Maple Drive. My current address is 890 Cedar Lane, Austin, TX 78701. You can reach me at 512-555-7890 or lisa.martinez@realestate.net. My pre-approval amount is $450,000 from Bank of Example, account ending in 3456.
|
||||||
1
examples/09_insurance_claim.txt
Normal file
1
examples/09_insurance_claim.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Policyholder: James Wilson, Policy #: POL-123456789, SSN: 987-65-4321. Claim for accident on 12/01/2023 at intersection of 5th Avenue and Main Street. Vehicle VIN: 1HGCM82633A123456. Contact: 303-555-4321, jwilson@insurance.co.
|
||||||
1
examples/10_educational_record.txt
Normal file
1
examples/10_educational_record.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Student Emma Thompson, ID: STU-20240156, enrolled in Computer Science program. Emergency contact: Mark Thompson (father) at 617-555-8765. Home address: 234 School Street, Newton, MA 02458. Email: emma.t@university.edu, DOB: 09/30/2002.
|
||||||
12
examples/11_multiparty_email.txt
Normal file
12
examples/11_multiparty_email.txt
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
From: alice.jones@company.com
|
||||||
|
To: bob.smith@client.org, carol.white@vendor.net
|
||||||
|
CC: manager@company.com
|
||||||
|
|
||||||
|
Hi Bob and Carol,
|
||||||
|
|
||||||
|
Following up on our call with Alice Jones (555-111-2222) and Bob Smith (555-333-4444). Alice's team at TechCorp (located at 100 Tech Way, Silicon Valley, CA 94000) will handle the implementation. Bob from ClientCo at 200 Business Blvd needs access by Friday.
|
||||||
|
|
||||||
|
Carol from VendorInc (Tax ID: 12-3456789) will provide licenses. Invoice to: Accounts Payable, 300 Finance Street, New York, NY 10001, Attn: David Brown.
|
||||||
|
|
||||||
|
Best regards,
|
||||||
|
Alice
|
||||||
1
examples/12_mixed_public_private.txt
Normal file
1
examples/12_mixed_public_private.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Our company newsletter features employee of the month Jessica Taylor from the Denver office. While we can't share personal details, Jessica (Employee ID: EMP-7890) has been with us for 5 years. For HR matters, contact her at 303-555-9999 or jessica.taylor@internal.company.com. Her public bio is available on our website, but her home address (456 Private Lane, Denver, CO 80201) and SSN (111-22-3333) remain confidential.
|
||||||
6
examples/13_technical_support.txt
Normal file
6
examples/13_technical_support.txt
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
Ticket #SUP-2024-5678
|
||||||
|
User: admin@smallbusiness.com
|
||||||
|
Server IP: 10.0.0.50 (Internal), 203.0.113.45 (Public)
|
||||||
|
Database connection string: Server=db.internal;User=dbadmin;Password=Str0ng!Pass#2024;Database=CustomerDB
|
||||||
|
Error log shows user 'john.doe' (ID: USR-445566) attempted login from IP 198.51.100.22 at 14:32:05 UTC.
|
||||||
|
Customer affected: SmallBiz LLC, Account #: BIZ-789456, Primary contact: owner@smallbusiness.com, Phone: 555-BIZHELP (555-249-4357).
|
||||||
3
examples/14_international_formats.txt
Normal file
3
examples/14_international_formats.txt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
European customer Hans Müller (Passport: C01234567) from Hauptstraße 123, 10115 Berlin, Germany. Phone: +49 30 12345678, Email: hans.mueller@deutsch.de.
|
||||||
|
American partner: Jane Doe, 123 Main St, NY 10001, USA, +1-212-555-0199, jane@american.com, SSN: 555-12-9876.
|
||||||
|
Asian contact: 田中太郎 (Tanaka Taro), 〒100-0001 東京都千代田区, Japan, +81-3-1234-5678, tanaka@japanese.jp, Employee number: JP-00123.
|
||||||
5
examples/15_log_entries.txt
Normal file
5
examples/15_log_entries.txt
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
2024-01-15 10:30:45 ERROR: Authentication failed for user='mary.johnson@techfirm.com' from IP='192.168.100.55'
|
||||||
|
2024-01-15 10:30:46 INFO: Retry attempt with credentials: username='mary.johnson', password='MyP@ssw0rd123'
|
||||||
|
2024-01-15 10:30:47 SUCCESS: User authenticated. Session token: abc123xyz789, User ID: USR-998877
|
||||||
|
2024-01-15 10:30:48 INFO: Accessing customer record: John Adams, Account: 654321, Email: jadams@email.net
|
||||||
|
2024-01-15 10:30:49 DEBUG: SQL Query: SELECT * FROM users WHERE ssn='987-65-4321' AND dob='1980-05-15'
|
||||||
17
examples/16_form_submission.txt
Normal file
17
examples/16_form_submission.txt
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
REGISTRATION FORM SUBMISSION:
|
||||||
|
- First Name: Patricia
|
||||||
|
- Last Name: Robinson
|
||||||
|
- Date of Birth: 06/25/1988
|
||||||
|
- Social Security: 222-33-4444
|
||||||
|
- Email: patricia.r@emailprovider.com
|
||||||
|
- Phone: (424) 555-6789
|
||||||
|
- Address Line 1: 789 Sunset Boulevard
|
||||||
|
- Address Line 2: Suite 456
|
||||||
|
- City: Los Angeles
|
||||||
|
- State: CA
|
||||||
|
- ZIP: 90028
|
||||||
|
- Emergency Contact Name: Robert Robinson
|
||||||
|
- Emergency Contact Phone: 424-555-9876
|
||||||
|
- Insurance Provider: HealthCare Plus
|
||||||
|
- Policy Number: HCP-123456789
|
||||||
|
- Group Number: GRP-456
|
||||||
6
examples/17_chat_conversation.txt
Normal file
6
examples/17_chat_conversation.txt
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
[09:15] Mike Chen: Hey team, client ABC Corp (Tax ID: 98-7654321) needs the report
|
||||||
|
[09:16] Sarah Lin: Which contact? Is it still j.wright@abccorp.com?
|
||||||
|
[09:17] Mike Chen: No, new contact is Tom Baker, 555-0123, t.baker@abccorp.com
|
||||||
|
[09:18] Sarah Lin: Got it. Sending to their office at 500 Corporate Way, Suite 200
|
||||||
|
[09:19] Mike Chen: Include billing info: Account #ABC-2024-789, PO# 45678
|
||||||
|
[09:20] IT Support: FYI their VPN IP range is 203.0.113.0/24, firewall exceptions needed
|
||||||
1
examples/18_mixed_languages.txt
Normal file
1
examples/18_mixed_languages.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Customer François Dubois (Client ID: FR-12345) contacted us from Paris. Address: 24 Rue de la Paix, 75002 Paris, France. Téléphone: +33 1 23 45 67 89. His Spanish colleague María González (maria.gonzalez@empresa.es, +34 91 234 5678) from Calle Mayor 15, 28013 Madrid will handle the European coordination. Payment via IBAN: FR14 2004 1010 0505 0001 3M02 606.
|
||||||
4
examples/19_similar_looking_data.txt
Normal file
4
examples/19_similar_looking_data.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
Order number: 555-123-4567 (looks like phone but it's an order number)
|
||||||
|
Real phone: 555-123-4567 (this is actually a phone number)
|
||||||
|
Reference code: john.smith@2024 (not an email)
|
||||||
|
Actual email: john.smith@example.com
|
||||||
5
examples/20_partial_information.txt
Normal file
5
examples/20_partial_information.txt
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
Customer J. Smith (only initial provided)
|
||||||
|
Phone: 555-1234 (missing area code)
|
||||||
|
Email: contact@ (incomplete email)
|
||||||
|
Address: Main Street (no number or city)
|
||||||
|
Card ending in: 1234
|
||||||
1
examples/21_ambiguous_context.txt
Normal file
1
examples/21_ambiguous_context.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Smith from Accounting called about the Johnson report. Contact Building 5, Room 302, Extension 4567. Reference case #12345 for details about 123 Main Street property.
|
||||||
1
examples/22_public_vs_private.txt
Normal file
1
examples/22_public_vs_private.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
CEO John Anderson (public figure) discussed with customer John Anderson (private individual, ID: CUST-123, email: janderson@private.net). The CEO can be reached via our public line 1-800-COMPANY, while the customer's direct line is 555-987-6543.
|
||||||
3
examples/23_various_formats.txt
Normal file
3
examples/23_various_formats.txt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
Phone: 555.123.4567 vs 555-123-4567 vs (555) 123-4567 vs +1 555 123 4567
|
||||||
|
SSN: 123-45-6789 vs 123456789 vs 123 45 6789
|
||||||
|
Date: 01/15/1990 vs 15-01-1990 vs January 15, 1990 vs 1990-01-15
|
||||||
3
examples/24_data_in_urls.txt
Normal file
3
examples/24_data_in_urls.txt
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
Website: https://user:password123@example.com/path/to/resource
|
||||||
|
File path: C:\Users\john.doe\Documents\SSN-123-45-6789.pdf
|
||||||
|
API endpoint: https://api.service.com/v1/users/12345/email/john@example.com
|
||||||
4
examples/25_false_positives.txt
Normal file
4
examples/25_false_positives.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
Product codes: EMAIL-SCANNER-PRO, PHONE-HOLDER-XL
|
||||||
|
Book titles: "How to Protect Your SSN" by John Smith
|
||||||
|
Company names: Smith & Associates, 123 Solutions Inc.
|
||||||
|
Street names: Elizabeth Avenue, Johnson Boulevard
|
||||||
4
examples/26_encoded_data.txt
Normal file
4
examples/26_encoded_data.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
Base64 email: am9obi5kb2VAZXhhbXBsZS5jb20= (john.doe@example.com)
|
||||||
|
URL encoded: john%40example.com
|
||||||
|
Hex phone: 0x5551234567
|
||||||
|
MD5 of SSN: 5d41402abc4b2a76b9719d911017c592
|
||||||
4
examples/27_data_in_code.txt
Normal file
4
examples/27_data_in_code.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
Mathematical: The calculation 555-123-4567 equals -4135
|
||||||
|
Code variable: let customer_ssn = "123-45-6789";
|
||||||
|
Quote: She said "my email is jane@example.com"
|
||||||
|
JSON: {"name":"John Doe","phone":"555-0123","ssn":"111-22-3333"}
|
||||||
6
examples/28_minimal_context.txt
Normal file
6
examples/28_minimal_context.txt
Normal file
@@ -0,0 +1,6 @@
|
|||||||
|
123-45-6789
|
||||||
|
john.doe@example.com
|
||||||
|
555-123-4567
|
||||||
|
123 Main Street
|
||||||
|
Robert Johnson
|
||||||
|
CC: 4532-1111-2222-3333
|
||||||
4
examples/29_obfuscated_data.txt
Normal file
4
examples/29_obfuscated_data.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
J*** D** called from 555-xxx-4567
|
||||||
|
Email: j****@example.com
|
||||||
|
SSN: ***-**-6789
|
||||||
|
Address: 1** Main Street, Spring*****, IL
|
||||||
4
examples/30_multiple_same_name.txt
Normal file
4
examples/30_multiple_same_name.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
John Smith (Customer) - john.smith@customer.com - 555-111-2222
|
||||||
|
John Smith (Employee) - john.smith@company.com - 555-333-4444
|
||||||
|
John Smith (Vendor) - john.smith@vendor.com - 555-555-6666
|
||||||
|
Meeting with all three John Smiths scheduled for Tuesday.
|
||||||
9
pyproject.toml
Normal file
9
pyproject.toml
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
[project]
|
||||||
|
name = "censorbot"
|
||||||
|
version = "0.1.0"
|
||||||
|
description = "Add your description here"
|
||||||
|
readme = "README.md"
|
||||||
|
requires-python = ">=3.12"
|
||||||
|
dependencies = [
|
||||||
|
"nicegui>=2.23.3",
|
||||||
|
]
|
||||||
174
src/main.py
Normal file
174
src/main.py
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
CensorBot - Data Sanitization Tool
|
||||||
|
A NiceGUI-based application for removing sensitive customer information from text
|
||||||
|
"""
|
||||||
|
import asyncio
|
||||||
|
import os
|
||||||
|
import random
|
||||||
|
from typing import List
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
|
from nicegui import ui
|
||||||
|
|
||||||
|
from lib import get_response, LLMBackend, LLMMessage
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
|
||||||
|
def get_random_example_text() -> str:
|
||||||
|
examples_dir = "examples"
|
||||||
|
|
||||||
|
# Get all .txt files
|
||||||
|
txt_files = [f for f in os.listdir(examples_dir) if f.endswith('.txt')]
|
||||||
|
|
||||||
|
if not txt_files:
|
||||||
|
raise FileNotFoundError("No .txt files found in examples directory")
|
||||||
|
|
||||||
|
# Pick random file
|
||||||
|
random_file = random.choice(txt_files)
|
||||||
|
file_path = os.path.join(examples_dir, random_file)
|
||||||
|
|
||||||
|
# Read and return content
|
||||||
|
with open(file_path, 'r', encoding='utf-8') as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
|
||||||
|
async def main():
|
||||||
|
input_text: ui.textarea
|
||||||
|
output_text: ui.textarea
|
||||||
|
|
||||||
|
prompt: str
|
||||||
|
|
||||||
|
with open('src/prompt.md') as prompt_file:
|
||||||
|
prompt = prompt_file.read()
|
||||||
|
|
||||||
|
backend: LLMBackend = {'base_url': os.environ['BACKEND_BASE_URL'],
|
||||||
|
'api_token': os.environ['BACKEND_API_TOKEN'],
|
||||||
|
'model': os.environ['BACKEND_MODEL']}
|
||||||
|
|
||||||
|
async def censor_input():
|
||||||
|
messages: List[LLMMessage] = [
|
||||||
|
{'role': 'system', 'content': prompt},
|
||||||
|
{'role': 'user', 'content': input_text.value}
|
||||||
|
]
|
||||||
|
try:
|
||||||
|
# Stream the response with cancellation support
|
||||||
|
async for chunk in get_response(backend, messages, True): # type: ignore
|
||||||
|
# Check if task was cancelled
|
||||||
|
current_task = asyncio.current_task()
|
||||||
|
if current_task and current_task.cancelled():
|
||||||
|
break
|
||||||
|
|
||||||
|
if 'content' in chunk:
|
||||||
|
output_text.value += chunk['content']
|
||||||
|
print(chunk['content'])
|
||||||
|
|
||||||
|
# Small delay to allow UI updates and cancellation checks
|
||||||
|
await asyncio.sleep(0.01)
|
||||||
|
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
ui.notify('Generation stopped by user', type='info')
|
||||||
|
# Save whatever content we have so far
|
||||||
|
return
|
||||||
|
|
||||||
|
# Application header
|
||||||
|
with ui.header(elevated=True).classes('q-pa-md'):
|
||||||
|
ui.label('🔒 CensorBot').classes('text-h4 text-weight-bold')
|
||||||
|
ui.label('Secure Data Sanitization for IT Service Companies').classes('text-subtitle1 text-grey-7')
|
||||||
|
|
||||||
|
# Main container
|
||||||
|
with ui.column().classes('w-full max-w-6xl mx-auto q-pa-lg q-gutter-md'):
|
||||||
|
|
||||||
|
# Input section
|
||||||
|
with ui.card().classes('w-full'):
|
||||||
|
ui.label('Original Text').classes('text-h6 text-weight-medium')
|
||||||
|
ui.label('Contains sensitive customer information').classes('text-caption text-grey-7')
|
||||||
|
|
||||||
|
input_text = ui.textarea(
|
||||||
|
placeholder='Paste your text here...\n\n'
|
||||||
|
'Example:\n'
|
||||||
|
'Customer John Smith called from 555-1234 about issue with account john@example.com',
|
||||||
|
value=get_random_example_text()
|
||||||
|
).classes('w-full').style('font-family: monospace').props('autogrow')
|
||||||
|
|
||||||
|
# Character count
|
||||||
|
char_count_label = ui.label('0 characters').classes('text-caption text-grey-6')
|
||||||
|
|
||||||
|
# Output section
|
||||||
|
with ui.card().classes('w-full'):
|
||||||
|
ui.label('Censored Text').classes('text-h6 text-weight-medium')
|
||||||
|
ui.label('Safe to use with external LLMs').classes('text-caption text-green-7')
|
||||||
|
|
||||||
|
output_text = ui.textarea(
|
||||||
|
placeholder='Censored text will appear here...\n\n'
|
||||||
|
'Example:\n'
|
||||||
|
'Customer [CUSTOMER_NAME] called from [PHONE_NUMBER] about issue with account [EMAIL]',
|
||||||
|
value=''
|
||||||
|
).classes('w-full').style('font-family: monospace; background-color: #f5f5f5').props('readonly autogrow')
|
||||||
|
|
||||||
|
# Copy button
|
||||||
|
with ui.row().classes('w-full justify-end q-gutter-sm'):
|
||||||
|
copy_button = ui.button('Copy to Clipboard', icon='content_copy').props('outline')
|
||||||
|
copy_button.disable()
|
||||||
|
|
||||||
|
# Action buttons
|
||||||
|
with ui.card().classes('w-full'):
|
||||||
|
with ui.row().classes('w-full justify-center q-gutter-md'):
|
||||||
|
clear_button = ui.button('Clear All', icon='clear').props('outline color=negative')
|
||||||
|
process_button = ui.button('Censor Data', icon='shield', on_click=censor_input).props('color=primary size=lg')
|
||||||
|
|
||||||
|
# Statistics section
|
||||||
|
with ui.expansion('Processing Statistics', icon='analytics').classes('w-full'):
|
||||||
|
with ui.row().classes('w-full q-gutter-md'):
|
||||||
|
with ui.column().classes('col'):
|
||||||
|
ui.label('Items Censored').classes('text-weight-medium')
|
||||||
|
stats_censored = ui.label('0').classes('text-h4 text-primary')
|
||||||
|
|
||||||
|
with ui.column().classes('col'):
|
||||||
|
ui.label('Processing Time').classes('text-weight-medium')
|
||||||
|
stats_time = ui.label('0.0s').classes('text-h4 text-primary')
|
||||||
|
|
||||||
|
with ui.column().classes('col'):
|
||||||
|
ui.label('Data Reduction').classes('text-weight-medium')
|
||||||
|
stats_reduction = ui.label('0%').classes('text-h4 text-primary')
|
||||||
|
|
||||||
|
# Event handlers (mockup only - no real functionality)
|
||||||
|
def update_char_count():
|
||||||
|
char_count_label.text = f'{len(input_text.value)} characters'
|
||||||
|
|
||||||
|
def mock_copy():
|
||||||
|
ui.notify('Text copied to clipboard (mockup)', type='positive')
|
||||||
|
|
||||||
|
def clear_all():
|
||||||
|
input_text.value = ''
|
||||||
|
output_text.value = ''
|
||||||
|
copy_button.disable()
|
||||||
|
stats_censored.text = '0'
|
||||||
|
stats_time.text = '0.0s'
|
||||||
|
stats_reduction.text = '0%'
|
||||||
|
update_char_count()
|
||||||
|
|
||||||
|
# Connect event handlers
|
||||||
|
input_text.on('input', update_char_count)
|
||||||
|
copy_button.on_click(mock_copy)
|
||||||
|
clear_button.on_click(clear_all)
|
||||||
|
|
||||||
|
# Footer
|
||||||
|
with ui.footer().classes('q-pa-md text-center'):
|
||||||
|
ui.label('CensorBot - Protecting Customer Privacy').classes('text-caption text-grey-6')
|
||||||
|
ui.label('⚠️ This is a mockup - no actual processing implemented yet').classes('text-caption text-orange')
|
||||||
|
|
||||||
|
|
||||||
|
# Run the application
|
||||||
|
if __name__ in {"__main__", "__mp_main__"}:
|
||||||
|
@ui.page('/')
|
||||||
|
async def _():
|
||||||
|
await main()
|
||||||
|
|
||||||
|
ui.run(
|
||||||
|
title='CensorBot - Data Sanitization Tool',
|
||||||
|
favicon='🔒',
|
||||||
|
show=False,
|
||||||
|
dark=False,
|
||||||
|
port=8080
|
||||||
|
)
|
||||||
43
src/prompt.md
Normal file
43
src/prompt.md
Normal file
@@ -0,0 +1,43 @@
|
|||||||
|
# Data Censoring Instructions
|
||||||
|
|
||||||
|
You are a data sanitization assistant. Your sole purpose is to identify and replace sensitive customer information with appropriate placeholders while maintaining the context and meaning of the text.
|
||||||
|
|
||||||
|
## What to Censor
|
||||||
|
|
||||||
|
Replace the following types of sensitive information:
|
||||||
|
|
||||||
|
1. **Personal Names**: Replace with `[NAME]` or `[CUSTOMER_NAME]`
|
||||||
|
2. **Email Addresses**: Replace with `[EMAIL]`
|
||||||
|
3. **Phone Numbers**: Replace with `[PHONE]`
|
||||||
|
4. **Physical Addresses**: Replace with `[ADDRESS]`
|
||||||
|
5. **Social Security Numbers**: Replace with `[SSN]`
|
||||||
|
6. **Credit Card Numbers**: Replace with `[CREDIT_CARD]`
|
||||||
|
7. **Bank Account Numbers**: Replace with `[ACCOUNT_NUMBER]`
|
||||||
|
8. **Driver's License Numbers**: Replace with `[LICENSE]`
|
||||||
|
9. **Passport Numbers**: Replace with `[PASSPORT]`
|
||||||
|
10. **Medical Record Numbers**: Replace with `[MRN]`
|
||||||
|
11. **IP Addresses**: Replace with `[IP_ADDRESS]`
|
||||||
|
12. **Usernames/User IDs**: Replace with `[USERNAME]`
|
||||||
|
13. **Passwords**: Replace with `[PASSWORD]`
|
||||||
|
14. **Company Names** (when context indicates it's customer data): Replace with `[COMPANY]`
|
||||||
|
15. **Dates of Birth**: Replace with `[DOB]`
|
||||||
|
|
||||||
|
## Rules
|
||||||
|
|
||||||
|
1. **Preserve Context**: Keep all non-sensitive text exactly as provided
|
||||||
|
2. **Maintain Structure**: Preserve formatting, punctuation, and spacing
|
||||||
|
3. **Be Consistent**: Use the same placeholder for the same entity throughout the text
|
||||||
|
4. **No Commentary**: Output ONLY the censored text, no explanations or additional text
|
||||||
|
5. **When in Doubt**: If something might be sensitive, censor it
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
Input:
|
||||||
|
"John Smith from Acme Corp called at 555-1234 about his account john.smith@acme.com. His credit card ending in 4567 was declined."
|
||||||
|
|
||||||
|
Output:
|
||||||
|
"[CUSTOMER_NAME] from [COMPANY] called at [PHONE] about his account [EMAIL]. His credit card ending in [CREDIT_CARD] was declined."
|
||||||
|
|
||||||
|
## Your Task
|
||||||
|
|
||||||
|
Censor the following text by replacing all sensitive information with appropriate placeholders. Output only the censored version:
|
||||||
Reference in New Issue
Block a user