A comprehensive AI-powered solution for automating finance, audit, and accounting tasks in CPA firm environments with enterprise-grade security controls.
- Invoice Processing: Automatically extract and validate data from invoices using NLP and pattern recognition
- Expense Categorization: ML-powered expense classification with GL account suggestions
- Audit Trail Automation: Comprehensive audit logging with anomaly detection
- Financial Reconciliation: Fuzzy matching for bank and book transaction reconciliation
- Fraud Detection: Multi-layered anomaly detection including Benford's Law analysis
- Data Encryption: AES-256 encryption for sensitive financial data
- Access Control: Role-based permissions with JWT authentication
- Audit Logging: Comprehensive activity tracking for compliance
- Secure API: Rate-limited REST API with HTTPS support
- Input Sanitization: Protection against injection attacks
- Data Retention: Configurable retention policies for regulatory compliance
- Installation
- Configuration
- Usage
- API Documentation
- Security Best Practices
- Architecture
- Testing
- Contributing
- Python 3.8 or higher
- pip package manager
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/HHR-CPA/vigilant-octo-engine.git
cd vigilant-octo-engine- Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Configure environment variables
cp .env.example .env
# Edit .env with your configuration- Initialize secure storage directories
mkdir -p logs models secure_data
chmod 700 secure_data # Restrict access on Unix systemsCopy .env.example to .env and configure the following:
# Security Configuration
SECRET_KEY=your-secret-key-here-change-in-production
ENCRYPTION_KEY=your-encryption-key-here-change-in-production
# Database Configuration
DATABASE_URL=sqlite:///./cpa_finance.db
# Audit Logging
AUDIT_LOG_PATH=./logs/audit.log
AUDIT_LOG_RETENTION_DAYS=365
# API Configuration
API_HOST=0.0.0.0
API_PORT=8000
API_RATE_LIMIT=100/minute- Generate secure random keys for production
- Never commit
.envto version control - Use PostgreSQL for production environments
- Enable HTTPS with valid SSL certificates
python src/api.pyThe API will be available at http://localhost:8000
from src.invoice_processing import InvoiceProcessor
processor = InvoiceProcessor()
invoice_text = """
ACME Corp
Invoice #INV-2024-001
Date: 01/15/2024
Total: $1,250.00
"""
invoice = processor.extract_invoice_data(invoice_text)
is_valid, errors = processor.validate_invoice(invoice)
category = processor.categorize_expense(invoice)
print(f"Category: {category}")
print(f"Valid: {is_valid}")from src.expense_categorization import ExpenseCategorizer
categorizer = ExpenseCategorizer()
category, confidence = categorizer.categorize(
description="Microsoft Office 365 Subscription",
vendor="Microsoft",
amount=150.00
)
gl_account = categorizer.suggest_gl_account(category)
print(f"Category: {category} (Confidence: {confidence:.2%})")
print(f"GL Account: {gl_account}")from src.anomaly_detection import AnomalyDetector
import pandas as pd
detector = AnomalyDetector()
transactions = pd.DataFrame({
'amount': [100, 150, 120, 5000, 110],
'vendor': ['A', 'B', 'A', 'C', 'B'],
'date': ['2024-01-15', '2024-01-16', '2024-01-17', '2024-01-18', '2024-01-19']
})
results = detector.detect_transaction_anomalies(transactions)
anomalies = results[results['is_anomaly']]
print(f"Detected {len(anomalies)} anomalies")from src.security import EncryptionManager, AccessControl
# Encryption
encryption = EncryptionManager()
sensitive_data = {"account": "123456", "balance": 50000}
encrypted = encryption.encrypt_dict(sensitive_data)
decrypted = encryption.decrypt_dict(encrypted)
# Authentication
access_control = AccessControl()
token = access_control.create_access_token({"user": "john", "role": "accountant"})
user_data = access_control.verify_token(token)All API endpoints (except /api/health) require authentication using JWT tokens.
Login
curl -X POST http://localhost:8000/api/auth/login \
-H "Content-Type: application/json" \
-d '{"username": "demo", "password": "Demo123!"}'Use Token
curl -X POST http://localhost:8000/api/invoice/process \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"invoice_text": "..."}'| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/api/auth/login |
POST | Authenticate and get token | No |
/api/invoice/process |
POST | Process invoice | Yes |
/api/expense/categorize |
POST | Categorize expense | Yes |
/api/audit/detect-anomalies |
POST | Detect anomalies | Yes (Auditor) |
/api/audit/generate-report |
POST | Generate audit report | Yes (Auditor) |
/api/reconcile/transactions |
POST | Reconcile transactions | Yes |
/api/health |
GET | Health check | No |
Full API documentation available at http://localhost:8000/docs when server is running.
-
Environment Security
- Use strong, randomly generated keys
- Store secrets in secure vault (e.g., AWS Secrets Manager, HashiCorp Vault)
- Enable HTTPS with valid SSL certificates
- Use PostgreSQL instead of SQLite
-
Access Control
- Implement multi-factor authentication (MFA)
- Use role-based access control (RBAC)
- Regularly rotate API keys and tokens
- Monitor failed authentication attempts
-
Data Protection
- Encrypt data at rest and in transit
- Implement data retention policies
- Regular security audits
- Secure file upload validation
-
Network Security
- Use firewall rules to restrict access
- Implement rate limiting
- Enable CORS only for trusted domains
- Use VPN or private network for sensitive operations
-
Audit & Compliance
- Enable comprehensive audit logging
- Regular review of audit logs
- Maintain logs for required retention period
- Implement automated alerting for suspicious activities
- cryptography: Industry-standard encryption library
- python-jose: JWT implementation
- passlib: Password hashing with bcrypt
- scikit-learn: Machine learning algorithms
- pandas: Data manipulation and analysis
- numpy: Numerical computing
- FastAPI: Modern, fast web framework
- uvicorn: ASGI server
- slowapi: Rate limiting
- python-json-logger: Structured logging
- SQLAlchemy: Database ORM with security features
- pydantic: Data validation using Python type annotations
- cerberus: Lightweight data validation
vigilant-octo-engine/
βββ src/
β βββ __init__.py
β βββ api.py # REST API with security
β βββ security.py # Encryption, access control
β βββ audit_logging.py # Audit trail management
β βββ invoice_processing.py # AI invoice automation
β βββ expense_categorization.py # ML expense categorization
β βββ anomaly_detection.py # Fraud detection
βββ tests/
β βββ test_security.py
β βββ test_invoice_processing.py
β βββ test_expense_categorization.py
β βββ test_anomaly_detection.py
βββ logs/ # Audit logs
βββ models/ # AI models cache
βββ secure_data/ # Encrypted data storage
βββ requirements.txt # Dependencies
βββ .env.example # Configuration template
βββ README.md
An accompanying frontend lives in frontend/ for interactive use of the secured API.
cd frontend
npm install
npm run devServed at http://localhost:5173 with proxying of /api/* to the backend (configured in vite.config.ts).
frontend/package.jsonβ scripts (dev,build,test) and dependencies.frontend/src/types.tsβ Typed interfaces mirroring backend responses.frontend/src/apiClient.tsβ Thin fetch wrapper; always sendsContent-Type: application/jsonand attaches JWT viaAuthorizationheader.frontend/src/AuthContext.tsxβ Inβmemory auth state (token not persisted to localStorage for security).frontend/src/ProtectedRoute.tsxβ Redirects unauthenticated users to/login.- Pages:
Invoice,Expense,Anomaly,Audit,Dashboard,Loginunderfrontend/src/pages/. - Components: Reusable UI in
frontend/src/components/(Navbar,LoadingSpinner,ErrorBoundary). - Hooks:
frontend/src/hooks/(useAuth,useApi,usePolling) abstract auth & polling logic. - Services: Thin domain wrappers in
frontend/src/services/(e.g.invoiceService.ts). - Utils: Formatting helpers in
frontend/src/utils/(formatCurrency,parseDate). - Configuration: ESLint (
.eslintrc.cjs), Prettier (.prettierrc), EditorConfig (.editorconfig) and env files (.env.development,.env.production).
- Tokens are kept only in React state (avoid XSS/localStorage persistence).
- CORS updated to allow
http://localhost:5173for development only (seesrc/api.py). - Do not add arbitrary originsβreview before deployment.
Vitest + Testing Library for component and client tests:
npm run testExample tests in frontend/src/__tests__/ validate API client request structure and protected routing.
Additional test environment uses jsdom (configured in vite.config.ts).
npm run buildOutputs production assets to frontend/dist/ (serve behind HTTPS; ensure secure headers).
npm run lint # ESLint (zero warnings policy for CI)
npm run type-check # TypeScript compile check without emit
npm run format # Prettier format all changed filesFrontend uses Vite prefixed vars:
VITE_API_BASE_URL=http://localhost:8000/api # dev
VITE_API_BASE_URL=/api # production (reverse proxy)
VITE_APP_ENV=development|production
Never expose secretsβonly non-sensitive config belongs in Vite prefixed variables.
- Enable HTTPS & HSTS at reverse proxy layer.
- Add Content Security Policy (CSP) disallowing inline scripts; move any inline styles to CSS.
- Use Subresource Integrity (SRI) for thirdβparty scripts (if any).
- Prefer ephemeral memory token storage (already implemented) and short JWT lifetimes with silent refresh.
- Implement backend rate limiting (already via
slowapi) and enforce perβorigin CORS.
If embedding in SharePoint, wrap built assets in SPFx web part or host as Teams tab:
- Acquire Azure AD token via MSAL and pass through to backend.
- Use Graph for user profile enrichment while keeping financial data strictly backend-bound.
- Store only minimal invoice metadata in SharePoint Lists; keep sensitive payloads encrypted server-side.
frontend/
βββ .eslintrc.cjs
βββ .prettierrc
βββ .editorconfig
βββ .env.development
βββ .env.production
βββ package.json
βββ vite.config.ts
βββ src/
β βββ apiClient.ts
β βββ AuthContext.tsx
β βββ components/
β β βββ Navbar.tsx
β β βββ LoadingSpinner.tsx
β β βββ ErrorBoundary.tsx
β βββ hooks/
β β βββ useAuth.ts
β β βββ useApi.ts
β β βββ usePolling.ts
β βββ services/
β β βββ invoiceService.ts
β β βββ expenseService.ts
β β βββ anomalyService.ts
β β βββ reconciliationService.ts
β β βββ auditService.ts
β βββ utils/
β β βββ formatCurrency.ts
β β βββ parseDate.ts
β βββ pages/
β β βββ Invoice.tsx
β β βββ Expense.tsx
β β βββ Anomaly.tsx
β β βββ Audit.tsx
β β βββ Dashboard.tsx
β β βββ Login.tsx
β βββ __tests__/
β β βββ apiClient.test.ts
β β βββ ProtectedRoute.test.tsx
β β βββ services.test.ts
β βββ types.ts
β βββ ProtectedRoute.tsx
β βββ setupTests.ts
β βββ App.tsx
β βββ main.tsx
βββ index.html
- Add new API endpoint: implement backend route, then create typed wrapper in
apiClient.tsand interface intypes.ts. - Keep mappings 1:1 with backend response fields; prefer explicit interfaces over
any.
Run all tests:
pytest tests/ -vRun specific test file:
pytest tests/test_security.py -vRun with coverage:
pytest tests/ --cov=src --cov-report=html- Extract data from PDF/image invoices
- Validate invoice information
- Categorize expenses automatically
- Suggest GL accounts for posting
- Categorize expenses using AI
- Identify tax-deductible expenses
- Detect policy violations
- Generate spending reports
- Detect duplicate transactions
- Identify unusual patterns
- Benford's Law analysis for fraud detection
- Comprehensive audit trail
- Match bank transactions with books
- Identify discrepancies
- Automated reconciliation suggestions
- Exception reporting
We welcome contributions! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
This software is provided as-is for CPA firms to automate financial processes. Users are responsible for:
- Ensuring compliance with applicable regulations
- Implementing appropriate security measures
- Regular security audits and updates
- Data backup and disaster recovery
- Consulting with legal and compliance teams
For questions, issues, or feature requests:
- Open an issue on GitHub
- Contact: support@cpafirm.com
- Documentation: GitHub Wiki
- Integration with QuickBooks/Xero
- OCR for invoice scanning
- Advanced ML models (deep learning)
- Mobile app for expense submission
- Real-time anomaly alerts
- Multi-currency support
- Blockchain audit trail
- Automated tax form generation
Built with β€οΈ for CPA firms seeking to automate and secure their financial operations.