System Architecture
Overview
TAXIA is a modular Graph-RAG system designed for Korean tax law question-answering with mandatory legal citations.
┌─────────────────────────────────────────────────────────────┐
│ TAXIA Architecture │
└─────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ CLI Tool │ │ REST API │ │ Python SDK │
│ │ │ (FastAPI) │ │ │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────────┴────────────────────┘
│
┌───────▼──────┐
│ TaxiaEngine │
│ (Core) │
└───────┬──────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌──────▼──────┐ ┌────────▼────────┐ ┌──────▼──────┐
│ Qdrant │ │ Neo4j │ │ LLM (API) │
│ Vector │ │ Graph-RAG │ │ Claude/ │
│ Search │ │ Relationships │ │ GPT-4 │
└─────────────┘ └─────────────────┘ └─────────────┘
Core Components
1. TaxiaEngine
The main orchestration layer that coordinates: - Query processing - Document retrieval (vector + graph) - LLM generation - Citation validation - Audit trail management
2. Vector Search (Qdrant)
High-performance vector similarity search: - Embedding generation (OpenAI ada-002) - Fast nearest-neighbor search - Collection management - Metadata filtering
3. Graph-RAG (Neo4j)
Graph database for legal document relationships: - Law → Enforcement Decree relationships - Enforcement Decree → Enforcement Rules - Cross-references between articles - Temporal version tracking
4. LLM Integration
Support for multiple LLM providers: - Anthropic Claude (Recommended) - claude-3-5-sonnet-20241022 - Best for Korean legal text - OpenAI GPT - gpt-4-turbo-preview - Alternative option
Data Flow
Query Processing
1. User Query
└─> TaxiaEngine.answer()
2. Vector Search
└─> Qdrant retrieval (top_k documents)
3. Graph Enhancement (optional)
└─> Neo4j related documents
4. Context Assembly
└─> Combine retrieved documents
5. LLM Generation
└─> Claude/GPT with context
└─> Mandatory citation requirement
6. Citation Validation
└─> Verify ≥2 legal sources
7. Response
└─> Answer + Citations + Trace ID
Document Indexing
1. Load Tax Law JSON
└─> Parse law/decree/rules
2. Generate Embeddings
└─> OpenAI ada-002
3. Store in Qdrant
└─> Vector + Metadata
4. Build Graph (optional)
└─> Create Neo4j relationships
└─> Law → Decree → Rules
└─> Cross-references
Module Structure
src/taxia/
├── __init__.py # Public API
├── engine.py # TaxiaEngine core
├── types.py # Type definitions
├── config.py # Configuration
│
├── retrieval/
│ ├── vector.py # Qdrant integration
│ ├── graph.py # Neo4j integration
│ └── embeddings.py # Embedding generation
│
├── llm/
│ ├── anthropic.py # Claude integration
│ ├── openai.py # OpenAI integration
│ └── prompts.py # Prompt templates
│
├── indexing/
│ ├── loader.py # Document loading
│ ├── parser.py # JSON parsing
│ └── indexer.py # Indexing orchestration
│
├── api/
│ ├── server.py # FastAPI server
│ └── routes.py # API endpoints
│
└── cli/
├── main.py # CLI entry point
└── commands.py # CLI commands
Design Principles
1. Citations Required
Every answer MUST include ≥2 legal citations: - Law articles - Enforcement decrees - Enforcement rules - Interpretations
2. Audit Trail
Complete traceability: - Unique trace ID per query - Full context logging - Citation provenance - Timestamp tracking
3. Modular Architecture
Pluggable components: - LLM provider swap (Claude ↔ GPT) - Vector DB swap (Qdrant → alternatives) - Graph DB optional (Neo4j)
4. Performance
Optimized for production: - Async IO for API calls - Batch indexing support - Connection pooling - Caching strategies
Deployment Options
Development
# Local with demo data
pip install taxia-core
python -c "from taxia import TaxiaEngine; engine = TaxiaEngine()"
Production
# With full infrastructure
docker-compose up -d # Qdrant + Neo4j
taxia index ./koreantaxlaw
taxia server --host 0.0.0.0 --port 8000
Cloud Deployment
Compatible with: - AWS (ECS, Lambda) - GCP (Cloud Run, GKE) - Azure (Container Instances, AKS)
Configuration
See Configuration Guide for details.
API Reference
Performance Considerations
Vector Search
- Collection size: up to 1M documents
- Query latency: < 50ms (p95)
- Embedding batch size: 100 docs
Graph-RAG
- Node count: ~10K (tax laws)
- Relationship count: ~50K
- Cypher query latency: < 100ms
LLM Generation
- Claude Sonnet: ~2-3s per query
- GPT-4 Turbo: ~3-4s per query
- Token usage: ~2-3K tokens/query
Security
- API key management via environment variables
- Rate limiting on API endpoints
- Input validation and sanitization
- Audit logging for compliance