Configuration Guide
Overview
TAXIA provides flexible configuration options through environment variables, configuration files, or Python code.
Environment Variables
LLM Configuration
# Anthropic Claude (Recommended)
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_MODEL="claude-3-5-sonnet-20241022"
# OpenAI GPT
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4-turbo-preview"
Vector Search Configuration
# Qdrant settings
export QDRANT_HOST="localhost"
export QDRANT_PORT=6333
export QDRANT_COLLECTION="taxia_documents"
Graph-RAG Configuration
# Neo4j settings
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"
Python Configuration
Basic Configuration
from taxia import TaxiaEngine, TaxiaConfig
config = TaxiaConfig(
llm_provider="anthropic",
llm_model="claude-3-5-sonnet-20241022",
qdrant_host="localhost",
qdrant_port=6333,
enable_graph_rag=True,
)
engine = TaxiaEngine(config=config)
Advanced Configuration
config = TaxiaConfig(
# LLM settings
llm_provider="anthropic",
llm_model="claude-3-5-sonnet-20241022",
llm_temperature=0.1,
llm_max_tokens=4000,
# Vector search
qdrant_host="localhost",
qdrant_port=6333,
qdrant_collection="taxia_docs",
top_k=5,
# Graph-RAG
enable_graph_rag=True,
neo4j_uri="bolt://localhost:7687",
neo4j_user="neo4j",
neo4j_password="password",
# Logging
log_level="INFO",
enable_audit_trail=True,
)
Configuration File
Create a taxia.yaml configuration file:
llm:
provider: anthropic
model: claude-3-5-sonnet-20241022
temperature: 0.1
max_tokens: 4000
vector_search:
host: localhost
port: 6333
collection: taxia_documents
top_k: 5
graph_rag:
enabled: true
uri: bolt://localhost:7687
user: neo4j
password: password
logging:
level: INFO
audit_trail: true
Load configuration:
from taxia import TaxiaEngine
engine = TaxiaEngine.from_config_file("taxia.yaml")
Configuration Options Reference
LLM Options
| Option | Type | Default | Description |
|---|---|---|---|
llm_provider |
str | "anthropic" | LLM provider: "anthropic" or "openai" |
llm_model |
str | "claude-3-5-sonnet-20241022" | Model name |
llm_temperature |
float | 0.1 | Temperature for generation |
llm_max_tokens |
int | 4000 | Maximum tokens in response |
Vector Search Options
| Option | Type | Default | Description |
|---|---|---|---|
qdrant_host |
str | "localhost" | Qdrant server host |
qdrant_port |
int | 6333 | Qdrant server port |
qdrant_collection |
str | "taxia_documents" | Collection name |
top_k |
int | 5 | Number of documents to retrieve |
Graph-RAG Options
| Option | Type | Default | Description |
|---|---|---|---|
enable_graph_rag |
bool | False | Enable Neo4j graph search |
neo4j_uri |
str | "bolt://localhost:7687" | Neo4j URI |
neo4j_user |
str | "neo4j" | Neo4j username |
neo4j_password |
str | None | Neo4j password |
Logging Options
| Option | Type | Default | Description |
|---|---|---|---|
log_level |
str | "INFO" | Logging level |
enable_audit_trail |
bool | True | Enable audit trail logging |
💰 Cost & Performance Information
Operational Costs
Understanding the cost structure helps you budget and optimize your TAXIA deployment.
LLM API Costs
Anthropic Claude (Recommended)
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical Query Cost |
|---|---|---|---|
| Claude 3.5 Sonnet | $3.00 | $15.00 | $0.02 - $0.05 |
| Claude 3 Haiku | $0.25 | $1.25 | $0.002 - $0.01 |
| Claude 3 Opus | $15.00 | $75.00 | $0.10 - $0.30 |
OpenAI GPT
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Typical Query Cost |
|---|---|---|---|
| GPT-4 Turbo | $10.00 | $30.00 | $0.05 - $0.15 |
| GPT-4 | $30.00 | $60.00 | $0.15 - $0.45 |
| GPT-3.5 Turbo | $0.50 | $1.50 | $0.005 - $0.02 |
Cost Breakdown for Typical Query:
Average Query = 2,000 input tokens + 500 output tokens
Claude 3.5 Sonnet:
- Input: (2,000 / 1,000,000) × $3.00 = $0.006
- Output: (500 / 1,000,000) × $15.00 = $0.0075
- Total: ~$0.014 per query
Daily estimates (1,000 queries):
- $14/day = $420/month
Daily estimates (100 queries):
- $1.40/day = $42/month
Infrastructure Costs
Self-Hosted (Recommended for Production)
| Component | Hardware | Monthly Cost |
|---|---|---|
| Qdrant | 4GB RAM, 50GB SSD | $0 (self-hosted) or $20-50 (cloud) |
| Neo4j (optional) | 4GB RAM, 50GB SSD | $0 (self-hosted) or $30-70 (cloud) |
| Application Server | 2GB RAM, 20GB SSD | $5-15 (cloud VPS) |
| Total Infrastructure | - | $0-135/month |
Cloud-Hosted Options
- AWS: ~$50-150/month (t3.medium + storage)
- Google Cloud: ~$45-130/month (e2-medium + storage)
- Azure: ~$55-140/month (B2s + storage)
- DigitalOcean: ~$30-80/month (droplets)
Total Cost Examples
Scenario 1: Small Business (100 queries/day)
LLM API (Claude 3.5 Sonnet): $42/month
Infrastructure (self-hosted): $0/month
Total: ~$42/month ($0.014 per query)
Scenario 2: Medium Business (1,000 queries/day)
LLM API (Claude 3.5 Sonnet): $420/month
Infrastructure (cloud): $50/month
Total: ~$470/month ($0.016 per query)
Scenario 3: Enterprise (10,000 queries/day)
LLM API (Claude 3.5 Sonnet): $4,200/month
Infrastructure (dedicated): $200/month
Total: ~$4,400/month ($0.015 per query)
Scenario 4: Education/Demo (Demo Mode)
LLM API: $0/month (no API calls in demo mode)
Infrastructure: $0/month (no Qdrant/Neo4j needed)
Total: $0/month
Cost Optimization Strategies
1. Use Caching
Save 60-80% on repeat questions:
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_answer(question: str):
result = engine.answer(question)
return result
# Common questions are cached
result = cached_answer("법인세율은?") # API call
result = cached_answer("법인세율은?") # Cached, no API call
Estimated savings: $300/month → $90/month (70% reduction)
2. Use Cheaper Models for Simple Questions
from taxia import TaxiaEngine, TaxiaConfig
# Use Haiku for simple questions
simple_config = TaxiaConfig(llm_model="claude-3-haiku-20240307")
simple_engine = TaxiaEngine(config=simple_config)
# Use Sonnet for complex questions
complex_config = TaxiaConfig(llm_model="claude-3-5-sonnet-20241022")
complex_engine = TaxiaEngine(config=complex_config)
def smart_answer(question):
# Route based on complexity
if len(question) < 50: # Simple question
return simple_engine.answer(question)
else: # Complex question
return complex_engine.answer(question)
Estimated savings: $420/month → $180/month (57% reduction)
3. Reduce max_tokens
# Default: 4000 tokens
config = TaxiaConfig(llm_max_tokens=2000) # Reduce by 50%
engine = TaxiaEngine(config=config)
Estimated savings: $420/month → $300/month (28% reduction)
4. Use Demo Mode for Testing
# Free for development/testing
test_engine = TaxiaEngine(demo_mode=True)
# Switch to production only when ready
prod_engine = TaxiaEngine(demo_mode=False)
Estimated savings during development: Unlimited free testing
Performance Metrics
Understanding performance helps you optimize and set expectations.
Response Time Breakdown
Typical Query (Claude 3.5 Sonnet + Qdrant):
Component Breakdown:
├── Vector Search (Qdrant): 50-200ms
├── LLM Processing (Anthropic): 1,500-3,000ms
├── Citation Extraction: 50-100ms
└── Response Formatting: 10-30ms
─────────────────────────────────────
Total: 1.6-3.3 seconds
With Graph-RAG (Neo4j enabled):
Component Breakdown:
├── Vector Search (Qdrant): 50-200ms
├── Graph Query (Neo4j): 100-300ms
├── LLM Processing (Anthropic): 1,500-3,000ms
├── Citation Extraction: 50-100ms
└── Response Formatting: 10-30ms
─────────────────────────────────────
Total: 1.7-3.6 seconds
Performance by Model
| Model | Avg Response Time | Throughput (queries/min) | Cost per Query |
|---|---|---|---|
| Claude 3.5 Sonnet | 2.5s | 24 | $0.014 |
| Claude 3 Haiku | 1.2s | 50 | $0.003 |
| GPT-4 Turbo | 3.0s | 20 | $0.08 |
| GPT-3.5 Turbo | 1.5s | 40 | $0.008 |
Performance Optimization
1. Optimize Qdrant:
# Use HNSW index for faster search
config = TaxiaConfig(
qdrant_hnsw_ef=128, # Higher = more accurate but slower
qdrant_hnsw_m=16 # Higher = more memory but faster
)
Expected improvement: 200ms → 100ms vector search
2. Reduce top_k:
# Retrieve fewer documents
config = TaxiaConfig(top_k=3) # Instead of default 5
Expected improvement: 2.5s → 2.2s total time
3. Use SSD for Qdrant:
Store Qdrant data on SSD instead of HDD.
Expected improvement: 200ms → 50ms vector search
4. Parallel Processing:
import asyncio
async def batch_answer(questions):
tasks = [engine.answer_async(q) for q in questions]
results = await asyncio.gather(*tasks)
return results
# Process 10 questions in parallel
questions = ["Q1", "Q2", ..., "Q10"]
results = await batch_answer(questions)
Expected improvement: 10 × 2.5s = 25s → 5s (5x faster)
Scalability Benchmarks
Single Instance Performance:
| Concurrent Users | Requests/sec | Avg Response Time | Success Rate |
|---|---|---|---|
| 10 | 4 req/s | 2.5s | 100% |
| 50 | 18 req/s | 2.8s | 100% |
| 100 | 32 req/s | 3.1s | 99.5% |
| 500 | 45 req/s | 11.2s | 95% (queuing) |
Recommendations: - <100 users: Single instance sufficient - 100-500 users: 2-3 instances with load balancing - 500+ users: Auto-scaling group (3-10 instances)
Monitoring & Cost Tracking
Track API Usage
from taxia import TaxiaEngine
import logging
logger = logging.getLogger("taxia.costs")
class CostTracker:
def __init__(self):
self.total_queries = 0
self.total_tokens = 0
self.total_cost = 0.0
def log_query(self, result):
self.total_queries += 1
# Estimate tokens (actual may vary)
input_tokens = result.metadata.get("input_tokens", 2000)
output_tokens = result.metadata.get("output_tokens", 500)
# Calculate cost (Claude 3.5 Sonnet)
cost = (input_tokens / 1_000_000 * 3.0) + \
(output_tokens / 1_000_000 * 15.0)
self.total_tokens += (input_tokens + output_tokens)
self.total_cost += cost
logger.info(f"Query #{self.total_queries}: {cost:.4f} USD")
def report(self):
return {
"total_queries": self.total_queries,
"total_tokens": self.total_tokens,
"total_cost": f"${self.total_cost:.2f}",
"avg_cost_per_query": f"${self.total_cost/self.total_queries:.4f}"
}
# Usage
tracker = CostTracker()
engine = TaxiaEngine()
result = engine.answer("법인세율은?")
tracker.log_query(result)
# Daily report
print(tracker.report())
# Output: {'total_queries': 145, 'total_cost': '$2.03', 'avg_cost_per_query': '$0.0140'}
Set Budget Alerts
class BudgetAlert:
def __init__(self, daily_budget=50.0):
self.daily_budget = daily_budget
self.daily_cost = 0.0
def check_budget(self, query_cost):
self.daily_cost += query_cost
if self.daily_cost >= self.daily_budget:
raise BudgetExceededError(
f"Daily budget ${self.daily_budget} exceeded! "
f"Current: ${self.daily_cost:.2f}"
)
if self.daily_cost >= self.daily_budget * 0.8:
logger.warning(f"80% of daily budget used: ${self.daily_cost:.2f}")
# Usage
budget = BudgetAlert(daily_budget=50.0)
budget.check_budget(0.014) # Log each query cost
ROI Calculator
Calculate your return on investment:
Example: Customer Support Use Case
Before TAXIA:
- Support team: 3 people × $3,000/month = $9,000/month
- Handle: 50 queries/day = 1,500/month
After TAXIA:
- Support team: 1 person × $3,000/month = $3,000/month
- TAXIA costs: $470/month
- Handle: 400 queries/day = 12,000/month
Savings: $9,000 - $3,000 - $470 = $5,530/month
ROI: ($5,530 / $470) × 100% = 1,177% ROI
Payback period: < 1 month
Your ROI:
- Calculate current support costs
- Estimate queries TAXIA can handle (typically 70-90%)
- Calculate TAXIA costs (LLM + infrastructure)
- Savings = Current Costs - Reduced Costs - TAXIA Costs
- ROI = (Savings / TAXIA Costs) × 100%
Cost Comparison: Build vs Buy
Option 1: Build Custom RAG (DIY)
Development: 3 engineers × 3 months = $45,000
Maintenance: 1 engineer × 20% = $12,000/year
Infrastructure: $100/month = $1,200/year
Total Year 1: $58,200
Option 2: Use TAXIA
Development: 1 engineer × 1 week = $2,500
Monthly costs: $470/month = $5,640/year
Total Year 1: $8,140
Savings: $50,060 in first year (86% cost reduction)
📊 Cost & Performance Summary
Quick Reference
| Scenario | Monthly Cost | Response Time | Best For |
|---|---|---|---|
| Demo Mode | $0 | N/A | Testing, POC |
| Small (100 q/day) | $42 | 2.5s | Startups, small teams |
| Medium (1K q/day) | $470 | 2.5s | Growing businesses |
| Large (10K q/day) | $4,400 | 2.5s | Enterprises |
Optimization Checklist
- [ ] Enable caching for common questions (70% cost reduction)
- [ ] Use Haiku for simple queries (60% cost reduction)
- [ ] Reduce max_tokens if possible (30% cost reduction)
- [ ] Use SSD for Qdrant (2x faster)
- [ ] Monitor and set budget alerts
- [ ] Track ROI monthly
Need help optimizing? See our Performance Guide