Configuration Guide

Overview

TAXIA provides flexible configuration options through environment variables, configuration files, or Python code.

Environment Variables

LLM Configuration

# Anthropic Claude (Recommended)
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_MODEL="claude-3-5-sonnet-20241022"

# OpenAI GPT
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4-turbo-preview"

Vector Search Configuration

# Qdrant settings
export QDRANT_HOST="localhost"
export QDRANT_PORT=6333
export QDRANT_COLLECTION="taxia_documents"

Graph-RAG Configuration

# Neo4j settings
export NEO4J_URI="bolt://localhost:7687"
export NEO4J_USER="neo4j"
export NEO4J_PASSWORD="your-password"

Python Configuration

Basic Configuration

from taxia import TaxiaEngine, TaxiaConfig

config = TaxiaConfig(
    llm_provider="anthropic",
    llm_model="claude-3-5-sonnet-20241022",
    qdrant_host="localhost",
    qdrant_port=6333,
    enable_graph_rag=True,
)

engine = TaxiaEngine(config=config)

Advanced Configuration

config = TaxiaConfig(
    # LLM settings
    llm_provider="anthropic",
    llm_model="claude-3-5-sonnet-20241022",
    llm_temperature=0.1,
    llm_max_tokens=4000,

    # Vector search
    qdrant_host="localhost",
    qdrant_port=6333,
    qdrant_collection="taxia_docs",
    top_k=5,

    # Graph-RAG
    enable_graph_rag=True,
    neo4j_uri="bolt://localhost:7687",
    neo4j_user="neo4j",
    neo4j_password="password",

    # Logging
    log_level="INFO",
    enable_audit_trail=True,
)

Configuration File

Create a taxia.yaml configuration file:

llm:
  provider: anthropic
  model: claude-3-5-sonnet-20241022
  temperature: 0.1
  max_tokens: 4000

vector_search:
  host: localhost
  port: 6333
  collection: taxia_documents
  top_k: 5

graph_rag:
  enabled: true
  uri: bolt://localhost:7687
  user: neo4j
  password: password

logging:
  level: INFO
  audit_trail: true

Load configuration:

from taxia import TaxiaEngine

engine = TaxiaEngine.from_config_file("taxia.yaml")

Configuration Options Reference

LLM Options

Option	Type	Default	Description
`llm_provider`	str	"anthropic"	LLM provider: "anthropic" or "openai"
`llm_model`	str	"claude-3-5-sonnet-20241022"	Model name
`llm_temperature`	float	0.1	Temperature for generation
`llm_max_tokens`	int	4000	Maximum tokens in response

Vector Search Options

Option	Type	Default	Description
`qdrant_host`	str	"localhost"	Qdrant server host
`qdrant_port`	int	6333	Qdrant server port
`qdrant_collection`	str	"taxia_documents"	Collection name
`top_k`	int	5	Number of documents to retrieve

Graph-RAG Options

Option	Type	Default	Description
`enable_graph_rag`	bool	False	Enable Neo4j graph search
`neo4j_uri`	str	"bolt://localhost:7687"	Neo4j URI
`neo4j_user`	str	"neo4j"	Neo4j username
`neo4j_password`	str	None	Neo4j password

Logging Options

Option	Type	Default	Description
`log_level`	str	"INFO"	Logging level
`enable_audit_trail`	bool	True	Enable audit trail logging

💰 Cost & Performance Information

Operational Costs

Understanding the cost structure helps you budget and optimize your TAXIA deployment.

LLM API Costs

Anthropic Claude (Recommended)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Typical Query Cost
Claude 3.5 Sonnet	$3.00	$15.00	$0.02 - $0.05
Claude 3 Haiku	$0.25	$1.25	$0.002 - $0.01
Claude 3 Opus	$15.00	$75.00	$0.10 - $0.30

OpenAI GPT

Model	Input (per 1M tokens)	Output (per 1M tokens)	Typical Query Cost
GPT-4 Turbo	$10.00	$30.00	$0.05 - $0.15
GPT-4	$30.00	$60.00	$0.15 - $0.45
GPT-3.5 Turbo	$0.50	$1.50	$0.005 - $0.02

Cost Breakdown for Typical Query:

Average Query = 2,000 input tokens + 500 output tokens

Claude 3.5 Sonnet:
- Input: (2,000 / 1,000,000) × $3.00 = $0.006
- Output: (500 / 1,000,000) × $15.00 = $0.0075
- Total: ~$0.014 per query

Daily estimates (1,000 queries):
- $14/day = $420/month

Daily estimates (100 queries):
- $1.40/day = $42/month

Infrastructure Costs

Self-Hosted (Recommended for Production)

Component	Hardware	Monthly Cost
Qdrant	4GB RAM, 50GB SSD	$0 (self-hosted) or $20-50 (cloud)
Neo4j (optional)	4GB RAM, 50GB SSD	$0 (self-hosted) or $30-70 (cloud)
Application Server	2GB RAM, 20GB SSD	$5-15 (cloud VPS)
Total Infrastructure	-	$0-135/month

Cloud-Hosted Options

AWS: ~$50-150/month (t3.medium + storage)
Google Cloud: ~$45-130/month (e2-medium + storage)
Azure: ~$55-140/month (B2s + storage)
DigitalOcean: ~$30-80/month (droplets)

Total Cost Examples

Scenario 1: Small Business (100 queries/day)

LLM API (Claude 3.5 Sonnet): $42/month
Infrastructure (self-hosted): $0/month
Total: ~$42/month ($0.014 per query)

Scenario 2: Medium Business (1,000 queries/day)

LLM API (Claude 3.5 Sonnet): $420/month
Infrastructure (cloud): $50/month
Total: ~$470/month ($0.016 per query)

Scenario 3: Enterprise (10,000 queries/day)

LLM API (Claude 3.5 Sonnet): $4,200/month
Infrastructure (dedicated): $200/month
Total: ~$4,400/month ($0.015 per query)

Scenario 4: Education/Demo (Demo Mode)

LLM API: $0/month (no API calls in demo mode)
Infrastructure: $0/month (no Qdrant/Neo4j needed)
Total: $0/month

Cost Optimization Strategies

1. Use Caching

Save 60-80% on repeat questions:

from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_answer(question: str):
    result = engine.answer(question)
    return result

# Common questions are cached
result = cached_answer("법인세율은?")  # API call
result = cached_answer("법인세율은?")  # Cached, no API call

Estimated savings: $300/month → $90/month (70% reduction)

2. Use Cheaper Models for Simple Questions

from taxia import TaxiaEngine, TaxiaConfig

# Use Haiku for simple questions
simple_config = TaxiaConfig(llm_model="claude-3-haiku-20240307")
simple_engine = TaxiaEngine(config=simple_config)

# Use Sonnet for complex questions
complex_config = TaxiaConfig(llm_model="claude-3-5-sonnet-20241022")
complex_engine = TaxiaEngine(config=complex_config)

def smart_answer(question):
    # Route based on complexity
    if len(question) < 50:  # Simple question
        return simple_engine.answer(question)
    else:  # Complex question
        return complex_engine.answer(question)

Estimated savings: $420/month → $180/month (57% reduction)

3. Reduce max_tokens

# Default: 4000 tokens
config = TaxiaConfig(llm_max_tokens=2000)  # Reduce by 50%
engine = TaxiaEngine(config=config)

Estimated savings: $420/month → $300/month (28% reduction)

4. Use Demo Mode for Testing

# Free for development/testing
test_engine = TaxiaEngine(demo_mode=True)

# Switch to production only when ready
prod_engine = TaxiaEngine(demo_mode=False)

Estimated savings during development: Unlimited free testing

Performance Metrics

Understanding performance helps you optimize and set expectations.

Response Time Breakdown

Typical Query (Claude 3.5 Sonnet + Qdrant):

Component Breakdown:
├── Vector Search (Qdrant): 50-200ms
├── LLM Processing (Anthropic): 1,500-3,000ms
├── Citation Extraction: 50-100ms
└── Response Formatting: 10-30ms
─────────────────────────────────────
Total: 1.6-3.3 seconds

With Graph-RAG (Neo4j enabled):

Component Breakdown:
├── Vector Search (Qdrant): 50-200ms
├── Graph Query (Neo4j): 100-300ms
├── LLM Processing (Anthropic): 1,500-3,000ms
├── Citation Extraction: 50-100ms
└── Response Formatting: 10-30ms
─────────────────────────────────────
Total: 1.7-3.6 seconds

Performance by Model

Model	Avg Response Time	Throughput (queries/min)	Cost per Query
Claude 3.5 Sonnet	2.5s	24	$0.014
Claude 3 Haiku	1.2s	50	$0.003
GPT-4 Turbo	3.0s	20	$0.08
GPT-3.5 Turbo	1.5s	40	$0.008

Performance Optimization

1. Optimize Qdrant:

# Use HNSW index for faster search
config = TaxiaConfig(
    qdrant_hnsw_ef=128,  # Higher = more accurate but slower
    qdrant_hnsw_m=16     # Higher = more memory but faster
)

Expected improvement: 200ms → 100ms vector search

2. Reduce top_k:

# Retrieve fewer documents
config = TaxiaConfig(top_k=3)  # Instead of default 5

Expected improvement: 2.5s → 2.2s total time

3. Use SSD for Qdrant:

Store Qdrant data on SSD instead of HDD.

Expected improvement: 200ms → 50ms vector search

4. Parallel Processing:

import asyncio

async def batch_answer(questions):
    tasks = [engine.answer_async(q) for q in questions]
    results = await asyncio.gather(*tasks)
    return results

# Process 10 questions in parallel
questions = ["Q1", "Q2", ..., "Q10"]
results = await batch_answer(questions)

Expected improvement: 10 × 2.5s = 25s → 5s (5x faster)

Scalability Benchmarks

Single Instance Performance:

Concurrent Users	Requests/sec	Avg Response Time	Success Rate
10	4 req/s	2.5s	100%
50	18 req/s	2.8s	100%
100	32 req/s	3.1s	99.5%
500	45 req/s	11.2s	95% (queuing)

Recommendations: - <100 users: Single instance sufficient - 100-500 users: 2-3 instances with load balancing - 500+ users: Auto-scaling group (3-10 instances)

Monitoring & Cost Tracking

Track API Usage

from taxia import TaxiaEngine
import logging

logger = logging.getLogger("taxia.costs")

class CostTracker:
    def __init__(self):
        self.total_queries = 0
        self.total_tokens = 0
        self.total_cost = 0.0

    def log_query(self, result):
        self.total_queries += 1

        # Estimate tokens (actual may vary)
        input_tokens = result.metadata.get("input_tokens", 2000)
        output_tokens = result.metadata.get("output_tokens", 500)

        # Calculate cost (Claude 3.5 Sonnet)
        cost = (input_tokens / 1_000_000 * 3.0) + \
               (output_tokens / 1_000_000 * 15.0)

        self.total_tokens += (input_tokens + output_tokens)
        self.total_cost += cost

        logger.info(f"Query #{self.total_queries}: {cost:.4f} USD")

    def report(self):
        return {
            "total_queries": self.total_queries,
            "total_tokens": self.total_tokens,
            "total_cost": f"${self.total_cost:.2f}",
            "avg_cost_per_query": f"${self.total_cost/self.total_queries:.4f}"
        }

# Usage
tracker = CostTracker()
engine = TaxiaEngine()

result = engine.answer("법인세율은?")
tracker.log_query(result)

# Daily report
print(tracker.report())
# Output: {'total_queries': 145, 'total_cost': '$2.03', 'avg_cost_per_query': '$0.0140'}

Set Budget Alerts

class BudgetAlert:
    def __init__(self, daily_budget=50.0):
        self.daily_budget = daily_budget
        self.daily_cost = 0.0

    def check_budget(self, query_cost):
        self.daily_cost += query_cost

        if self.daily_cost >= self.daily_budget:
            raise BudgetExceededError(
                f"Daily budget ${self.daily_budget} exceeded! "
                f"Current: ${self.daily_cost:.2f}"
            )

        if self.daily_cost >= self.daily_budget * 0.8:
            logger.warning(f"80% of daily budget used: ${self.daily_cost:.2f}")

# Usage
budget = BudgetAlert(daily_budget=50.0)
budget.check_budget(0.014)  # Log each query cost

ROI Calculator

Calculate your return on investment:

Example: Customer Support Use Case

Before TAXIA:
- Support team: 3 people × $3,000/month = $9,000/month
- Handle: 50 queries/day = 1,500/month

After TAXIA:
- Support team: 1 person × $3,000/month = $3,000/month
- TAXIA costs: $470/month
- Handle: 400 queries/day = 12,000/month

Savings: $9,000 - $3,000 - $470 = $5,530/month
ROI: ($5,530 / $470) × 100% = 1,177% ROI
Payback period: < 1 month

Your ROI:

Calculate current support costs
Estimate queries TAXIA can handle (typically 70-90%)
Calculate TAXIA costs (LLM + infrastructure)
Savings = Current Costs - Reduced Costs - TAXIA Costs
ROI = (Savings / TAXIA Costs) × 100%

Cost Comparison: Build vs Buy

Option 1: Build Custom RAG (DIY)

Development: 3 engineers × 3 months = $45,000
Maintenance: 1 engineer × 20% = $12,000/year
Infrastructure: $100/month = $1,200/year
Total Year 1: $58,200

Option 2: Use TAXIA

Development: 1 engineer × 1 week = $2,500
Monthly costs: $470/month = $5,640/year
Total Year 1: $8,140

Savings: $50,060 in first year (86% cost reduction)

📊 Cost & Performance Summary

Quick Reference

Scenario	Monthly Cost	Response Time	Best For
Demo Mode	$0	N/A	Testing, POC
Small (100 q/day)	$42	2.5s	Startups, small teams
Medium (1K q/day)	$470	2.5s	Growing businesses
Large (10K q/day)	$4,400	2.5s	Enterprises

Optimization Checklist

[ ] Enable caching for common questions (70% cost reduction)
[ ] Use Haiku for simple queries (60% cost reduction)
[ ] Reduce max_tokens if possible (30% cost reduction)
[ ] Use SSD for Qdrant (2x faster)
[ ] Monitor and set budget alerts
[ ] Track ROI monthly

Need help optimizing? See our Performance Guide