Data Distribution

Overview

TAXIA tax law data is distributed through Hugging Face for easy access and automatic downloading.

Dataset Structure

taxia-data/
├── 2015/
│   ├── income_tax_act.json
│   ├── income_tax_act_decree.json
│   ├── income_tax_act_rules.json
│   ├── corporate_tax_act.json
│   ├── corporate_tax_act_decree.json
│   ├── corporate_tax_act_rules.json
│   ├── vat_act.json
│   ├── vat_act_decree.json
│   ├── vat_act_rules.json
│   ├── tax_laws.json
│   ├── tax_laws_combined.json
│   ├── tax_laws_detailed.csv
│   └── tax_laws_articles.csv
│
├── 2016/ ... 2024/  (same structure)
│
└── 2025/
    └── (same structure)

Data Coverage

Laws

Enforcement Decrees

Enforcement Rules

JSON Schema

Law Document Structure

{
  "law_name": "Corporate Tax Act",
  "law_name_ko": "법인세법",
  "year": 2025,
  "articles": [
    {
      "article_number": "Article 1",
      "title": "Purpose",
      "content": "This Act aims to establish tax obligations for corporations...",
      "clauses": [
        {
          "clause_number": "1",
          "content": "Resident corporations shall be taxed under this Act..."
        }
      ]
    }
  ]
}

Combined Document Format

{
  "law_type": "Corporate Tax Act",
  "law_type_ko": "법인세법",
  "category": "law",
  "year": 2025,
  "full_text": "...",
  "articles": [...],
  "metadata": {
    "last_updated": "2025-01-01",
    "revision_count": 42,
    "related_laws": ["Income Tax Act", "Value Added Tax Act"]
  }
}

Automatic Download

TAXIA automatically downloads required data on first use:

from taxia import TaxiaEngine

# Downloads data automatically to ~/.taxia/data/
engine = TaxiaEngine()

Manual Download

from taxia.data import DataDownloader

downloader = DataDownloader()

# Download specific year
downloader.download_year(2025)

# Download all years
downloader.download_all()

# Custom directory
downloader.download_all(target_dir="./my_data")

Download from CLI

# Download latest year
taxia data download --year 2025

# Download all years
taxia data download --all

# Check download status
taxia data status

Data Updates

Version Control

Update Strategy

from taxia.data import DataDownloader

downloader = DataDownloader()

# Check for updates
if downloader.has_updates():
    print("Updates available!")
    downloader.update_all()

Local Data Usage

Using Local Files

If you have your own tax law data:

from taxia import TaxiaEngine

engine = TaxiaEngine()

# Index local directory
engine.index_documents("./my_koreantaxlaw/")

Data Format Requirements

Your JSON files should follow the schema:

{
  "law_name": "Your Law Name",
  "year": 2025,
  "articles": [
    {
      "article_number": "Article 1",
      "title": "Article Title",
      "content": "Article content..."
    }
  ]
}

Data Statistics

Total Coverage (2015-2025)

Category Files per Year Total Files
Laws 3 33
Enforcement Decrees 3 33
Enforcement Rules 3 33
Combined Files 4 44
Total 22 242

File Sizes

Article Count

License

The tax law data is public domain: - Source: Korean National Law Information Center - Usage: Free for commercial and non-commercial use - Attribution: Not required but appreciated

Data Quality

Validation

All data is validated for: - JSON schema compliance - Article numbering consistency - Cross-reference accuracy - UTF-8 encoding

Known Issues

Contributing Data

To contribute data improvements:

  1. Fork the taxia-data repository
  2. Make corrections or additions
  3. Submit a pull request with description