Data Distribution
Overview
TAXIA tax law data is distributed through Hugging Face for easy access and automatic downloading.
- Repository: xaikorea/taxia-data
- Format: JSON files organized by year
- Coverage: 2015-2025 (11 years)
- Total Files: 242 files
Dataset Structure
taxia-data/
├── 2015/
│ ├── income_tax_act.json
│ ├── income_tax_act_decree.json
│ ├── income_tax_act_rules.json
│ ├── corporate_tax_act.json
│ ├── corporate_tax_act_decree.json
│ ├── corporate_tax_act_rules.json
│ ├── vat_act.json
│ ├── vat_act_decree.json
│ ├── vat_act_rules.json
│ ├── tax_laws.json
│ ├── tax_laws_combined.json
│ ├── tax_laws_detailed.csv
│ └── tax_laws_articles.csv
│
├── 2016/ ... 2024/ (same structure)
│
└── 2025/
└── (same structure)
Data Coverage
Laws
- Income Tax Act (소득세법)
- Corporate Tax Act (법인세법)
- Value Added Tax Act (부가가치세법)
Enforcement Decrees
- Each law has corresponding enforcement decrees
- Detailed implementation guidelines
- Calculation methods and rates
Enforcement Rules
- Administrative procedures
- Form specifications
- Filing requirements
JSON Schema
Law Document Structure
{
"law_name": "Corporate Tax Act",
"law_name_ko": "법인세법",
"year": 2025,
"articles": [
{
"article_number": "Article 1",
"title": "Purpose",
"content": "This Act aims to establish tax obligations for corporations...",
"clauses": [
{
"clause_number": "1",
"content": "Resident corporations shall be taxed under this Act..."
}
]
}
]
}
Combined Document Format
{
"law_type": "Corporate Tax Act",
"law_type_ko": "법인세법",
"category": "law",
"year": 2025,
"full_text": "...",
"articles": [...],
"metadata": {
"last_updated": "2025-01-01",
"revision_count": 42,
"related_laws": ["Income Tax Act", "Value Added Tax Act"]
}
}
Automatic Download
TAXIA automatically downloads required data on first use:
from taxia import TaxiaEngine
# Downloads data automatically to ~/.taxia/data/
engine = TaxiaEngine()
Manual Download
from taxia.data import DataDownloader
downloader = DataDownloader()
# Download specific year
downloader.download_year(2025)
# Download all years
downloader.download_all()
# Custom directory
downloader.download_all(target_dir="./my_data")
Download from CLI
# Download latest year
taxia data download --year 2025
# Download all years
taxia data download --all
# Check download status
taxia data status
Data Updates
Version Control
- Each year's data is versioned separately
- Updates published via Hugging Face
- Automatic version checking on startup
Update Strategy
from taxia.data import DataDownloader
downloader = DataDownloader()
# Check for updates
if downloader.has_updates():
print("Updates available!")
downloader.update_all()
Local Data Usage
Using Local Files
If you have your own tax law data:
from taxia import TaxiaEngine
engine = TaxiaEngine()
# Index local directory
engine.index_documents("./my_koreantaxlaw/")
Data Format Requirements
Your JSON files should follow the schema:
{
"law_name": "Your Law Name",
"year": 2025,
"articles": [
{
"article_number": "Article 1",
"title": "Article Title",
"content": "Article content..."
}
]
}
Data Statistics
Total Coverage (2015-2025)
| Category | Files per Year | Total Files |
|---|---|---|
| Laws | 3 | 33 |
| Enforcement Decrees | 3 | 33 |
| Enforcement Rules | 3 | 33 |
| Combined Files | 4 | 44 |
| Total | 22 | 242 |
File Sizes
- Individual law files: 50-200 KB
- Combined files: 500 KB - 2 MB
- Total dataset size: ~150 MB
Article Count
- Approximate articles per law: 50-150
- Total articles across all years: ~20,000+
License
The tax law data is public domain: - Source: Korean National Law Information Center - Usage: Free for commercial and non-commercial use - Attribution: Not required but appreciated
Data Quality
Validation
All data is validated for: - JSON schema compliance - Article numbering consistency - Cross-reference accuracy - UTF-8 encoding
Known Issues
- Some historical amendments may be incomplete
- Minor formatting variations across years
- Ongoing improvements via community contributions
Contributing Data
To contribute data improvements:
- Fork the taxia-data repository
- Make corrections or additions
- Submit a pull request with description