STANDARD
v1.1 SPECIFICATION
Why This Standard Exists
Company data is currently fragmented across thousands of local formats and proprietary silos. Central.Enterprises provides a unified global layer that serves as public infrastructure for economic development, academic research, and system interoperability. By normalizing data into a single, predictable format, we enable anyone to build reliable tools on top of the world's business registry.
Two-Tier Data Model
To balance public transparency with commercial viability and safety, we operate two strict data tiers that share identical structure (same rows, same columns).
1. OpenData Tier
OpenDataPublished for education, research, and public interest. Sensitive or proprietary fields (like direct contacts) are masked to ensure safety while maintaining structural integrity.
2. Premium (Pro) Tier
PremiumDesigned for operational intelligence. Includes full enrichment values for all columns, such as verified direct emails, active social signals, and spending indicators.
Global Identity Strategy
Identifying a unique company globally is a complex challenge. We employ a tripartite strategy:
- Stable ID (sdc_id): An internal alphanumeric identifier for long-term tracking, ensuring references remain stable across merges or updates.
- Digital Identity (Domain): We treat a company's primary website domain as its most practical global key, enabling reproducible research on digital presence.
- Official Registry (Canonical): We normalize local government identifiers (CIF, EIN,
CRN) into a standardized
COUNTRY-NUMBERformat to support cross-border economic analysis.
Core Schema v1.1
The Standard enforces a fixed 37-column layout. The column order is immutable to ensure CSV parser compatibility.
| IDX | FIELD NAME | OPENDATA STATUS | DESCRIPTION |
|---|---|---|---|
| 1 | sdc_id | OPEN | Stable internal UUID. |
| 2 | name | OPEN | Legal registered name. |
| 3 | website | OPEN | Primary domain (normalized). |
| 4 | country_code | OPEN | ISO 3166-1 alpha-2. |
| 5 | province_region | OPEN | Administrative region. |
| 6 | city | OPEN | City or municipality. |
| 7 | reg_number | MASKED | Local identifier (e.g. B12345678). |
| 8 | reg_number_type | MASKED | Label of ID (e.g. "NIF"). |
| 9 | reg_number_country | MASKED | Country of registration. |
| 10 | reg_number_canonical | MASKED | Format: CC-NUMBER. |
| 11 | main_category | OPEN | Primary taxonomy classification. |
| 12 | categories | OPEN | Full taxonomy list (pipe-separated). |
| 13 | RSS | OPEN | News feed URL. |
| 14 | emails | MASKED | Verified contact emails. |
| 15 | phone | MASKED | Main switchboard number. |
| 16 | address | MASKED | Full postal address. |
| 17 | CentralRating | MASKED | Internal trust score. |
| 18 | description | MASKED | Company description text. |
| 19-20 | workday_timing / closed_on | MASKED | Operational hours data. |
| 21 | featured_image | MASKED | Primary brand image URL. |
| 22-23 | is_spending_on_ads / competitors | MASKED | Commercial signals. |
| 24-37 | Social Fields (LinkedIn...Medium) | MASKED | 14 platform-specific columns. |
Technical Specifications
- Format: CSV (Comma Separated)
- Encoding: UTF-8
- Quoting: RFC 4180 Compliant
- Line Break: CRLF or LF
- True/False: output as
true/false - Empty Values: Empty string
"" - Dates: ISO 8601 (YYYY-MM-DD)
- Multivalue: Pipe separator
|
Normalization Rules: Websites are canonicalized (lowercase host). Country codes are strict ISO 3166-1 alpha-2. Registry numbers are stripped of delimiters for the canonical field. Builds are deterministic; the same input always yields the same SHA-256 hash.
Dataset Manifest & Provenance
Every distribution is accompanied by a dataset_manifest.json providing cryptographic proof of
integrity and build metadata.
{
"schema_version": "sdc-1.1",
"tier": "opendata",
"build_id": "2026-01-09-PUBLIC",
"generated_at": "2026-01-09T12:00:00Z",
"files": [
{
"filename": "companies_es.csv",
"rows": 52000,
"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
}
],
"maintainer": "Central.Enterprises Foundation",
"license": "CC BY 4.0",
"notes": "Masked fields are exported as empty strings."
}
Provenance: OpenData values are populated only when the source provenance allows for open redistribution. If the data source is restrictive, the value is masked in the Open tier but preserved in the Premium tier, ensuring total legal compliance.
Data Examples
CSV Header
sdc_id,name,website,country_code,province_region,city,reg_number,reg_number_type...[full list]...Medium
OpenData Row Example
"550e8400-e29b-41d4-a716-446655440000","Acme Corp","acme.com","US","California","San Francisco","","","","", "Technology","Tech|Software","https://acme.com/rss","","","","","","","","","","","","","","","","","","","","","","","",""
Premium Row Example
"550e8400-e29b-41d4-a716-446655440000","Acme Corp","acme.com","US","California","San Francisco","12-3456789","EIN","US","US-123456789", "Technology","Tech|Software","https://acme.com/rss","contact@acme.com","+1-555-0199","1 Market St","A+","Global leader in widgets...","09:00-17:00","Sun","https://acme.com/logo.png","true","WidgetCo|GadgetInc","acme-corp","@acme","acme","acmevideo",...
Current: v1.1 (Stable)
We adhere to semantic versioning. 1.0 -> 1.1 implies backward-compatible field additions. 2.0 would imply breaking changes.