Base Model: Schema Generation System Specification
Vision
The vision of this project is to create a unified, technology-agnostic schema definition system that embodies the DRY (Don't Repeat Yourself) principle. By establishing a single, comprehensive base schema, we aim to eliminate redundancy and inconsistencies across various data representations and technologies.
Our goal is to provide a flexible and extensible foundation from which multiple specialized schemas can be generated. This approach ensures consistency across different platforms and reduces the maintenance burden associated with managing multiple schema definitions independently.
By centralizing the core data model in a base schema, we enable teams to focus on the unique aspects of each target platform or technology, while maintaining a single source of truth for the underlying data structure. This not only streamlines development processes but also enhances data integrity and interoperability across systems.
Ultimately, this vision supports more efficient, error-resistant, and adaptable data management practices, allowing organizations to respond more quickly to changing requirements and technological advancements.
Overview
This system allows the definition of a base schema model using YAML. From this base model, various export formats can be generated to cater to different technologies, including relational databases, full-text search engines, and NoSQL databases. The system supports additional metadata for exporters, allowing for customization of data mappings and relationships.
Features
- YAML Schema Definition:
- Define the base schema model with core properties and validation rules.
- Python Parsing:
- Use Python with PyYAML to parse the YAML schema.
- The spec can be implemented with any language, the reference implementation will use Python
- Exporters:
- Generate various export formats:
- Relational Databases: Create SQL schemas, including tables and relationships.
- Full-Text Search Engines: Define index structures.
- NoSQL Databases: Map objects to JSON columns or other structures.
- File types: JSON Schema, Parquet, ORC
- Wire protocol standards: Open API, Async API, Protocol Buf, Avro
- Metadata and Configuration:
- Define additional metadata in YAML for each exporter, including data mappings, relationships, and additional fields like GUIDs, 'created', and 'updated' timestamps.
Components
- Schema Definer:
- A YAML-based system to define base models, capturing core attributes and validation rules.
- YAML Parser:
- Python script using PyYAML to read and parse schema definitions.
- Exporters:
- Modular components for each target format:
- SQL Exporter: Converts the base model into SQL schema, adding relationships and extra fields as needed.
- Full-Text Exporter: Structures data for full-text search engines.
- JSON Exporter: Maps entire objects to JSON columns or other NoSQL structures.
- Metadata Configurator:
- YAML files for each exporter, specifying mappings, relationships, and additional attributes like GUIDs, 'created', and 'updated' timestamps.