# Data Model: Relational Document System ## Core Entities ### Document **Purpose**: Represents a markdown file with indexed properties **Fields**: - id (string): Unique identifier, typically file path hash - file_path (string): Absolute path to markdown file - content_hash (string): SHA256 hash of file content for change detection - last_modified (integer): Unix timestamp of last file modification - created_at (integer): Timestamp when document was first indexed - updated_at (integer): Timestamp of last index update ### Property **Purpose**: Individual key-value pairs extracted from YAML headers **Fields**: - id (string): Unique property identifier - document_id (string): Foreign key to Document - key (string): Property name from YAML header - value (string): Serialized property value - value_type (string): Data type (string, number, boolean, date, array) - created_at (integer): Timestamp when property was created - updated_at (integer): Timestamp of last property update ### Query **Purpose**: Saved query definitions for reuse **Fields**: - id (string): Unique query identifier - name (string): Human-readable query name - definition (string): Query syntax definition - created_at (integer): Query creation timestamp - last_used (integer): Timestamp of last query execution - use_count (integer): Number of times query has been executed ### Schema **Purpose**: Metadata about property types and validation rules **Fields**: - property_key (string): Property name across documents - detected_type (string): Most common data type for this property - validation_rules (string): JSON-encoded validation rules - document_count (integer): Number of documents containing this property - created_at (integer): Timestamp when schema entry was created ## Relationships ### Document ↔ Property - One-to-many: Each document has multiple properties - Cascade delete: Properties are removed when document is deleted ### Document ↔ Query - Many-to-many: Queries can reference multiple documents - Junction table: QueryResults stores execution history ### Property ↔ Schema - Many-to-one: Multiple properties with same key map to one schema entry ## Data Types ### Supported Property Types - **string**: Text values (default type) - **number**: Numeric values (integer or float) - **boolean**: true/false values - **date**: ISO 8601 date strings - **array**: JSON-encoded arrays - **object**: JSON-encoded objects (nested structures) ### Type Detection Logic 1. Parse YAML value using native YAML parser 2. Apply type detection rules: - Strings matching ISO 8601 format → date - Numeric strings without decimals → number (integer) - Numeric strings with decimals → number (float) - "true"/"false" (case insensitive) → boolean - Arrays/objects → respective types - Everything else → string ## Indexing Strategy ### Primary Indices - documents.file_path (unique) - properties.document_id (foreign key) - properties.key (for property-based queries) - queries.id (unique) ### Composite Indices - properties(document_id, key) for fast document property lookup - properties(key, value_type) for type-constrained queries - queries(last_used) for recent query tracking ## Validation Rules ### Document Validation - File must exist and be readable - File must have valid YAML header (--- delimiters) - YAML must parse without errors - File must be UTF-8 encoded ### Property Validation - Property keys must be non-empty strings - Property values must be serializable - Array/object values must be valid JSON - Date values must match ISO 8601 format ### Query Validation - Query syntax must be valid according to defined grammar - Query must reference existing properties - Query complexity must be within performance limits