108 lines
No EOL
3.7 KiB
Markdown
108 lines
No EOL
3.7 KiB
Markdown
# Data Model: Relational Document System
|
|
|
|
## Core Entities
|
|
|
|
### Document
|
|
**Purpose**: Represents a markdown file with indexed properties
|
|
**Fields**:
|
|
- id (string): Unique identifier, typically file path hash
|
|
- file_path (string): Absolute path to markdown file
|
|
- content_hash (string): SHA256 hash of file content for change detection
|
|
- last_modified (integer): Unix timestamp of last file modification
|
|
- created_at (integer): Timestamp when document was first indexed
|
|
- updated_at (integer): Timestamp of last index update
|
|
|
|
### Property
|
|
**Purpose**: Individual key-value pairs extracted from YAML headers
|
|
**Fields**:
|
|
- id (string): Unique property identifier
|
|
- document_id (string): Foreign key to Document
|
|
- key (string): Property name from YAML header
|
|
- value (string): Serialized property value
|
|
- value_type (string): Data type (string, number, boolean, date, array)
|
|
- created_at (integer): Timestamp when property was created
|
|
- updated_at (integer): Timestamp of last property update
|
|
|
|
### Query
|
|
**Purpose**: Saved query definitions for reuse
|
|
**Fields**:
|
|
- id (string): Unique query identifier
|
|
- name (string): Human-readable query name
|
|
- definition (string): Query syntax definition
|
|
- created_at (integer): Query creation timestamp
|
|
- last_used (integer): Timestamp of last query execution
|
|
- use_count (integer): Number of times query has been executed
|
|
|
|
### Schema
|
|
**Purpose**: Metadata about property types and validation rules
|
|
**Fields**:
|
|
- property_key (string): Property name across documents
|
|
- detected_type (string): Most common data type for this property
|
|
- validation_rules (string): JSON-encoded validation rules
|
|
- document_count (integer): Number of documents containing this property
|
|
- created_at (integer): Timestamp when schema entry was created
|
|
|
|
## Relationships
|
|
|
|
### Document ↔ Property
|
|
- One-to-many: Each document has multiple properties
|
|
- Cascade delete: Properties are removed when document is deleted
|
|
|
|
### Document ↔ Query
|
|
- Many-to-many: Queries can reference multiple documents
|
|
- Junction table: QueryResults stores execution history
|
|
|
|
### Property ↔ Schema
|
|
- Many-to-one: Multiple properties with same key map to one schema entry
|
|
|
|
## Data Types
|
|
|
|
### Supported Property Types
|
|
- **string**: Text values (default type)
|
|
- **number**: Numeric values (integer or float)
|
|
- **boolean**: true/false values
|
|
- **date**: ISO 8601 date strings
|
|
- **array**: JSON-encoded arrays
|
|
- **object**: JSON-encoded objects (nested structures)
|
|
|
|
### Type Detection Logic
|
|
1. Parse YAML value using native YAML parser
|
|
2. Apply type detection rules:
|
|
- Strings matching ISO 8601 format → date
|
|
- Numeric strings without decimals → number (integer)
|
|
- Numeric strings with decimals → number (float)
|
|
- "true"/"false" (case insensitive) → boolean
|
|
- Arrays/objects → respective types
|
|
- Everything else → string
|
|
|
|
## Indexing Strategy
|
|
|
|
### Primary Indices
|
|
- documents.file_path (unique)
|
|
- properties.document_id (foreign key)
|
|
- properties.key (for property-based queries)
|
|
- queries.id (unique)
|
|
|
|
### Composite Indices
|
|
- properties(document_id, key) for fast document property lookup
|
|
- properties(key, value_type) for type-constrained queries
|
|
- queries(last_used) for recent query tracking
|
|
|
|
## Validation Rules
|
|
|
|
### Document Validation
|
|
- File must exist and be readable
|
|
- File must have valid YAML header (--- delimiters)
|
|
- YAML must parse without errors
|
|
- File must be UTF-8 encoded
|
|
|
|
### Property Validation
|
|
- Property keys must be non-empty strings
|
|
- Property values must be serializable
|
|
- Array/object values must be valid JSON
|
|
- Date values must match ISO 8601 format
|
|
|
|
### Query Validation
|
|
- Query syntax must be valid according to defined grammar
|
|
- Query must reference existing properties
|
|
- Query complexity must be within performance limits |