notex.nvim/specs/002-notex-is-a/data-model.md

108 lines
3.7 KiB
Markdown
Raw Permalink Normal View History

2025-10-05 20:16:33 -04:00
# Data Model: Relational Document System
## Core Entities
### Document
**Purpose**: Represents a markdown file with indexed properties
**Fields**:
- id (string): Unique identifier, typically file path hash
- file_path (string): Absolute path to markdown file
- content_hash (string): SHA256 hash of file content for change detection
- last_modified (integer): Unix timestamp of last file modification
- created_at (integer): Timestamp when document was first indexed
- updated_at (integer): Timestamp of last index update
### Property
**Purpose**: Individual key-value pairs extracted from YAML headers
**Fields**:
- id (string): Unique property identifier
- document_id (string): Foreign key to Document
- key (string): Property name from YAML header
- value (string): Serialized property value
- value_type (string): Data type (string, number, boolean, date, array)
- created_at (integer): Timestamp when property was created
- updated_at (integer): Timestamp of last property update
### Query
**Purpose**: Saved query definitions for reuse
**Fields**:
- id (string): Unique query identifier
- name (string): Human-readable query name
- definition (string): Query syntax definition
- created_at (integer): Query creation timestamp
- last_used (integer): Timestamp of last query execution
- use_count (integer): Number of times query has been executed
### Schema
**Purpose**: Metadata about property types and validation rules
**Fields**:
- property_key (string): Property name across documents
- detected_type (string): Most common data type for this property
- validation_rules (string): JSON-encoded validation rules
- document_count (integer): Number of documents containing this property
- created_at (integer): Timestamp when schema entry was created
## Relationships
### Document ↔ Property
- One-to-many: Each document has multiple properties
- Cascade delete: Properties are removed when document is deleted
### Document ↔ Query
- Many-to-many: Queries can reference multiple documents
- Junction table: QueryResults stores execution history
### Property ↔ Schema
- Many-to-one: Multiple properties with same key map to one schema entry
## Data Types
### Supported Property Types
- **string**: Text values (default type)
- **number**: Numeric values (integer or float)
- **boolean**: true/false values
- **date**: ISO 8601 date strings
- **array**: JSON-encoded arrays
- **object**: JSON-encoded objects (nested structures)
### Type Detection Logic
1. Parse YAML value using native YAML parser
2. Apply type detection rules:
- Strings matching ISO 8601 format → date
- Numeric strings without decimals → number (integer)
- Numeric strings with decimals → number (float)
- "true"/"false" (case insensitive) → boolean
- Arrays/objects → respective types
- Everything else → string
## Indexing Strategy
### Primary Indices
- documents.file_path (unique)
- properties.document_id (foreign key)
- properties.key (for property-based queries)
- queries.id (unique)
### Composite Indices
- properties(document_id, key) for fast document property lookup
- properties(key, value_type) for type-constrained queries
- queries(last_used) for recent query tracking
## Validation Rules
### Document Validation
- File must exist and be readable
- File must have valid YAML header (--- delimiters)
- YAML must parse without errors
- File must be UTF-8 encoded
### Property Validation
- Property keys must be non-empty strings
- Property values must be serializable
- Array/object values must be valid JSON
- Date values must match ISO 8601 format
### Query Validation
- Query syntax must be valid according to defined grammar
- Query must reference existing properties
- Query complexity must be within performance limits