Gap Analysis

This document identifies systemic gaps in current personal data operations implementations, derived from evaluating 5 major systems against our principles.

Methodology: Cross-system analysis identifying principles that no system adequately addresses.

Critical Gaps (P2 Scores Across All Systems)

These are must-have principles where ALL evaluated systems are weak.

GAP-1: Temporal Integrity (P2)

Scores: Obsidian (0), Roam (1), atproto (2), Solid (0), Notion (1) Only adequate implementation: atproto

The Problem: Most personal knowledge systems treat time as an afterthought. Version history is external (git), limited (recent changes only), or non-existent. Understanding how knowledge evolved - "what did I think when?" - is core to reflection but unsupported.

Why This Matters:

Can't trace conceptual evolution (UC-1)
Can't do time-travel queries (R4)
Can't maintain provenance chains (R2)
Reflection (T7) is crippled

What's Missing:

Immutable or append-only storage models
First-class versioning at mnemegram level
Temporal indexing for "state at time T" queries
Automatic timestamp precision

Engineering Implications:

Need event sourcing or commit-based storage
Need temporal indexes in query layer
Need UI for time-travel exploration

Who Gets It Right:

atproto: MST with signed commits, full history
Git (as external tool): Commit model with branching

GAP-2: Provenance Traceability (P12)

Scores: Obsidian (0), Roam (0), atproto (2), Solid (1), Notion (0) Only adequate implementation: atproto

The Problem: Where did this mnemegram come from? What influenced it? What did it influence? These are basic questions, but almost no system tracks provenance automatically. It's left to manual tagging or external tools.

Why This Matters:

Can't trace information genealogy (UC-15)
Can't distinguish AI-generated vs human content (R55)
Can't link decisions to outcomes (R82)
Accountability (T5) is impossible

What's Missing:

Automatic capture of derivation chains
"Generated from" relationships
Source attribution for ingested content
Clear marking of AI vs human authorship

Engineering Implications:

Need graph model with derivation edges
Need metadata on all mnemegrams: created_by, derived_from, influenced_by
Need UI for provenance visualization

Who Gets It Right:

atproto: Cryptographic commits create provenance chain
Scientific notebooks: Lab notebooks have provenance by practice

GAP-3: Contextual Access Control (P10)

Scores: Obsidian (0), Roam (0), atproto (0), Solid (2), Notion (2) Adequate implementations: Solid, Notion (but vendor-locked)

The Problem: Most personal knowledge systems are either all-private (Obsidian) or all-shared (Roam workspace). Mnemegrams that exist in multiple contexts (work/personal) have no way to express "this colleague can see my work notes but not my personal journal."

Why This Matters:

Can't share subsets selectively (UC-2)
Can't partition work/personal (UC-8)
Can't do contextual AI access (UC-12)
Communion (T9) requires trust boundaries

What's Missing:

Fine-grained ACLs at mnemegram level
Context metadata (work, personal, family, etc.)
Capability-based delegation
Temporal access grants (expire after condition)

Engineering Implications:

Need ACL system or capability tokens
Need context as first-class metadata
Need policy evaluation engine
Need audit logs

Who Gets It Right:

Solid: WAC (Web Access Control) with per-resource ACLs
Notion: Granular workspace permissions (but proprietary)

GAP-4: Proactive Surfacing (P11)

Scores: Obsidian (1), Roam (2), atproto (1), Solid (0), Notion (0) Only adequate implementation: Roam

The Problem: Most systems are query-driven: you ask, system responds. But valuable knowledge often isn't retrieved because you don't know to ask. Proactive surfacing - "here's something relevant you didn't ask for" - is rare.

Why This Matters:

Can't discover non-obvious connections (R17)
Can't get relationship maintenance reminders (UC-10)
Can't surface patterns without explicit query (UC-17)
Orientation (T4) suffers

What's Missing:

Recommendation engines based on context
Anomaly detection ("this is unusual")
Time-based triggers ("you haven't reviewed X in 3 months")
Pattern recognition ("you always feel better after Y")

Engineering Implications:

Need background computation for recommendations
Need pattern detection algorithms
Need context awareness (what's the agent doing now?)
Need push notification system

Who Gets It Right:

Roam: Linked references sidebar, query embeds
Modern email: "Smart compose" suggestions

Significant Gaps (Majority of Systems Weak)

GAP-5: Heterogeneous Integration (P13)

Scores: Obsidian (1), Roam (0), atproto (0), Solid (2), Notion (1)

The Problem: Personal knowledge spans text, images, locations, biometrics, communications. Most systems are text-first with media as attachments. Cross-type queries ("where was I when I wrote about X?") are impossible.

What's Missing:

Unified graph across data types
Entity resolution (same person across systems)
Spatiotemporal queries
Biometric-knowledge correlation

Engineering Implications:

Need polymorphic node types in graph
Need cross-type indexing
Need query language spanning modalities

GAP-6: Collective Possibility (P7)

Scores: Obsidian (0), Roam (1), atproto (2), Solid (1), Notion (2)

The Problem: Personal knowledge systems are designed for individuals. Multi-generational memory (UC-5), collaborative research (UC-14), family archives need multi-agent models but most systems bolt collaboration on afterward.

What's Missing:

Clear attribution (who said what)
Consensus mechanisms
Personal vs shared mnemegram distinction
Generational handoff (beyond creator lifetime)

Engineering Implications:

Need multi-agent authorship model
Need assertions with agent attribution
Need merge/fork operations
Need permission models that include "family" or "team"

GAP-7: Friction Minimization (P5)

Scores: Obsidian (1), Roam (2), atproto (1), Solid (0), Notion (2)

The Problem: Comprehensive memex requires capturing everything, but manual entry doesn't scale. Automation and integration are afterthoughts, not core features. Data exhaust goes uncaptured.

What's Missing:

Passive capture from communication platforms
Automated biometric/location logging
Browser history integration
Low-friction mobile capture

Engineering Implications:

Need API integrations as first-class
Need background capture processes
Need mobile-first capture UX
Need tolerance for messy, unstructured input

Architectural Patterns Missing from All Systems

PATTERN-1: Event Sourcing for Personal Knowledge

No system uses event sourcing for personal knowledge management, yet it solves multiple gaps:

Temporal integrity (GAP-1): Full history by design
Provenance (GAP-2): Events form derivation chain
Audit (T5): Every change is event

Why Not Adopted:

Query complexity (need materialized views)
Storage costs (every edit = new event)
Unfamiliar to PKM community

Potential:

Combine event stream with current-state cache
Use CRDTs for multi-device sync
Enable "replay" for understanding evolution

Only atproto hints at this (with its token model), but nobody does full capability-based access control for personal knowledge.

What It Would Enable:

Granular delegation (R25)
Time-limited sharing
Revocable access without centralized server
"Give this AI access to these 10 mnemegrams for this task"

Why Not Adopted:

Complexity of implementation
UX challenges (managing capabilities)
PKM tools prioritize simplicity

Potential:

Use UCAN (User Controlled Authorization Networks)
Capabilities as shareable tokens
Eliminates need for centralized ACL server

PATTERN-3: Federated Personal Data Servers

atproto and Solid both attempt this, but neither is designed for personal knowledge:

atproto: Social-first, not knowledge-first
Solid: Too complex, poor performance

What's Missing:

PDS designed for personal knowledge graph, not social posts
Efficient query federation
Standardized personal knowledge schemas (lexicons)

Potential:

Personal data server that speaks both atproto and Solid protocols
AppViews specialized for personal knowledge (not social feeds)
Local-first PDS with optional federation

PATTERN-4: Hybrid Storage (Immutable + Mutable)

atproto does this (content-addressed blocks + mutable pointers), but others don't:

Obsidian: Pure mutable (files on disk)
Roam: Mutable cloud database
Solid: Mutable pods

What Hybrid Enables:

Immutability for provenance/history
Mutability for "current state" UX
Content deduplication
Cryptographic verification

Potential:

IPFS or similar for immutable content
Mutable indexes pointing to immutable blocks
Best of both worlds

PATTERN-5: Context-Aware Indexing

Nobody does this well. Indexing is either:

Full-text (Obsidian, Notion)
Graph-based (Roam, limited)
Triple-store (Solid, slow)

What Context-Aware Would Enable:

"Find notes written when I was in NYC"
"Show me mnemegrams from anxious periods"
"What was I working on during Q3 2023?"

Missing Technical Pieces:

Spatiotemporal indexes
Emotional/mental state as index dimension
Multi-dimensional query language

Priority Gaps for Experimentation

Based on:

How many use cases affected
How many requirements unsatisfied
How many principles violated
Feasibility of improvement

HIGHEST PRIORITY

P1: Temporal Integrity + Provenance (GAP-1 + GAP-2)

Affects: 8 use cases, 15 requirements, 2 principles
Current state: Only atproto adequate
Experiment: Event-sourced personal knowledge store
Technology: Append-only log (SSB-style) or commit model (atproto-style)

HIGH PRIORITY

P2: Contextual Access Control (GAP-3)

Affects: 5 use cases, 12 requirements, 2 principles
Current state: Only Solid + Notion (vendor-locked)
Experiment: Capability-based sharing for personal knowledge
Technology: UCAN, Macaroons, or custom capability system

MEDIUM PRIORITY

P3: Proactive Surfacing (GAP-4)

Affects: 4 use cases, 6 requirements, 1 principle
Current state: Only Roam adequate
Experiment: Recommendation engine for personal knowledge
Technology: Graph algorithms, pattern matching, LLM embeddings

LOWER PRIORITY (But Important)

P4: Heterogeneous Integration (GAP-5) P5: Friction Minimization (GAP-7) P6: Collective Possibility (GAP-6)

Questions for the Working Group

Which gaps matter most to your use cases?
Are there systems we didn't evaluate that address these gaps?
Which experiments should we prioritize?
What existing technologies could be adapted? (e.g., Git for temporal integrity, UCAN for access control)

Cross-References

system-evaluation - Detailed scoring of 5 systems
principles - 15 principles these gaps violate
requirements - Specific requirements unsatisfied
glossary-engineering - Technical terms for solutions

Gap Analysis

Critical Gaps (P2 Scores Across All Systems)

GAP-1: Temporal Integrity (P2)

GAP-2: Provenance Traceability (P12)

GAP-3: Contextual Access Control (P10)

GAP-4: Proactive Surfacing (P11)

Significant Gaps (Majority of Systems Weak)

GAP-5: Heterogeneous Integration (P13)

GAP-6: Collective Possibility (P7)

GAP-7: Friction Minimization (P5)

Architectural Patterns Missing from All Systems

PATTERN-1: Event Sourcing for Personal Knowledge

PATTERN-2: Capability-Based Sharing

PATTERN-3: Federated Personal Data Servers

PATTERN-4: Hybrid Storage (Immutable + Mutable)

PATTERN-5: Context-Aware Indexing

Priority Gaps for Experimentation

HIGHEST PRIORITY

HIGH PRIORITY

MEDIUM PRIORITY

LOWER PRIORITY (But Important)

Questions for the Working Group

Cross-References

Backlinks