Onboarding: Personal Data Operations Domain

Welcome! This document helps you understand what we're doing, why it matters, and how to contribute.

Start Here: The Big Picture

What Is This Domain?

Personal Data Operations is the emerging practice of managing individual knowledge at scale - with the rigor of data engineering, the sovereignty of personal computing, and the richness of semantic knowledge representation.

It sits at the intersection of:

Personal Knowledge Management (PKM) - Tools like Obsidian, Roam, Zettelkasten
Decentralized Identity & Data - Protocols like atproto, Solid, DIDs
Data Engineering - Practices from databases, data ops, event sourcing
Semantic Web - Standards like RDF, linked data, ontologies

Why Does This Matter?

The Problem:

We generate massive amounts of personal data (notes, messages, locations, health metrics)
Current tools either lock you in (Notion, Roam) or lack structure (plain files)
No one system handles: sovereignty + semantics + scale + longevity
Collaboration is bolted on, not designed in
Time and provenance are afterthoughts

The Opportunity:

Build infrastructure that outlives any single tool
Enable true data sovereignty while supporting collaboration
Make decades of personal knowledge queryable and generative
Learn from enterprise data ops but adapt for personal scale
Create foundation for AI assistants that truly know your context

What Are We Building Toward?

Not a single product, but a domain of practice with:

Shared vocabulary (concepts, principles, patterns)
Requirements and specifications grounded in real use cases
Evaluation criteria for existing and future systems
Experiments and prototypes exploring key tensions
Community knowledge for practitioners

Essential Concepts

These are the conceptual building blocks you need to understand our work.

From Memex Ontology

We ground our work in the Memex Ontology, which defines:

Mnemegram: The shadow of experience cast into information space - the fundamental unit of capture.

Concepts/List/Agent: That which interacts with the memex - you, your future self, your family, an AI assistant, etc.

Essential Properties (E1-E6): What something must be to qualify as memex.

Functions (F1-F9): What memex does

Telos (T1-T11): What memex is for

Key Insight: Essential properties, functions, and telos co-determine each other. No layer is foundational - they shape each other.

Core Tensions

Personal data operations involves navigating inherent tradeoffs:

Flexibility vs Queryability: More flexible schemas make querying harder Privacy vs Discoverability: Encryption limits what can be indexed Portability vs Performance: Generic formats are slower than optimized ones Immutability vs Right-to-Delete: Append-only conflicts with true deletion Capture Friction vs Data Quality: Easy capture means messier data

We don't "solve" these tensions - we understand them and make informed tradeoffs.

Adjacent Domains You Should Know

Personal Knowledge Management (PKM):

Zettelkasten method
Tools: Obsidian, Roam, Logseq, Notion
Community: r/Zettelkasten, r/ObsidianMD
Key figures: Niklas Luhmann, Sönke Ahrens

Decentralized Web / Web3:

atproto (Bluesky)
Solid (Tim Berners-Lee)
IPFS, Dat/Hypercore
DIDs (Decentralized Identifiers)

Data Engineering:

Event sourcing
CQRS (Command Query Responsibility Segregation)
CRDTs (Conflict-Free Replicated Data Types)
Data lineage and provenance

Semantic Web:

RDF (Resource Description Framework)
SPARQL query language
Linked data principles
Schema.org, FOAF, Dublin Core

You don't need to be expert in all these, but knowing they exist helps you see where ideas come from.

How We Work: The Methodology

We follow W3C working group methodology:

Use Cases - Concrete scenarios from real needs
Requirements - What must systems do to satisfy use cases
Principles - Architectural guidelines derived from requirements
Specifications - (Future) Concrete designs that satisfy principles
Implementations - (Future) Working prototypes and systems

Current Status: We've completed use cases → requirements → principles. Now moving to gap analysis and experiments.

What We've Built So Far

Core Documents

use-cases - 19 scenarios covering individual to collective use

Read UC-1 (Researcher traces evolution) and UC-6 (Activist under threat) for range

requirements - 90 requirements organized by domain

Scan the functional domains, note which resonate with your interests

principles - 15 architectural principles with interdependencies

Start with the "Essential" principles (P1, P6, P8)
Note which principles conflict (these are key tensions)

system-evaluation - How existing systems score against principles

Reveals what's missing from current tools
Shows why no single system is adequate

gap-analysis - Systematic identification of universal weaknesses

GAP-1 (Temporal Integrity) and GAP-2 (Provenance) are highest priority
Shows where innovation is most needed

glossary-engineering - Technical terms that actually matter

Only terms implicated by our analysis, not generic survey
Organized by which gaps/principles they address

Supplemental Documents

storage-models - Content-addressed, mutable, append-only, hybrid schema-approaches - Ontologies, lexicons, property graphs, emergent atproto-analysis - Deep dive on Bluesky's architecture design-tradeoffs - 10 core tensions in the domain

How to Contribute

Phase 1: Understand the Domain (First 1-2 weeks)

Essential Reading (in order):

Read An Ontology of Memex]] - The foundation
Skim use-cases - Pick 2-3 that resonate with your experience
Read principles - Focus on Essential and Core principles
Skim gap-analysis - Understand what's missing

Optional Depth:

system-evaluation if you use Obsidian/Roam/Notion and want to understand their limitations
glossary-engineering for terms you encounter and don't recognize
atproto-analysis if decentralized protocols interest you

Check Your Understanding:

Can you explain what a "mnemegram" is to someone unfamiliar?
Can you name 3 of the 7 gaps and why they matter?
Can you articulate one core tension (e.g., flexibility vs queryability)?

Phase 2: Find Your Angle (Weeks 2-4)

Identify what draws you:

User perspective: Which use cases match your needs? What's missing?
Technical perspective: Which gaps could you help address?
Domain expertise: What adjacent knowledge do you bring? (PKM, decentralization, databases, semantic web?)
Practical: Do you want to build, research, document, or test?

Exploration Activities:

Try scoring your current tool(s) against the 15 principles
Pick one gap (GAP-1 through GAP-7) and research solutions
Add a use case we missed
Find a system we didn't evaluate and score it

Phase 3: Active Contribution (Ongoing)

Ways to Contribute:

1. Documentation & Analysis

Evaluate additional systems against principles
Write deep-dives on specific technologies (like atproto-analysis)
Expand glossary with terms you encounter
Document case studies from your own practice

2. Requirements & Use Cases

Add use cases from your experience
Refine existing requirements
Map requirements to real-world systems
Identify conflicts between requirements

3. Technical Exploration

Prototype solutions to priority gaps
Research emerging technologies (new protocols, tools)
Performance benchmarking of approaches
Write proof-of-concept code

4. Community Building

Orient new members
Facilitate discussions
Synthesize working group conversations
Connect to adjacent communities

5. Specification Work (Future phase)

Help design concrete architectures
Write technical specifications
Define interfaces and protocols
Create reference implementations

Key Questions We're Exploring

These are open questions where we need thinking and experimentation:

Architectural Questions

Q1: Can event sourcing work for personal knowledge?

It solves temporal integrity and provenance
But query complexity and storage costs are concerns
Need: Prototype and measure

Q2: What's the right level of semantic richness?

RDF gives maximum expressiveness but poor performance
Plain links are fast but lose meaning
Need: Middle-ground approaches

Q3: How do we balance agent sovereignty with collective memory?

Individuals should control their data
But families/teams need shared memory
Need: Multi-agent models that respect both

Q4: Can we have comprehensive automation without losing control?

Friction minimization requires automation
But automation risks agency loss
Need: Permission models for automated capture

Domain-Specific Questions

Q5: What should a "personal data server" for knowledge look like?

atproto has PDS for social data
Solid has pods for general data
Need: PDS designed specifically for knowledge graphs

Q6: How do we make proactive surfacing helpful, not intrusive?

Recommendations require background computation
But surprises can be jarring
Need: Context-aware surfacing that respects agency

Q7: What does portable provenance look like?

Provenance chains must survive tool migration
Need: Standards for representing derivation

Practical Questions

Q8: Can individuals realistically self-host?

Local-first is ideal for sovereignty
But most people won't run servers
Need: Models that work for non-technical users

Q9: How do we handle the "past self embarrassment" problem?

Full history enables reflection
But people want to forget/delete
Need: Graceful forgetting with integrity

Q10: What's the migration path from existing tools?

People have years of notes in Obsidian, Roam, Notion
Need: Import tools that preserve structure

Resources for Going Deeper

Communities

Personal Knowledge Management:

r/Zettelkasten
r/ObsidianMD

Decentralized Tech:

atproto Discord
Solid Community Forum
IPFS Community

Data Engineering:

Martin Kleppmann's "Designing Data-Intensive Applications"
Event Sourcing patterns (EventStore documentation)

Semantic Web:

W3C Semantic Web Activity
RDF 1.1 Primer

Key Papers & Articles

Local-First Software (Ink & Switch) https://www.inkandswitch.com/local-first/

As We May Think (Vannevar Bush, 1945) - Original memex vision https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/

atproto Documentation https://atproto.com/

Solid Specification https://solidproject.org/

Event Sourcing (Martin Fowler) https://martinfowler.com/eaaDev/EventSourcing.html

Vocabulary Quick Reference

Terms from Memex Ontology:

Mnemegram - Unit of captured experience
Agent - Who/what interacts with memex
Assertion - Claim about mnemegram or relation
Referent - Persistent entity referenced across mnemegrams
Provenance - Origin-tracking chain

Terms from Personal Data Ops:

Temporal Integrity - Preserving time as first-class dimension
Agent Sovereignty - Individual control over data
Schema Pluralism - Multiple concurrent schemas
Contextual Access - Permissions adapted to context
Proactive Surfacing - System-initiated recommendations

Key Technical Terms:

Event Sourcing - State as sequence of events
CRDT - Conflict-free replicated data types
Content Addressing - Reference by hash, not location
Capability-Based Security - Access as unforgeable tokens
RDF - Resource Description Framework (triples)
DID - Decentralized Identifier

See glossary-engineering for complete technical glossary.

Working Group Norms

Communication

Asynchronous first - Document in digital garden, discuss in Discord
Show, don't just tell - Prototypes and examples over pure theory
Cite sources - Link to use cases, requirements, principles that inform your thinking
Be specific - "This violates P8 (Protection)" is better than "This seems insecure"

Decision Making

Requirements-driven - Proposals should satisfy identified requirements
Principle-aligned - Designs should score well against our 15 principles
Tension-aware - Acknowledge tradeoffs explicitly
Experiment-friendly - Try things, measure, learn

What We Value

First principles thinking - Why, not just what
Intellectual humility - "I don't know" is valid
Adjacent expertise - Bringing insights from other domains
Practical grounding - Use cases from lived experience
Long-term thinking - Decades, not quarters

What We Avoid

Technology solutionism - Starting with "what if we used X?" without grounding in requirements
Perfect-is-enemy-of-good - Prototypes > perfection
Bikeshedding - Focus on high-impact gaps, not minutiae
Gatekeeping - All backgrounds welcome, meet people where they are

Your First Contribution

Easiest Onramp:

Add a use case - What do YOU need from personal data operations? Write it up following the UC template in use-cases
Score your current tool - Use Obsidian/Roam/Notion/etc? Score it against our 15 principles in your own doc. What did you learn?
Pick a term - Choose one from glossary-engineering that's new to you. Research it deeper. Write a tutorial or example.
Find a gap - Look at gap-analysis. Pick GAP-1 through GAP-7. Research how it could be solved. What technologies exist? What's missing?
Document your practice - How do you currently manage personal knowledge? What works? What's broken? This lived experience is valuable.

Share your contribution in Discord and we'll help integrate it into the digital garden.

FAQ for Newcomers

Q: Do I need to be technical to contribute? A: No. Use case documentation, domain analysis, and conceptual work are equally valuable. We need both builders and thinkers.

Q: What if I disagree with a principle or requirement? A: Great! Document your reasoning, link to which use cases you think are misunderstood, propose alternatives. Productive disagreement strengthens the work.

Q: Can I just lurk and learn? A: Absolutely. Many people benefit from following along. Contribute when ready.

Q: I use [tool X], is it doomed? A: No tool is perfect. Our evaluation helps you understand tradeoffs and inform your choices. Use what works for you.

Q: Is this academic or practical? A: Both. We want rigorous foundations AND working implementations. Theory informs practice; practice tests theory.

Q: What's the timeline? A: Open-ended. This is a long-term domain-building effort, not a product launch. Contribute at your pace.

Q: Who leads this? A: The working group is collaborative. Different people drive different efforts. Leadership is earned through contribution.

Next Steps

Read the essentials (memex ontology, use cases, principles)
Introduce yourself in Discord - What brings you here? What's your background?
Pick one contribution from the "Your First Contribution" section
Ask questions - If something's unclear, others probably wonder too

Welcome to the Personal Data Operations working group. We're glad you're here.

Document Status

Last Updated: 2025-02-05 Maintained By: Working group members Feedback: Suggest improvements in Discord or via pull request

An Ontology of Memex - Foundational ontology
use-cases - 19 scenarios we're addressing
principles - 15 architectural principles
gap-analysis - What's missing from current systems
glossary-engineering - Technical vocabulary