Synchronization Strategies for Personal Data
This document surveys synchronization architectures for personal data operations. Sync strategy directly impacts P1 (Agent Sovereignty), P15 (Graceful Degradation), and local-first viability.
The Synchronization Challenge
Core Problem: Multi-device access to personal knowledge requires sync. How do you keep local copies consistent without sacrificing sovereignty or offline capability?
Requirements Implicated:
- R22: Decadal maintainability (not dependent on company)
- R27: Offline and under-duress protection
- R40: Privacy-preserving local processing
Principles:
- P1 (Agent Sovereignty): Must work without central authority
- P15 (Graceful Degradation): Must work offline
- P9 (Performance): Sync must be efficient at scale
Tensions:
- Offline capability vs real-time collaboration
- Simplicity vs conflict resolution sophistication
- Privacy vs convenience (cloud sync services)
Centralized Server Sync
Description: All devices sync through central server. Server holds authoritative state.
Architecture:
Device A ←→ Central Server ←→ Device B
(source of truth)
How It Works:
- Devices push changes to server
- Server resolves conflicts (last-write-wins or custom logic)
- Other devices pull changes from server
Principle Alignment:
- Violates P1 (Agent Sovereignty) - requires trust in server
- Violates P15 (Graceful Degradation) - breaks offline
- Supports P9 (Performance) - simple, well-understood
Requirements Violated:
- R22 (Decadal maintainability) - depends on company
- R27 (Offline operation) - requires connectivity
Strengths:
- Simple mental model
- One source of truth (no conflicts)
- Easy to implement
- Battle-tested (Dropbox, Google Drive)
Weaknesses:
- Single point of failure
- Vendor lock-in
- Privacy concerns (data on server)
- Doesn't work offline
- Requires ongoing service
Examples:
- Notion, Roam Research (fail on P1)
- Google Drive, Dropbox (convenient but not sovereign)
Use for Personal Data Ops: Not recommended for primary sync. Acceptable only as:
- Backup destination (in addition to local)
- Optional convenience (not required)
- Encrypted blob storage (server can't read)
Peer-to-Peer Sync
Description: Devices sync directly with each other, no central server.
Architecture:
Device A ←→ Device B
↕ ↕
Device C ←→ Device D
How It Works:
- Devices discover each other (mDNS, DHT, manual)
- Exchange changes directly
- Merge using CRDT or OT algorithm
Principle Alignment:
- Strongly supports P1 (Agent Sovereignty) - no central authority
- Strongly supports P15 (Graceful Degradation) - works offline, syncs when possible
- Moderate P9 (Performance) - discovery overhead, variable network
Requirements Addressed:
- R22 (Decadal maintainability) - no company dependency
- R27 (Offline operation) - fully functional offline
- R40 (Privacy-preserving) - data never leaves devices
Strengths:
- True agent sovereignty
- No vendor dependency
- Works offline
- Privacy (data on your devices only)
- No recurring costs
Weaknesses:
- Discovery complexity
- NAT traversal challenges
- Sync only when devices can reach each other
- No "sync while I'm away from all devices"
- Complex conflict resolution
Examples:
- Syncthing (file sync)
- Secure Scuttlebutt (social, append-only)
- BitTorrent Sync (files)
Use for Personal Data Ops: Strong candidate for P1/P15 compliance. Challenges:
- Mobile devices (battery, intermittent connectivity)
- Discovery (how do devices find each other?)
- Initial sync for new device
Enhancement: Hybrid P2P + optional relay server
CRDT-Based Sync (Conflict-Free Replicated Data Types)
Description: Data structures that can be modified concurrently on multiple devices and merged without conflicts.
How It Works:
- Each edit generates CRDT operation
- Operations commute (order doesn't matter)
- Devices exchange operations
- Merge is automatic and deterministic
CRDT Types:
- State-based (CvRDT): Send entire state, merge function
- Operation-based (CmRDT): Send operations, apply in any order
- Delta-based: Send state changes (efficient)
Principle Alignment:
- Strongly supports P1 (Agent Sovereignty) - no central authority needed
- Strongly supports P15 (Graceful Degradation) - offline-first by design
- Good P9 (Performance) - efficient for most operations
Requirements Addressed:
- R27 (Offline operation) - designed for offline-first
- R22 (Decadal maintainability) - algorithm-based, not service
Strengths:
- Automatic conflict resolution
- Mathematical correctness (convergence guaranteed)
- Offline-first by design
- No coordination needed
- Well-understood algorithms
Weaknesses:
- Cannot represent all operations (constraints hard)
- Deletion is complex (tombstones)
- Some CRDTs have large overhead
- Merge can produce unexpected results
- Not intuitive for users
CRDT Flavors:
Last-Write-Wins (LWW):
- Simplest CRDT
- Each field has timestamp
- Latest write wins
- Problem: Concurrent edits lose data
Observed-Remove Set (OR-Set):
- For sets (tags, links)
- Add wins over remove
- Preserves concurrent adds
Conflict-Free Replicated JSON (Automerge, Yjs):
- Full document CRDTs
- Handle complex data structures
- Used by collaborative editors
Examples:
- Automerge (CRDT library for JSON)
- Yjs (CRDT for text and rich data)
- Roshi (Twitter's CRDT store)
Use for Personal Data Ops: Excellent for:
- Text documents (Yjs handles collaborative editing)
- Sets (tags, links)
- Counters (reference counts)
Challenging for:
- Constraints ("this field must be unique")
- Complex validation
- Operations needing coordination
Recommendation: Use CRDTs for data layer, add validation layer above if needed.
Operational Transform (OT)
Description: Algorithm for merging concurrent edits by transforming operations relative to each other.
How It Works:
- Device A makes edit op1
- Device B makes concurrent edit op2
- Transform op1 relative to op2 (and vice versa)
- Apply transformed operations
Principle Alignment:
- Supports P1 (Agent Sovereignty) - can work decentralized
- Moderate P15 (Graceful Degradation) - typically needs coordination
- Good P9 (Performance) - efficient for text
Strengths:
- Natural for text editing
- Used by Google Docs (proven at scale)
- Intention-preserving (maintains user intent)
Weaknesses:
- Requires central server (typical implementation)
- Complex algorithm (correctness hard to prove)
- Doesn't naturally generalize beyond text
Examples:
- Google Docs
- ShareDB (OT framework)
- Etherpad
Use for Personal Data Ops: Less suitable than CRDTs because:
- Most implementations require server
- CRDTs have better theoretical foundation
- CRDTs handle more data types
Consider OT only if:
- Real-time collaborative editing is priority
- Text is primary data type
- Willing to accept server dependency
Git-Style Sync (Merkle DAG)
Description: Commit-based sync with branching and merging, like version control.
How It Works:
- Changes bundled into signed commits
- Commits form directed acyclic graph (DAG)
- Devices exchange commits
- Merge creates new commit with multiple parents
- Conflicts handled explicitly (user resolves)
Principle Alignment:
- Strongly supports P2 (Temporal Integrity) - full history preserved
- Strongly supports P12 (Provenance) - commits are provenance
- Supports P1 (Agent Sovereignty) - decentralized by design
- Supports P15 (Graceful Degradation) - offline-first
Requirements Addressed:
- R1, R2, R4 (Temporal, provenance, time-travel) - Git excels here
- R23 (Cryptographic verification) - commits are signed
- R22 (Decadal maintainability) - Git outlives companies
Strengths:
- Full history (addresses GAP-1)
- Branching and merging
- Cryptographic verification
- Proven at massive scale
- Offline-first
- No vendor dependency
Weaknesses:
- Conflicts require manual resolution
- Complex for non-technical users
- Text-oriented (binary diffs poor)
- Merge conflicts in concurrent edits
Examples:
- Git (version control)
- atproto (social, uses MST + commits)
- Fossil (distributed VCS with additional features)
Use for Personal Data Ops: Excellent for:
- Knowledge as text files (Markdown, Org-mode)
- When history is important
- Technical users comfortable with Git
Challenging for:
- Rich data structures (JSON, databases)
- Non-technical users
- Real-time collaboration (conflicts)
Enhancement: Git + CRDT (use CRDT for conflict resolution)
Event Sourcing Sync
Description: Sync events (not state). Replay events to reconstruct state.
How It Works:
- All changes are events in append-only log
- Each device has event log
- Devices exchange events
- Replay events to derive current state
- Events are immutable, totally ordered per device
Principle Alignment:
- Strongly supports P2 (Temporal Integrity) - events are history
- Strongly supports P12 (Provenance) - event chains
- Supports P1 (Agent Sovereignty) - decentralized
- Supports P15 (Graceful Degradation) - offline replay
Requirements Addressed:
- GAP-1 (Temporal Integrity) - event sourcing is primary solution
- R1, R2, R4 (Temporal, provenance, time-travel)
Strengths:
- Perfect audit trail
- Time-travel built-in
- Natural for multi-device (events are facts)
- Can rebuild state from events
Weaknesses:
- Storage grows forever (need compaction)
- Query complexity (need materialized views)
- Eventual consistency (not immediate)
- Events cannot be deleted (tombstones only)
Examples:
- Secure Scuttlebutt (social, gossip protocol)
- Apache Kafka (enterprise event streaming)
- EventStore (event sourcing database)
Use for Personal Data Ops: Excellent when:
- Temporal integrity is priority (GAP-1)
- Audit trail needed
- Willing to manage storage
- Can handle eventual consistency
Challenge: Compaction (how do you prune old events without losing history?)
Hybrid Approaches
Real systems often combine strategies:
Git + CRDT:
- Git for commit history
- CRDT for automatic conflict resolution
- Example: Could enhance Obsidian
P2P + Optional Relay:
- P2P when devices can reach each other
- Relay server when direct connection fails
- Example: Syncthing supports relays
Local + Cloud Backup:
- Primary: Local sync (CRDT or Git)
- Backup: Encrypted blobs to cloud
- Example: Keybase file system
Event Sourcing + Snapshots:
- Events for precision
- Periodic snapshots for performance
- Compact old events
- Example: Some CQRS systems
Comparison Matrix
| Strategy | P1 Sovereignty | P15 Offline | P2 Temporal | Complexity | Conflict Handling |
|---|---|---|---|---|---|
| Centralized | Low | Low | Low | Low | Server decides |
| P2P | High | High | Medium | High | Manual/CRDT |
| CRDT | High | High | Medium | Medium | Automatic |
| OT | Medium | Medium | Low | High | Transformation |
| Git | High | High | High | High | Manual merge |
| Event Sourcing | High | High | High | High | Eventual consistency |
Recommendations by Context
For Solo User, Multiple Devices:
- Git (if technical, text-focused)
- CRDT (if non-technical, rich data)
- Avoid: Centralized (unnecessary dependency)
For Family/Small Group:
- CRDT (automatic conflict resolution)
- P2P + Relay (privacy + convenience)
- Avoid: Manual merge (non-technical users)
For Research Collaboration:
- Git (academic norm, branching useful)
- Event Sourcing (audit trail matters)
- Consider: Hybrid Git+CRDT
For Maximum Sovereignty:
- P2P + CRDT (no servers, automatic merge)
- Git (if willing to manage conflicts)
- Must avoid: Centralized
For Maximum History/Provenance:
- Event Sourcing (perfect audit trail)
- Git (commit history)
- CRDT (less history detail)
Implementation Considerations
Network Assumptions:
- Reliable: Can use state-based CRDTs
- Unreliable: Need operation-based or event-based
- Offline-first: CRDT, Git, Event Sourcing
Conflict Philosophy:
- Avoid conflicts: Centralized
- Automatic resolution: CRDT
- Manual resolution: Git
- Eventual consistency: Event Sourcing
Storage Costs:
- Low (state only): Centralized, CRDT (state-based)
- Medium (recent history): CRDT (op-based)
- High (full history): Git, Event Sourcing
Performance:
- Initial sync: Large for full history (Git, Event Sourcing)
- Ongoing: Efficient for ops-based (CRDT, Event)
- Conflict resolution: Expensive for OT, cheap for CRDT
Open Questions
- Can CRDT handle rich knowledge graph operations reliably?
- How do you compact event logs without losing essential provenance?
- What's the right sync granularity (document, assertion, field)?
- Can Git-style sync work for non-technical users with better UX?
- How do you sync across decades (storage implications)?
- What's the migration path from centralized to sovereign sync?
Cross-References
- principles - P1 (Sovereignty), P15 (Graceful Degradation), P2 (Temporal)
- gap-analysis - GAP-1 (Event sourcing addresses temporal integrity)
- glossary-engineering - CRDT, OT, Event Sourcing definitions
- storage-models - Storage affects sync strategy
- atproto-analysis - Git-style sync in practice