Skip to content

Nostr Scraping Project

Core Features Implemented

  • Import from nosdump file
  • NIP-50 Search Functionality
  • Scraping + Job + Worker Functionality
  • Real-time Subscriptions

Nuanced Featuers

  • Import from nosdump file
  • NIP-50 Search Functionality
  • Scraping + Job + Worker Functionality
    • Recursive Filter Scraping with logging so scraping can be interrupted and restarted
    • Simple Filter Job Management System
  • Realtime Subscriptions
    • Can be subscribed to a filter as the events are published to the Nostr relay
  • TODO
    • Export to Nosdump file file
    • Nostr Kind White Listing
    • Nostr Kind Black Listing
    • NIP05 Scraping and History
    • Nostr Relay Metadata Scraping and History
    • Bot Support
    • Authentication Support
    • Ephemeral Events
    • CRON styled Filter Scraping
    • nkbip-02 AI Embedding Vector Support
  • SQL Table for Indexed tags
  • SQL Table for not so indexed tags

TODO

  • Write a blog post about what I want to get out of Scraping Nostr
  • Error logging for nosdump-ingest
  • Add a Relay to nosdump-ingest ingesting data
  • Write a blog post about the problem of Activities verses Workflows, and relate it to what were were trying to do with CGFS
  • We ought to start using fractal terminology to describe our scraping n stuff, like with CGFS we are supposed to reference the root of a discussion, in Nostr we are supposed to also reference the root event of a discussion, I believe in the future raw Nostr events posted without context are going to be rare and not adopted by default in Nostr clients

Job States

  • My Job States
    • TODO
    • RUNNING
    • COMPLETED
    • ERROR
    • FAILED
  • Temporal Activity States
    • Running
    • Cancelled
    • Completed
    • Failed
    • Terminated
    • Timed Out

As per Nostr Scraping Plan 0.0.1 we got a couple different things we need to scrape separately,

  • Events from a User from a Specific Relay
    • scrape.pubkey.from.relay.0.0.1
  • Replies to a thread
    • scrape.replies.to.thread.from.relay.0.0.1
  • Reactions to a Thread
    • scrape.reactions.to.thread.from.relay.0.0.1
  • Follows of a NPUB
    • scrape.follows.of.pubkey.from.relay.0.0.1
  • Badges send to a User
    • scrape.badges.to.publey.from.relay.0.0.1
  • NIP05 Stuff

    • scrape.nip05.0.0.1
  • We start with a single NPUB of popular Nostr User

  • We scrape the Users NIP05 Identity for other Relays they use
  • We scrape all that users events from every relay they say they publish to
  • We then grab all the
    • events mention a pubkey using p tag
    • reactions(NIP-07) to the NPUB
    • replies(NIP-01) to the NPUB
    • Followers (NIP-02) of the NPUB
    • Badges (NIP-58) to the NPUB
  • We then look at their follow list
  • We add every NPUB to a backlog of Nostr events to scrape

Logs