Post #2 in the Claude Code Toolkit series. Post #1 was nf-agents, the skill I wrote so I would stop forgetting flags when spawning teams. This post is about the opposite problem: cleaning up after weeks of those teams (and solo sessions) saving notes to disk.
Claude Code’s auto-memory feature is wonderful in week one. Every interesting decision, fact, and workflow lands as a markdown file in your autoMemoryDirectory. By week six the folder is a swamp. MEMORY.md is out of date. Aged session-end states sit next to today’s notes. Two files describe the same decision from slightly different angles. A fresh session cannot tell what is load-bearing and what is forgotten.
nf-dream is the cleanup pass.
The problem the skill solves
Plain markdown memory has a property I love: it is cheap, greppable, audit-friendly. You can cat, grep, mv, and version-control it like any other text files. The flip side is that nothing in the system tells the next session which files matter today.
Memory folders rot in predictable ways:
- Session-end states accumulate. Each session ends with a
project_session_end_state_<date>.mdsummary. Within two weeks there are ten of them, only the latest is load-bearing. - MEMORY.md becomes a stale index. Entries are appended chronologically. Old “BLOCKED BY external API” notes sit at the top long after the block was resolved.
- Topic overlap goes unnoticed. Two files describe the same decision because the second was written without the author remembering the first.
- Stale markers go unread. A file says “mostly obsolete, see new doc” but stays because nobody wants to delete it.
After a few months on my own setup, my personal/ scope had about thirty files mixing blog notes, infra tinkering, and portfolio work. A fresh session loading that folder would do the right thing eventually, but “eventually” meant scanning a wall of stale entries. I wanted a way to tell future-me: read these five first, ignore those twenty, the rest is archive.
After nf-dream runs, a fresh session should know within 30 seconds what is done, next, in progress, future, and stale.
What the skill is
Think of it as the Claude Code analogue of sleep: the working set is reorganized so the next session boots into the right context immediately. The bucketing vocabulary (Now, Next, Done, Future, Reference, Stale) is borrowed from memory-consolidation systems I have built for database-backed knowledge bases, applied here to filesystem memory files. The judge is the running Claude session itself. No external LLM API, no second auth dance.
The ten modes
| Mode | Use when | Writes? |
|---|---|---|
preview (default) | You want the report. Classify everything into six buckets, print it, change nothing. Always safe to start with. | No |
reorganize | MEMORY.md is stale. Rewrite it grouped by bucket, other projects’ entries untouched. | Yes |
handoff | You want a one-page “what next” file for a fresh session to read first. | Yes |
consolidate | Aged session-end states have piled up. Move them to archive/YYYY-MM/. | Yes (moves) |
dedupe | Two files describe the same topic. Merge into one canonical, archive the absorbed source. | Yes |
stale | Surface stale candidates, you pick which to archive. | Yes (moves) |
lint | Health check across the whole folder. No writes. | No |
rollback | Undo the last consolidate / dedupe / reorganize / handoff / full via tarball restore or per-batch un-archive. | Yes (restore) |
full | preview plus reorganize plus consolidate plus handoff in one confirmation. | Yes |
backfill | Tag metadata.project on files that lack it. Idempotent. | Yes (frontmatter) |
The honest day-to-day usage in my setup is three of these. I run preview first to see what the folder looks like, then handoff to drop a one-pager that next-week-me can read in 30 seconds, then every few weeks full to move aged session-end states into archive/. The rest are situational: dedupe when I catch myself writing the same fact twice, backfill once per scope folder, rollback when I picked the wrong mode in a hurry, lint as a quick sanity check before a heavier consolidate.
A first run, end to end
Type /nf-dream with no arguments. The skill resolves the scope (reads autoMemoryDirectory from <cwd>/.claude/settings.local.json), detects the project slug from basename "$(git rev-parse --show-toplevel)", and prints something like:
nf-dream preview <scope> project: example-blog
──────────────────────────────────────
Scope: 12 of 30 files (filtered to project 'example-blog' + generic)
MEMORY.md: 18 KB archive/: 4 files
## Now (load-bearing, read these first)
* project_session_end_state_2026_05_17.md (1d ago, START HERE)
* deploy-flow.md (3d ago, Pending: CI gate)
## Next (queued work)
* og-template-phase2.md (5d ago)
## Done (completed, may be archivable)
* feature-x-rollout.md (32d ago, MERGED)
## Stale (candidates for archive)
* old-build-config.md (78d ago, mostly obsolete)
## Health
* MEMORY.md size: 18 KB, Files > 30 KB: 0
* Broken MEMORY.md links: 0, Missing frontmatter: 2
Run /nf-dream reorganize | handoff | consolidate | full to act on this.
The report does not write anything. To act on it you re-run with the mode you want. Every write mode snapshots first, shows a plan, asks for confirmation, and only then mutates files.
The rest of this post walks through the four design decisions inside the skill that took the longest to settle and explains the incidents that drove each one.
Decision 1: Project-scoped by default, no folder-wide escape hatch
A shared scope folder usually mixes multiple projects. My own personal/ scope mixes the blog you are reading, my portfolio site, a couple of infra threads, and a pile of cross-project user-profile facts. Folks I know keep one folder per company that mixes five or six projects.
The first version of nf-dream had no project filter. reorganize rewrote MEMORY.md for the whole folder. Fine on the first weekend with eight files. It broke the day I ran it on about two hundred files spread across six projects: the LLM judge ran out of context, started skimming, and the resulting MEMORY.md had false-positive “Done” entries that were actually mid-flight on another project. Took 30 minutes to untangle from the tarball snapshot.
The fix was to scope every run to one project. The slug resolves in this order:
--project <slug>flag, if you pass one explicitly.basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)", walking up out of worktrees so a worktree branch slug never becomes the project slug.- AskUserQuestion prompt with the list of slugs currently in the cache, plus an “Enter new slug” option.
After resolving, the skill confirms on first run: "Project detected: example-blog. Filter run to this project plus generic memories?" so you can catch a wrong auto-detect early.
There is deliberately no --all flag. Folder-wide use cases are still possible by running once per project, in sequence. The escape hatch is missing on purpose: I do not trust myself to remember “I am about to mutate every project’s index at once” when I am tired.
The one exception is lint. It runs folder-wide because it is read-only, and it is useful precisely because it finds problems that span projects (orphan files, broken cross-links, stale cache entries).
Decision 2: Snapshot then write, never rm
Every write mode does the same thing first, before any mv or Write or Edit:
STAMP=$(date +%Y%m%d-%H%M%S)
tar -czf <scope>/.snapshots/full-<STAMP>.tar.gz -C <scope> \
--exclude='.snapshots' --exclude='archive' .
This captures every memory file at one consistent timestamp. Rollback from the tarball is always possible, even if a per-file move failed mid-batch.
The incident that drove this: I once “tidied up” a folder by hand with a chain of mv commands. One had a typo, two files ended up at the same destination, the second silently overwrote the first. Noticed three days later. No backup. The fact was gone.
Since then the rule is non-negotiable: snapshot first, write second. Memory folders are text, the snapshot for my 30-file scope is under 200 KB compressed. Retention is 14 days; the path is printed after every successful run so the user can save it elsewhere for longer retention.
The complementary rule: nf-dream never runs rm. Files get mv’d into <scope>/archive/YYYY-MM/, with a breadcrumb line appended to <scope>/archive/README.md:
2026-05-17: project_session_end_state_2026_04_12.md, archive/2026-04/project_session_end_state_2026_04_12.md
The archive grows forever. That is fine: text is cheap, archives are excluded from active classification, and rollback can un-archive any batch by reading the breadcrumb log. The few times I want “actually delete this”, I do it by hand. The skill stays write-only-by-move.
Decision 3: The running Claude session is the judge, not an external API
nf-dream does per-file judgment in three places: stale detection, consolidate (“is this still load-bearing?”), and dedupe (“same topic as this other file?”). The obvious implementation is to spin up a separate LLM call per file with an Anthropic or OpenAI API key.
I picked the opposite: the judge is the current Claude session, the one already running when you type /nf-dream. It reads the file in-context and answers a binary question.
Two reasons. No additional cost: you are already paying for the current session, the judge runs inside its context window. No additional auth: no second API key, no account juggle, the skill works in offline or restricted environments as long as the Claude Code CLI itself works.
The trade-off: judgments are limited by the session’s context budget. For very large folders (hundreds of files) the session starts skimming. The skill addresses that bluntly: it refuses to run on folders with fewer than 5 files, and for huge folders asks you to run in batches per project, which the project filter makes natural.
The alternative I rejected: a background daemon with its own API key, cron-scheduled, mutating the folder without an active session. For filesystem memory tied to a user’s workflow, the cost (extra auth, billing, “what did the daemon do while I was asleep” surprises) outweighs the benefit. Manual on purpose.
Decision 4: metadata.project as the authoritative source of truth
For each file in scope, the skill resolves the project tag in this precedence:
metadata.projectin the file’s YAML frontmatter, authoritative.- A
<scope>/.nf-dream/project-tags.jsoncache, invalidated on file mtime. - A 7-step pre-classify pipeline: filename token match, body absolute-path match, body URL match,
originSessionIdreverse-lookup,metadata.type: userfallback to["generic"], in-session LLM judge, user override via AskUserQuestion if confidence is below 0.7.
Once metadata.project is set, every subsequent run is instant: open file, parse YAML, check key, done. The pipeline only runs for files that lack the tag.
This let me decouple two pieces:
- A memory-frontmatter rule in the host setup auto-fills
metadata.project,metadata.cwd,metadata.tags,metadata.relatedwhen Claude saves a new memory. New files arrive correctly tagged. nf-dream backfillruns once per legacy folder to tag old files. After that the filter is essentially free.
The frontmatter nf-dream expects:
---
name: example-blog-deploy-flow
description: Wrangler deploy mechanism for example-blog (Cloudflare Pages)
metadata:
type: reference
project: example-blog
cwd: ~/projects/example-blog
tags: [wrangler, cloudflare, deploy]
related: [example-cloudflare-api-key-conflict]
---
The skill does not invent these fields. It reads whatever the host setup writes. The two pieces are independent: nf-dream works even if you skip the rule (the pre-classify pipeline still resolves project tags), but the combination of “rule writes frontmatter at save” plus “skill reads frontmatter at consolidate” makes the filter near-instant on warm folders.
The reason this took an incident: an earlier version computed (file, project) fresh on every run. The pipeline is good but slow on a 200-file folder, and the LLM judge burns context. Caching helped, but mtime-based invalidation is fiddly. The clean fix was pushing source of truth into the file. Read-once at save, write-once at backfill, every subsequent run is a YAML parse.
The killer feature: HANDOFF-<slug>.md
If you run only one mode regularly, run handoff. It synthesizes a single-page file at <scope>/HANDOFF-<slug>.md that looks like:
# Handoff: example-blog on 2026-05-18
## TL;DR
Blog rebuild is mid-migration. Two posts queued, one stuck on OG template. Wrangler deploy works.
## In progress / Pending
- OG template phase 2 (per-series overrides), source: og-template-phase2.md
## Next session should
- Finish the nf-dream blog draft, context: nf-dream.md
- Write a 2-sentence intro for the toolkit series
## Recently shipped (last 7 days)
- nf-agents post published, source: nf-agents.md
## Pointers
- START HERE: project_session_end_state_2026_05_17.md
- Project rules: .claude/rules/workflow.md
This is what a fresh session opens first. Capped at about 120 lines / 6 KB on purpose. It tells next-session what is queued, what is stuck, and what just shipped, without making it scroll MEMORY.md from the top.
Output is named per-slug (HANDOFF-example-blog.md, etc.) so multiple project handoffs coexist. MEMORY.md is still the full index; HANDOFF-<slug>.md is the read-this-first companion.
What you should keep even if you never use my skill
Four patterns generalize cleanly to any “tool that mutates user data”:
- Project-scoped by default, no folder-wide escape hatch. Shared resources mix projects. The “operate on everything at once” mode is the most dangerous mode. Make it impossible to invoke accidentally. Force per-project, in sequence, with confirmation each time.
- Snapshot then write, never
rm. A tarball before every mutation is cheap; the benefit is that “I screwed up” is reversible every time. Move files toarchive/instead of deleting. Append a breadcrumb log so un-archive is possible. - In-session LLM judgment over external API calls when both work. No extra auth, no extra billing, no “what did the daemon do while I was asleep” surprises. The trade-off (limited context budget) is usually addressable by running in batches.
- Frontmatter metadata as source of truth, with a backfill mode for legacy files. Push the tag in once, read it forever. Pair the skill that reads metadata with a complementary rule that writes metadata at save time.
The skill itself is one way to encode those. The patterns are the value.
Get the skill
White-labeled, ready to drop into ~/.claude/skills/. Published in the claude-skills-toolkit repo alongside nf-agents.
git clone https://github.com/llawliet11/claude-skills-toolkit.git
cp -r claude-skills-toolkit/nf-dream ~/.claude/skills/
Folder reference: claude-skills-toolkit/nf-dream.
Verify in a new Claude Code session: type /nf-dream and the skill should appear in the slash-command list. Start with /nf-dream preview (default, read-only). If the report looks sensible, escalate to /nf-dream handoff, then /nf-dream full once you trust the buckets.
The skill’s own README.md covers required environment (the autoMemoryDirectory setting in .claude/settings.local.json), the optional memory-frontmatter rule, and snapshot retention. If you adopt it and find a failure mode I have not yet hit, open an issue on the repo.