Eve: Vercel's framework for agents that actually ship
Vercel launched an open-source agent framework at Ship London. The filesystem is the config, durable execution is built in, and deploy is just vercel deploy.
Read post →Vercel launched an open-source agent framework at Ship London. The filesystem is the config, durable execution is built in, and deploy is just vercel deploy.
Read post →A Skill is reusable know-how Claude reaches for on its own. A Workflow is an explicit pipeline you wire up and control. Here's the difference, what you can build with each, and when to reach for which.
Read post →Loooom is v1.0 — tagged and released on GitHub. The shape hasn't changed since the pivot post: fifteen curated non-technical skills, each scored against a written rubric. What v1.0 adds is the layer of testing that was missing.
The two rubric gates judge the skill text. The new third gate runs each skill the way an agent actually would — SKILL.md as the system prompt — and unit-tests its Agent Behavior contract with promptfoo: hook has to make you say what the song is about in one sentence before writing anything, stack has to kill the 23% credit card before entertaining the crypto bet, focus has to send your phone to another room. Thirty tests, two per skill, deterministic assertions plus an LLM rubric, all on Groq's free tier.
The whole harness still costs $0 — judge and tests both run on Groq's free tier. The price shows up in a different currency: run the suite three times back-to-back and you blow through the tokens-per-minute cap, and everything crawls behind HTTP 429s. Promptfoo's cache makes that mostly painless (passing tests don't re-run), but a free eval stack rations your iteration speed instead of your wallet. For a project this size, that's the right trade.
The first run came back 27/30, and the failures were the educational part. Two were the tests' fault, not the skills': story and frame were correctly following their own "make them name the one thing first" contract while my rubrics demanded the whole lecture in turn one. The third was a token cap truncating train before it reached progressive overload. Behavioral tests don't just check the skills — they force you to decide what the skill is actually supposed to do on the first turn.
The audit also closed an embarrassing loop: voice had been shipping without a worked example — the one skill not practicing what it preached, and the spec gate had been flagging it since day one. Fixed in v1.0.
I ran this whole launch with Claude Code on Fable, Anthropic's new model — the skill audit, the test suite, the release, and this note. First project I've shipped with it.
Three real ingredients — pecorino, pepper, pasta water — plus a knob of butter for insurance, tossed into a glossy sauce that never breaks.
Read post →How I worked with Claude through five rounds of image generation to design a logo for my Japanese learning app — and ended up inventing a kanji that hides a smile.
Read post →A plain-English walkthrough for setting up your own always-on AI assistant on a Mac mini — OpenClaw, Google Gemini, and Tailscale — written for a first-timer.
Read post →I run a Claude Code agent on a Mac mini in Chicago that I reach over Telegram. The hard part isn't the agent, it's keeping it up without me — across crashes, model swaps, and the occasional reboot. The fix is layered supervision, where each layer owns one kind of failure:
run.sh loops the agent and watches its exit code. An in-session model
switch exits with code 42; the loop sees that and relaunches on the new model.
Any other code stops the loop and hands control up.LaunchAgent with RunAtLoad starts the tmux
session at login (so it survives a reboot), and a StartInterval watchdog
re-checks every couple of minutes and rebuilds the session if it's gone.The thing I keep relearning: "restart it when it dies" is not one job. A reboot, a crash, and an intentional model swap are different failures, and each wants a different layer to catch it. Pile them all into one script and it's brittle; separate them and the whole thing just stays up.
I love OpenClaw. I hate that it doesn't run on my Claude Pro subscription. Turns out Claude Code, with the Telegram channels plugin and one CLAUDE.md, is the same harness — minus the daemon, the API bill, and the second LLM provider. Here's the actual recipe, ported from a hotel in Tokyo to a Mac mini in Chicago in forty minutes.
Read post →A curated collection of high-quality skills for people who don't code — and an experiment in what actually makes a skill good.
Read post →A month that turned the "agentic turn" from talking point to shipping product. Google I/O, Opus 4.8, a $65B raise, and the infrastructure race to run your agents 24/7.
Read post →A five-ingredient Japanese-style spaghetti — butter, tamari, and parmesan tossed with hot pasta and finished with green onion. The wafu pasta I kept eyeing in Tokyo, made at home in ten minutes.
Read post →Microsoft's SkillOpt is the first paper to treat agent skill files as trainable parameters — propose an edit, evaluate on held-out examples, accept only on strict improvement. Here's what it found and what it means for teams building with agents.
Read post →OpenHuman is a desktop-first agentic assistant with persistent memory, 118+ OAuth integrations, and a token compression layer. Here's what it does and how it fits alongside an existing Claude Code harness.
Read post →Karpathy's four rules for agentic coding are worth reading — having them written down in a shared format is a useful starting point for anyone building with Claude Code.
Read post →How I moved magerbot's brain from @-imported markdown files into gbrain's Postgres-native semantic memory layer — what broke, what the gotcha was, and why the context model is fundamentally better.
Read post →Hanshin Tigers vs. Chunichi Dragons at Koshien Stadium — the right-field cheering section, uriko beer vendors, 7th-inning balloons, and a walk-off home run to win it.
Read post →We missed the original ticket sale, got rescued by a tour, and spent an afternoon learning how much more fun sumo is when someone helps you understand what you're watching.
Read post →I built a 200-line harness called conseiller to test Anthropic's new advisor tool — a fast executor model that consults a stronger model mid-generation. Two days later Anthropic shipped Claude Managed Agents, Multi-agent Orchestration, Dreams, Routines, and Remote Agents. Here's both halves: what I built and what they shipped, and how the pieces fit together into something a lot like OpenClaw.
Read post →I built a Go Bubble Tea starter for local model servers, used Gemma 4 through llama.cpp, and split the TUI into llocal.
Read post →I'd been seeing chatter about Hermes Agent from Nous Research, so I installed it locally and put it to work on this blog. Notes on the pitch, the SOUL.md system, and what it actually felt like to use.
Read post →A practical explainer for both developers and everyday Claude users: what prompt caching is, what gets reused, what breaks it, and how to make long sessions cheaper and faster.
Read post →A simple set of habits I use to keep long AI coding sessions from getting bloated: better one-shot prompts, matching model and thinking level to the job, understanding cache behavior, and using cheaper orchestrators when it makes sense.
Read post →A fennel-forward Italian spice blend that turns any ground meat into proper sausage
Read post →I reverse engineered several of my own sites into DESIGN.md files to see how much of a design system can actually be described, and why writing down design intent might be more reusable than it looks.
Read post →A practical tour of Claude Code flags that are easy to miss but genuinely useful once you move past the default interactive loop.
Read post →A bright, high-impact rice finished with garlic, lots of cilantro, and fresh lime juice added after cooking.
Read post →Anthropic shutting down OAuth-based Claude Code access forced my hand. Here's how I moved OpenClaw to OpenAI Codex, why Codex makes more sense inside a real agent harness than it did on its own, and why brainpack changes the switching cost.
Read post →The Y Combinator CEO open-sourced his entire Claude Code workflow. Here are the 10 skills worth knowing — including why office-hours should be the first thing you run on any new idea.
Read post →