Blog
Thoughts On AI, problem-solving, and building things.
-
-
Design it together, then prove it
Tests passing and a clean release don't mean a change works. I handed Claude a permissions change and had it prove itself against real data, from the outside in, before I trusted it.
-
Making the agent prove it worked
I ran a whole change through one agent, end to end, and made every step prove itself against the live system before moving on.
-
How a Language Model Gets Stuck in a Loop
I hit a repetition loop bug in an app: the model entered a loop and kept repeating itself. I went deep with Claude to understand what happened and how to defend against it.
-
Inject shell output into a skill before the model reads it
A !`command` in a SKILL.md injects its output into the prompt before the model ever sees the file. Front-load the context your skill needs and skip the tool-call round-trip.
-
The Prompting Playbook: Prompt vs Harness
A full explanation of Anthropic's prompting playbook: evals, failure modes, tools, harness design, and when not to solve agent problems with more prompt text.
-
When Claude Spawns the Next Claude
A skill that turns the transition between Claude sessions into a one-command relay. The previous session writes the handoff, opens a new pane, launches a fresh Claude in it, and that Claude starts working — no prompt from me, no new terminal, no re-explaining. I review the PRs.
-
Bun.Image
Bun 1.3.14 shipped a built-in image pipeline — zero npm deps, no native build, JPEG/PNG/WebP/AVIF and ThumbHash placeholders in one chainable API. Here's what it does and where it shines.
-
Learning about agents by building one with agents
My agent kept skipping steps and executing actions wrong. Prompt changes didn't help — the fix was adding enforcement through code. Four patterns about wiring tool-using agents — prompt vs toolConfig, routers, phase machines, observability.
-
Learning from Boris Cherny's live session
I asked Claude and Codex to analyze the Bun + Claude Code live coding session, then combined their answers. Here is the synthesis.
-
skillify: The Skill That Builds Skills
I went through the leaked Claude Code source with Claude. We found a skill called skillify and reproduced it. It turns the session you just finished into a reusable skill — interview-driven, with required success criteria on every step.
-
Generating two-host podcasts from app output with Gemini
A podcast generation engine on top of an app, alongside the live voice agent. One ~30-line prompt is the only human authorship; two Gemini models cooperate through a single JSON field. Here's how it works, what I learned, and the template repo and playground I pulled out of it.
-
Deep Modules — and Why They Matter More in the AI Era
I watched Matt Pocock's talk on software fundamentals in the AI era and one idea kept circling back — deep modules. So I sat down with Claude and worked it from the ground up. Here's the walkthrough that came out.
-
Shipping a voice agent on Gemini Live, and fixing the barge-in
Voice agents feel broken instantly if they keep talking after the user interrupts. The Gemini Live API doesn't give barge-in for free. Here's the desync between two clocks that caused it, what we tried, and the ~30-line fix.
-
Cairn: Compile Knowledge, Don't Retrieve It
RAG rediscovers your knowledge on every query. Cairn compiles it once, at ingest — an agent with a router tool, built on Karpathy's LLM-wiki pattern, plus the verification step that makes it hold up.
-
The Cache Is the Conversation
Why AI agents feel natural — and what people actually mean when they say caching in this new world.
-
Compounding Agentic Workflows: When the Skill Runs the Second Time
Claude drove my app today like a user — clicked through a feature, took eleven screenshots, wrote me the Hebrew operator manual, PDF ready to email by lunch. The interesting part isn't the output. It's that this was the second run of a workflow, and the second run was structurally easier than the first.
-
Strict or Loose: Tuning a Citation-First Answer Prompt
My Q&A system refused an answerable question during testing. The router had done its job — the bug was one rule in the answer prompt. A three-line diff shifted the slider.
-
An LLM-Wiki for a 640-Page Book
I used Karpathy's LLM-wiki pattern on my personal notes, then adapted it for Q&A over a 640-page book that stays put. Same pattern, different layer count.
-
Why AI Agents Break on Long Tasks - And the Fix That Changes Everything
Understanding unbounded context growth and the state externalization architecture that solves it
-
Deep Dive: Building Continuous Learning for Claude Code
The methodology and /learn command behind accumulating structured knowledge
-
How Do I Talk to My Agent? Event Delivery
Understanding webhooks, websockets, and building a Telegram bot on AWS with AgentCore
-
Running Claude Code on a Remote Linux Server (VPS)
Setting up a 24/7 development environment with Docker, InfluxDB, and prediction models
-
-
-
Web Dev Fundamentals - How Programs Talk to Each Other
CLI, servers, protocols and module systems - the fundamentals you need when building with AI
-
A Skill That Creates Brand Guidelines in One Click
How to turn repetitive processes into skills - a practical example with brand documents
-
Prompt Caching - How to Save 90% on API Costs
The difference between a $100/day agent and a $10/day agent is often just prompt caching. Here's how it works.
-
Context Engineering - The Art of Building Agents That Actually Work
Why do most agents fail? Not the model - the context. A practical guide to context engineering.
-
What and Why Develop with Bun?
10x faster, fewer dependencies, built-in TypeScript - worth trying
-
I Built a Skill That Teaches Claude How to Teach
Personal tutor for any subject - technical, financial, philosophy, AI
-
Another Spontaneous Lesson with Claude Code - What is Bun?
Why Anthropic acquired Bun and why it matters to me
-
Second Round with AgentCore - Now with Strands and Scheduled Tasks
I built an agent that runs in the cloud, talks on Telegram and sends scheduled reports to email
-
What I Learned About ESM vs CommonJS - And Built a Real-Time Task Board
A conversation with Claude about JavaScript modules led to an interesting project
-
I Built a Prompt Optimization Tool with GEPA
LLM as both judge and optimizer - learns from data and rewrites the prompt
-
I Built a Research Agent That Learns and Evolves Over Time
How to build persistent memory for an agent - without fine tuning
-
I Built a Unified CLI for Google Services - Gmail, Calendar and Drive
Now Claude Code can summarize emails, manage files and update calendar - all from the terminal
-
Building an Agent-Native System for Architectural Renderings
How I helped a fish farming company create AI renderings - without overloading the model
-
How I Deployed My First AI Agent to AWS AgentCore
From local agent to production in under an hour - including Google integration
-
How Claude Built My Mom a New Website
A small project demonstrating Claude's endless capabilities - from design to SEO
-
A Vision Model Running in the Browser - What is WebGPU and Why It Matters
Mistral released a 3B model that runs entirely in the browser - I tested what you can do with it
-
From Article to Production: Building a Continuous Learning System That Actually Works
How reading about 'poor man's continuous learning' led to building a research agent, applying it to real market research, and developing a skill architecture
-