Last post I made a stupidly simple rule: no new tools until the current feature is done. And — shock — it worked. Features shipped. The “after I ship” junk drawer stopped overflowing. I promised you the app reveal and the new name.
You’re not getting either today. Sorry. One more detour — and I swear, this one matters.
Because here’s what I didn’t tell you last time: the rule fixed the feature problem. It did absolutely nothing for the agent problem. And I’d been pretending not to notice for weeks — smiling through the chaos like a man whose house is on fire but whose lawn looks immaculate.
🎭 A Tuesday Morning in February
Let me paint you the scene, because I think you deserve to know what “AI-powered solo development” actually looked like at its worst.
It’s 7:14am. I’m at my desk. Six — six — Claude Code terminals open. I wish I was exaggerating. I am not.
Terminal one is halfway through a refactor I asked for last night before bed. Terminal two is building a new component. Terminal three is writing tests for something terminal one hasn’t finished yet. Terminal four is… actually, I have no idea what terminal four is doing. It’s been running for eleven hours. It seems happy. I’m afraid to look.
Terminal five is Codex, supposedly reviewing yesterday’s output — but I can’t remember which output, because the Notion page where I was “tracking” all this hasn’t been updated since Thursday. And terminal six? Terminal six is me asking Claude to help me figure out what the other five Claudes are doing. I am using AI to manage my AI. This is not the flex I thought it was.
Trae is sitting in the dock, glowing faintly, like a neglected Tamagotchi. I haven’t opened it in a week — not because it’s bad, but because Claude CLI got so good that I stopped leaving the terminal. The results just kept getting better. And when Trae’s renewal came up, I looked at the subscription, looked at what Claude Max would cost instead, and thought: What if I just… redirected that money toward the thing that’s actually doing all the work? Not a breakup. A reallocation. A very sensible, very Scottish reallocation.
There’s a bash script running somewhere. I wrote it at 1am three weeks ago. It does something important. I think. It definitely runs, because I can hear the cron job firing every morning like a loyal dog fetching a newspaper I never asked for and can’t read.
I’m tracking all of this in Notion — in the same way a goldfish tracks its last three laps of the bowl.
The outputs were great — that was the maddening part. Each individual Claude session was producing solid work. Codex was catching things Claude missed. The code was genuinely better than anything I’d written solo. But six terminals, no coordination, no shared memory, and one increasingly frazzled human trying to hold the whole thing together by switching tabs fast enough? That’s not a workflow. That’s air traffic control without the radar.
💸 The API Bill That Changed Everything
And then there was the money.
Oh, the money.
I was on pay-as-you-go API plans for both Claude and Codex. Which sounds sensible and economical right up until you leave an agent running overnight and wake up to discover it’s been having a fascinating conversation with itself about edge cases in your authentication flow for seven hours straight. At premium token rates. While you slept.
The first time it happened I laughed it off. “Cost of learning,” I told myself, like a man who’s just put a tenner in a slot machine and is definitely going to stop after one more go.
The second time I was less amused.
The third time — the one where I blew through both the daily and weekly API allotments on a Claude session that produced approximately four useful lines of code and three hundred lines of “let me reconsider the architectural implications of this approach” — I did the maths.
I was spending more on AI API calls than I was on actual infrastructure. More than the Appwrite server. More than the domain. More than the GitLab subscription I was already questioning. My AI “employees” were running up expenses like consultants at a hotel minibar, and I was the idiot who’d given them the room key.
That was the week I switched to Claude Max and Codex Plus. Flat monthly rate. Unlimited usage. No more waking up to surprise invoices. No more rationing prompts like wartime butter.
Best £40 a month I’ve ever spent. And I don’t say that lightly — I once spent £40 on a keyboard because it sounded like the one I had on the C64. (It didn’t. It sounded like a disappointed accordion. But the sentiment was there.)
🔥 The Day It All Caught Fire
The real breaking point came on a Tuesday — because these moments always come on a Tuesday, it’s like the universe has a standing appointment.
I’d asked Claude to refactor the user profile module. Standard stuff. Meanwhile, Codex was running a review pass on the onboarding flow from the day before. And somewhere in the background, my 1am bash script was doing… whatever it does.
At 9:47am, I opened auth.ts and found something that can only be described as two AIs having an argument inside a file.
Claude in terminal one had refactored it. Meanwhile, Claude in terminal three — blissfully unaware of terminal one’s existence, because why would it be — had been writing tests against the old version. Then Codex had reviewed the old version too, flagged issues, and terminal one had helpfully “fixed” those issues by partially reverting its own refactor while merging in Codex’s suggestions on top. Three agents, one file, zero communication. The result contradicted itself three times, imported a module that no longer existed, and had a function called authenticateUser that — I am not making this up — authenticated nothing. It accepted any input and returned true. Every time. For everyone.
I sat there, staring at a universal authentication bypass I’d accidentally created through sheer administrative incompetence, and I thought: If this had shipped, every user would be logged in as every other user. I would have built the world’s most welcoming security vulnerability.
I closed the laptop. I made a coffee. I opened the laptop. The code was still terrible.
That was the moment. Not a clean epiphany — more like the feeling you get when you step in something wet while wearing socks. You can’t ignore it anymore. Something fundamental has to change.
🪞 Fumbling Toward the Metaphor
I spent a few days trying to frame the problem in ways that might suggest a solution. My first attempt was “pipeline” — maybe I just needed a better pipeline. Like CI/CD, but for agents. Input goes in, output comes out, nothing touches anything it shouldn’t.
That lasted about a day before I realised my agents don’t work like a pipeline. They argue. They iterate. They sometimes need to go back and redo things based on new information. A pipeline assumes linear flow. My reality was more like a pub debate that occasionally produced working software.
Then I tried “orchestra” — I’m the conductor, the agents are the musicians. This metaphor fell apart when I remembered that orchestras rehearse, follow sheet music, and generally don’t wander off to rewrite the bassoon part while the violins are mid-solo.
It was my mate — same one who’d called me out on the tool-chasing — who accidentally nailed it. We were on a call, I was ranting about the auth.ts incident, and he said:
“Mate, it sounds like you’ve got employees. You just haven’t got a company.”
I opened my mouth to argue.
Closed it.
Opened it again.
He was right. And worse, he was right in a way that was so obvious I wanted to crawl under my desk.
Claude isn’t a hammer — she’s a Senior Engineer who needs a job description and a code review process. Codex isn’t a wrench — he’s the slightly cynical Principal Reviewer who finds every flaw but would rather critique than create. The bash scripts are office automation — reliable, dumb, always running, never complaining.

I didn’t have a tool problem. I had a management problem. And I — the self-appointed CEO of this one-man operation — was the worst manager any of them had ever had.
🧰 The Stack Lockdown
Before I could build the company, I had to stop moving the office. So I froze the stack — properly this time, not the “freeze after one more experiment” version I’d been running for a year.
GitLab to GitHub. I’d been loyal to GitLab out of stubbornness, which is the Scottish national sport. But every AI agent integration I tried assumed GitHub first. The GitLab adapters were always community-maintained, slightly behind, and occasionally just wrong. The migration took an afternoon. I lost one webhook and zero sleep. I do not miss GitLab. Sorry, GitLab.
Trae to VSCode. Not because Trae was bad — it genuinely wasn’t. But Claude CLI had quietly become my entire workflow, and Claude CLI lives in VSCode. The AI plugins I rely on daily? VSCode. Codex? VSCode. Six simultaneous terminals of chaos? All VSCode. Trae’s subscription came up for renewal and I did the maths: cancel Trae, redirect the money to Claude Max, get unlimited access to the tool that was actually producing 90% of my output. It was less a breakup and more a portfolio rebalancing. Trae was a solid investment. Claude Max was just a better one.
Claude and Codex to flat-rate plans. Already covered. The API bill intervention. Claude Max, Codex Plus. Promoted both from freelancers to salaried employees. Salaried employees don’t send you invoices at 3am.
The killer move: adversarial reviews. Claude Max writes. Codex Plus reviews everything Claude produces. Not as a suggestion. As policy. Two agents, different training, different blind spots, both staring at the same diff. Claude builds. Codex challenges. I arbitrate.
The number of bugs this catches is, genuinely, embarrassing. Most of them are the kind a human solo dev never spots because they wrote the code five minutes ago and it still feels right. Having two AIs disagree about my code is like having two editors argue about my prose — slightly bruising to the ego, excellent for the output.
📋 AutoMaker (A Breakup Story)
Credit where it’s due: AutoMaker got me halfway here.
AutoMaker is an open-source Kanban-driven agent harness. You describe a feature, drop it into “In Progress,” an agent picks it up in its own git worktree, and hands you a diff. For months, it was how I shipped. And it’s genuinely good at what it does — one feature, one agent, one outcome.
But here’s the thing about tools you outgrow: it’s never their fault, and you feel guilty anyway.
I started asking AutoMaker to do things it was never designed for. “Can you also track what Codex reviewed?” No. “Can you manage my API budget?” No. “Can you remember what we decided about the schema last Thursday?” Absolutely not. It’s a sprint board, not a brain. I was asking my screwdriver to also be a level, a tape measure, and a therapist.
The honest version: I spent two weeks trying to bend AutoMaker into something it wasn’t before admitting I needed a different category of tool entirely. If you’re doing one stream of feature work — genuinely, start with AutoMaker. It’s excellent. But if you wake up one morning and realise you’re managing a portfolio of agents, budgets, and background jobs… you’ve outgrown the sprint board. I had.
🧩 What Was Still Missing
So: GitHub, VSCode, Claude Max, Codex Plus, AutoMaker handling feature delivery, a few bash scripts humming in the background. All pointed in the same direction.
Except I was still the glue. Still switching between six terminals trying to remember which Claude was doing what. Still the only person in the building who could remember what we decided yesterday. CEO, personal assistant, payroll clerk, and IT support — four hats, one tired head, and a growing suspicion that the bottleneck in my one-man company was… the one man.
I started looking for the missing layer. Not to build it myself — I’d done enough yak-shaving for one lifetime — but in the hope that someone, somewhere, had already had the same problem and open-sourced the answer.
Turns out someone had. It’s called Paperclip.
The name is a half-joke nod to the paperclip-maximiser thought experiment — because if an AI ever does take over the world, it won’t be one genius agent going rogue. It’ll be a thousand agents all cheerfully maximising their own little metrics with nobody minding the shop. Paperclip, the tool, is the thing that finally makes them mind the shop.
I installed it. I pointed my existing agents at it. And within about a week, the strangest thing happened. The app started moving faster — not because the tools got better, but because I stopped spending my mornings rebuilding yesterday’s context in my head. Paperclip was doing that for me.
The auth.ts incident? Can’t happen anymore. Atomic file locking. Two agents, same file, same time? One waits. The bug that nearly turned my app into a “log in as literally anyone” service just… can’t exist in the new setup.
I’d tell you more, but that’s what the next post is for. This one’s already long enough, and I promised myself I’d stop writing 3,000-word posts.
checks word count
I have not kept that promise.
👀 Coming Up Next
Next post, I’m doing a proper walkthrough of how I’m using Paperclip day-to-day. Org charts, budgets, heartbeats, atomic execution, and the specific mechanics that finally got my agents to stop stepping on each other’s toes.
And yes — it’s open source. Repo is here if you want a head start.
The app reveal is still coming. I haven’t forgotten. I just had to get the factory floor running before I could show you the product.