There have been two leaps, and you are probably stuck on the first one.
Leap one: you stop writing source code and start talking to an agent that writes it for you.
Leap two, which is happening now: you stop talking to the agent directly. You set up routines and loops that prompt the agent for you. One engineer described running a tree of thousands of agents: one agent prompting agents prompting agents.
If you are still hand-prompting one agent at a time, you are getting maybe 10% of the upside.
Never fix the same mistake twice.
This is the single most important habit they mention. When Claude gets something wrong, do not just tell it to do the task differently in the moment.
Instead, tell it to write the correction into your CLAUDE.md, or turn it into a reusable skill.
Do this consistently and the agent compounds. It stops repeating errors and can effectively run for much longer without you babysitting it.
Verification is the whole game, and it is not what you think.
Everyone hears verification and pictures unit tests, lint, and type checks. Those were already automated. They are the easy part.
Real verification for agents is one question: can the agent actually run the thing it built?
Not "do the tests pass." Can it launch the app, click through the new feature, hit the edge cases, see what breaks, and fix it? That takes real setup work, and it is where most teams stop short.
Build skills that let the agent test itself.
One concrete example from the conversation: an engineer working on the desktop app uses a desktop development skill. It teaches Claude to spin up the local app, use computer control to click around the new UI, test edge cases, and fix-and-recheck on its own.
When it hits a known environment issue, it can check team context, decide whether staging is down, fix the real bug, and update the skill so it knows next time.
The agent is not just writing code. It is QA-ing its own work in a live app.
Drop plan mode. Use auto mode.
Plan mode mattered for older models. With newer ones, the planning step can become dead weight. Auto mode lets you fire off an agent and immediately move to the next one instead of sitting there approving every action.
The trick that makes it safe: permission requests can be routed to a separate model that checks them for security and denies anything sketchy.
The counterintuitive payoff is that this can be safer than manual approval. When you approve almost every prompt by hand, your eyes glaze over and you rubber-stamp the dangerous one. A model screening everything means you only look at what actually matters.
Set up routines that work while you sleep.
The highest-leverage use case they are excited about is routines that listen and act on their own.
- One engineer set up a routine that watches tickets, GitHub issues, and bug reports for his feature, auto-drafts a fix, and pings him the PR.
- Another listens for any bug unanswered for five hours, puts up a fix, and auto-merges the ones that are easy to verify.
The result: people regularly find that their bug was already fixed by someone else's agent before they got to it.
Be a context minimalist.
The era of heavy prompt engineering and heavy context engineering is giving way to something simpler: give the model the minimal system prompt, the minimal set of tools, and let it figure out the rest.
Just give it a way to pull in context when it needs to.
Over-stuffing context is micromanaging, and the model often knows a better path to the same outcome than the one you would force on it.
Go multi-agent, and go mobile.
The old setup was six terminal tabs with six git checkouts of the same repo. The new setup is different:
- A single agent view with the desktop app handling worktree cloning automatically.
- Remote control from a phone. Start an agent at your desk, then check in, course-correct, and kick off new ones while you are getting coffee.
- Voice mode for spinning up an agent mid-conversation. Have an idea, talk it into existence, and never open your laptop.
One of them now does roughly half their engineering from a phone.
This is exactly the workflow DexRelay is built for.
If your agents can run in parallel, the bottleneck moves from typing code to keeping those agents moving: approving the right command, jumping into the right thread, steering a run, checking what changed, and kicking off the next job before momentum dies.
DexRelay puts Codex and Claude Code control on your iPhone, so the agent loop does not stop when you leave your desk. QR pair your Mac, switch project threads, approve commands, dictate the next prompt, and keep multiple runs moving from anywhere.
Put the AI at the center, not on the side.
The best analogy in the conversation comes from a 1990s Harvard Business Review piece asking why computers were not boosting productivity yet.
The answer: companies kept the paper filing cabinet and the pen-and-paper process, then bolted a computer on the side. The gains came only when they threw out the old process and rebuilt around the computer.
Same thing now. Do not keep your old workflow and sprinkle AI on top. Make Claude the default for every question, review, form, and line of code. At Anthropic, new hires do not ask people questions first. They ask Claude.
Trust is what unlocks parallelism.
You cannot run a dozen agents at once if you have to watch each one. The reason they can step away is that they trust the agents to run safely.
That trust was earned deliberately: transcripts classified for safety, red-teamers and internal teams trying to prompt-inject and break the system, and every successful attack turned into an eval the model now defends against.
Security is not a talking point here. It is the thing that makes hands-off scale possible.
Every role is merging, and taste is the new bottleneck.
PMs, designers, finance, data scientists - they all code now, because the agent writes the code. Designers ship prototype changes directly instead of waiting on an engineer. Finance runs projections in Claude Code.
Which means the scarce skill is no longer typing the code. It is having the product context, business context, taste, and curiosity to come up with the right idea and own it end to end.
"I do not have a to-do list anymore. Claude just builds everything. My job is to come up with the ideas."
Stop using AI as a faster autocomplete.
Set up agents that test themselves, write your corrections into skills so they never regress, run them in parallel on auto mode, and rebuild your process around them instead of bolting them onto the old one.