# ZeroLabs — Full Content Index
> All published articles from ZeroLabs by Jimmy G of ZeroShot Studio
> Website: https://labs.zeroshot.studio
> Canonical directory: [llms.txt](https://labs.zeroshot.studio/llms.txt)

---

## GitHub for Beginners, Done Properly
URL: https://labs.zeroshot.studio/ai-workflows/github-for-beginners-done-properly
Zone: ai-workflows
Tags: github, git, beginners, developer-workflow
Published: 2026-04-06

Learn a calm beginner GitHub workflow with clean repos, small commits, pull requests, branch protection, and a README people can use.

> **KEY TAKEAWAY**
> GitHub feels "pro" when every change is easy to understand, review, and roll back. Beginners do not need an enterprise process. They need a calm workflow they can repeat without guessing.
>
> * **The Problem:** New developers often dump work straight onto `main`, write vague commits, and leave the README too thin to help the next person.
> * **The Solution:** Set up one clean loop around a real README, small commits, feature branches, pull requests, branch protection, and one lightweight check.
> * **The Result:** Your repo becomes easier to explain, safer to change, and far less likely to turn into a mystery blob after one messy afternoon.

*Last updated: 2026-04-06 · Tested against GitHub Docs and GitHub web UI on 2026-04-06*

```mermaid
%% File: diagrams/github-beginner-loop.mmd
flowchart LR
  A["Create repo"] --> B["Add README and .gitignore"]
  B --> C["Create feature branch"]
  C --> D["Make small commits"]
  D --> E["Open pull request"]
  E --> F["Run checks"]
  F --> G["Merge to main"]
```

GitHub is not where you dump code after the real work is done. For beginners, it is the thing that keeps the work recoverable while you are still learning. The first time you break a working project on `main`, GitHub stops feeling like a nice extra and starts feeling like your seatbelt.

Using GitHub like a pro does not mean learning every advanced feature in one weekend. It means turning on a few boring defaults that make your work legible: one repo per project, a README that actually explains the project, small commits, short-lived branches, pull requests before merge, and at least one automated check.

### Contents

1. [What does using GitHub like a pro actually mean?](#what-does-using-github-like-a-pro-actually-mean)
2. [What should you set up on day one?](#what-should-you-set-up-on-day-one)
3. [What is the daily workflow that keeps beginners safe?](#what-is-the-daily-workflow-that-keeps-beginners-safe)
4. [Which GitHub settings should you turn on early?](#which-github-settings-should-you-turn-on-early)
5. [How do you write a README people can actually use?](#how-do-you-write-a-readme-people-can-actually-use)
6. [Frequently asked questions](#frequently-asked-questions)
7. [What should you do next?](#what-should-you-do-next)

## What does using GitHub like a pro actually mean?

It means the repo explains itself even when you are tired, moving quickly, or using AI to draft half the code. GitHub describes GitHub Flow as a lightweight, branch-based workflow built around branches and pull requests, which is still the cleanest mental model for most beginners ([GitHub Flow](https://docs.github.com/en/get-started/quickstart/github-flow)).

The shift is small but important:

| Weak habit | Better default | Why it matters |
|---|---|---|
| Committing straight to `main` | Work in a branch | You can experiment without breaking the stable line |
| One giant "final fixes" commit | Small scoped commits | You can explain changes and revert cleanly |
| Bare repo with no context | Real README and PR description | Future-you can understand the project |
| Manual eyeballing only | Pull request plus a simple check | GitHub catches obvious mistakes before merge |
| Depending on memory | Branch rules and named issues | The process still works on a bad day |

> **The hard rule:** If you would be nervous to explain the change tomorrow, it should not go straight to `main` today.

That is the real pro habit. Not complexity. Just clarity.

## What should you set up on day one?

Start with one calm repo, not a maze.

1. **Create one repository per project.** GitHub's repository quickstart is still the right starting point for a clean setup ([Quickstart for repositories](https://docs.github.com/create-a-repo)).
2. **Add a README immediately.** GitHub says a repository README can live in `.github`, the repository root, or `docs`, and GitHub shows the first one it finds ([About the repository README file](https://docs.github.com/articles/about-readmes)).
3. **Add the right `.gitignore`.** Your repo should not collect `node_modules`, build output, logs, local databases, or secrets.
4. **Choose one auth method and stick with it.** GitHub supports both HTTPS and SSH connections, and SSH is often smoother on your main development machine once it is configured ([About authentication to GitHub](https://docs.github.com/github/authenticating-to-github/about-authentication-to-github), [Connecting to GitHub with SSH](https://docs.github.com/en/authentication/connecting-to-github-with-ssh)).
5. **Keep `main` clean.** Treat it as the boring branch that should always be safe to pull.

A simple starter structure is enough:

```text
# File: project-structure.txt
my-project/
├── README.md
├── .gitignore
├── src/
└── .github/
    └── workflows/
```

That is plenty for a first real repo.

## What is the daily workflow that keeps beginners safe?

Use one loop until it becomes boring.

1. **Name the task before you code.** Open an issue or at least write one line describing what you are about to change.
2. **Create a branch from `main`.** GitHub's branching docs explain the branch model clearly, and the important part is simple: new work belongs off the stable line ([About branches](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-branches)).
3. **Make small commits.** GitHub's commit docs are basic on purpose, and that is the point. A commit should capture one meaningful step, not an entire chaotic session ([About commits](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/about-commits)).
4. **Open a pull request before merge.** A pull request is not just for teams. It gives you a diff, a description field, and one last pause before the change lands ([About pull requests](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests)).
5. **Merge only when you can explain the change in plain English.** If the branch feels muddy, keep cleaning it.

The branch names and commit messages do not need to be clever. They need to be readable.

Good examples:

- Branch: `fix/login-button-loading`
- Branch: `docs/add-readme-setup-steps`
- Commit: `fix: stop login button double submit`
- Commit: `docs: add local setup section to README`

This is also where the AI piece matters. AI can help you move fast, but fast is only useful if the repo still makes sense afterwards. That is the same reason structured review matters in our own workflow posts like [AI review agents in a content pipeline](/ai-workflows/ai-review-agents-content-pipeline). Speed without a checkpoint usually creates debt.

## Which GitHub settings should you turn on early?

You do not need every setting. You need the ones that stop easy mistakes.

### 1. Branch protection

GitHub's protected branch settings can require pull requests, reviews, and passing status checks before merge ([Managing protected branches](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches)). For a beginner repo, I would turn on:

1. Require a pull request before merging.
2. Require passing status checks.
3. Block force pushes.
4. Require conversation resolution if you are using pull requests seriously.
![GitHub branch protection settings for a beginner repository](/api/images/1775510514207-github-for-beginners-done-properly-proof-step-01.png "Branch protection keeps main boring and safe.")
That setup is enough to stop the most common beginner mistake: treating `main` like a scratchpad.

### 2. One lightweight GitHub Actions workflow

GitHub Actions lets you run workflows on events like `push` and `pull_request`, which makes it the easiest beginner CI layer on the platform ([Quickstart for GitHub Actions](https://docs.github.com/en/actions/get-started/quickstart)). Do not start with a huge pipeline. Start with one honest check.

```yaml
# File: .github/workflows/ci.yml
name: ci

on:
  pull_request:
  push:
    branches:
      - main

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
      - run: npm ci
      - run: npm run lint
      - run: npm test -- --runInBand
```

The goal is not enterprise CI. The goal is to catch broken code before merge.

### 3. Issues and pull request descriptions

Use issues as a backlog, not as a dumping ground. Keep the titles plain:

- Fix checkout loading state
- Add README install steps
- Split auth helpers into one module

Then open the pull request with one clear sentence about what changed and why. That one habit saves more time than beginners expect.

## How do you write a README people can actually use?

A good README answers the questions a new visitor has in the first minute.

It does not need to be perfect. It needs to stop the confusion.

At minimum, include:

1. **What the project is**
2. **Who it is for**
3. **How to run it locally**
4. **What is incomplete or rough**
5. **Where to find important commands or docs**

A clean starter README looks like this:

```md

# Project Name

Short description of what the project does and why it exists.

## Stack

- Next.js
- TypeScript
- PostgreSQL

## Local setup

1. Clone the repo.
2. Install dependencies with `npm install`.
3. Copy `.env.example` to `.env.local`.
4. Run `npm run dev`.

## Current status

What works, what is incomplete, and what to fix next.
```

The README is not marketing copy. It is the repo's front door. GitHub makes that explicit by surfacing the README prominently in the repository view ([About the repository README file](https://docs.github.com/articles/about-readmes)).

The mistake I see most often is trying to sound impressive instead of useful. A useful README beats a clever one every time.

If you like systems that reduce thrash, this is the same mindset behind [why every writing team needs a calm publishing checklist](/resources/why-every-writing-team-needs-a-calm-publishing-checklist). The tool matters. The boring defaults matter more.

## Frequently asked questions

**Should beginners use SSH or HTTPS for GitHub?**

Use SSH on your own machine if you want fewer repeated auth prompts. Use HTTPS if your environment is locked down or you already rely on a credential manager. The important thing is consistency, not status.

**Do I need pull requests if I work alone?**

Yes, if the project matters. A pull request gives you a clean diff, a place to explain the change, and one last checkpoint before merge.

**How small should a commit be?**

Small enough that the commit message still tells the truth. If one commit contains three separate ideas, split it.

**What should never go into a README?**

Secrets, copied stack traces, vague bragging, and stale setup steps. A README should make the repo easier to use, not harder to trust.

## What should you do next?

If your current repo feels messy, fix it in this order:

1. Add a real README and a proper `.gitignore`.
2. Stop committing straight to `main`.
3. Turn on branch protection.
4. Add one tiny GitHub Actions workflow.
5. Start opening pull requests for meaningful changes.

That is enough to make your work calmer and easier to recover.

You do not need to become a Git wizard this week. You need a workflow that helps you understand your own project when the excitement wears off. That is the version of "pro" beginners should care about.

If you are building with AI while you learn, read [You don't need an AI agent](/agents/you-dont-need-an-ai-agent). The same rule applies here too: the magic is rarely the tool by itself. The magic is the system you wrap around it.

---

Ready to clean up your workflow? Start by protecting `main` and writing a README you would trust if the repo belonged to someone else.

[Browse more ZeroLabs workflow posts](/resources) | [See our AI workflow breakdowns](/ai-workflows)

---

## Why We Simplified ZeroLabs' Writing Pipeline
URL: https://labs.zeroshot.studio/resources/why-we-simplified-zerolabs-writing-pipeline
Zone: resources
Tags: ZeroLabs writing pipeline, content operations, AI workflows
Published: 2026-04-06

Why ZeroLabs simplified the writing path, what stayed gated, and when the fuller workflow still makes sense.

> **KEY TAKEAWAY**
> * **The Problem:** The earlier path carried too many writing-adjacent handoffs for a draft that still had to satisfy the same publish contract at the end.
> * **The Solution:** We shortened the path, kept the brief and draft as the core handoff, and left review, visuals, and publish gates intact.
> * **The Result:** Every run still carries **8 required items, including a visuals workspace**, so it got smaller without becoming vague or unaccountable.

*Last updated: 2026-04-06 · Tested against the current ZeroContentPipeline run contract*

```mermaid
%% File: content/20260406-182622-why-we-simplified-the-zerolabs-writing-pipeline/visuals/why-we-simplified-the-zerolabs-writing-pipeline-arch-flow-01.mmd
flowchart LR
  A["Research"] --> B["Brief.md"]
  B --> C["Draft.md"]
  C --> D["Style, facts, and SEO review"]
  D --> E["Visual manifest"]
  E --> F["Publish gate"]
  B -.-> G["Removed: duplicate handoffs and extra branching"]
  C -.-> G
```

We simplified the path because extra branching was not buying better drafts. It was creating more places for a point to drift, get restated, or sit idle while the next stage repeated the same instruction in a new format.

That is the bit worth saying plainly. We did not simplify by dropping review. We cut duplicate handoffs and kept the contract that matters: a brief, a real draft, review outputs, a visual manifest, and a publish gate that still blocks sloppy work.

## What did we actually remove from the ZeroLabs writing pipeline?

We removed the clutter. The draft no longer needs to wander through extra branching just to reach the same review boundary with more room for interpretation errors.

In practice, that means we now treat `brief.md` and `draft.md` as the main handoff, not one stop in a parade of near-duplicate checkpoints. The state machine remains explicit, but the drafting slice is tighter: `research -> write -> cleanup -> baseline_review -> style -> facts -> seo -> visuals -> text_reconciliation -> final_review -> publish -> live_qa`.

One of the hard-earned lessons here was that a neat stage chart can still fail in practice. We had a version that looked orderly on paper, but it asked the same draft to survive too many small transfers before anyone learned anything new.

In this workspace, that failure mode shows up fast. Once a draft gets bounced through too many "light touch" stages, people stop reading for substance and start reading for stage compliance. The work gets tidier and less useful at the same time.

**What's an artifact contract?** It is the list of files and run outputs that must exist for a run to be legible, reviewable, and publishable. In the current contract, that includes `brief.md`, `draft.md`, `review-report.json`, `visual-manifest.json`, `content//visuals/`, `publish-result.json`, `run-log.md`, and `run.json`.

## What stayed locked in place after simplification?

Nearly all of the safety that actually matters.

The publish contract still expects validators to pass before a post is ready. Style issues cannot be shrugged off. SEO issues cannot be hand-waved. Broken links block. Visual checks apply when visuals are part of the post. If something fails, the run can be marked `draft_blocked` instead of pretending it is ready.

That boundary matters more than the number of sub-stages in the middle. GitHub's deployment-environment docs point at the same idea: keep approval and secret boundaries explicit even when the workflow changes ([GitHub Docs](https://docs.github.com/en/actions/how-tos/deploy/configure-and-manage-deployments/manage-environments)). Smaller flow, same gate.

We kept the same posture with run outputs as well. The workflow artifact docs note that uploaded artifacts can be shared across jobs and downloaded later in the same workflow run ([GitHub Docs](https://docs.github.com/en/actions/tutorials/store-and-share-data)). Different system, same instinct: handoffs should stay inspectable.

GitHub's environment docs make the same separation visible from another angle: a job that references an environment must follow protection rules before running or accessing the environment's secrets ([GitHub Docs](https://docs.github.com/en/actions/how-tos/deploy/configure-and-manage-deployments/manage-environments)).

For this kind of reflective resources post, the visual plan is `mermaid-only`. That is deliberate. Proof assets are reserved for proof-heavy tutorials, setup guides, execution claims, and UI/interface claims. A piece about pipeline design does not become more trustworthy because we glued a decorative screenshot on top of it.

| Removed pressure | Kept on purpose | Why it matters |
|---|---|---|
| Duplicate writing-adjacent branching | Brief and draft as canonical writing artifacts | Fewer chances for the argument to drift |
| Extra handoff noise | Review reports and validator gates | Problems still surface before publish |
| Visuals by habit | Proof-first visual policy | Images stay tied to real evidence |
| Implied readiness | Explicit `draft_blocked` status | The system can say "not ready" cleanly |

> **The catch:** A shorter writing path puts more weight on the brief. If the brief is thin, fuzzy, or overclaimed, there are fewer decorative steps later on to hide that problem.

## How do we keep review quality with fewer stages?

By making the remaining stages do distinct jobs.

Writing turns the brief into a coherent article. Cleanup does the deterministic polish. Review stages handle style, facts, and SEO. Visuals cover proof policy and asset safety. Final review decides whether the post is fit to publish. That separation matters because each stage answers a different question, instead of six stages all asking whether the draft "feels done."

The simpler version also makes ownership cleaner. The content workspace owns pipeline behavior, while ZeroLabs owns the publish contract and site-facing behavior such as slug and zone persistence. That split is not admin theatre. It keeps this system focused on research, drafting, review, and evidence instead of quietly mutating into site code.

That boundary is the same reason posts in this system keep real internal linking discipline. A publishable article should carry at least two or three internal links, and when it helps the reader, at least one should cross into another zone. If you want to see how this philosophy shows up elsewhere, the companion pieces on [AI review agents in the content pipeline](/ai-workflows/ai-review-agents-content-pipeline), [why you don't need an AI agent](/agents/you-dont-need-an-ai-agent), and [publishing directly to Labs via MCP](/openclaw/how-claude-published-directly-to-labs-via-mcp) are the right next stops.

## When is a simpler pipeline not enough?

A smaller pipeline is not a religion. It is a call based on the job.

If the post makes strong execution claims, needs screenshots as proof, or carries a genuine tutorial burden, the fuller workflow earns its keep. The same goes for posts with many citations, fragile facts, or a higher risk of evidence drift between draft, review, visuals, and publish.

This is also where the trade-off gets honest. We have not attached a time-saved number to the simpler path because we have not measured one directly yet. That restraint matters. It is easy to say a shorter workflow is faster. It is harder, and more useful, to admit when that is still a judgment rather than a verified result.

The rule we are following is simple: remove steps that repeat intent, keep the ones that establish proof, quality, or clear ownership. If a stage cannot explain what unique failure it catches, it is probably there for comfort rather than control.

## Frequently asked questions

**What did simplifying the ZeroLabs writing pipeline actually remove?**

It removed unnecessary branching and duplicate handoffs inside the writing path. The brief and the draft remain the central writing artifacts, while the later review, text reconciliation, visuals, and publish stages still gate quality and readiness.

**How do you keep review quality after reducing the number of stages?**

By making the remaining stages narrower and easier to audit. A smaller pipeline helps if each surviving stage has one job and produces a traceable output. It hurts if simplification turns every stage into a vague "final pass."

**When is a simpler pipeline the wrong choice?**

When the post needs literal proof, dense fact verification, or more than one kind of editorial scrutiny. Tutorials, setup guides, and execution-heavy walkthroughs usually need the fuller path because the evidence burden is higher.

## Why was this the right simplification for us?

Because the job of the writing stage is to produce a publishable draft, not to cosplay as six different departments.

What we wanted was less shuffling and clearer accountability. What we kept was the part that protects the reader: traceable outputs, clear review gates, a proof-aware visual policy, and a publish boundary that can still say no. That is a better trade than a longer diagram with more arrows and less signal.

---

If you are tightening your own content system, start by asking which stages discover new information and which ones just move the same box to a new shelf. Keep the first group. Cut the second.

---

## Trace AI Content Handoffs Without Losing Proof Assets
URL: https://labs.zeroshot.studio/ai-workflows/trace-ai-content-handoffs-without-losing-proof-assets
Zone: ai-workflows
Tags: ai workflows, content pipeline, proof assets, provenance
Published: 2026-04-06

Trace AI content handoffs with one run ID, a manifest, and proof metadata that keep draft, review, and publish states aligned.

> **KEY TAKEAWAY**
> * **The problem:** AI content handoffs break when the draft, manifest, proof file, and review state stop naming the same run.
> * **The fix:** Carry one run ID through every step, record lightweight provenance, and store evidence with captions and digests.
> * **The result:** Reviews get sharper, publish gates become defensible, and you can still explain why a sentence shipped after the workflow moved on.

*Last updated: 2026-04-05 · Tested against the current ZeroContentPipeline workspace docs plus the linked OpenTelemetry, GitHub Actions, Playwright, W3C PROV, and MCP documentation*

In this run, `content/` holds the live draft and `runs/` holds the review state. I have watched drift show up when a rename lands in only one place: the asset path shifts, the run ID falls out of sync, and the sentence no longer matches the evidence beside it.

The fix is smaller than most teams expect. You do not need a giant observability stack to keep evidence intact. You need one run spine, explicit handoff records, and evidence that says exactly what it proves.

```mermaid
%% File: content/20260405-233200-how-to-trace-every-ai-content-pipeline-handoff-without-losing-proof-assets/visuals/diagrams/mermaid/trace-handoff-evidence-chain-01.mmd
flowchart LR
  A["Run ID created"] --> B["Draft, manifest, and assets carry same ID"]
  B --> C["Each handoff records entity, activity, and agent"]
  C --> D["Proof asset stored with path, caption, and digest"]
  D --> E["Reviewer checks sentence, manifest, and asset together"]
  E --> F["Publish waits until the chain agrees"]
```

![Evidence chain diagram for a traced AI content handoff](/api/images/1775434666034-trace-handoff-evidence-chain-01.svg "Diagram rendering of the same run-ID chain described in the article.")

## What counts as a handoff in an AI content pipeline?

In practice, I treat a handoff as any boundary where the next step can lose context, proof, or approval state. Human to model is one. Model to tool is another. Tool to workflow job is another. Reviewer to publisher counts too.

OpenTelemetry describes traces as collections of spans, with `SpanContext` propagated across boundaries so related work can still be connected later ([OpenTelemetry Overview](https://opentelemetry.io/docs/reference/specification/overview/)). You do not need full tracing to borrow the useful bit: one stable identifier that survives the boundary.

The W3C PROV overview models provenance through entities, activities, and agents, and says the PROV family is meant to support interoperable provenance interchange, reproducibility, and versioning ([W3C PROV Overview](https://www.w3.org/TR/prov-overview/)). That maps neatly onto content work. A draft is an entity. A review pass is an activity. A model, tool, or editor is an agent.

The shape is already there in this workspace. `content/` holds the active draft and `runs/` holds the run-local state, so drift becomes visible instead of hidden. The current run manifest separates `visual_mode`, `proof_required`, `safety_review`, and `asset_plan`, which only helps if the draft and proof file keep the same run ID.

GitHub Actions and Playwright are just the concrete examples here. The pattern still works with other runners and capture tools, as long as the handoff record names the same run and the proof object stays tied to the claim.

## How do you trace AI content pipeline handoffs without losing proof assets?

I use one contiguous chain from the first prompt to the publish gate.

1. **Mint one run ID before anything branches.** Use the same identifier on the brief, draft, manifest, capture, audit, and artifact.
2. **Record each handoff as entity, activity, and agent.** Say what moved, what changed, and who or what did the work.
3. **Separate the claim from the proof.** Keep the article claim in prose and the evidence in the manifest and asset metadata.
4. **Store file integrity with the artifact.** GitHub Actions can move outputs between jobs, and artifact download validates the digest ([Workflow artifacts](https://docs.github.com/en/actions/concepts/workflows-and-actions/workflow-artifacts), [Store and share data with workflow artifacts](https://docs.github.com/actions/using-workflows/storing-workflow-data-as-artifacts)).
5. **Choose the smallest truthful capture.** Playwright supports full-page and element screenshots ([Playwright Screenshots](https://playwright.dev/docs/next/screenshots)).
6. **Gate publish on agreement.** If the sentence, manifest, and proof disagree, stop.

> **The hard rule**
> A handoff is not done when a file exists. It is done when the next step can verify what the file proves, who produced it, and which run it belongs to.

## When is a manifest enough, and when do you need more proof?

A manifest is the routing layer. It tells the system where the asset lives, which run owns it, what class of output it is, and whether it is safe to move forward. That is good for audits and machine checks.

It is not always enough as evidence. If the article claims a visible workflow state, a review decision, or a specific execution result, you need something more literal than metadata. A screenshot, report, or stored artifact gives the reviewer a real object to inspect.

Use the manifest on its own when the handoff is mostly administrative: a draft moved into review, a bundle snapshot written to the run folder, an archived file landing where the next job expects it.

Use additional proof when a sentence would fall apart under challenge. If you say an audit passed, the report or capture is stronger than a manifest field alone. If you say a visual was reviewed and sanitized, the proof should show that exact state instead of a nearby screen and a hopeful caption.

I keep the proof capture tight: one sanitized frame that shows the run ID, the review state, and the matching manifest record in the same moment.

![Proof review state for the live run, including the run ID, blocked visuals gate, and the target proof asset metadata](/api/images/1775434666075-how-to-trace-every-ai-content-pipeline-handoff-without-losin.png "Sanitized proof capture showing the live run ID, proof requirement, and target asset review state from the blocked visuals pass.")

This is where MCP's split between resources and tools becomes useful. Resources are data exposed to clients, and tools are callable actions ([MCP Resources](https://modelcontextprotocol.io/docs/concepts/resources), [MCP Tools](https://modelcontextprotocol.io/specification/2025-06-18/server/tools)).

In a content pipeline, that distinction matters. Read-only context handoffs and action handoffs should not blur into one vague story about "the agent did it."

## When should you use a screenshot, a log, a diagram, or a stored artifact?

Pick the lightest thing that can survive review.

| If you need to prove... | Best proof surface | Why it holds up |
|---|---|---|
| The route through a workflow | Mermaid diagram | It explains the chain without pretending to be execution evidence |
| A visible UI or review state | Screenshot | It shows the literal state the sentence depends on |
| One badge, panel, or result | Element screenshot | It keeps the proof narrow and readable |
| A packaged output or generated report | Stored artifact | It preserves the actual output for later inspection |
| A machine decision trail | Log excerpt or report | It shows event history with less visual noise |

The trade-off is simple. Diagrams explain. Evidence verifies. Reports preserve. Logs explain failure. Mix those jobs together and review gets muddy fast.

The smallest honest capture is usually the best reviewer surface. It is easier to inspect, easier to caption, and harder to oversell. That is why the proof-first rule helps: one image proves one state, one report proves one machine outcome, and one diagram explains one path through the system.

## Common errors and gotchas (troubleshooting)

The first warning sign is identifier drift. The draft uses one run ID, the asset folder uses another, or the review note stops naming the same object as the manifest. From there, every later step gets more interpretive than it should be.

The second warning sign is proof inflation. Someone captures a whole page to look safe, but the actual claim depends on one badge in a corner. That makes the proof harder to read and easier to dispute.

The third is action and context collapsing into the same blob. If a tool both fetched read-only context and changed workflow state, the handoff record should say that plainly. Otherwise the reviewer is left inferring whether they are looking at evidence, execution, or both.

For adjacent examples, [How to Debug ZeroContentPipeline Proofs](/ai-workflows/how-to-debug-zerocontentpipeline-with-proof-screenshots) covers review drift, [You Don't Need an AI Agent](/agents/you-dont-need-an-ai-agent) covers the workflow getting too magical, and [How Claude Published Directly to Labs via MCP](/openclaw/how-claude-published-directly-to-labs-via-mcp) covers the final boundary where traced work becomes a live post.

## What are the most common questions?

**What counts as a handoff in an AI content pipeline?**

Any boundary where the next step could lose identifiers, context, proof, or approval state counts as a handoff. That includes human-to-model, model-to-tool, tool-to-workflow, workflow-to-review, and review-to-publish boundaries.

**How do I avoid losing proof assets when a step moves to another tool or person?**

Carry one run ID everywhere, keep the manifest entry next to the asset metadata, and make the caption describe the exact sentence the asset supports. If the next operator has to guess what the file proves, the handoff was too loose.

**Is a manifest enough, or do I also need traces, screenshots, and digests?**

A manifest is the index, not the whole evidence chain. Use it to route and classify work, then add traces or step records for continuity, screenshots for visible states, and digests or archived outputs when file integrity matters.

**When should I use a screenshot versus a log, a diagram, or a stored artifact?**

Use screenshots for visible states, logs for event history, diagrams for route explanation, and stored artifacts for outputs you may need later. Many workflows need one explanatory diagram plus one literal proof object, not a gallery of decorative captures.

Start with one sentence that matters. Trace it from prompt to publish candidate. Make the run ID, manifest entry, and proof object agree, then expand that pattern to the rest of the pipeline.

That is not flashy, but it gives you a chain you can defend when someone asks why the workflow believes what it believes.

---

Want the broader system around this? Read the rest of the [AI Workflows](/ai-workflows) cluster, then follow the links above into review drift, agent boundaries, and MCP-based publishing.

---

## My ADHD Dev Toolkit That Actually Works
URL: https://labs.zeroshot.studio/maintenance-mode/my-adhd-dev-toolkit-the-boring-tools-that-actually-work
Zone: maintenance-mode
Tags: ADHD dev toolkit, AI coding workflow, developer ADHD workflow
Published: 2026-04-05

A first-person ADHD workflow for AI coding: boring defaults, reusable prompts, and small rituals that reduce friction and re-entry cost.

> **KEY TAKEAWAY**
> * **The Problem:** Task initiation gets expensive fast when every job starts with a blank editor, a blank prompt, and five decisions before your hands touch the keyboard.
> * **The Solution:** My setup is mostly boring infrastructure: pinned task lists, workspace defaults, reusable AI instructions, and tiny start rituals that remove re-deciding.
> * **The Result:** Adult ADHD can affect planning, organisation, time management, and sustained attention, so I shape my day around fewer open loops and cheaper restarts ([NIMH](https://infocenter.nimh.nih.gov/publications/adhd-adults-four-things-know)).

## Why does my ADHD dev toolkit start with less, not more?

My setup has very little glamour in it. That is the point. On rough afternoons, I want to reopen the repo and know the next move without rebuilding the plan first. When I work with AI every day, the biggest risk is rarely lack of capability. It is the reset cost at the start, drift in the middle, and a fried brain at 5 p.m. deciding whether the next step is tests, prompts, docs, or me staring into the fridge again.

Adult ADHD can affect planning, organisation, time management, and sustained attention ([NIMH](https://infocenter.nimh.nih.gov/publications/adhd-adults-four-things-know)). I do not need a workflow that assumes perfect momentum. I need one that still works when momentum has packed a bag and left the building.

So my system is built like maintenance, not optimisation. Fewer choices. Fewer resets. More defaults baked into the repo, the editor, and the task itself. The difference shows up fastest when I have been away from a project for a day and the first ten minutes would otherwise disappear into tab-hopping.

## What boring tools actually pull the weight?

The useful stuff is deeply unsexy. A pinned markdown file for "now, next, later". Workspace-specific settings in VS Code. A short repo instruction file that tells the model how we work here. Keyboard shortcuts I can hit without thinking. A daily shutdown note so tomorrow-me does not inherit a crime scene.

Copilot guidance is pretty plain about this: better results come from relevant context, narrower prompts, and smaller tasks, not mystical phrasing ([Copilot prompt engineering](https://docs.github.com/en/copilot/concepts/prompting/prompt-engineering)). Repository-level custom instructions help for the same reason. They carry the boring context forward so I do not have to restate the house rules every time ([Repository custom instructions](https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot)).

VS Code supports the same kind of low-friction setup through keyboard shortcuts and workspace-scoped configuration ([keyboard shortcuts](https://code.visualstudio.com/docs/getstarted/keybindings), [workspace configuration](https://code.visualstudio.com/docs/editing/workspaces/workspaces)). That matters more to me than another shiny app because, for me, the fastest system is usually the one already living inside the project I have open.

| Boring tool | What it does | Why it helps on rough-focus days |
|---|---|---|
| Pinned task note | Holds the current step outside my head | Starting becomes "do line one", not "rebuild the whole plan" |
| Workspace settings | Carries project defaults automatically | Less setup, less rummaging, less drift |
| Repo AI instructions | Narrows prompts before I type | Fewer context dumps and fewer weird detours |
| Keyboard shortcuts | Gets me into commands quickly | Lower activation energy than menu hunting |
| Shutdown note | Leaves breadcrumbs for tomorrow | Cheaper restart when attention is thin |

The hard part is accepting that these tools are not supposed to feel impressive. They are supposed to feel quiet. If I notice the system too much, it usually means I have made it too clever.

> **The hard rule:** If a tool adds a new place to manage my work, it needs to remove two existing decisions or it does not stay.

## How do I stop AI from making context switching worse?

AI can absolutely reduce drag. It can also turn one coding task into six tabs, three speculative refactors, and a little side quest that eats your afternoon. I have done that more than once: I have spent an hour polishing a prompt, then realised I had asked for three jobs and two escape hatches. It feels productive right up until nothing is actually finished.

In procedural tasks, interruptions cost more when they are more complex. Research on procedural work found that complex interruptions increased resumption time and sequence errors ([PubMed](https://pubmed.ncbi.nlm.nih.gov/35225631/)). Another study modelled recovery as a measurable process rather than an instant snap-back ([PubMed](https://pubmed.ncbi.nlm.nih.gov/18229478/)).

That is why I try to keep AI inside a lane. One file or one decision at a time. One explicit ask. One visible next step written down before I open chat. I want the model helping me turn the spanner, not grabbing the whole toolbox and throwing it across the shed.

This is also where internal guardrails matter. Our review loops and conventions do a lot of the boring work before taste and judgement kick in, which is the same maintenance-mode instinct behind [AI review agents in the content pipeline](/ai-workflows/ai-review-agents-content-pipeline). Guardrails are not bureaucracy when they save your brain from reloading the same context all day.

## What does the smallest useful system look like?

When I rebuild my setup from scratch, I do not build a twelve-part life operating system by Friday. The version that sticks is smaller:

1. **Write one live task file.** Keep a single note open in the repo with `now`, `next`, and `later`. `Now` gets one item.
2. **Create one project prompt.** Add a short reusable instruction block with the repo rules, preferred output style, and what to avoid.
3. **Use workspace defaults.** Save the editor settings, terminal layout, and common commands with the project.
4. **End the day with a restart line.** Leave one sentence for tomorrow that says where you stopped and the next concrete action.
5. **Cut one source of choice.** Remove one app, one board, or one ritual that makes you manage the workflow instead of doing it.

That is enough to feel the difference. On the days I leave a restart line, I spend less time reloading context and more time making the next change. You do not need a cinematic before-and-after montage. You need the first ten minutes of the day to stop punching you in the face.

## What helps with late-day decision fatigue without turning into productivity theatre?

I use "decision fatigue" as shorthand because most developers know the feeling. Late in the day, choices get sticky and stupid. I feel it most after lunch, when small choices start taking twice as long. The science is messier than the phrase makes it sound, and recent large-scale field data found no evidence for decision fatigue in that specific healthcare context ([PubMed](https://pubmed.ncbi.nlm.nih.gov/40011733/)). So I treat it as a practical description, not a sacred theory.

The fix for me is simple: remove choices before I need them. I decide my default editor layout once. I keep a small set of shortcuts. I start from the same task note. I ask the AI for one bounded thing instead of a grand plan for my entire existence.

This is why I like maintenance-mode thinking in general. The boring option often wins because it survives contact with an ordinary Tuesday. That is the muscle behind [why we self-host our stack at ZeroLabs](/vps-infra/why-we-self-host-our-stack-at-zerolabs). Constraints do useful work when they keep the floor solid.

Both, probably. Plenty of developers without ADHD will get mileage out of fewer open loops, tighter prompts, and stronger defaults. But I built this kit because I personally need a lower-friction way back into the work when attention slips, energy drops, or the shape of the task turns fuzzy halfway through the day.

The catch is that no ADHD system is universal. Some people need more visual structure. Others need body doubling, timers, or environmental changes. My setup leans toward repo-native tools and boring rituals because I want as little distance as possible between "I should start" and actually starting.

## Frequently asked questions

**What part of ADHD does this toolkit actually help with?**

Mostly task initiation, re-entry after interruption, and the admin overhead around coding. I am trying to reduce the number of times I have to plan the work from scratch, not force myself into perfect focus.

**Which boring tools matter more than fancy productivity systems?**

The ones already attached to the work win first: a single task note, workspace defaults, repo instructions for AI, and a shutdown breadcrumb. They beat a complicated stack because they live where the task already lives.

**How do you keep AI coding from increasing context switching instead of reducing it?**

I open it with a narrow brief and a visible next step, then keep the scope to one file, one bug, or one decision. If the model starts spraying options everywhere, I stop and rewrite the ask before I keep coding.

**What is the smallest version of this system someone can start with?**

One note with `now`, `next`, and `later`, plus one reusable AI instruction block for the repo. If you do only those two things, starting gets cheaper immediately.

**Is this workflow for everyone with ADHD?**

No. It is my maintenance kit, not a diagnosis in markdown. Borrow the parts that reduce friction for you and leave the rest on the bench.

Start with one file and one rule. Open a task note in the repo. Write the next physical action. Then make your AI prompt smaller than feels necessary.

That is enough to get your hands back on the keyboard. After that, keep the systems that reduce drag and bin the ones that only make you feel organised. If this toolkit only works when you feel sharp, it does not work. It is decorative.

If you want the bigger Maintenance Mode frame, read [Maintenance Mode: Why the Best Developers Treat Themselves Like Production Systems](https://labs.zeroshot.studio/maintenance-mode/maintenance-mode-why-the-best-developers-treat-themselves-like-production-systems) and [The Decision Tax](https://labs.zeroshot.studio/maintenance-mode/the-decision-tax-why-ai-coding-drains-you-faster-than-writing-code-yourself). They sit next to this piece for a reason.

---

If this angle is useful, keep going with [AI review agents in the content pipeline](/ai-workflows/ai-review-agents-content-pipeline), [Claude Code hooks that replace half your Claude.md](/ai-workflows/claude-code-hooks-replace-half-your-claude-md), and [why we self-host our stack at ZeroLabs](/vps-infra/why-we-self-host-our-stack-at-zerolabs).

---

## How to Debug ZeroContentPipeline Proofs
URL: https://labs.zeroshot.studio/ai-workflows/how-to-debug-zerocontentpipeline-proofs
Zone: ai-workflows
Tags: zerocontentpipeline, proof screenshots, ai workflows, debugging
Published: 2026-04-05

A practical workflow for aligning claims, Mermaid diagrams, proof captures, the manifest, and audit checks before publish.

> **KEY TAKEAWAY**
> * **The Problem:** A draft can sound finished while the proof asset, manifest, and audit are still describing different states.
> * **The Solution:** Treat the run as one chain of evidence, and line up the sentence, the capture, the manifest entry, and the audit result before publish.
> * **The Result:** You get a post that is easier to review, easier to trust, and harder to break at the last handoff.

*Last updated: 2026-04-05 · Tested against the current ZeroContentPipeline v0.1.0 workspace build*

In this run, I checked one thing first: do the draft, the manifest, and the audit name the same state? They did not line up on the first pass. I found placeholder proof entries in `visual-manifest.json`, and the audit report flagged a Mermaid block whose file-path comment was not in the exact format the validator expected.

I also had to compare the draft and the asset plan side by side, because the screenshot itself looked fine while the metadata still said `reviewed: false`.

I tested the draft against the local validator after that, because the block-comment syntax was the part most likely to drift again.

Proof captures are evidence, not decoration. The draft makes the claim. The visual backs it. The manifest explains it. The audit checks whether the package still tells one true story.

The structure matters because generative systems prefer clean attribution and concrete claims. The Princeton GEO paper is a useful reminder here: citations and specific details make a page easier to quote and reuse ([paper](https://arxiv.org/abs/2311.09735)). For the mechanics of the capture itself, Playwright documents both full-page and element screenshots ([screenshots](https://playwright.dev/docs/screenshots)), and Mermaid keeps the workflow shape readable ([flowchart docs](https://mermaid.js.org/syntax/flowchart.html)).

```mermaid
%% File: content/20260405-165339-how-to-debug-zerocontentpipeline-with-proof-screenshots/visuals/diagrams/mermaid/zcp-debug-proof-flow-01.mmd
flowchart LR
  A["brief.md sets the claim"] --> B["draft.md states the claim"]
  B --> C["proof capture shows the state"]
  C --> D["visual-manifest.json describes the asset"]
  D --> E["pipeline audit checks the package"]
  E --> F["publish only when all five agree"]
```

## What problem do proof screenshots solve in this workflow?

Proof screenshots settle a simple argument: did the run actually reach the state the article describes? If the post claims an audit passed, an asset was reviewed, or a step produced a visible result, the capture is the fastest way to show the receipt.

**What's a proof screenshot?** A truthful, sanitized image of the exact state that supports one sentence in the draft. Not the whole screen. Not a nearby screen. The exact state doing the work.

That distinction matters because screenshots and diagrams do different jobs. Mermaid explains the path. A screenshot proves a rendered state. If the post needs both, use both.

## How do you debug the run?

1. **Start with the sentence that needs proof.** Open `brief.md` and `draft.md` together. Find the line that would fall apart first if a reviewer asked for the receipt.

2. **Choose one decisive state.** Pick the smallest visible moment that settles the claim. A reviewed asset entry, a generated result, or an audit outcome is usually enough.

3. **Sketch the path before you capture it.** A Mermaid block forces the workflow into the open. If the flow feels muddy on paper, the capture will usually be muddy too.

4. **Capture the narrowest honest frame.** Full-page screenshots look thorough, but they bury the point. Element screenshots keep the evidence tight and readable, which is exactly what the Playwright docs recommend for focused capture work ([screenshots](https://playwright.dev/docs/screenshots)).

5. **Write the manifest like a reviewer will read it.** Alt text, caption, source environment, and review state tell the next person what the asset proves and whether it is safe to publish.

6. **Run the audit after the package lines up.** If the audit disagrees with the article, trust the audit first. Walk back through the sentence, the capture, and the manifest until they say the same thing.

## When should you use Mermaid, and when should you use a screenshot?

Use the lighter proof surface that answers the real question.

| If you need to show... | Use this | Why |
|---|---|---|
| The order of the workflow | Mermaid | It explains the path without interface noise |
| A real state the article depends on | Screenshot | It gives evidence instead of a sketch |
| One control, panel, or result | Element screenshot | It keeps the proof readable |
| A sensitive flow you cannot safely capture live | Mermaid | It avoids leaking extra context |

This run taught me one practical thing: the visual layer can look almost ready while the manifest still carries placeholders.

That is why I keep the proof asset, the draft sentence, and the manifest entry in the same review pass.

![ZeroContentPipeline proof asset beside its audit handoff state](/api/images/how-to-debug-zerocontentpipeline-with-proof-screenshots-proof-step-01.webp "Sanitized proof screenshot showing the asset state that the draft and manifest need to describe.")

The clean capture is usually the one with a narrow job. If you need three zooms and a paragraph of explanation to make it land, the frame is too wide.

## Common errors and gotchas (troubleshooting)

The first failure mode is drift. The draft promises a specific proof point, but the stored image shows a nearby step instead of the one that matters.

The second is over-capturing. Big screenshots feel safe, but they often make the reader hunt for the one visible state that matters. That weakens the proof and slows review.

The third is thin metadata. The image exists, but the caption is vague, the alt text says almost nothing, or the review state is unfinished. That is usually where publishing friction lives.

> **The hard rule:** If the post makes an execution claim, the proof asset has to survive both the eyeball test and the audit.

If the issue is still unclear, check the manifest first. In this workflow, `reviewed: false` or `sanitized: false` means the asset is not ready for publish, even if the image itself is technically accurate.

## Frequently asked questions

**What does a proof screenshot need to show?**

It needs to show the exact state that supports the sentence doing the real work in the draft. If the reader has to guess what they are meant to notice, the capture is too loose.

**What if the screenshot is real but still fails review?**

Check the manifest entry before you blame the image. Weak alt text, a vague caption, or an unfinished review state can make a truthful capture fail the workflow.

**Can Mermaid replace screenshots completely?**

No. Mermaid explains the route through the system. It does not prove that a particular run reached a particular state.

**Do I need both visuals for every post?**

No. Use the lightest useful format. If the post is conceptual, a diagram may be enough. If the post makes a visible workflow claim, use a proof asset as well.

## What should you do next?

Keep the claim narrow. Capture one honest proof state. Make the manifest carry the same story as the draft. Then let the audit be the grumpy reviewer in the room.

That approach is not flashy, but it holds up. If you want the broader publishing context, [You don't need an AI agent](/agents/you-dont-need-an-ai-agent) is a good reset on keeping the workflow small, [How Claude published directly to Labs via MCP](/openclaw/how-claude-published-directly-to-labs-via-mcp) shows the far end of the chain, and the [AI Workflows](/ai-workflows) zone holds the rest of the cluster.

---

The short version: one claim, one proof point, one tidy chain of evidence.

---

## The Decision Tax: Why AI Coding Drains You Faster Than Writing Code Yourself
URL: https://labs.zeroshot.studio/maintenance-mode/the-decision-tax-why-ai-coding-drains-you-faster-than-writing-code-yourself
Zone: maintenance-mode
Tags: AI coding fatigue, cognitive load, decision fatigue, ADHD developers, vibe coding burnout, developer wellbeing
Published: 2026-04-01

AI coding flips your role from creator to evaluator. Every output triggers a micro-decision cascade that compounds across a session in ways traditional coding doesn't. Here's the mechanism, and what to do about it.

> **KEY TAKEAWAY**
> * **The Problem:** AI coding shifts developers from creation mode to evaluation mode, triggering a micro-decision cascade on every output that traditional coding doesn't impose.
> * **The Solution:** Name the "decision tax" for what it is, budget your daily evaluation cycles deliberately, and protect creation-only windows in your schedule.
> * **The Result:** Most of the cognitive fatigue from an AI-heavy session traces back to review decisions, not technical work. Recognizing the pattern is the first step to managing it.

A full day of AI coding left me more depleted than eight hours of writing code myself. The git diff was thin, the commits were sparse, and I couldn't figure out where the energy went. Felt like I'd run a marathon but my Strava showed a nap.

That gap is what this post is about. The tiredness is real. The mechanism is just not what most developers think it is.

## What's the difference between writing code and using AI?

When you write code yourself, you're in creation mode. You're building a model of the problem, choosing abstractions, making design calls that compound forward. It has a natural flow rhythm. The decisions feel owned.

When you use AI, the mode flips. You're no longer the generator: you're the evaluator. Every output requires a verification pass: is this correct? Does it match what I actually intended? Accept, reject, or edit? You're not building from scratch; you're assessing something else built.

These are cognitively different tasks, and the gap between them is the root of AI coding fatigue. Creation has natural closure: you finish a function, a component, a test. Evaluation doesn't, because every accepted output might still be subtly wrong. That's where the cost hides.

## What is the decision tax?

The decision tax is the sum of micro-decisions triggered by each AI output in a session. Each one costs almost nothing. Across a full day, they compound.

A typical AI-heavy session runs through this loop continuously:

```mermaid
flowchart TD
  A["Write prompt"] --> B["AI generates output"]
  B --> C{"Evaluate output"}
  C -->|"Accept"| D["Integrate and continue"]
  C -->|"Reject"| E["Refine prompt"]
  C -->|"Edit"| F["Manual correction"]
  D --> A
  E --> A
  F --> A
  D --> G["Session end: cognitive debt accumulated"]
```

Each cycle through that loop is cheap. Fifty cycles is not. You're holding your original intent in one hand while parsing what the model produced in the other. That dual-grip is the tax. Pure coding doesn't load your working memory the same way.

Roy Baumeister's ego depletion research proposed that your capacity for good decisions drains through the day, regardless of how trivial each one feels. Subsequent replication studies have produced mixed results, but the mechanism is contested, not the experience. ([Baumeister et al., 1998](https://pubmed.ncbi.nlm.nih.gov/9599441/)) The AI coding loop is a decision machine running continuously.

| Mode | Primary cognitive task | Flow-compatible? | Decision load per hour |
|------|----------------------|-----------------|----------------------|
| Writing code yourself | Generation and design | High | Low to medium |
| AI-assisted coding | Evaluation and triage | Low | High |
| Code review (static) | Evaluation | Medium | Medium |

The middle row is the one developers underestimate. Code review feels like work. AI-assisted coding feels like productivity. The cognitive load profile is similar.

## Why is evaluation mode resistant to flow?

Flow needs clear goals, immediate feedback, and a challenge that matches your skill level. Creation hits all three when the work is pitched right. Evaluation doesn't: "review this output" is open-ended in a way that "build this function" never is.

Gloria Mark's decade of attention research at UC Irvine documented a consistent finding: getting back into focused work after an interruption takes substantially longer than the interruption itself. (Mark, [Attention Span](https://www.hachettebookgroup.com/titles/gloria-mark/attention-span/9781538708330/)) Every accept/reject call is a micro-interruption: you flip from builder to reviewer and back. In an active AI session, that flip happens every 30 to 60 seconds.

In our ZeroShot Studio setup, we started logging session types after noticing a pattern: AI-heavy days correlated with worse judgment calls in the late afternoon, not just lower energy. The fatigue came from the mode, not the volume.

> **The reality:** The decision tax doesn't show up in your task tracker or your commit log. You'll end an eight-hour day with a thin diff and attribute it to poor focus. The tax was real: you just charged it to the wrong account.

## How does ADHD change the picture?

For developers with ADHD, evaluation mode is hostile territory. ADHD brains are wired for novelty and creation: the dopamine hit from building something new is neurology, not a workaround. Sustained review without clear closure triggers boredom and avoidance loops that look like procrastination from the outside.

The AI coding loop is short enough to feel stimulating (fast outputs, rapid iteration) but the evaluation grind drains the focus reserves that keep ADHD developers productive. It holds your attention while quietly emptying the tank.

If you're a developer with ADHD who finds AI coding sessions more exhausting than flow-state coding, this is probably why.

## How do you reduce the decision tax?

The fix isn't avoiding AI tools. It's treating your decision capacity as a finite daily resource and spending it on purpose.

Three things that have worked for us:

1. **Timebox evaluation sessions.** Cap continuous AI-assisted work at 90 minutes. After that, switch to creation work (greenfield code, architecture planning, writing) to let the evaluation queue drain before it backs up.

2. **Batch your accept/reject decisions.** Instead of evaluating each AI suggestion in real time, generate several outputs and review them together in a dedicated block. Batching converts continuous overhead into discrete review windows with clear closure.

3. **Protect creation-only windows.** Two-hour blocks where the AI tools are off. Not a productivity ritual: a calibration tool. Hands on the keyboard, no copilot. You need to remember what building feels like so you notice when you've been reviewing for too long.

The goal is awareness. Once you can name the tax, you can budget it. For specifics on structuring AI-assisted sessions, the [AI Workflows](/ai-workflows) zone covers prompt engineering and session design in more depth.

## Frequently asked questions

**Is AI coding fatigue the same as regular burnout?**

Related but distinct. Burnout builds over months from sustained pressure and needs structural change: different work, different conditions, real time off. AI coding fatigue is session-level. It accumulates in hours and recovers with a mode change. You don't need a week off; you need two hours of writing code without a copilot. That changes how you respond to it.

**How do I know if the decision tax is affecting me?**

Run a rough audit at the end of an AI-heavy session. How much time did you spend writing net-new code versus evaluating AI output? If more than two-thirds was evaluation, the tax was running all day. Second signal: if your judgment calls at 4pm are worse than at 10am and your energy doesn't explain the gap, the pool is empty.

**Does getting better at prompting reduce the tax?**

Yes. Building a library of prompt templates for familiar tasks (refactoring, test scaffolding, boilerplate) raises the first-pass acceptance rate and compounds over time. The cost spikes on novel problems where you haven't built that muscle yet. Track where you're doing the most editing: those are the prompts worth refining first.

---

The decision tax is one piece of a bigger picture. If you're thinking about sustainable output as a developer, the [Maintenance Mode pillar](/maintenance-mode/why-the-best-developers-treat-themselves-like-production-systems) is where this fits.

[More from Maintenance Mode](/maintenance-mode) | [Follow on jimmygoode.com](https://jimmygoode.com)

---

## Setting Up AI Coding Agents: A Practical Guide to Claude Code, Copilot, and Gemini CLI
URL: https://labs.zeroshot.studio/ai-workflows/setting-up-ai-coding-agents-claude-code-codex-and-gemini-cli
Zone: ai-workflows
Tags: claude-code, github-copilot, gemini-cli, agents, AGENTS.md, CLAUDE.md, agentic-coding
Published: 2026-03-31

Claude Code, GitHub Copilot, and Gemini CLI are all capable agents. But without proper config files, they spend half their time guessing about your codebase. This guide covers the exact folder structure, instruction files, and unified AGENTS.md strategy that makes them actually useful.

*Last updated: 2026-03-31 · Tested against Claude Code 1.x, GitHub Copilot CLI (GA Feb 2026), Gemini CLI 0.1.x*

Three months into running Claude Code full-time across production codebases, the biggest unlock had nothing to do with model quality. It was the config file.

The moment I started treating the agent's instruction file as a first-class project artifact, the quality of the work jumped. The agent stopped doing dumb things I'd already told it about. It knew the folder structure, the banned patterns, the deploy process. It worked the way I actually wanted it to work.

[Claude Code](https://www.anthropic.com/engineering/claude-code-best-practices), GitHub Copilot, and [Gemini CLI](https://github.com/google-gemini/gemini-cli) all support persistent instruction files. Most developers skip them or write three lines and call it done. Here's how to actually set them up, what goes in them, and how to structure a repo so your agent can navigate it without hand-holding.

There's also `AGENTS.md`, a single file all three tools read, so you write the rules once.

> **KEY TAKEAWAY**
>
> * **The Problem:** AI coding agents without proper config files make expensive guesses about your codebase: touching things they should leave alone, missing patterns you've established, and forgetting context between sessions.
> * **The Solution:** A structured `AGENTS.md` at the repo root, combined with thin tool-specific instruction files, gives every agent persistent context: folder structure, coding standards, deploy workflow, and off-limits zones.
> * **The Result:** Agents that work like a new team member who actually read the docs, not one who skips them and figures it out as they go.

---

## What Makes Agentic Coding Different From Autocomplete?

Autocomplete predicts the next token. An agent makes decisions.

When you ask Copilot to autocomplete a function, it generates based on what's in the current file. When you ask an agent to fix a bug, it reads multiple files, runs tests, checks git history, and writes code. The scope is completely different.

So the agent needs context that autocomplete never needed. Where things live, what patterns the codebase uses, what it's allowed to touch. Without that context, agents guess. With it, they work.

---

## Claude Code

### Installation

```bash
# File: terminal
npm install -g @anthropic-ai/claude-code
```

Claude Code requires Node 18+ and an Anthropic API key. Set the key in your environment:

```bash
# File: ~/.zshrc or ~/.bashrc
export ANTHROPIC_API_KEY=sk-ant-...
```

Run `claude` from any project directory to start a session. First time takes 30 seconds to initialise.

### The CLAUDE.md File

Claude Code reads this at the start of every session. Place it at the project root. Every spawned sub-agent inherits it, so whatever you write here applies to the full agent tree.

```text
# Example: your-project/
your-project/
├── CLAUDE.md          # Agent instructions
├── .claude/
│   ├── settings.json  # Tool permissions, model settings
│   └── skills/        # Custom slash commands
├── src/
└── ...
```

A thin `CLAUDE.md` gets you almost nothing. Here's what belongs in it:

**Project identity:** one sentence on what the thing does and who it's for. The agent uses this to calibrate how conservative it should be.

**Folder structure:** a short annotated file tree. Not exhaustive, just enough that the agent knows where to look.

**Coding conventions:** patterns the codebase uses. TypeScript strict mode, no default exports, tests co-located with source. Whatever you've established.

**Off-limits zones:** files or directories the agent should never touch. `dist/`, build artifacts, migration files, anything with customer data.

**Commands:** how to run tests, how to build, how to deploy. The agent will run these. Get them right.

**Banned patterns:** things you don't want introduced. Global state, direct DOM manipulation, synchronous I/O in async contexts. Write them down.

Here's a minimal example:

```markdown

# MyApp

FastAPI backend with a React frontend. Deployed on a single VPS via Docker Compose.

## Structure

src/
  api/          # FastAPI routes
  services/     # Business logic (no DB calls here)
  models/       # SQLAlchemy models
  tests/        # Co-located with source
frontend/
  src/
    components/ # Reusable UI only
    pages/      # Route-level components

## Stack

- Python 3.12, FastAPI, SQLAlchemy 2.x
- React 19, TypeScript strict mode, no default exports
- PostgreSQL (never raw SQL - use the ORM)

## Commands

- Test: `pytest src/`
- Lint: `ruff check .`
- Build: `docker compose build`

## Never Touch

- `alembic/versions/` - migration files are sacred
- `dist/` - compiled output, not source
- `.env` - read it, never write it
```

Specific, direct, no filler. The agent reads it at session start and operates within those rules.

I learned this by getting it wrong first. My early `CLAUDE.md` for ZeroLabs was 12 lines of vague intent ("be careful with the database", "don't break production"). The agent treated that as permission to be cautious about whatever it felt like. It took an afternoon of undoing agent decisions before I sat down and wrote explicit off-limits zones.

### Slash Commands

Slash commands are markdown files for repeatable tasks. Drop them in `.claude/skills/`:

```text
# Example: .claude/
.claude/
└── skills/
    ├── deploy.md      # /deploy
    ├── test.md        # /test
    └── review.md      # /review
```

Type `/deploy` in a session and it runs the instructions in `deploy.md`. Good for anything you'd otherwise explain every time: deploy workflow, PR checklist, the 14-step dance before merging.

### Tool Permissions and Settings

`.claude/settings.json` controls what tools the agent is allowed to run. A baseline:

```json
// File: .claude/settings.json
{
  "permissions": {
    "allow": [
      "Bash(npm test)",
      "Bash(npm run lint)",
      "Bash(docker compose build)",
      "Read",
      "Write",
      "Edit"
    ],
    "deny": [
      "Bash(rm -rf*)",
      "Bash(git push --force*)"
    ]
  }
}
```

Without this, Claude Code asks permission on anything it's uncertain about. Explicit allows and denies save the back-and-forth.

### MCP Servers

Claude Code supports MCP (Model Context Protocol), connecting to external tools as context sources: databases, APIs, custom services.

```json
// File: .claude/settings.json (MCP section)
{
  "mcpServers": {
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres", "postgresql://localhost/myapp"]
    }
  }
}
```

Wire this into `.claude/settings.json` and the agent gets direct read access to your database: schema info, read-only queries, code generation informed by real data. If you're running agents against a self-hosted stack, the [VPS infrastructure guides](/vps-infra/) cover how to expose services securely for MCP access.

---

## GitHub Copilot Coding Agent

### The Two Modes

Copilot ships in two configurations:

**Copilot CLI:** interactive terminal agent, similar to Claude Code. [Reached GA in February 2026](https://github.blog/changelog/2026-02-25-github-copilot-cli-is-now-generally-available/). Run `gh copilot` after installing the GitHub CLI extension.

**Copilot Coding Agent:** async background agent on GitHub.com. Assign an issue and it opens a PR. Built for mid-sized, well-scoped tasks you don't need to babysit.

Both read the same instruction files.

### The copilot-instructions.md File

Place this at `.github/copilot-instructions.md`. VS Code picks it up automatically. The coding agent on GitHub reads it for every issue it processes.

```text
# Example: your-project/
your-project/
├── .github/
│   ├── copilot-instructions.md   # Global instructions for all Copilot modes
│   └── agents/                   # Optional: custom agent profiles
│       └── backend-agent.md
├── src/
└── ...
```

Plain markdown. Same content as `CLAUDE.md`: project context, folder structure, conventions, off-limits zones, commands.

One difference: you can create per-directory instruction files with the `.instructions.md` extension. Put `frontend.instructions.md` in `frontend/` and Copilot applies those rules whenever it touches that subtree.

```text
# Example: frontend/
frontend/
├── frontend.instructions.md   # Rules specific to frontend work
└── src/
    └── ...
```

Useful when your frontend and backend have different conventions.

### The .github/agents/ Directory

Custom agent profiles live in `.github/agents/`. Each markdown file specialises the agent for a type of work:

```markdown

# Backend Agent

Focus on the FastAPI backend only. Never touch the frontend directory.
Run `pytest src/api/` to validate changes. Follow the service layer pattern
in src/services/. All database access goes through the ORM in src/models/.
```

Reference the profile with `--profile backend-agent` when starting a CLI session.

---

## Gemini CLI

### Installation

```bash
# File: terminal
npm install -g @google/gemini-cli
```

Gemini CLI authenticates with your Google account by default: 60 requests per minute, 1,000 per day free. For higher limits, point it at a Google AI Studio API key.

```bash
# File: ~/.zshrc or ~/.bashrc
export GEMINI_API_KEY=your-key-here
```

Run `gemini` from your project directory to start a session.

### The GEMINI.md File

Gemini CLI reads `GEMINI.md` from the current directory upward, like `.gitignore`: every `GEMINI.md` between the working directory and repo root gets merged in order.

```text
# Example: your-project/
your-project/
├── GEMINI.md              # Root-level rules for the whole project
├── backend/
│   └── GEMINI.md          # Backend-specific overrides
└── frontend/
    └── GEMINI.md          # Frontend-specific overrides
```

The `/memory` command adds content to the active session context. Use it for runtime context that doesn't belong in a permanent file.

### settings.json

Gemini CLI's settings file lives at `~/.gemini/settings.json` globally. Per-project settings can sit at the project root.

```json
// File: ~/.gemini/settings.json
{
  "theme": "Default",
  "model": "gemini-2.5-pro",
  "contextFileName": "AGENTS.md"
}
```

That last line matters. It tells Gemini CLI to read `AGENTS.md` instead of `GEMINI.md`.

### MCP Support

Gemini CLI supports MCP servers in `~/.gemini/settings.json`:

```json
// File: ~/.gemini/settings.json (MCP section)
{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/project"]
    }
  }
}
```

---

## How Do You Keep Agent Config Files in Sync?

Three separate instruction files that mostly say the same thing. You update one, forget the other two, and your agents operate on stale rules.

`AGENTS.md` fixes this.

`AGENTS.md` is a cross-tool standard backed by the Agentic AI Foundation (AAIF), hosted under the Linux Foundation. Copilot, Gemini CLI, Cursor, Windsurf, and Aider read it natively. Claude Code reads `AGENTS.md` as a fallback when no `CLAUDE.md` is present, which is why the layout below keeps a thin `CLAUDE.md` that references `AGENTS.md`. Write the shared rules once; each tool picks them up through its own path.

```text
# Example: your-project/ (unified config layout)
your-project/
├── AGENTS.md                         # Source of truth for all agents
├── CLAUDE.md                         # Claude-specific extensions
├── .github/
│   └── copilot-instructions.md       # Copilot-specific extensions
├── GEMINI.md                         # Gemini-specific extensions (or skip if using contextFileName)
└── ...
```

The config hierarchy looks like this:

```mermaid
%% File: diagram - config file hierarchy
graph TD
    A[AGENTS.mdSource of truth] --> B[CLAUDE.mdClaude Code]
    A --> C[.github/copilot-instructions.mdGitHub Copilot]
    A --> D[GEMINI.mdGemini CLI]
    B --> E[Claude Code session]
    C --> F[Copilot CLI / Coding Agent]
    D --> G[Gemini CLI session]
```

Tool-specific files extend `AGENTS.md`. They don't duplicate it.

We hit the sync problem on ZeroLabs before we had this pattern. `CLAUDE.md` said one thing about deploy commands, `copilot-instructions.md` said something slightly different, and Gemini was operating on a third version that hadn't been touched in six weeks. Every agent was working from a different mental model of the same codebase. The thin-wrapper approach killed that entire class of problem in one afternoon.

**CLAUDE.md with AGENTS.md:**

```markdown

# MyApp

See AGENTS.md for project rules, structure, and conventions.

## Claude-specific

- Use TodoWrite to plan multi-step tasks before executing
- Prefer Edit over Write for existing files
- Run `python3 .claude/skills/zero-publish/scripts/validate_draft.py` before marking writing tasks complete
```

**copilot-instructions.md with AGENTS.md:**

```markdown

# MyApp

See AGENTS.md for project rules, structure, and conventions.

## Copilot-specific

- For async background tasks, always open a draft PR before starting work
- Run the full test suite before marking a coding-agent task complete
```

**Gemini CLI: just use contextFileName**

```json
// File: ~/.gemini/settings.json
{
  "contextFileName": "AGENTS.md"
}
```

Gemini CLI reads `AGENTS.md` directly. No separate file needed.

---

## Folder Structure Template

A production-ready structure that works across all three tools:

```text
# Example: your-project/ (production-ready layout)
your-project/
│
├── AGENTS.md                         # Cross-tool instructions (source of truth)
├── CLAUDE.md                         # Claude-specific config (references AGENTS.md)
│
├── .claude/
│   ├── settings.json                 # Tool permissions, MCP servers
│   └── skills/                       # Slash commands
│       ├── deploy.md                 # /deploy
│       └── test.md                   # /test
│
├── .github/
│   ├── copilot-instructions.md       # Copilot-specific config (references AGENTS.md)
│   └── agents/                       # Custom Copilot agent profiles
│       └── backend.md
│
├── src/                              # Your source code
├── tests/                            # Test suite
├── docs/                             # Documentation
│   └── architecture.md               # Include this in AGENTS.md references
│
└── .gemini/
    └── settings.json                 # Gemini CLI project settings
```

This works whether you're using one agent or all three. `AGENTS.md` is the spine. Everything else is a wrapper.

---

## What Should Go in Your AGENTS.md File?

The file does four jobs:

**1. Project identity.** One paragraph on what the codebase is, who uses it, and what it's built with. Not documentation for humans, it's priming for the agent's first read.

**2. Folder map.** An annotated tree of the directories that matter. Not every folder, just the ones that aren't self-explanatory. Where does business logic live? Where do tests go? Where does config come from?

**3. Rules.** Coding standards, naming conventions, banned patterns. Be specific. "Use the repository pattern" is weak. "All database access goes through `src/repositories/`, never call the ORM directly from a route handler" is strong.

**4. Commands.** How to test, lint, build, deploy. The agent will run these. If they're wrong, the agent will fail in the wrong direction.

A "Doctrine" section for higher-level principles is optional but worth it. Error handling philosophy, when to add abstractions. Agents pick up on these and make more consistent decisions.

---

## Context Management

The biggest mistake with coding agents: treating context like free storage.

Every message costs tokens and, more importantly, makes reasoning less precise. Long sessions accumulate drift. The agent starts hedging, second-guessing, losing track.

I've seen quality drops around 70-75% context usage, with real degradation above 85%. When you hit that range, compact or clear.

`/compact` summarises the session and continues. Good for long tasks where you need to preserve progress.

`/clear` resets to zero. Good between distinct tasks or when you notice the agent going in circles.

Gemini CLI's `/memory` stores content for the session. Useful for runtime context, but remember: more memory means a larger context window. Be intentional.

Keep sessions task-scoped. One session per unit of work. The config file handles persistent context. The session handles the task.

---

## What Are the Most Common Agent Setup Mistakes?

**Writing vague instructions.** "Follow best practices" is useless. "All async functions must use explicit error boundaries; never let exceptions bubble silently" is actionable.

**Missing the commands section.** If the agent can't run your tests or your build, it can't verify its work. Write the exact commands, including any environment setup they need.

**Forgetting off-limits zones.** The agent will touch anything it thinks is relevant. (Ask me how I know. Actually, don't.) If files should never change, say so explicitly.

**Updating one config file and not the others.** `AGENTS.md` exists to prevent this. Source of truth in one place, thin wrappers everywhere else.

**Starting sessions with too much already open.** Fresh sessions are cheaper and more focused. Don't start a new task in a session that's already been running for two hours.

---

## FAQs

**Do I need both CLAUDE.md and AGENTS.md, or just one?**

For Claude-only projects, `CLAUDE.md` alone works fine. For repos where you use multiple AI tools, `AGENTS.md` as the source of truth with thin tool-specific wrappers saves maintenance overhead.

**Where does .github/copilot-instructions.md go?**

Repo root, inside `.github/`. VS Code picks it up automatically for any workspace. The Copilot coding agent on GitHub reads it when it processes issues. It works everywhere you'd run Copilot.

**Does Gemini CLI support MCP servers?**

Yes. Configure them in `~/.gemini/settings.json` (global) or in a project-level settings file. The same MCP servers that work with Claude Code mostly work with Gemini CLI too.

**Can I run Claude Code and Copilot on the same repo?**

Fully compatible. The config files coexist, and GitHub's Agent HQ even lets you assign issues to whichever agent you prefer. `AGENTS.md` means both agents operate on the same rules.

**How do I stop Claude Code from forgetting context between sessions?**

`CLAUDE.md` is the persistent layer. Everything in there is available at the start of every session. For deeper persistence across sessions (work in progress, ongoing decisions), Claude Code supports external memory via MCP servers.

**What's the difference between Copilot coding agent and Copilot CLI?**

Copilot coding agent runs asynchronously on GitHub: you assign an issue to it, it works in the background, and opens a PR when it's done. Copilot CLI is an interactive terminal session. Same underlying model, different interaction pattern.

**Is Gemini CLI free?**

Yes, and the free tier is genuinely usable for solo development. The main constraint isn't the rate limits but model access: the free tier defaults to Gemini 2.0 Flash, while Gemini 2.5 Pro requires a paid API key. For most coding tasks Flash is fine, but for complex multi-file refactors or architecture questions the Pro model is noticeably better. Worth upgrading to a Google AI Studio key if you plan to use it as a primary agent rather than occasional spot-checks.

**What goes in CLAUDE.md vs a slash command?**

`CLAUDE.md` is for always-on context: project structure, rules, conventions, commands. Slash commands are for repeatable tasks with multiple steps: your deploy workflow, a PR review checklist, a database migration procedure. If you'd want the agent to know it in every session, it belongs in `CLAUDE.md`. If you'd want to trigger it on demand, make it a slash command.

---

## Where to Go Next

This setup gets you a context-aware agent. The next level: multiple agents for different parts of the workflow, one planning and delegating while others execute.

Claude Code supports this through sub-agents. Set up a coordinator that reads `CLAUDE.md` and spawns task-specific agents with narrower permissions. Scales well for large codebases.

If you're new to `AGENTS.md` and want a ready-to-use template, there's one [coming in the agents zone](/agents/agents-md-template). The [agentic workflow patterns guide](/agents/agentic-workflow-patterns) covers the multi-agent patterns in detail.

The config file setup takes 20 minutes. The productivity difference shows up in the first hour. Worth doing properly once.

And if you're already using Claude Code: open your current `CLAUDE.md` right now. If it's under 20 lines or doesn't have an explicit "never touch" section, that's your first fix. Everything else in this post is optional. That part isn't.

---

## What Are AI Workflows? (The Practical Guide for Builders)
URL: https://labs.zeroshot.studio/ai-workflows/what-are-ai-workflows
Zone: ai-workflows
Tags: ai-workflows, automation, llm, agents, python
Published: 2026-03-31

Most people use AI as a one-shot tool. Workflows are how you make it reliable, repeatable, and actually useful in production. Here's what they are and how they work.

# What Are AI Workflows? (The Practical Guide for Builders)

> **KEY TAKEAWAY**
> * **The Problem:** Using AI as a one-shot tool produces inconsistent results that can't be audited, improved, or reliably handed off to others.
> * **The Solution:** An AI workflow is a defined sequence of steps, triggers, and tool calls that turns an input into a predictable output. It runs the same way every time.
> * **The Result:** Workflows let you build AI-powered systems you can actually trust, debug when they break, and improve over time without starting from scratch.

*Last updated: 2026-03-31 · Tested against Claude claude-sonnet-4-6, n8n v1.x, LangChain v0.3.x*

Most people use AI the same way they use a calculator: type something in, get something out, move on. That works for quick questions. It fails completely for anything you need to do more than once.

We'll cover what workflows actually are, how they work under the hood, and when to reach for one instead of a prompt or a full agent.

## What is an AI workflow, exactly?

An AI workflow is a defined sequence of steps where at least one involves an AI model, and the whole thing runs in a predictable, repeatable order. Each step takes an input, does something with it, and passes a result to the next.

The key word is "defined." You know what runs first, what happens at each branch point, and what the output looks like. That structure is what makes it possible to debug, test, and improve.

**What's [AI workflow]?** A series of connected steps, each with a clear input and output, where one or more steps use an LLM (like Claude or GPT-4) to process, classify, generate, or decide. The workflow is controlled by code or an orchestrator, not by the AI itself.

## How does an AI workflow actually work?

At the mechanical level, most workflows follow the same shape:

```mermaid
flowchart TD
    A[Trigger] --> B[Step 1: Input Processing]
    B --> C[Step 2: LLM Call]
    C --> D{Branch: Decision}
    D -->|Path A| E[Step 3a: Tool Call]
    D -->|Path B| F[Step 3b: Skip]
    E --> G[Step 4: Format Output]
    F --> G
    G --> H[Output / Next Workflow]
```

Something triggers the workflow: a cron job, a webhook, a user action, or the output of another workflow. The trigger hands an input to the first stage, which pre-processes or validates the data.

The LLM typically does its work in the next stage: classify this text, summarise this document, generate a draft, decide which path to take. Based on that output, the workflow branches or continues. Tool calls fire here too: write to a database, call an API, send a message. The final stage formats and delivers the result.

Each step is explicit. Each output is inspectable. When something breaks, you know exactly which step failed.

## What is the difference between a workflow and an AI agent?

A **workflow** is controlled by you. You define the steps, the order, the branches. The AI handles tasks within those steps, but it doesn't decide what happens next. Deterministic by design.

An **agent** is controlled by the AI. You give it a goal and tools, and it figures out the sequence on its own. Flexible, but harder to predict.

Anthropic's [guide to building effective agents](https://www.anthropic.com/research/building-effective-agents) puts it well: prefer workflows when the task has predictable steps, agents when the steps are unknown. The [Claude tool use docs](https://docs.anthropic.com/en/docs/build-with-claude/tool-use) cover wiring tool calls inside individual steps.

In practice, most production systems use workflows for the bulk of operations. Agents handle the genuinely unpredictable parts.

## Why use a workflow instead of just prompting?

Three reasons, and they compound.

**Reliability.** A prompt you type into a chat interface runs differently every time. A workflow runs identically. The context is the same, the tools are the same, the input format is validated. You can run it a hundred times and expect consistent behaviour.

**Debuggability.** When a one-shot prompt goes wrong, you have one place to look: the prompt. When a workflow goes wrong, you have structured logs, step outputs, and a defined execution path. You can pinpoint exactly which step failed. In our content pipeline, we replay individual steps with the same input and see where things went sideways.

**Composability.** Workflows can call other workflows. The output of a summarisation workflow can feed a classification workflow, which feeds a routing workflow. Each piece stays small and testable. You build reliable systems from reliable components.

A raw prompt is a one-time interaction. A workflow is infrastructure.

## What tools can I use to build AI workflows?

The right tool depends on what you're optimising for.

**Code-first options:**
- **Plain Python** with the Anthropic or OpenAI SDK is often the best starting point. No framework overhead, full control, easy to test. We use this for most of our ZeroShot Studio pipelines.
- **LangChain** provides pre-built chains and agent loops. Useful if you want faster scaffolding, less useful when you need full transparency into what's actually happening.
- **Prefect** and **Airflow** are data pipeline orchestrators that work well when AI steps live inside a larger scheduled workflow with retries and monitoring built in.

**No-code / low-code options:**
- **n8n** is an open-source tool with native AI nodes for common LLM tasks. Self-hostable, visual editor, good for teams who don't want to write API calls by hand.
- **Zapier** with AI steps works for simple automation chains. Easier to set up than n8n, less control over the AI-specific parts.

No universal winner here. Code-first gives you control for complex or long-running pipelines. No-code is faster to ship when you're connecting tools and running simple prompts.

## When should I use a workflow instead of a full agent?

Use a workflow when:
- The steps are known in advance
- You need the same result every time for the same input
- The task involves sequential processing of structured data
- You need to debug, test, or hand the system off to someone else
- Cost predictability matters

Use an agent when:
- The steps depend on information that only exists at runtime
- The task requires the AI to reason about what to do next
- You're exploring an open-ended problem space
- Failures are recoverable and experimentation is acceptable

The honest answer: most things that sound like they need an agent are actually workflow problems in disguise. We defaulted to agents on our first two ZeroShot Studio pipelines and rebuilt both as workflows after the third mysterious failure in a month. If you can draw the flowchart in advance, build a workflow. If you can't, consider an agent for the parts you can't chart and wrap a workflow around the rest.

## What do real AI workflows look like in production?

**ZeroSignals** is our Reddit content intelligence pipeline. A cron job collects Reddit posts every four hours and sends them to our VPS via webhook. From there: ingest, generate embeddings with Ollama, classify each post by intent (question, tool request, rant, etc.), score relevance, store in PostgreSQL. A separate workflow surfaces results to the ZeroBlog dashboard. No agent involved. Every step is predetermined: collect, embed, classify, store, surface.

**Zero Publish** is the pipeline that produced this post. Stages: research, write, cleanup, style review, fact check, SEO review, visuals, final review, publish. An LLM does work at each stage, but the sequence is fixed. We built it this way after one too many "published a hallucinated statistic and nobody noticed for three days" incidents. An agent would be more flexible, but predictability matters more for a publishing pipeline.

Both are workflows. Neither has an AI deciding what to do at the macro level. We chose predictability over flexibility, and for production systems that need auditing, that's the right trade every time.

---

## Frequently Asked Questions

**What's the difference between an AI workflow and an AI agent?**

A workflow has a fixed sequence of steps you define in advance. An agent decides its own steps at runtime based on what it observes. Workflows are more predictable and easier to debug. Agents are more flexible when the path to a goal is genuinely unknown.

**Do I need to code to build an AI workflow?**

No. Tools like n8n let you build visual workflows with LLM steps without writing code. That said, code-first approaches give you more control over prompt engineering, error handling, and step-level logging. Start with no-code to prototype; move to code when the workflow needs to be production-grade.

**When should I use a workflow instead of an agent?**

When the steps are known, the output needs to be consistent, and you need to be able to debug failures. Workflows are the right default for production AI systems. Reach for an agent when the task is genuinely open-ended and the steps can't be specified in advance.

**What tools can I use to build AI workflows?**

Code-first: Python with the Anthropic or OpenAI SDK, LangChain for pre-built components, Prefect for scheduled pipelines. No-code: n8n (self-hostable, open-source) or Zapier for simple automation chains.

**How is an AI workflow different from a regular automation like Zapier?**

A regular automation connects apps and moves data: "when X happens, do Y." An AI workflow includes steps where an LLM reasons, classifies, generates, or decides. The AI step handles unstructured data and judgment calls that rule-based automations can't manage.

---

If you're building your first AI-powered system, start with a workflow. Draw the steps, define the inputs and outputs, and pick whatever tool fits your team. Get that running before you think about agents. A boring workflow that works every time beats a clever agent that surprises you on a Friday afternoon.

For building the AI steps inside your workflows with Claude, the [agent loop guide](/agents/building-first-claude-agent) is the right next stop. If you're thinking about where to host the whole thing, the [self-hosting guide](/vps-infra/self-hosting-vps) covers running production workloads on a budget.

---

## Stop Chaining Agents: The Controller Pattern for AI Pipelines
URL: https://labs.zeroshot.studio/ai-workflows/stop-chaining-agents-the-controller-pattern-for-ai-pipelines
Zone: ai-workflows
Tags: ai-pipelines, agents, architecture, workflow, claude
Published: 2026-03-31

Agent-chaining looks elegant until it breaks. Here's why a deterministic controller with stateless stage workers is a better foundation for any multi-step AI pipeline.

> **KEY TAKEAWAY**
> * **The Problem:** Agent-chaining hides control flow inside prompts, burns tokens on repeated context, and makes failures compound in ways that are hard to debug or retry.
> * **The Solution:** A deterministic controller that owns stage transitions, validation, and retries, paired with stateless workers that read inputs and produce structured outputs.
> * **The Result:** Auditable runs, lower ongoing token costs, and a publish gate that can actually hold the line.

Agent-chaining looks elegant on a whiteboard. Research agent finishes, triggers the draft agent. Draft agent finishes, triggers review. Review triggers visuals, visuals trigger publish. One handoff, then the next, and the whole thing flows.

The problem is what happens when something goes wrong.

## Why does agent-chaining feel so natural?

The appeal is real. If you've read anything about [ReAct agents](https://arxiv.org/abs/2210.03629) or multi-agent systems in the last couple of years, the handoff model is everywhere. Mirrors how humans delegate. Feels like the AI is "thinking forward" instead of waiting for instructions. And for simple two-step pipelines, it actually holds up.

We started building our content pipeline this way. Research agent passes a brief to a write agent. Write agent produces a draft and hands it to a review agent. The code was clean. The prompts were clear. The first few runs looked great.

Then we started hitting the edges.

## What actually breaks when agents chain themselves?

The first failure mode is control flow hiding inside prompts. When agent B is triggered by agent A, the logic for "should B run right now?" lives in A's output. Nobody owns the decision except the agent that just finished. If the prior stage produced garbage, B doesn't know, and neither do you until B produces worse garbage downstream.

The second is context bloat. Each agent in a chain tends to carry forward more than it needs: prior outputs, reasoning from earlier stages, fragments of the brief it didn't use. Token counts creep up with every hop. By the time you reach a review or publish step, you're paying for the whole conversation history whether it's relevant or not.

Retries are the worst. If the facts stage fails halfway through a run, who handles that? In a chained system, the answer is usually "the agent that triggered it" or "you, manually." Neither is great. The first requires writing retry logic into every agent prompt. The second defeats the point of automation.

The failure mode we hit hardest was artifact drift. Our style report, facts report, and publish gate were each reading different versions of "the post." No one had told them to stay in sync because nothing was in charge of keeping them there. The chain had no memory of what changed between stages.

## What does the controller pattern look like?

One deterministic controller owns the pipeline. Agents act as stage workers that produce structured outputs.

```mermaid
graph TD
    C[Controller] --> R[Research Worker]
    R -->|brief.md + brief.json| C
    C --> W[Write Worker]
    W -->|draft.md| C
    C --> CL[Cleanup]
    CL -->|patched draft.md| C
    C --> ST[Style Worker]
    ST -->|style-report.json| C
    C --> FA[Facts Worker]
    FA -->|facts-report.json| C
    C --> SE[SEO Worker]
    SE -->|seo-report.json| C
    C --> V[Visuals Worker]
    V -->|visual-manifest.json| C
    C --> FR[Final Review]
    FR -->|review-report.json| C
    C -->|approval gate| P[Publish]
    P -->|publish-result.json| C

    style C fill:#1a1a2e,color:#fff
    style P fill:#16213e,color:#fff
```

The controller reads run state, invokes one worker for one stage, validates that the required artifacts exist, and records the result before advancing. Workers never trigger the next stage.

Our current pipeline runs eleven stages in order: research, write, cleanup, baseline review, style, facts, seo, visuals, final review, publish, live QA. Every stage produces a `stage-result.json` with status, the model used, input and output artifact hashes, and whether any blocking issues were found. If the status isn't `completed`, nothing moves forward.

## How do workers stay stateless?

Each worker reads only the artifacts it needs for its stage. The style worker reads `draft.md` and the style guide. The facts worker reads `draft.md` and the claims list. Neither of them needs to know what happened in the research stage, what the SEO score is, or whether visuals are planned.

That constraint is the feature. Stateless workers are cheap to run. They don't accumulate context from prior stages. They can be retried in isolation without rerunning anything upstream. And when they produce a bad output, the failure is contained to that stage.

Workers produce two things: the stage artifact (a report, a patched draft, a manifest) and a structured result JSON. The result JSON is what the controller trusts. Not the prose summary at the end of an agent response. Not whether the agent "said" it succeeded. The file either exists and passes validation, or the stage is blocked.

## What does the controller actually own?

The controller owns stage order, retries, cooldowns, blocking, the approval gate, and the final publish decision. It applies structured patches from review workers rather than letting agents rewrite freely. Deterministic validators run before and after each stage to confirm the draft improved.

What it doesn't do: trust agent output. An agent that says "the draft looks solid" is not a passing gate. (Ask me how many times we shipped on vibes before wiring that check.) The controller reads the structured artifacts and confirms they exist, are valid JSON, and contain no blocking issues.

The publish step is mechanical. After every upstream stage has passed, the controller runs a final validation suite, then calls the publish endpoint. If any gate fails, it blocks. Pushing past a blocked gate requires an explicit `--override "reason"`, and that reason gets logged into the run state.

## Does this actually cost less to run?

Yes, though the savings compound over time more than they show up in a single run.

The biggest win is that review workers return structured patches rather than full rewrites. A style worker that returns `{"old": "...", "new": "..."}` pairs runs at a fraction of a "please rewrite this entire post with better rhythm" pass. The controller applies the patches deterministically. No second full-context pass to decide whether the post is ready.

Stateless workers also mean you can use lighter models for most stages. Facts extraction and SEO validation don't need the same horsepower as the write stage. We run most review stages on Sonnet and escalate only when a stage blocks and needs a harder call to resolve.

The final review is worth calling out. We deliberately avoided a "main agent rereads the whole post and decides if it's ready" step. That pattern is expensive and adds one more full-context pass where an agent can hallucinate a verdict. Final review in our pipeline is a deterministic merge of the stage reports, not another agent opinion.

## When does agent-chaining still make sense?

Two places. Short pipelines with two or three steps where a failure means you restart everything anyway. And tasks where each step genuinely requires the full live context of the previous step rather than a structured artifact.

For anything with more than three stages, explicit approval gates, or a publish step you can't easily roll back, the controller pattern is worth the upfront wiring.

## Frequently asked questions

**What is the controller pattern for AI pipelines?**
A deterministic controller script owns stage order, retries, and approval gates. AI agents act as stateless workers, invoked by the controller, producing structured artifacts. The controller validates outputs before advancing to the next stage. Workers never trigger the next stage themselves.

**What's wrong with chaining AI agents?**
Control flow hides inside prompts. Retries require agents to handle their own failure. Context accumulates with each hop, increasing token costs. Artifact drift occurs when downstream stages read stale or inconsistent versions of the same content.

**How does this compare to tools like Prefect or Airflow?**
The concept is the same: a scheduler owns task execution, workers stay stateless, and the pipeline is explicit. [Prefect](https://docs.prefect.io/) and AWS Step Functions solve this for data pipelines. The controller pattern applies the same principle to AI agent pipelines where the "tasks" are LLM calls rather than data transforms.

**Do workers need to know about other stages?**
No, and they shouldn't. A worker reads its stage inputs and produces its outputs. The controller holds the run state. Keeping workers ignorant of other stages is what makes them cheap to run and easy to retry.

---

We're still building this out. The controller in our pipeline started as a Python script with a few stage checks and is now a proper state machine with retry logic, cooldowns, and a two-pass remediation cycle before it gives up and blocks. None of that logic lives in a prompt. It lives in code, where it belongs.

If you're building something multi-stage with AI agents and you keep hitting the same debugging sessions trying to figure out which agent handed what to which, it's probably time to pull the control flow out of the prompts.

---

## How I Manage 30+ Docker Services Without Losing My Mind
URL: https://labs.zeroshot.studio/vps-infra/how-i-manage-30-docker-services-without-losing-my-mind
Zone: vps-infra
Tags: docker, devops, self-hosted, vps, infrastructure
Published: 2026-03-31

36 apps, 44 containers, one VPS. Here's the system.

# How I Manage 30+ Docker Services Without Losing My Mind

> **KEY TAKEAWAY**
>
> * **The Problem:** Running 30+ Docker services on a single VPS without a coordination system means every change is a gamble and every outage is a mystery.
> * **The Solution:** A JSON registry as the single source of truth, three sync modes matched to each service's update pattern, a health monitor on 60-second cycles, and a blackboard protocol for shared operator coordination.
> * **The Result:** 36 registered apps, 44 running containers, with nightly automation handling cleanup, drift detection, and backup verification automatically at 03:30 UTC.

*Last updated: 2026-03-31 · Tested against Docker Engine v29.3.1 · Compose v2.35.1 · All metrics from the author's own production VPS.*

The first time I watched a deploy break three unrelated services, I thought I'd made a mistake. The second time, I knew the problem wasn't the deploy. The problem was that I had no system. I was carrying the state of 20+ containers in my head, and heads aren't reliable at 11pm when a health alert fires.

So I built one.

Today I run 36 registered apps across 44 containers on a single VPS. Not all of them are active (a few are hibernated or parked), but all of them are tracked, monitored, and recoverable. Here's the exact setup.

## What does the registry actually do?

Everything starts with the apps.registry.json state file. This single file contains the canonical truth about each service: its name, port, Compose directory, healthcheck URL, database, sync mode, current status, and last deployed commit.

Nothing gets deployed without an entry. Nothing gets monitored without one. The registry is the contract between the deployment process, the health monitor, and the nightly maintenance script.

Port allocation follows a deliberate block structure. Production services live on 3001-3008. Staging occupies 3010-3019. MCP servers start at 3020. Infrastructure like ZeroMemory (3050-3051), Zero-Signals (3060-3061), and Ollama (11434) each have their own band. Readable at a glance: 3003 is always production, 3010 is always staging. No guessing.

## How do you choose a sync mode?

The registry tracks a sync mode field for each service. I use three: **git-deploy**, **live-edit**, and **image-only**.

**git-deploy** is for apps where the source of truth is a remote GitHub repo. The VPS pulls on deploy. There should never be uncommitted local changes on a git-deploy service. If the nightly maintenance script finds any, it flags them as drift: something changed on the server without going through git first. Twelve of my apps use this mode.

**live-edit** is for services where I iterate directly on the VPS and push to git afterward. The nightly script auto-commits uncommitted changes with a timestamp message and pushes to origin. Nine use this mode, mostly infrastructure tools and internal dashboards where the feedback loop needs to be fast.

**image-only** is for services with no source repo on the VPS: n8n, Ollama, MinIO, Plane. They pull from container registries. Lifecycle management is a pull-and-restart flow, not a git workflow. Simple.

The key is matching the mode to how the service actually gets updated, not how you wish it would get updated.

## How do you know when something breaks?

The health monitor runs as a containerized daemon using host networking so it can reach ports bound to 127.0.0.1. It checks each registered service's healthcheck URL every 60 seconds. Container status gets the same interval. System resources (CPU load, memory, disk) check every 5 minutes. Backup freshness and manifest drift both check hourly.

Thresholds are specific, calibrated against what actually matters. Disk warns at 80%, escalates at 90%. Memory warns at 85%, escalates at 95%. If a service restarts 3 times, that's a warning. Ten restarts triggers an alert. Endpoint failures get 2 consecutive misses before a warning fires and 5 before escalation.

Alerts route to Discord and [ntfy.sh](https://ntfy.sh), with a 15-minute cooldown to prevent storms. Routing is graduated: INFO (recovery, normal state) goes to Discord only, WARN adds ntfy.sh, CRITICAL adds a role mention so it actually wakes someone up.

Not every app has a dedicated health endpoint. The registry tracks the actual path per service. ZeroLink uses /api/health, n8n uses /healthz, MinIO uses /minio/health/live. AutoGen Studio falls back to its root route because it has no dedicated check path. OpenClaw, currently hibernated, uses a null health route and gets monitored by container status only. Map your actual endpoints, not the ones you assume exist.

## How do you stop two operators breaking each other's work?

The blackboard protocol handles this. It lives in the shared blackboard file and has three sections: active tickets, locks, and the append-only update log.

Before any operator or agent takes action on the server, they acquire an atomic file lock using O_EXCL (fail if file exists, POSIX-standard). It records the owner and a UTC timestamp, expires after 60 seconds. If you find a stale one, you delete it and proceed, but you document why. No agent may act without an uncontested entry in the locks section.

Every significant operation gets logged to Recent Updates in what I call Gold Standard format: who, when, which ticket, the classification (DESTRUCTIVE, MIGRATION-GRADE, BACKUP-GRADE, FIX, or INSPECTION), and bullet-point evidence showing commands run, paths touched, and outcomes verified. History is append-only. Corrections go in as new entries prefixed with "Correction:", not edits to the original.

When the Recent Updates section exceeds 50 entries, the oldest 40 get archived to a dated file and a breadcrumb stays in the main blackboard pointing to the archive. This keeps the active file readable without losing history.

The blackboard might sound like bureaucratic overhead. Running multiple agents and operators on the same server without it is worse. I learned that the hard way when an incident in early 2026 required me to reconstruct what changed across three sessions with no shared record. It took twice as long as it should have.

## What runs automatically every night?

Nightly maintenance fires at 03:30 UTC. Ten steps, fully automated.

Step 1 prunes dangling images, exited containers (except anything named "debug" or "keep"), and dangling volumes. Named volumes stay. Build cache older than 7 days gets cleared.

Step 2 runs checks across all registered services, the same curl approach the monitor uses. HTTP 2xx-4xx counts as responding. 5xx is a server error. A connection timeout or refusal means the service is down. This gives me a nightly snapshot independent of the real-time daemon.

Step 3 checks backup freshness: anything older than 26 hours triggers a warning, older than 48 hours escalates.

Steps 4 and 5 log system resources and regenerate the server manifest from live Docker state. The manifest snapshot is what makes drift detection possible.

Step 6 cleans the workspace: removes old content workspaces, with a safety check for code files before deleting anything.

Step 7 checks source control. Live-edit apps with uncommitted changes get auto-committed with a timestamp. Git-deploy apps with local changes get flagged as drift and logged.

Step 8 sends a Telegram summary. Steps 9 and 10 sync git and rotate old logs. The whole thing takes under two minutes.

## How do you catch configuration drift?

The manifest snapshot from step 5 gets compared against the last-approved manifest baseline. The monitor also runs this comparison every hour.

A discrepancy means something changed in the live server state that isn't reflected in what was last approved: a new container appeared, a port changed, a healthcheck status flipped. Most changes are intentional (a new deploy). The protocol is to update the baseline afterward so the comparison stays meaningful.

The manifest itself is detailed: hostname and IP, OS version, kernel, hardware, engine version, running containers with their images, port mappings, status, and app directory. A full snapshot of server state at a point in time.

Running a diff between current and last-approved is often the fastest way to figure out what changed when something unexpected happens.

This is the flow I actually optimize for: one registry feeding monitoring, nightly maintenance, drift checks, and alerts instead of a pile of disconnected scripts.

![System flow from registry to health checks, maintenance, drift detection, and alerts.](/api/images/1775198874919-how-i-manage-30-docker-services-without-losing-my-mind-diagr.svg "The registry drives monitoring, nightly maintenance, manifest snapshots, and alert routing.")

## Frequently Asked Questions

**How do you handle rollbacks?**

For git-deploy apps, rollback is simply checking out the previous commit on the VPS and rebuilding the Compose service. The registry tracks the last deployed commit per app so I always know what's running. For image-only apps like n8n, it's pulling a specific tagged version and restarting.

```bash
git checkout 
docker compose up -d --build
```

**What happens if the health monitor itself goes down?**

The nightly check in step 2 is independent of the monitor daemon. It runs from the maintenance script directly. If the monitor container is down, I'd lose real-time alerting but the nightly sweep would still catch issues. The monitor container is also in the registry and gets checked like everything else.

**Do you need separate VPSes for staging and production?**

Not at this scale. Port block separation (production 3001-3008, staging 3010-3019) handles isolation. For strict compliance requirements, a separate VPS makes sense. For a self-hosted stack, port-based separation works fine.

**How do you manage secrets across 36 apps?**

Environment files per service, stored in the app directory on the VPS and never committed to the repo. The auto-sync cron explicitly excludes .env files. ZeroVault handles shared credentials across services that need them.

**What's the biggest thing this system doesn't solve?**

Deployment rollouts. Everything here assumes you have working containers. If a new build is broken, you're still reading Docker Compose logs like everyone else. The registry and monitoring catch the symptoms, but you still have to fix the code.

---

One rule underlies all of it: the server should never know something that isn't also in git or the registry. State changes get recorded. Operations get logged. Each night, automation validates what's running against what should be.

It's not zero-ops. But it's close enough that I can close my laptop and sleep without worrying about it. Most nights, anyway.

If the Claude Code side of this setup interests you, [how I cut token usage per session](/vps-infra/save-tokens-claude-code-instructions) covers the instruction architecture that makes operating this stack cheaper. The [AI workflows zone](/ai-workflows) has more on running agents against a self-hosted stack. And if you want to compare notes on fleet management, I'm on [Twitter/X](https://x.com/jimmygoode).

---

## Maintenance Mode: Why the Best Developers Treat Themselves Like Production Systems
URL: https://labs.zeroshot.studio/maintenance-mode/maintenance-mode-why-the-best-developers-treat-themselves-like-production-systems
Zone: maintenance-mode
Tags: burnout, adhd, mental-health, developer-wellness, ai-coding, vibe-coding, productivity
Published: 2026-03-31

You monitor your servers around the clock. You have alerting, health checks, graceful degradation. So why does the system that crashes most often have zero observability?

> **KEY TAKEAWAY**
>
> * **The Problem:** AI-assisted coding amplifies cognitive load and accelerates burnout, especially for developers with ADHD who already ride the hyperfocus-crash cycle
> * **The Solution:** Apply the same SRE principles you use on production systems to yourself: maintenance windows, personal SLOs, graceful degradation, and scheduled downtime
> * **The Result:** Sustainable output without the crash-and-recover pattern that eventually takes you offline permanently

*Last updated: 2026-03-31*

You have health checks on your containers. Alerting on your database. Automated failover when a pod goes down. You probably have better observability on a $5/month VPS than you do on the system running the whole operation: you.

This is the zone nobody wants to write about, and the one that matters more than any deploy pipeline or agent orchestration layer. Maintenance Mode is about keeping the human in the loop operational.

I run on ADHD. I code with AI all day. I''ve shipped entire features in single hyperfocus sessions that felt like time travel, then spent the next two days unable to open a terminal. If that pattern sounds familiar, this post is for you.

## Why does AI-assisted development burn people out faster?

Vibe coding with AI introduces a kind of fatigue our industry hasn''t named yet.

BCG published research in March 2026 calling it ["AI brain fry"](https://hbr.org/2026/03/when-using-ai-leads-to-brain-fry). Their study of 1,488 workers found that 14% already experience it. You''re not just writing code anymore. You''re reviewing AI-generated output, deciding whether to trust it, holding your own mental model while evaluating someone else''s, and context-switching between creation and QA dozens of times per hour.

That''s a different cognitive pattern than typing code yourself. You''re running two mental processes in parallel: your intent and the AI''s output. Every time they diverge, your brain burns energy reconciling them.

A [Faros AI study](https://www.faros.ai/blog/ai-software-engineering) across 10,000 developers found that AI users were actually 19% slower on average, despite believing they were faster. Not bad tooling. The invisible overhead of constant evaluation and judgement calls.

Stack that on top of already-rising exhaustion across the industry. [LeadDev''s 2025 survey](https://leaddev.com/culture/engineering-burnout-rising-2025-layoffs-reshape-tech-industry) found 22% of engineering leaders at critical burnout levels, with another 24% at moderate. Those numbers were measured before AI coding tools became the default workflow.

## What makes the ADHD-developer burnout cycle different?

Standard recovery advice doesn''t work for ADHD brains. "Take regular breaks" sounds great until you''re six hours into a hyperfocus session and your brain is producing more dopamine than it has all week. Breaking that flow doesn''t feel like rest. It feels like someone unplugging you mid-download.

The [Stack Overflow blog](https://stackoverflow.blog/2024/05/10/between-hyper-focus-and-burnout-developing-with-adhd/) covered this in 2024: ADHD developers exist in a cycle between hyperfocus and burnout that neurotypical productivity frameworks don''t account for. The hyperfocus isn''t optional. It''s how our brains compensate for the executive function gaps that make "normal" sustained focus difficult.

You hit a problem that''s interesting enough to lock in. The world drops away. You ship an unreasonable amount of work in one session. Then you hit the wall. Hard. Not "I''m a bit tired" but "I can''t form a coherent thought and opening Slack gives me anxiety."

[ADDitude Magazine](https://www.additudemag.com/adhd-hyperfocus-crash/) calls it the hyperfocus let-down. As they put it: "The ADHD brain''s dopamine system works differently, requiring more stimulation to feel motivated, which is why hyperfocus states are so consuming and why the crash afterward feels so complete." When it runs dry, everything becomes harder: focus, motivation, basic task initiation.

AI coding tools pour fuel on this. Prompt goes in, code comes out, and that instant feedback loop creates exactly the rapid-reward cycle ADHD brains lock onto. Vibe coding is hyperfocus fuel. It makes the productive sprints more intense and the crashes deeper.

## How do you apply SRE thinking to yourself?

Your brain is a production system. I stopped treating that as a metaphor and started treating it as an operational reality.

You''d never run a production server at 100% CPU continuously and expect it to perform well. You wouldn''t skip maintenance windows because the system is "fine right now." And you''d never ignore degraded performance metrics because the service is technically still responding.

But that''s exactly how most of us treat ourselves.

SRE vocabulary works better than wellness culture here:

**Uptime is not 100%.** No production system targets 100% uptime, and neither should you. Five-nines availability on a server is 99.999%, which still allows for 5 minutes of downtime per year. Your personal SLO should be even more generous. The goal is sustainable throughput, not maximum throughput.

**Maintenance windows are scheduled, not reactive.** You don''t wait for a server to fall over before rebooting it. Schedule your downtime before your system forces it. For me that means hard stops at specific times, not when I "feel tired" (because ADHD brains are terrible at interoception and I''ll feel tired approximately never until I''m already crashed).

**Graceful degradation beats hard failure.** When a system is under load, you shed non-essential traffic. When your brain is approaching its limit, shed non-essential decisions. Close Slack. Leave the PRs. Drop the context-switching. Protect the one thing that''s actually producing value and let everything else wait.

**Incident response needs a runbook.** When you do crash, and you will, have a recovery protocol that doesn''t depend on willpower or motivation, because those are the exact resources you''ve depleted. Mine involves a specific playlist, a specific location, and absolutely no screens for a minimum of two hours.

## What does maintenance mode actually look like in practice?

Systems, not willpower. Stop relying on the thing ADHD brains are worst at: consistent self-regulation through executive function.

**90-minute blocks with hard boundaries.** Not "work until I feel like stopping." An actual timer. When it fires, I stand up regardless of where I am in the task. I learned this the hard way: "just five more minutes" with ADHD is never five minutes. It''s two hours and a missed meal.

**Context dumps before shutdown.** Before stopping any session, I spend 3 minutes writing down exactly where I am, what the next step is, and any state that won''t survive a break. ADHD working memory is a whiteboard that someone wipes clean every time you look away. Anything not written down is gone.

**No-decisions-after-6pm rule.** Decision fatigue compounds through the day, and AI-assisted development accelerates it because every AI output requires a judgement call: accept, reject, modify. By evening, my ability to choose well has degraded enough that any call I make is likely wrong. So I stop making them.

**One offline day per week.** Not "no coding." No screens with work on them. This is the weekly maintenance window. No tickets, no PRs, no "quick check on that deploy." If the production system can''t survive 24 hours without you, that''s a reliability problem.

**Energy auditing.** I track which activities drain versus restore cognitive capacity. Debugging an [agent pipeline](/agents)? Draining. Sketching system architecture on paper? Restorative. Reviewing AI-generated code for correctness? Extremely draining. Riding a motorcycle? Hard reset. This lets me schedule work like I''d schedule jobs on a server: expensive operations go in peak windows, cheap tasks fill the gaps.

## What does a personal SLO actually look like?

Concrete. Measurable. Not aspirational.

Here''s what mine look like (yours will differ, tune them to your own system):

- **No deep-work session exceeds 3 hours** without a hard 30-minute break. Not negotiable. Not "unless I''m in flow." Especially if I''m in flow. That''s when ADHD brains are most at risk of burning through their reserves.
- **Maximum 2 major context switches per day.** Switching between projects or problem domains has a higher cognitive cost than most people realise. For ADHD brains, each switch can cost 20-30 minutes of ramp-up time. Two per day is my limit.
- **Weekly maintenance window is sacred.** One full day offline. 52 out of 52 weeks. This is my 99.x% uptime target for the year. Skip the maintenance window, and unplanned downtime follows.
- **Crash recovery protocol activates immediately.** No "push through it." When I recognise the signs (task initiation failure, irritability, inability to hold a thought for more than 30 seconds), the runbook fires. Screen off. Walk. Music. Minimum two hours before I reassess.

These aren''t goals. They''re operational constraints. A rate limit on an API isn''t aspirational. It''s what keeps the system stable.

## How do AI coding tools change this equation?

AI tools are amplifiers. They multiply your productive capacity during hyperfocus, and they deepen the crash when it comes.

Vibe coding with Claude or Copilot creates a tight feedback loop that ADHD brains find irresistible. Prompt, result, evaluate, prompt, result, evaluate. Each cycle delivers a small dopamine hit. You''re shipping features at a pace that would have taken days, finishing in hours.

But each evaluation cycle costs cognitive energy. And because the feedback is instant, you burn through reserves faster than writing code manually. Manual coding has natural governors: thinking time, typing time, compile time. AI removes them. You accelerate without friction until you hit the wall.

So keep using them. Just respect the operating limits the same way you would with any high-performance tool.

A chainsaw cuts faster than a handsaw. That''s why it has more safety features, not fewer.

## Frequently Asked Questions

**Is ADHD actually common in software development, or just a Silicon Valley stereotype?**

The data suggests it''s genuinely more prevalent. Analysis of the [2022 Stack Overflow Developer Survey](https://stackoverflow.blog/2023/12/26/developer-with-adhd-youre-not-alone/) found roughly 10.5% of developers reported concentration or memory difficulties consistent with ADHD, compared to the general population prevalence of 4-5%. The self-selection makes sense: ADHD brains are drawn to work that rewards novel problem-solving and intense short-duration focus, which describes most of software development.

**I don''t have ADHD. Does any of this apply to me?**

Most of it applies to anyone doing AI-assisted development. The cognitive load patterns, the decision fatigue accumulation, the missing recovery systems are universal. The same applies to anyone running [Claude, Copilot, or multi-agent workflows](/ai-workflows) at scale. ADHD just makes the consequences hit faster and harder. Think of it as the canary in the coal mine: if these patterns burn out ADHD developers first, they''ll eventually reach everyone.

**How do I bring this up at work without sounding like I''m making excuses?**

Frame it in systems language your team already understands. "I''m implementing maintenance windows to prevent unplanned outages" lands better than "I need mental health breaks." You''re not asking for special treatment. You''re applying the same reliability engineering principles your team uses on production systems, to the most critical system in the pipeline.

**What tools actually help with managing this?**

Timers that you can''t dismiss (I use a physical one, not an app, because apps are too easy to ignore). A paper notebook for context dumps (screens during breaks aren''t breaks). A calendar with blocked time that colleagues can see but can''t book over. These are boring, low-tech solutions. That''s the point. The system has to work when your executive function is at its worst, not its best.

---

The best developers I know don''t have more discipline than the rest of us. They have better systems. They build the same guardrails around themselves that they build around their production infrastructure, because they''ve learned, usually the hard way, that the most expensive outage is always the one running the whole operation.

Your servers have health checks. Your CI pipeline has gates. Your database has automated backups.

Build the same thing for yourself. That''s maintenance mode.

---

## The AI News I Actually Follow (And Everything I Ignore)
URL: https://labs.zeroshot.studio/news/the-ai-news-i-actually-follow-(and-everything-i-ignore)
Zone: news
Tags: ai-news, signal-noise, builders, editorial, curation
Published: 2026-03-31

There's more AI news every day than anyone can read. Here's the filter I use to decide what's worth covering, and what gets ignored.

> **KEY TAKEAWAY**
> * **The Problem:** AI news moves faster than anyone can track, and most of it doesn't change anything about how you actually build.
> * **The Solution:** Cover only what shifts the tools, infrastructure, or workflows builders rely on. Ignore the rest.
> * **The Result:** A news zone with a clear filter: if it doesn't change how I build, deploy, or operate, it doesn't get a post.

---

*Last updated: 2026-03-31*

## Why is there so much AI news that doesn't matter?

The AI news cycle has a structural problem. Announcements move fast, benchmarks drop weekly, and every model release comes with a press kit full of claims that require six months of real use to verify. Most of it won't change anything you're doing on Monday.

I've tried newsletters, RSS feeds, Slack channels. All of them eventually became the same thing: a pile of updates with no signal about which ones required action. Volume isn't the problem. The missing filter is.

The ZeroLabs news zone exists because I track this stuff anyway. I'm building on top of these tools, running them in production, deciding which updates warrant changing something about how I operate. Publishing the analysis makes it useful to others, but the filter existed before the zone did.

## What actually makes a story worth covering?

The test is simple: does this change something I build, deploy, or run?

A new model dropping with a spec sheet and benchmarks doesn't automatically pass that test. Spec sheets don't pass the filter. A model that improves reasoning on the tasks I actually use it for, once I've verified it, might.

A few concrete triggers:

**The toolchain changes.** When OpenAI [acquired Astral](/news/openai-acquires-astral), the team behind [uv](https://docs.astral.sh/uv/), [Ruff](https://docs.astral.sh/ruff/), and ty, that warranted coverage because it affects the Python tooling stack I use daily. What it meant immediately for dependency planning, not what it might mean eventually.

**The operating model changes.** When Anthropic shipped [Claude Code Channels](/news/anthropic-claude-code-telegram-discord), direct Telegram and Discord integrations, that changed how you can wire Claude into an existing team workflow. Operational, immediate, worth knowing.

**The infrastructure shifts.** Pricing changes, API limits, context window updates, deployment model changes: anything that affects how you actually run AI in a system gets covered when it's confirmed and actionable.

## What doesn't get covered?

Benchmark announcements that aren't backed by independent testing. Capability demos that haven't shipped to API. "AI will transform X" takes with no implementation path attached. Rumour-driven pieces where the signal is thin.

Also: AI ethics coverage, regulatory commentary, and investment news. Those have other homes. This zone is for builders who want to know what changed in the tools and infrastructure they depend on. That's context, not something you act on Monday morning.

## How often does this zone update?

When something worth covering happens. There's no editorial schedule. Some weeks see multiple posts. Some months see none. That's intentional: the filter matters more than the cadence. A quiet month means nothing shipped that changed anything.

If you want to stay current, bookmark the zone. The posts here are written to be findable, not just timely: the analysis holds up longer than most news summaries because it's grounded in what the change means for actual use.

## Frequently asked questions

**Do you cover every model release?**

No. A release gets covered when the capability change affects existing workflows or introduces a new deployment option. "X beats Y on benchmark Z" doesn't get a post unless that benchmark maps to something builders actually use.

**What counts as builder-relevant?**

If someone building with AI would need to make a decision based on the news: update a dependency, change a prompt strategy, reconsider a tool choice. If it's interesting but requires no action, it probably doesn't make the cut.

**How is this different from other AI newsletters?**

Most AI newsletters aim for completeness. This zone aims for specificity. The audience is people who are already building with AI, not people getting an overview of the space. That changes what gets in.

**Are the posts opinionated?**

Yes. Every post includes analysis. If a change has implications, they get named. Useful coverage requires having a position, so I take one.

---

If this sounds like the filter you've been looking for, [browse the news zone](/news) and see what's shipped. If you want the technical depth behind the changes, the [AI Workflows](/ai-workflows), [Agents](/agents), and [VPS & Infra](/vps-infra) zones cover how things actually get built.

---

## What's in the ZeroLabs Resource Library (and Where to Start)
URL: https://labs.zeroshot.studio/resources/what's-in-the-zerolabs-resource-library-(and-where-to-start)
Zone: resources
Tags: ai-for-business, resources, free-course, templates, founders
Published: 2026-03-31

A map of the resources zone: a free 7-part AI for Business course plus standalone tools and templates, all built from real production use.

> **KEY TAKEAWAY**
> * **The Problem:** Most AI guides are written for people who already know what they're doing, which leaves founders and operators with no clear starting point.
> * **The Solution:** The ZeroLabs resource library collects production-tested tools, templates, and a free 7-part AI for Business course, organised so you can start anywhere.
> * **The Result:** One clear entry point for non-technical teams who want to use AI without wading through theory.

---

*Last updated: 2026-03-31*

## What is this zone, and is it for you?

I built the AI for Business course because I kept having the same conversation with founders who were busy and overwhelmed by AI coverage that either talked down to them or assumed they were engineers. The templates started the same way: built for my own content workflow, shared because they worked.

This section of ZeroLabs collects that work. Practical resources for founders, team leads, and solo operators who want to use AI without wading through theory. Founders, team leads, solo operators, people who've opened ChatGPT twice and felt like they were doing it wrong. Everything here was built from real use -- not written to fill a content calendar. The templates come from actual projects. The course content reflects problems I've run into building and running teams that use AI daily. None of it assumes a technical background.

## How is the library organised?

The resource library runs two tracks. You don't have to pick one forever, but knowing which to start on saves time.

- **The AI for Business course** -- a free, 7-part series for founders and team leads who want a working understanding of AI without touching code. It covers how AI makes decisions, how to get your team using it safely, how to make it sound like your brand, and exercises that generate outputs you can use the same day. Start with [the overview](/resources/ai-for-business-free-practical-course), then work through the lessons in order. The first one takes about 20 minutes.

- **Standalone tools and templates** -- if you have a specific problem, grab what you need and get back to work. The [blog post template](/resources/blog-post-template-gemini) was built against what AI search systems actually look for in 2026. The [GEO and E-E-A-T guide](/resources/geo-eeat-get-your-content-cited-by-ai) covers how to get your content cited by language models, not just ranked by Google, based on the [Princeton GEO research](https://arxiv.org/abs/2311.09735) on how generative engines select and cite sources. These work independently. You don't need to do the course first.

## Why does this library exist?

Every resource here comes from something I actually needed -- not something I thought other people would find useful in theory.

That's the bar: if it's not something I'd use, it doesn't go in. Production-tested means these aren't drafts. They've been run against actual projects, paying clients, and live publishing pipelines. When something stops working, I update it.

## Where should I start?

If you're new here and not sure which track fits, [start with the course overview](/resources/ai-for-business-free-practical-course). It's three minutes to read and tells you exactly what you'll get from the full seven lessons. If you're already running AI experiments and just need a specific tool, browse the full resources list and grab whatever looks relevant.

If you're a developer or want to go deeper on automation, the [AI Workflows zone](/ai-workflows) has more technical content. The [Agents zone](/agents) covers building AI that does actual work for you, not just generates text.

## Frequently asked questions

**Is the AI for Business course actually free?**

Yes, entirely. No email gate, no paywall. Seven posts, all published and publicly accessible.

**Do I need to know how to code?**

No. The course and most of the standalone resources are written for non-technical readers. The GEO and E-E-A-T content gets slightly technical in places, but the practical steps don't require writing code. If you want to go further, [Anthropic's prompt engineering documentation](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview) is well-written and accessible without a developer background.

**Where should I start if I've never used AI tools before?**

Lesson one of the course: [Your AI Action Plan](/resources/your-ai-action-plan). It's a 20-minute exercise that gives you 5 real challenges to test AI on in your own work. By the end you'll know whether this is worth your time, and you'll have done it rather than just read about it.

**How often does the library get updated?**

When I build new tools or existing ones need updating. There's no content schedule here. If something changes (and things change fast in AI), I revise the posts and the dates reflect it.

---

Start with what you need. If you're not sure, [the course is the clearest on-ramp](/resources/ai-for-business-free-practical-course). Everything else builds from there.

---

## Why We Self-Host Our Stack at ZeroLabs
URL: https://labs.zeroshot.studio/vps-infra/why-we-self-host-our-stack-at-zerolabs
Zone: vps-infra
Tags: self-hosting, vps, infrastructure, postgresql, docker
Published: 2026-03-31

ZeroLabs runs on a VPS by design. Here is the infrastructure we use, the services we keep close, and why fewer vendors usually makes a small team faster.

> **KEY TAKEAWAY**
> * **The Problem:** Modern app stacks quietly turn into a pile of separate bills, dashboards, webhooks, and failure points long before the product itself gets complicated.
> * **The Solution:** We keep the core of ZeroLabs on one VPS with Docker, PostgreSQL, Nginx, private networking, and self-managed service choices where they give us real control.
> * **The Result:** We operate multiple live apps and internal systems on one understandable stack, with fewer moving parts to pay for, debug, and stitch together.

*Last updated: 2026-03-31 · Tested against Ubuntu 24.04, Docker 29.3.1, and PostgreSQL 16.13*

## Contents

1. [Why does SaaS sprawl hurt small teams?](#why-does-saas-sprawl-hurt-small-teams)
2. [What does the ZeroLabs stack actually look like today?](#what-does-the-zerolabs-stack-actually-look-like-today)
3. [Which services do we prefer to self-manage, and why?](#which-services-do-we-prefer-to-self-manage-and-why)
4. [Where does self-hosting save money and complexity?](#where-does-self-hosting-save-money-and-complexity)
5. [How should you decide what belongs on your VPS, and when should you stop?](#how-should-you-decide-what-belongs-on-your-vps-and-when-should-you-stop)
6. [Frequently asked questions](#frequently-asked-questions)

## Why does SaaS sprawl hurt small teams?

Self-hosting starts to make sense the moment the glue code becomes the real product. A tiny team can launch quickly with managed services, but the stack often grows sideways before it grows up. Hosting lives in one dashboard, auth in another, the database in a third, mail in a fourth, and now every feature asks you to wire permissions, env vars, webhooks, billing, and logs across all of them.

The price creep is not imaginary. At the time of writing, [Vercel Pro starts at $20 per month plus usage](https://vercel.com/pricing), [Clerk Pro starts at $25 per month](https://clerk.com/pricing), and [Neon’s Launch tier lists a typical spend of $15 per month](https://neon.com/pricing). None of those numbers are outrageous on their own. Stack three or four together, then add storage, email, monitoring, and seats, and your “simple” app starts feeling like a tray of subscriptions held together with hope and webhook retries.

The bigger problem is not the first invoice. It is operational shape. Every extra vendor becomes another place where state can drift, permissions can be wrong, DNS can get weird, or an integration can fail in a way no single provider can see clearly.

We learned this the usual way, with hands on the keyboard and mild regret. In our experience, once we run more than one product, “best-in-class for every layer” often becomes “best-in-class at making incident response annoying.”

> **The reality:** Small teams usually do not drown in raw infrastructure first. They drown in coordination overhead between many separate services.

## What does the ZeroLabs stack actually look like today?

ZeroLabs is not theory for us. It sits inside a live stack we can inspect. Our current VPS manifest shows Ubuntu 24.04 on a 12-core machine with 23 GB of RAM, Docker 29.3.1, PostgreSQL 16.13, Nginx, and 44 running containers supporting public apps and internal systems.

That does not mean one giant mystery box. It means one controlled base layer. In our setup, we run ZeroLabs as a container behind Nginx, and we run ZeroContentPipeline as a separate module. ZeroMemory, Zero-Signals, and OpenClaw-related services live alongside them, but they share the same operational habits: containers, private networking where possible, one PostgreSQL estate we understand, and one place to reason about deployment state.

![Architecture diagram showing ZeroLabs running on one Ubuntu VPS with Nginx, Docker, PostgreSQL, S3-compatible storage, and connected internal services](/api/images/1774978789326-why-we-self-host-our-stack-at-zerolabs-diagram-01.svg "ZeroLabs runs multiple services on one self-managed core instead of scattering them across unrelated platforms.")

Here is the shape in plain English:

| Layer | What we use | Why it stays close |
|---|---|---|
| Host OS | Ubuntu 24.04 | Boring, stable, well-documented base |
| Runtime | Docker + Compose | Repeatable deploys without hand-built snowflakes |
| Reverse proxy | Nginx | One front door for domains, TLS, and routing |
| Database | PostgreSQL 16.13 | General-purpose relational core we trust |
| Object storage | S3-compatible tooling where useful | One storage model across apps |
| Private access | Tailscale-style private networking and local bindings | Keep internal surfaces off the public internet |
| App layer | ZeroLabs, ZeroContentPipeline, Zero-Signals, ZeroMemory | Separate services, shared operating model |

That is the heart of our VPS & Infra philosophy. One box is not the strategy forever. One understandable control plane is.

For a blog and adjacent internal tools, that trade makes sense. We do not need five separate platform opinions before breakfast. We need a stack we can inspect, back up, and repair without opening a tab cemetery.

![Sanitized proof screenshot showing live container status for ZeroLabs, ZeroContentPipeline, ZeroMemory MCP, Zero-Signals, and related services](/api/images/1774978789466-why-we-self-host-our-stack-at-zerolabs-proof-step-01.png "Sanitized from real live docker status output on 2026-03-31. Ports, IPs, usernames, and host paths were removed before publishing.")

## Which services do we prefer to self-manage, and why?

We lean self-managed for the parts of the stack that hold core state or define the shape of the app. That usually means hosting, databases, object storage, background workers, private APIs, and sometimes auth. The test is simple: if the thing becomes painful to move later, we would rather own the boring version early.

### Hosting

Hosting is the obvious starting point. A VPS plus Docker and Nginx gives us predictable behavior, direct network control, and clean boundaries between services. It also keeps app-to-app traffic local instead of bouncing through three vendors and back again.

Managed hosting can be brilliant for some teams. We still recommend it for people who need to launch this afternoon. But once you already operate multiple services, paying for separate compute wrappers on top of the same Linux basics starts to feel like buying your own tools back one panel at a time.

### Database

We strongly prefer keeping the main database close. [PostgreSQL](https://www.postgresql.org/docs/current/index.html) is mature, flexible, and boring in the best possible way. It handles transactional data, search extensions, queues, analytics side tables, and half the weird ideas builders come up with at midnight.

Self-managing the database does not mean pretending backups and replication are optional. It means those responsibilities stay visible. We would rather own backup policy, restore drills, extensions, and network exposure explicitly than discover six months later that our data model quietly bent around a platform default we never chose.

### Auth

Auth needs nuance. We prefer self-managed auth as a service, not home-grown auth as a hobby. Those are wildly different sentences.

If we self-manage auth, the goal is usually to run a proper identity layer such as Keycloak or a similar product on our infrastructure, not to write password reset flows from scratch while staring into the void. The official [Keycloak docs](https://www.keycloak.org/server/configuration) are useful here because they make the production stance very clear: secure by default, with hostname and HTTPS/TLS expected. That is the right attitude.

Why keep auth close at all? Because auth leaks into everything. Session rules, roles, organization boundaries, internal tools, service-to-service trust, and audit trails all get easier when identity lives in the same world as the rest of your app rather than in a separate billing model with its own product roadmap.

### Object storage

Object storage is one of our favourite self-managed wins because it is easy to keep the interface standard. [MinIO’s S3 API layer](https://www.min.io/product/aistor/s3-api) is built around AWS S3 compatibility, including the “no code changes required” migration story. That matters.

**What is S3-compatible storage?** It means your app talks the same bucket-and-object language whether the storage sits in AWS or on your own box. For a small team, that is gold. You can keep the app code simple, avoid vendor-specific file APIs, and move later without sawing the whole storage layer off the product.

### Mail

Mail is where ideology gets people into trouble. We prefer controlling mail flows and domains, but we are not romantic about deliverability. A mail stack is not just “send email.” It is DNS records, reputation, bounce handling, suppression lists, and a long slow conversation with inbox providers.

So our principle is tighter than “self-host all mail.” We want mail logic to stay close to the app, and we want the delivery path to stay understandable. Sometimes that means a fully self-managed mail layer. Sometimes it means a relay or provider for the last mile, especially when deliverability matters more than purity. Even then, we still want one coherent system, not three overlapping notification vendors and a prayer.

## Where does self-hosting save money and complexity?

The cleanest savings show up in three places: base spend, duplicated platform features, and debugging time.

### Base spend

The first win is that one VPS can replace several “starter” subscriptions. That does not mean a VPS is always cheaper in every scenario. It means the economics often swing early once you are running more than one service.

![Comparison flowchart showing vendor-per-layer SaaS sprawl on one side and a self-managed VPS core with selective edge services on the other](/api/images/1774978789579-why-we-self-host-our-stack-at-zerolabs-diagram-02.svg "The point is not to self-host everything. It is to keep the core state close and buy specialty only where it genuinely earns its place.")

Take a very ordinary modern stack:

| Capability | Common managed choice | Current list-price example | Self-managed direction |
|---|---|---|---|
| Hosting | Vercel | [Pro from $20/mo plus usage](https://vercel.com/pricing) | VPS compute you already control |
| Database | Neon | [Launch typical spend $15/mo](https://neon.com/pricing) | PostgreSQL on the same box or private DB host |
| Auth | Clerk | [Pro from $25/mo](https://clerk.com/pricing) | Self-managed identity service |
| Mail | Resend | Provider plan plus extras like [dedicated IPs at $30/mo](https://resend.com/pricing) | Self-managed mail flow or controlled relay |

Again, none of those are bad products. We use products like this all the time when they are the right fit. The point is structural: your stack can start charging rent in four directions before your app has even found its voice.

### Duplicated platform features

Managed services often overlap in sneaky ways. You pay for logs in one place, traces in another, auth events in a third, and image or object storage in a fourth. You end up buying “helpful convenience” multiple times.

When we keep the core stack on one VPS, we can centralize a lot of that. Reverse proxy rules live in one place. Network boundaries live in one place. App logs follow one operating model. Backups and health checks can be designed once and reused. That has real value, even if it never appears as a neat line item on an invoice.

### Debugging time

This is the bit people forget because it never arrives as a price alert. When a request moves through five vendors, a bug turns into archaeology. You are comparing timestamps, guessing at retries, and trying to work out which dashboard is lying with the most confidence.

In our setup, when something goes sideways we can usually trace the path from Nginx to container to app logs to PostgreSQL without leaving the same operating context. That is a massive quality-of-life improvement. It also makes posts like [how to run a security audit on your vibe coded app](/ai-workflows/how-to-run-a-security-audit-on-your-vibe-coded-app) much more useful, because the remediation path is under our control.

> **The reality:** Cost matters, but operator clarity matters more. A cheaper stack you cannot reason about is still expensive.

## How should you decide what belongs on your VPS, and when should you stop?

Our rule of thumb is boring and useful:

1. **Keep core state close.** Database, object storage, background jobs, internal APIs, and service-to-service communication usually belong near the app.
2. **Use standard interfaces.** PostgreSQL, SMTP, S3-compatible storage, OAuth/OIDC, and HTTP keep future moves realistic.
3. **Do not build auth or mail from scratch.** Run proven services, or buy them, but do not invent them because you felt brave on a Tuesday.
4. **Centralize observability and ops.** One health model, one backup policy, one deployment story, one place to reason about failures.
5. **Pay for specialty, not convenience theatre.** If a vendor gives you something hard to reproduce well, great. If it only gives you another dashboard and a monthly charge, maybe keep that one at home.

There are also clear moments to stop. If you are pre-product, non-technical, under heavy launch pressure, or facing compliance work you are not equipped to handle, managed services are often the better choice. A good rule of thumb is simple: do not self-manage components you cannot support properly. That usually means real restore drills for the database, careful handling for identity, and deliverability work for email.

We like a hybrid stance because it keeps the argument honest. Keep the stateful, movable core close when that helps. Buy the edge where the edge is genuinely specialized.

That same thinking shows up across the rest of our work. The [Claude Code hooks piece](/ai-workflows/claude-code-hooks-replace-half-your-claude-md) is really about moving rules closer to execution. The [OpenClaw guide](/openclaw/what-is-openclaw-the-open-source-ai-agent-framework-that-runs-your-digital-life) is about owning the runtime instead of renting a black box. Different layer, same instinct.

For ZeroLabs, the answer is not “self-host everything forever.” The answer is “self-manage the parts that define the product, and stay ruthlessly honest about what you can support.”

That is the front door to this whole zone. We are interested in infrastructure as a working system, not as a status symbol. If a service helps us ship, keep it. If it mostly adds bills, seams, and platform gravity, we would rather get our hands dirty and run the thing ourselves.

### Key terms and tools mentioned

- **[PostgreSQL](https://www.postgresql.org/docs/current/index.html)**: Open-source relational database we use as the core state layer for app data and internal systems.
- **[Keycloak](https://www.keycloak.org/server/configuration)**: Self-managed identity and access platform that shows the kind of mature auth service worth running instead of building from scratch.
- **[MinIO](https://www.min.io/product/aistor/s3-api)**: S3-compatible object storage layer that keeps file handling portable across self-managed and cloud environments.

This post is the front door for the VPS & Infra zone. From here, we will get more specific: security audits, deployment patterns, internal networking, backup habits, and the small operational choices that keep a self-managed stack from turning feral.

If you are building with AI tools, indie SaaS patterns, or a tiny team, infrastructure does not need to be glamorous. It needs to be understandable. That is the standard we care about.

## Frequently asked questions

**Is self-hosting always cheaper than managed services?**

No. If you only have one tiny app and you value zero ops over everything else, managed platforms can be cheaper in practice because they cost less attention. Running your own stack starts to win once you have multiple services, real state, and enough technical confidence to make one shared platform calmer than four separate vendors.

**Should I self-host auth for my next project?**

Only if you are willing to treat identity like core infrastructure. Running a mature auth service can be a smart move, but writing your own login system almost never is. If the real choice is between proven auth SaaS and a weekend of hand-rolled password logic, buy the service and move on.

**What should I self-host first?**

Hosting and the database are usually the best first moves because they give you the clearest gain in control. Object storage is a strong third step because S3-compatible tooling keeps migration paths tidy. As a rule of thumb, leave mail and identity until the team has the appetite to support them well.

**What about email deliverability?**

Separate mail architecture from inbox reputation. You can keep domains, templates, routing, and event handling under your control while still using a specialist relay for the delivery leg. The mistake is assuming that self-managed automatically means better deliverability. It often means more work instead.

**Is one VPS enough for a real product?**

Often, yes, for longer than people expect. A single well-managed server can carry multiple production services if the stack stays boring, workloads stay isolated, backups are real, and resource usage is watched closely. Outgrowing one box is a healthy problem. Outgrowing your ability to understand the system is the worse one.

---

Ready to keep going? Start with the [VPS & Infra zone](/vps-infra), then read the [app security audit guide](/ai-workflows/how-to-run-a-security-audit-on-your-vibe-coded-app) and the [OpenClaw explainer](/openclaw/what-is-openclaw-the-open-source-ai-agent-framework-that-runs-your-digital-life).

---

## What Is OpenClaw? The Open-Source AI Agent That Runs Your Digital Life
URL: https://labs.zeroshot.studio/openclaw/what-is-openclaw
Zone: openclaw
Tags: openclaw, ai agent, self-hosted, open source, docker, telegram, ollama
Published: 2026-03-31

OpenClaw is an open-source, self-hosted AI agent framework that connects to 23+ messaging platforms, supports 35+ model providers, and puts you in full control of your data. Here's what it does, why 340K developers starred it, and how to set it up.

> **KEY TAKEAWAY**
> * **The Problem:** Cloud-hosted AI assistants lock your data into someone else's infrastructure, limit which models you can use, and charge monthly fees for features you could run yourself.
> * **The Solution:** OpenClaw is a free, open-source AI agent gateway you self-host on your own hardware, connecting 23+ messaging channels to 35+ model providers with full privacy.
> * **The Result:** 340,000+ GitHub stars in four months, 2 million monthly active users, and a skills registry of 13,700+ community-built extensions, making it the fastest-growing open-source project in GitHub history.

*Last updated: 2026-03-31 · Tested against OpenClaw v2026.3.x*

## Contents

1. [What is OpenClaw and where did it come from?](#what-is-openclaw)
2. [How does the architecture actually work?](#architecture)
3. [What can OpenClaw actually do?](#capabilities)
4. [How do you install OpenClaw: VPS vs local machine?](#installation)
5. [What does basic setup look like?](#basic-setup)
6. [Why has OpenClaw generated so much interest?](#why-the-hype)
7. [Frequently asked questions](#faq)

## What is OpenClaw and where did it come from?

OpenClaw is a self-hosted AI agent you run on your own hardware. It connects to messaging platforms you already use, routes conversations to AI models you choose, and executes tasks through a plugin and skills system. Your data stays on your machine. No subscription required.

Where most agent frameworks require you to write Python or TypeScript, OpenClaw is configuration-first. Agent behaviour is defined in markdown files (SOUL.md for personality, SKILL.md for capabilities). Changing how your agent works means editing a text file, not debugging code.

Peter Steinberger, an Austrian developer, published the first version in November 2025 under the name "Clawdbot." The project was derived from Clawd (later Molty), an earlier AI virtual assistant experiment. Within two months, Anthropic filed a trademark complaint over the name's similarity to Claude, prompting a rename to "Moltbot" on January 27, 2026, and then to "OpenClaw" three days later ([OpenClaw Wikipedia](https://en.wikipedia.org/wiki/OpenClaw)).

The name stuck. The lobster emoji became the mascot. And the project exploded.

By mid-March 2026, OpenClaw had collected over 340,000 GitHub stars, surpassing React's decade-long record in roughly 60 days ([The New Stack](https://thenewstack.io/openclaw-github-stars-security/)). The repository now has 1,000+ active contributors and 67,000+ forks. Steinberger joined OpenAI in February 2026 and transferred the project to an independent 501(c)(3) foundation. We're running it at ZeroShot Studio on both a VPS and a local mini PC, and it handles everything from daily briefings to git audits to email triage.

## How does the architecture actually work?

OpenClaw runs as a local-first gateway. One process handles sessions, channels, tools, and events. Think of it as a switchboard sitting between your messaging apps and your AI models.

```mermaid
flowchart TD
    subgraph Channels["Channels (23+)"]
        direction TB
        TG["Telegram"] --- WA["WhatsApp"] --- SL["Slack"] --- DC["Discord"] --- WEB["WebChat"]
    end

    subgraph Gateway["OpenClaw Gateway"]
        direction TB
        RT["Router"] --> AG["Agent Manager"]
        AG --> SK["Skills Engine"]
        AG --> CR["Cron Scheduler"]
        AG --> SB["Sandbox"]
    end

    subgraph Models["Model Providers (35+)"]
        direction TB
        CL["Claude"] --- GP["GPT / Codex"] --- OL["Ollama"] --- GM["Gemini"]
    end

    Channels --> RT
    AG --> Models
```

Save the file, the gateway picks up your changes. Send a message to your Telegram bot, the agent responds. That's the loop. Everything else builds on top of this.

We have 10 agents running on this setup at ZeroShot Studio -- daily briefings, git audits, email triage, and a few background cron tasks. The process running the gateway uses less memory than a browser tab.

## Why has OpenClaw generated so much interest?

The trajectory alone is striking: 9,000 stars on launch day, 60,000 within 72 hours, over 340,000 by mid-March -- faster than any open-source project in GitHub history ([star-history.com](https://www.star-history.com/blog/openclaw-surpasses-react-most-starred-software), [OpenClaw Statistics](https://www.getpanto.ai/blog/openclaw-ai-platform-statistics)). Three things drove that.

**Privacy and ownership.** Every major AI assistant runs on someone else's cloud. Your conversations, your documents, your business data, all processed on infrastructure you don't control. OpenClaw flips that. Self-host it, your data never leaves your machine. For businesses handling sensitive information, that alone is enough.

**Model freedom.** Most AI platforms lock you into one provider. OpenClaw gives you 35+ and lets you mix them. Run Claude for reasoning, GPT for code generation, and a local 3B model for background tasks. When a new model drops, add it to the config. No migration, no lock-in.

**Meeting users where they already are.** Instead of opening another AI app, OpenClaw connects to the tools you already have. Send a WhatsApp message, get an AI response. Ask in your team's Slack, your agent handles it. That "zero new apps" approach has real pull for non-technical users who don't want another dashboard.

The community has built 13,700+ skills on ClawHub, so most common use cases already have something ready to install.

## Frequently asked questions

**Is OpenClaw free to use?**

Yes. OpenClaw is MIT-licensed and free to self-host. You pay only for the AI model API calls you make (if using cloud providers like OpenAI or Anthropic). Running entirely on local models via Ollama costs nothing beyond your electricity bill.

**What hardware do I need?**

For a basic setup with cloud models: any machine that runs Node.js 22+. For local models via Ollama: 8 GB RAM minimum for 3B parameter models, 16 GB for 7B models. CPU-only inference works but expect slower responses (30-120 seconds per query depending on model size). A GPU significantly improves local model performance.

**Can I run OpenClaw 24/7 on a VPS?**

Yes. Docker Compose on a VPS is the recommended production setup. A EUR 5-10/month VPS with 2-4 GB RAM handles the gateway and cloud routing comfortably. Add Ollama for local models and you'll want 4 GB+. We run ours on a Hetzner VPS with 10 agents, mixed cloud and local models, and it hasn't complained yet.

**How does OpenClaw compare to using ChatGPT or Claude directly?**

ChatGPT and Claude are AI models. OpenClaw is the framework that wires those models to your messaging channels, tools, and automated workflows. Use it with GPT, Claude, Gemini, or local models. It adds multi-agent routing, cron scheduling, persistent memory, and channel integration that the native apps don't provide.

**Is my data private with OpenClaw?**

When self-hosted, your data stays on your hardware. Conversations, files, and agent memory live on your machine. The only data that leaves is what you send to cloud providers (API calls to OpenAI, Anthropic, etc.). To keep everything local, use Ollama or another self-hosted provider.

---

We've been running OpenClaw at ZeroShot Studio for about three months now -- first on a laptop, then moved to a Hetzner VPS when we wanted it always-on. The honest answer to "is it worth the setup?" is yes, but mainly because we put the time into writing proper SOUL.md and SKILL.md files. An under-configured agent is still just a chatbot. A well-configured one does actual work.

[Install it](https://github.com/openclaw/openclaw) on your laptop, connect Telegram, send your first message. When you want always-on, move to Docker on a VPS.

[OpenClaw GitHub](https://github.com/openclaw/openclaw) | [Official Docs](https://docs.openclaw.ai/) | [ClawHub Skills](https://openclaw.ai/)

---

## What Are AI Agents? The Complete Guide to Autonomous AI
URL: https://labs.zeroshot.studio/agents/what-are-ai-agents-the-complete-guide-to-autonomous-ai
Zone: agents
Tags: ai-agents, autonomous-ai, agent-architecture, claude-code, ai-workflows
Published: 2026-03-31

AI agents take action. Chatbots give answers. Scripts follow recipes. Here's what actually makes an agent an agent, when you should build one, and when a bash script is the smarter move.

> **KEY TAKEAWAY**
> * **The Problem:** "AI agent" gets slapped on everything from chatbots to cron jobs, making it impossible to know what you actually need to build
> * **The Solution:** An agent is an LLM that chooses its own tools, plans its own steps, and takes action without you scripting every move
> * **The Result:** Understanding the three components (LLM + tools + memory) lets you build agents that solve real problems, and know when a 10-line script is the better call

### Contents

1. [What is an AI agent?](#what-is-an-ai-agent)
2. [How are agents different from chatbots and scripts?](#how-are-agents-different-from-chatbots-and-scripts)
3. [What types of agents are people actually building?](#what-types-of-agents-are-people-actually-building)
4. [How do you build one?](#how-do-you-build-one)
5. [What does an agent look like in production?](#what-does-an-agent-look-like-in-production)
6. [When should you NOT use an agent?](#when-should-you-not-use-an-agent)
7. [Frequently asked questions](#faq)

## What is an AI agent?

An AI agent is software that uses a language model to decide what to do next, then does it. Not "generates a response." Takes action. Calls APIs, reads files, searches the web, writes code, sends messages. The model is the brain, but the hands are what make it an agent.

Anthropic's technical definition puts it cleanly: agents are "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks" ([Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)). The key word is *dynamically*. A chatbot responds. A script executes. An agent decides.

The market agrees this distinction matters. [Gartner projects](https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025) that 40% of enterprise applications will feature task-specific AI agents by 2026, up from less than 5% in 2025. That is not gradual adoption. That is a land rush.

Andrew Ng and Harrison Chase from LangChain both push a useful framing: agency is a spectrum. A simple router that picks between two API endpoints based on user input is mildly agentic. A system that decomposes a problem into subtasks, selects tools for each one, and recovers from failures is deeply agentic. Most useful agents sit somewhere in the middle.

## How are agents different from chatbots and scripts?

This is where the confusion lives, so here is a straight comparison.

| | Chatbot | Script / Automation | AI Agent |
|---|---|---|---|
| **Decision-making** | Single-turn response | Predefined rules, fixed paths | Dynamic, LLM-driven |
| **Tool use** | None (or scripted) | Rigid API calls | Chooses tools at runtime |
| **Handles ambiguity** | Generates text about it | Breaks | Reasons through it |
| **Memory** | Session context only | State machines | Working + persistent memory |
| **Adapts to failures** | Apologises | Throws an error | Retries with a different approach |

A chatbot is a text interface to a model. You ask, it answers. ChatGPT in its default mode is a chatbot. Siri is a chatbot with a few hardcoded integrations.

A script is a recipe. It does exactly what you told it, in exactly the order you specified. No judgment, no deviation. Reliable, fast, predictable. Most of the world's automation runs on scripts, and that is a good thing.

An agent sits between these. It has the reasoning of a chatbot and the action capability of a script, but the execution path is decided at runtime by the model itself. You give it a goal. It figures out the steps.

The practical test: if you can draw the entire workflow on a whiteboard before the task starts, you probably want a script. If the path depends on what the agent discovers along the way, you need an agent.

## What types of agents are people actually building?

Forget the academic taxonomy. Here is what is shipping in production right now.

**Research agents** break down complex questions, search multiple sources, synthesise findings, and produce structured reports. OpenAI's Deep Research and Google's Gemini Deep Research are the headline examples. Google DeepMind's Aletheia agent went further in late 2025, autonomously generating and verifying peer-reviewed research using a generate-verify-revise loop.

**Coding agents** read your codebase, plan changes across multiple files, write code, run tests, and fix what breaks. Claude Code scores 80.8% on [SWE-bench Verified](https://www.swebench.com/). Cursor and GitHub Copilot operate closer to the IDE, with Copilot reporting [55% faster coding](https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/) in Microsoft's internal studies. These are the agents most developers encounter first.

**Review agents** audit content, code, or data against defined standards. We built ours as a [3-agent content pipeline](/ai-workflows/ai-review-agents-content-pipeline) for this blog: a fact-checker that verifies claims and catches hallucinated statistics, a style reviewer that enforces voice consistency, and an SEO auditor that checks structure and discoverability. Each agent runs independently, and the pipeline costs roughly $0.50-0.80 per post.

**Workflow agents** orchestrate multi-step business processes. Insurance claim processing, customer onboarding, code deployment pipelines. One documented case study showed 7 coordinated agents reducing claim processing time by 80%.

**Computer-use agents** operate GUIs directly. Anthropic's computer use capability and OpenAI's Operator let agents click buttons, fill forms, and navigate interfaces the way a person would. Still early, but the trajectory is clear.

## How do you build one?

Strip away the framework marketing and you are looking at three components.

**1. The brain: a language model.**

The LLM handles reasoning and planning. It reads the current situation, decides what tool to call, interprets the result, and plans the next step. Different tasks need different models. We use local models (Qwen 2.5) for routine operational tasks and Claude for anything that needs real reasoning depth.

**2. The hands: tools and functions.**

Tools are what separate agents from chatbots. A tool is any function the agent can call: an API request, a database query, a file operation, a web search, a code execution environment. The model decides which tool to use and with what parameters.

The [Model Context Protocol](https://modelcontextprotocol.io/) (MCP) is standardising how agents connect to tools, which means you can build tool integrations once and use them across different agent frameworks. We wrote about this in practice when [Claude published directly to Labs via MCP](/openclaw/how-claude-published-directly-to-labs-via-mcp).

**3. The memory: context that persists.**

Agents need to remember what happened earlier in a task (working memory), what happened in previous sessions (long-term memory), and what they know about the world (knowledge retrieval). Without memory, every step starts from zero.

In practice, working memory lives in the LLM's context window. Long-term memory uses vector databases or structured storage. We run ours on PostgreSQL with pgvector, storing semantic embeddings so the agent can retrieve relevant past decisions without replaying entire conversation histories.

**The loop.** These three components operate in a cycle: observe the current state, reason about what to do, act using a tool, observe the result, repeat. That loop is the agent. Everything else is orchestration.

**Frameworks handle the plumbing.** We did not build our loop from scratch, and you probably should not either:

| Framework | Best for | Architecture |
|---|---|---|
| **LangGraph** | Controllable workflows with branching and error recovery | Graph state machine |
| **CrewAI** | Role-based multi-agent collaboration | Two-layer (crews + flows) |
| **OpenAI Agents SDK** | Lightweight orchestration with handoffs | Minimal primitives |
| **Claude Agent SDK** | Claude-native development with MCP integration | Tool-augmented |
| **AutoGen** | Flexible multi-agent conversation | Conversation-based |

## What does an agent look like in production?

Theory is cheap. Here is what we actually run.

Every post on this blog goes through a 3-agent review pipeline before it publishes. We built it because we shipped a post with a hallucinated statistic and nobody caught it for three days. Embarrassing. That was the last time we trusted vibes-based publishing.

The pipeline works like this:

1. **Fact-checker agent (Sonnet).** Extracts every verifiable claim from the draft. Searches the web for corroborating or contradicting evidence. Returns verdicts: verified, likely accurate, unverified, suspect. Suspect claims block the publish.

2. **Style reviewer agent (Opus).** Checks the draft against a 660-line style guide. Catches voice drift, banned phrases, AI-written tells, rhythm problems. Makes surgical edits using exact string replacements, never rewrites whole paragraphs.

3. **SEO auditor agent (Sonnet).** Validates structure for both traditional search and AI discovery: heading hierarchy, paragraph density, FAQ formatting, citation density, internal linking.

Each agent runs independently. The orchestrator is a Python script that manages stage transitions, not another LLM. We wrote about the full architecture in [How to Build AI Review Agents for Your Content Pipeline](/ai-workflows/ai-review-agents-content-pipeline).

The cost per post runs $0.50-0.80 across all three agents (our own production numbers). Doing it by hand would take 2-3 hours per post.

> **The reality:** You do not need to choose between agent flexibility and script reliability. Use agents for the parts that need judgment. Use scripts for the parts that need predictability. Wire them together.

The enforcement layer is where it gets interesting. [Claude Code hooks](/ai-workflows/claude-code-hooks-replace-half-your-claude-md) provide deterministic gates that run before and after every tool call. The agent can reason freely, but the hooks enforce hard rules: no publishing without all three reviews passing, no destructive database operations without explicit confirmation. Agent flexibility with script-level guardrails.

## When should you NOT use an agent?

Anthropic's own guidance says it plainly: "Add multi-step agentic systems only when simpler solutions fall short" ([Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)).

This is a position we hold strongly at ZeroLabs. If the workflow fits on a whiteboard, write a script. Scripts are faster, cheaper, and easier to debug. An agent that does the same thing every time is just an expensive script with more failure modes.

Agents can cost up to [10x more than equivalent traditional API workflows](https://intuitionlabs.ai/articles/ai-agent-vs-ai-workflow). Every LLM call adds latency, token costs, and a non-zero chance of the model doing something unexpected. That cost is worth paying when the task genuinely requires judgment, adaptation, and dynamic tool selection. It is not worth paying for a cron job that runs the same three API calls every morning.

**Use a script when:**
- The inputs are structured and predictable
- The steps are the same every time
- Speed and cost matter more than flexibility
- You can draw the whole workflow on a whiteboard

**Use an agent when:**
- The inputs are messy, unstructured, or context-dependent
- The number of steps cannot be predicted in advance
- The task requires judgment calls and exception handling
- Recovery from unexpected states is part of the job

The best systems combine both. Our content pipeline uses agent intelligence for the review passes but script-level orchestration for stage transitions. The Python controller decides which agent runs next. The agents decide what to do within their scope. Predictable skeleton, flexible muscles.

## Frequently asked questions

**What is the difference between an AI agent and a chatbot?**

A chatbot generates text. An agent takes action. The distinction is tool use and autonomy: agents call APIs, read and write files, execute code, and make sequential decisions about what to do next based on what they observe. A chatbot in its default mode waits for your prompt and responds. An agent receives a goal and works toward it.

**When should I use an agent instead of a script?**

Apply the whiteboard test. If you can map every step and decision branch before the task starts, a script is simpler, cheaper, and more reliable. Agents earn their keep when the execution path depends on what they discover along the way, when inputs vary unpredictably, or when the task requires interpreting ambiguous information.

**What do I need to build an AI agent?**

Less than you think. A working agent requires three things: an LLM with a function-calling API (Claude, GPT-4o, Gemini, or an open-source model like Qwen 2.5), at least one tool definition (even a single web search or file read counts), and a loop that feeds tool results back to the model. The scaffolding for that loop is roughly 50-80 lines of Python without a framework. With LangGraph or the Claude Agent SDK, it is closer to 20. Most developers ship their first working agent within a day. The harder part is scoping it: agents that try to do too much are harder to debug than agents with a single, well-defined job.

**How much do AI agents cost to run?**

Highly variable. A single-agent loop handling one task might cost $0.05-0.20 per execution. Multi-agent pipelines with several LLM calls per run, like our 3-agent content review, cost $0.50-0.80. Enterprise deployments with heavy tool use can reach $1-5 per complex task. The 79% of companies now using AI agents ([PwC, 2025](https://www.pwc.com/)) are finding that the ROI depends on matching agent complexity to task complexity.

**Are AI agents safe to use in production?**

With guardrails, yes. Production agents need deterministic enforcement (hooks that run regardless of what the model decides), scope boundaries (agents only access what they need), and human-in-the-loop checkpoints for irreversible actions. We use [Claude Code hooks](/ai-workflows/claude-code-hooks-replace-half-your-claude-md) for this on our VPS. The risk scales with the autonomy you grant, so start narrow.

## Where agents go from here

The trajectory is steep. Agents are moving from single-purpose tools toward persistent collaborators that maintain context across days and weeks. [Anthropic's Claude Code Channels](/agents/anthropic-claude-code-channels-telegram-discord) already let agents live inside Telegram and Discord, turning messaging apps into agent deployment surfaces.

The practical advice: start with a single, well-scoped agent that solves one real problem in your workflow. The fact-checker from our content pipeline took a day to build and has already caught three hallucinated statistics that would have gone live. That is the ROI that matters.

Build the smallest useful agent. Run it. Watch where it fails and fix what breaks.

---

Want to see agents in action? Read how we built our [3-agent content review pipeline](/ai-workflows/ai-review-agents-content-pipeline) or learn about [Claude Code hooks for agent enforcement](/ai-workflows/claude-code-hooks-replace-half-your-claude-md).

---

## You Don't Need an AI Agent
URL: https://labs.zeroshot.studio/agents/you-dont-need-an-ai-agent
Zone: agents
Tags: ai-agents, automation, langchain, crewai, python, business
Published: 2026-03-27

Most people don't need an AI agent. If the workflow fits on one whiteboard and the decisions are predictable, a script will usually do the job better.

> **KEY TAKEAWAY**
> * **The Problem:** Businesses spend weeks building AI agents with complex frameworks when a simple script would solve their actual problem in days.
> * **The Solution:** Map the real workflow before choosing tools. Most "agent" use cases are decision trees, API calls, or scoring functions that need zero orchestration.
> * **The Result:** Simple automations ship in days instead of weeks, cost a fraction to run, and break less often. One script replacing 2 hours of daily manual work pays for itself in a week.

*Last updated: 2026-03-27 · Tested against LangChain v0.3, CrewAI v0.80, and raw OpenAI API workflows*

### Contents

1. [Why are so many AI agent projects failing?](#why-are-so-many-ai-agent-projects-failing)
2. [What's the difference between an AI agent and an automation?](#whats-the-difference-between-an-ai-agent-and-an-automation)
3. [What do businesses actually pay for?](#what-do-businesses-actually-pay-for)
4. [When do you actually need an AI agent?](#when-do-you-actually-need-an-ai-agent)
5. [How do you decide what to build?](#how-do-you-decide-what-to-build)
6. [Why does everyone overbuild?](#why-does-everyone-overbuild)
7. [Frequently asked questions](#frequently-asked-questions)

## Why are so many AI agent projects failing?

Most businesses trying to build AI agents don't have an agent-shaped problem. That's what nobody in the hype cycle wants to admit.

The AI agent market was valued at $7.63 billion in 2025 and is projected to reach $10.9 billion in 2026, growing at a 49.6% CAGR ([Grand View Research](https://www.grandviewresearch.com/industry-analysis/ai-agents-market-report)). LangChain has over 131,000 GitHub stars. CrewAI raised $18 million in 2024 ([SiliconANGLE](https://siliconangle.com/2024/10/22/agentic-ai-startup-crewai-closes-18m-funding-round/)). Conference talks, YouTube tutorials, Discord servers, the entire industry is screaming "agents."

This pattern plays out everywhere. Someone spends a month deep in CrewAI documentation, joins three communities, watches every tutorial. They want an "AI agent" for lead qualification. What they actually need is a script that checks a few fields against their ICP criteria and sends one of two responses. A few days of work. Saves hours daily. They call it their AI agent. Nobody corrects them.

## What's the difference between an AI agent and an automation?

An AI agent reasons about its next step. It holds context across interactions, uses tools dynamically, and makes decisions the developer didn't explicitly code for. Think: a system that reads a customer complaint, decides whether to escalate or resolve, pulls relevant order data, drafts a response, and knows when to hand off to a human.

An automation follows a predetermined path. Input goes in, rules get applied, output comes out. No reasoning. No memory. No surprises. If you've ever set up a [README-driven workflow](/ai-workflows/why-every-vibe-coder-needs-a-readme), you already know how far deterministic instructions can take you.

The confusion happens because marketing has blurred this line completely. LangChain's State of AI Agents report found that quality was the top barrier to agent deployment, cited by 32% of respondents ([LangChain](https://www.langchain.com/stateofaiagents)). When even the developers building these systems struggle to make them reliable, what chance does a small business owner have of getting it right on the first try?

| Feature | Simple Automation | AI Agent |
|---------|------------------|----------|
| Decision logic | Rules-based, predetermined | Reasoning, dynamic |
| Context | Stateless or minimal | Maintains conversation/task memory |
| Failure mode | Predictable, traceable | Hallucination, drift, unpredictable |
| API cost | One call or zero | Multiple LLM calls per task |
| Build time | Hours to days | Weeks to months |
| Maintenance | Low, stable | High, model updates break things |

> **The hard rule:** If you can draw your workflow as a flowchart with fixed branches, you need an automation. If the branches depend on judgment calls that change with every input, you might need an agent.

## What do businesses actually pay for?

These are common patterns across consulting, forums, and dev communities. The request sounds sophisticated. The solution rarely is.

1. **"We need an AI content agent."** Usually means: one API call with a good prompt and some formatting logic. A short script on a cron job. Pennies per month in API fees.

2. **"We need an AI support agent."** Usually means: a decision tree covering the same handful of questions that come in every day. Pattern matching and templated responses. No LLM required.

3. **"We need an AI recruiting agent."** Usually means: a scraper with a scoring function. Pull candidate profiles, score against a few criteria, rank. Zero reasoning involved.

4. **"We need an AI analytics agent."** Usually means: a scheduled database query that formats results into a Slack or email digest. Same metrics, same cadence, every week.

5. **"We need an AI email agent."** Usually means: a filter rule with one API call for classification. Flag, categorise, route. Done.

Every one of these ships in under a week. They run for months without maintenance. The pattern holds across industries: developer tooling, e-commerce, professional services, it doesn't matter. Including in our own [Claude Code hooks workflows](/ai-workflows/claude-code-hooks-replace-half-your-claude-md), where the right tool is often a two-line shell command, not an orchestrated pipeline. Strip any problem down to its actual mechanics, and the solution gets simpler every time.

## When do you actually need an AI agent?

Agents earn their complexity when three conditions are true at once:

1. **The task requires multi-step reasoning.** Not "check three fields," but "read this document, understand the context, decide what information is missing, go find it, and synthesise a recommendation."

2. **The inputs are genuinely unpredictable.** Not five categories of customer email, but free-form requests where the next step depends entirely on what the user said.

3. **The workflow can't be reduced to a flowchart.** If you can draw it as a decision tree with fixed branches, you don't need an agent. Full stop.

Gartner projects that by end of 2026, 40% of enterprise applications will include task-specific AI agents ([Gartner](https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025)). That's real. But "task-specific" is doing heavy lifting in that sentence. These aren't general-purpose autonomous systems. They're narrow tools handling defined complex tasks within larger, mostly deterministic workflows.

## How do you decide what to build?

Before writing a single line of code, answer these five questions:

1. **Can I describe every possible input?** If yes, you need rules, not reasoning.
2. **Does the output change based on judgment?** If no, it's a transformation, not a decision.
3. **How many steps are involved?** Under five deterministic steps? That's a script.
4. **What breaks if the LLM hallucinates?** If the answer is "everything," don't use an LLM in the critical path. This is also a core principle in [security auditing for vibe coders](/ai-workflows/how-to-run-a-security-audit-on-your-vibe-coded-app): the blast radius of a bad AI output should always be bounded.
5. **What's the actual dollar cost of the manual process?** If someone spends 30 minutes a day on it, that's roughly 10 hours a month. A $500 automation has a two-week payback period. You don't need a $50,000 agent platform.

The best infrastructure is the kind you understand completely. A simple automation that costs $2,000 and saves $500/month has a better return than a $50,000 agent build that saves $3,000/month. The math isn't complicated. The ego is.

Google Cloud's ROI of AI study found 74% of executives achieved returns within the first year of broad AI deployment ([Google Cloud](https://cloud.google.com/transform/roi-of-ai-how-agents-help-business)), but those returns skew toward the simplest implementations.

## Why does everyone overbuild?

Three forces push projects toward unnecessary complexity. Course creators sell $497 agent-building courses. Tool companies charge $99/month for orchestration platforms. Nobody profits from telling you a bash script solves your problem.

Then there's resume-driven development. "I built a multi-agent system with RAG and vector search" sounds better on LinkedIn than "I wrote a 50-line Python script." We optimise for impressiveness over effectiveness. I've done it myself, once spending a week on a multi-model pipeline that a single prompt template could have handled. More than once, if I'm being honest.

According to LangChain's State of AI Agents report, 57.3% of respondents now have agents in production ([LangChain](https://www.langchain.com/stateofaiagents)). But production doesn't mean optimal. Plenty of those systems are doing work a cron job could handle, just with more latency and more ways to break. If you want a framework for cutting through AI hype in your own content or tooling decisions, the [GEO and EEAT guide](/resources/geo-e-e-a-t-get-your-content-cited-by-ai) covers how to evaluate substance over marketing noise.

## Frequently asked questions

**What signals tell me I've outgrown a simple automation?**

Three red flags: your exception-handling code is longer than your happy path, users keep hitting edge cases you can't enumerate in advance, and the "rules" change depending on context the system doesn't have. If you're patching the same automation weekly to handle new input shapes, that's your signal. Most workflows never get there.

**Are AI agent frameworks like LangChain and CrewAI worth learning?**

Yes, for the right problems. LangChain excels at tool orchestration, memory, and multi-step reasoning chains. CrewAI shines for multi-agent collaboration on complex tasks. The mistake is reaching for these frameworks before confirming your problem needs them. Build three things with raw API calls first. You'll know when a framework helps and when it's just in the way.

**What's the actual cost difference between running an AI agent and a simple automation?**

These are rough estimates based on current API pricing and will vary by model and prompt size. A simple automation using one API call per task typically runs $0.01-0.05 per execution. An agent making 5-10 LLM calls per task with framework overhead often runs $0.50-2.00 per execution. At 1,000 tasks per month, that's the difference between roughly $50 and $2,000. Over a year, the gap compounds to tens of thousands of dollars across multiple workflows. Multiply that across every "agent" a company runs. It adds up fast.

**Can I start with a simple automation and upgrade to an agent later?**

This is the smartest approach. Build the deterministic version first. Track the exceptions it can't handle. If those outliers require genuine reasoning and represent a significant share of your volume, you have a data-driven case for an agent. The exceptions almost always turn out to be rarer than people assumed when they first started wanting an "agent."

AI agents are genuinely useful for the right problems. But the vast majority of business problems don't need intelligence. They need the boring task to go away. That's what people pay for. Nobody has ever complained that a solution wasn't complex enough.

Next time someone tells you they need an AI agent, ask them to draw the workflow on a whiteboard. Count the decision points. If it fits on one whiteboard with arrows that don't loop back on themselves, hand them a script. Save the agents for problems that actually need them.

---

Building automations or AI tools for your business? Subscribe to the ZeroLabs newsletter for practical guides that skip the hype.

[Subscribe to ZeroLabs](https://labs.zeroshot.studio) | [Browse AI Workflows](https://labs.zeroshot.studio/ai-workflows)

---

## Claude Code Hooks Replace Half Your CLAUDE.md
URL: https://labs.zeroshot.studio/ai-workflows/claude-code-hooks-replace-half-your-claude-md
Zone: ai-workflows
Tags: claude code, hooks, CLAUDE.md, AI coding, developer tools, automation
Published: 2026-03-26

CLAUDE.md rules are suggestions. Hooks are guarantees. Here's how to move your enforcement logic out of prompts and into deterministic shell scripts that actually run every time.

> **KEY TAKEAWAY**
> * **The Problem:** CLAUDE.md rules break because language models are probability engines, and Claude eventually deprioritizes them under compression or competing goals.
> * **The Solution:** Claude Code hooks are deterministic shell scripts that execute outside the LLM's context window at fixed lifecycle points, enforcing rules with exit codes that the harness enforces before Claude ever sees the result.
> * **The Result:** Security blocks, validation gates, and formatting enforcement now always execute, cutting a 400-line CLAUDE.md down to 100 lines of pure context without losing enforcement.

*Last updated: 2026-03-27 · Tested against Claude Code v0.2.29*

## Why do CLAUDE.md rules break?

Your CLAUDE.md gets injected into Claude's context window. Claude reads it, considers it, and *usually* follows it. The operative word is "usually."

Write "never use rm -rf" in CLAUDE.md and Claude will respect that instruction most of the time. But occasionally it's deep in a multi-step task, context is compressed, and your rule gets deprioritised against the immediate goal. That's not a bug. That's how language models work: they're probability engines, not rule executors.

In our ZeroShot Studio workflows, we used to hit this wall repeatedly. CLAUDE.md said "always run tests before committing." Claude followed it for the first three tasks, then skipped it on the fourth because it was "a minor change." The tests would have caught a broken import.

Anthropic's [hooks documentation](https://docs.anthropic.com/en/docs/claude-code/hooks) puts it plainly: hooks "provide deterministic control over Claude Code's behavior, ensuring certain actions always happen rather than relying on the LLM to choose to run them."

![Terminal output showing ZeroVPS Claude hooks files and the corresponding `.claude/settings.json` hook registrations for PreToolUse and PostToolUse.](/api/images/claude-hooks-proof.png)

## What are Claude Code hooks?

Hooks are shell commands that execute at fixed points in the session. Not Claude the model. The harness ie the runtime wrapper.

That distinction matters. CLAUDE.md lives inside the LLM's context. Hooks live outside it. The harness runs your hook script, reads the exit code, and enforces the result before Claude ever sees what happened.

**What's a hook?** A shell script that fires on an event, reads JSON from stdin, and returns a decision: proceed, block, or modify.

Here's the minimal anatomy:

1. **Event fires.** Claude is about to run a Bash command (PreToolUse event).
2. **Harness pipes JSON to your script.** Tool name, input arguments, session ID.
3. **Your script decides.** Exit 0 to allow, exit 2 to block.
4. **Harness enforces.** Claude never gets to override the decision.

```json
// File: .claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/validate-bash.sh"
          }
        ]
      }
    ]
  }
}
```

That goes in `.claude/settings.json`. Every Bash command Claude tries to run now passes through your script first.

## How do hooks compare to CLAUDE.md rules?

| Aspect | CLAUDE.md | Hooks |
|--------|-----------|-------|
| Execution | LLM reads and considers | Harness runs deterministically |
| Can block actions | No, Claude decides | Yes, exit code 2 blocks |
| Survives context compression | Best-effort | Always fires |
| Can modify tool input | No | Yes, via `updatedInput` |
| Can run external scripts | No | Yes, any shell command |
| Can call APIs | No | Yes, HTTP hook type |
| Scope | Guidelines and context | Enforcement and validation |

The comparison isn't "hooks are better." It's "hooks and CLAUDE.md solve different problems." CLAUDE.md tells Claude what you prefer. Hooks enforce what you require.

## Which rules should be set up as hooks?

Not everything belongs in a hook. Here's the split we use:

**Keep these in CLAUDE.md:**
- Coding conventions (naming, patterns, architecture preferences)
- Project context (what the app does, who it's for)
- Communication style ("be concise," "don't add docstrings I didn't ask for")
- Workflow preferences ("prefer editing over creating new files")

**Move these to hooks:**
- Security blocks (prevent destructive commands, protect sensitive files)
- Formatting enforcement (auto-run prettier after edits)
- Validation gates (lint checks, test runs before commits)
- Environment injection (load project-specific variables)
- Audit logging (track every tool call to a log file)
- Permission automation (auto-approve safe operations)

The rule I use: if failure means "Claude did something slightly different than I wanted," it's a CLAUDE.md rule. If failure means "data got deleted" or "secrets got committed," it's a hook.

> **The hard rule:** Conventions and context belong in CLAUDE.md. Security blocks, formatting enforcement, and validation gates belong in hooks. Mixing the two is what causes both to underperform.

If you want the full implementation walkthrough, [how to set up Claude Code hooks for your workflow](/posts/claude-code-hooks-workflow) covers the step-by-step from scratch.

## How do you write your first hook?

Start with the most common case: blocking dangerous Bash commands.

1. **Create the script.** Save this as `.claude/hooks/protect-prod.sh`:

```bash
# File: .claude/hooks/protect-prod.sh
#!/bin/bash
INPUT=$(cat)
COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty')

if echo "$COMMAND" | grep -qiE "drop table|drop database|truncate|rm -rf /"; then
  echo "Blocked: Destructive command detected. Use a migration instead." >&2
  exit 2
fi

exit 0
```

2. **Make it executable.** `chmod +x .claude/hooks/protect-prod.sh`

3. **Register it in settings.json:**

```json
// File: .claude/settings.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": ".claude/hooks/protect-prod.sh"
          }
        ]
      }
    ]
  }
}
```

4. **Test it.** Ask Claude to run `DROP TABLE users` and watch it get blocked. Every time.

Exit code 2 blocks execution and sends your stderr message back as feedback. Exit code 0 lets it through.

## What hook events are available?

According to [Anthropic's hooks reference](https://docs.anthropic.com/en/docs/claude-code/hooks), there are 25 lifecycle points. These are the ones you'll actually use:

| Event | When | Can block? | Use for |
|-------|------|------------|---------|
| PreToolUse | Before any tool runs | Yes | Block commands, validate inputs |
| PostToolUse | After a tool succeeds | No | Auto-format, audit logging |
| SessionStart | Session begins | No | Inject environment, load context |
| Stop | Claude finishes responding | Yes | Verify work is complete |
| UserPromptSubmit | User sends a message | Yes | Prompt validation |
| FileChanged | Watched file modified | No | React to config changes |
| PermissionRequest | Permission dialog appears | Yes | Auto-approve safe operations |

Matchers use regex. `"Bash"` matches only Bash. `"Edit|Write"` matches both. `"mcp__.*"` matches all MCP tools. Case-sensitive, so `"bash"` won't match `"Bash"`.

## What patterns actually replace CLAUDE.md rules?

These are running in production right now.

**Pattern 1: Auto-format on save.** Instead of "please run prettier after editing files" in CLAUDE.md:

```json
// File: .claude/settings.json
{
  "PostToolUse": [
    {
      "matcher": "Edit|Write",
      "hooks": [
        {
          "type": "command",
          "command": "jq -r '.tool_input.file_path' | xargs npx prettier --write 2>/dev/null; exit 0"
        }
      ]
    }
  ]
}
```

**Pattern 2: Protected files.** Instead of "don't edit .env or package-lock.json" in CLAUDE.md:

```bash
# File: .claude/hooks/protect-files.sh
#!/bin/bash
FILE=$(cat | jq -r '.tool_input.file_path // empty')
case "$FILE" in
  *.env|*package-lock.json|*CODEOWNERS)
    echo "Blocked: $FILE is protected." >&2
    exit 2 ;;
esac
exit 0
```

**Pattern 3: Context injection after compaction.** CLAUDE.md content can get compressed away. This survives:

```json
// File: .claude/settings.json
{
  "SessionStart": [
    {
      "matcher": "compact",
      "hooks": [
        {
          "type": "command",
          "command": "echo '{\"hookSpecificOutput\":{\"hookEventName\":\"SessionStart\",\"additionalContext\":\"Current sprint: auth-refactor. Use bun, not npm.\"}}'"
        }
      ]
    }
  ]
}
```

**Pattern 4: Audit trail.** Log every file edit without relying on Claude to remember:

```bash
# File: .claude/hooks/audit-trail.sh
#!/bin/bash
INPUT=$(cat)
FILE=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
TOOL=$(echo "$INPUT" | jq -r '.tool_name')
echo "$(date -u +%Y-%m-%dT%H:%M:%SZ) $TOOL $FILE" >> .claude/audit.log
exit 0
```

**Pattern 5: Stop gate.** Verify tests pass before Claude considers itself done:

```json
// File: .claude/settings.json
{
  "Stop": [
    {
      "hooks": [
        {
          "type": "agent",
          "prompt": "Run the test suite and verify all tests pass before stopping.",
          "model": "claude-haiku",
          "timeout": 120
        }
      ]
    }
  ]
}
```

## Common errors and gotchas (troubleshooting)

A few things bit us early on. Ask me how I know...

**Shell profile pollution.** If your `.zshrc` has unconditional `echo` statements, they pollute stdout and JSON parsing breaks silently. Wrap any echo in `if [[ $- == *i* ]]; then`.

**Exit code 2 is the only block.** Exit 1 doesn't block, it just logs to verbose output. Only exit 2 actually stops the action.

**PostToolUse can't undo.** The tool already ran. The file is already edited, the command already executed. For blocking, use PreToolUse.

**Stop hook loops.** If your Stop hook tells Claude to keep going, check `stop_hook_active` in the input JSON. When it's `true`, you're in a re-check. Exit immediately or you'll loop forever. We learned this one the hard way at 2am.

**Matcher case sensitivity.** `"bash"` does not match `"Bash"`. Use exact tool names: `Bash`, `Edit`, `Write`, `Read`, `Glob`, `Grep`.

## Frequently asked questions

**Can hooks and CLAUDE.md work together?**

They should. CLAUDE.md gives Claude the reasoning to make good decisions. Hooks catch the cases where good decisions aren't enough. We keep conventions and project context in CLAUDE.md. Security, formatting, and validation live in hooks.

**Do hooks work with MCP tools?**

Yes. MCP tools follow the naming pattern `mcp____`. Match them with regex: `"mcp__github__.*"` catches all GitHub MCP calls. `"mcp__.*"` catches everything.

**Where do hooks go for team projects?**

`.claude/settings.json` in the project root gets committed to the repo and shared with the team. Personal hooks go in `~/.claude/settings.json` (machine-wide) or `.claude/settings.local.json` (project-specific, gitignored).

**Can a hook modify Claude's input?**

Yes. PreToolUse hooks can return `updatedInput` in their JSON output. The harness swaps Claude's original input with your modified version before execution.

## The split that actually works

Stop trying to make CLAUDE.md your enforcement. It's a context document, not a contract.

Move your "must always happen" rules into hooks. Keep your "here's how we work" guidelines in CLAUDE.md. The result is a system where Claude has good judgement from context and hard guardrails from code.

We cut our CLAUDE.md from 400 lines to 100 after migrating enforcement into hooks. Claude followed the remaining guidelines more reliably because there was less noise. Hooks handled the non-negotiables without burning context tokens.

That's the real win. Not just reliability, but focus. CLAUDE.md gets to be what it should be: a concise project brief, not a massive token-burning rule book that the model gradually forgets.

---

Go build your first hook. The [official hooks documentation](https://docs.anthropic.com/en/docs/claude-code/hooks) has the complete event reference and JSON schemas.

[Read more Claude Code guides](/zone/ai-workflows) | [Subscribe to the newsletter](#)

---

## Your AI Action Plan: 5 Challenges, 20 Minutes
URL: https://labs.zeroshot.studio/resources/ai-for-business-action-plan
Zone: resources
Tags: ai-for-business, action-plan, exercises, challenges, getting-started
Published: 2026-03-24

You've got the theory. Now put it to work. Five real business challenges, each with a guided AI exercise. Twenty minutes total.

> **KEY TAKEAWAY**
> * **The Problem:** Knowing about AI is worthless without doing.
> * **The Solution:** These five timed challenges take 20 minutes total and produce real outputs: a competitor brief, a branded email, a workflow map, a security quick-start, and a 30-day action plan.
> * **The Result:** You'll have actionable AI implementation ready to use today, not "sometime this week."

*Last updated: 2026-03-27 · Tested against ChatGPT, Claude, and Gemini (March 2026)*

Set a timer. This post is five timed challenges that give you real, usable outputs before you close the tab. Each one takes four minutes. Each one produces something you can use today.

When I ran these live in our ZeroShot Studio workshops, the 20-minute constraint was what made them stick. People who "planned to try it later" rarely did. People who did it right then kept using AI daily. Twenty minutes. Hands on the keyboard. Let's go.

## Contents

1. [Challenge 1: Can you build a competitor snapshot in 4 minutes?](#challenge-1-can-you-build-a-competitor-snapshot-in-4-minutes)
2. [Challenge 2: Can you upgrade a bad prompt in 4 minutes?](#challenge-2-can-you-upgrade-a-bad-prompt-in-4-minutes)
3. [Challenge 3: Can you map your first automation target in 4 minutes?](#challenge-3-can-you-map-your-first-automation-target-in-4-minutes)
4. [Challenge 4: Can you lock down your AI security basics in 4 minutes?](#challenge-4-can-you-lock-down-your-ai-security-basics-in-4-minutes)
5. [Challenge 5: Can you build a 30-day AI plan in 4 minutes?](#challenge-5-can-you-build-a-30-day-ai-plan-in-4-minutes)
6. [What are the eight principles to remember?](#what-are-the-eight-principles-to-remember)
7. [Where do you go from here?](#where-do-you-go-from-here)
8. [FAQ](#faq)

## Challenge 1: Can you build a competitor snapshot in 4 minutes?

**What:** Use the competitor analysis template from [Post 5](/resources/ai-for-business-competitor-analysis) to analyse one competitor right now.

**How:**
1. Open ChatGPT, Claude, or Gemini (1 minute)
2. Paste the competitor analysis template and fill in your business details and one competitor name (2 minutes)
3. Scan the output for flags and bookmark 2-3 claims to check later (1 minute)

**You now have:** A structured competitor brief you can share with your team today. We use this exact template every time a new competitor appears on our radar -- it takes one afternoon from "who are these people" to "here's what they're actually doing."

> **The hard rule:** Speed matters more than perfection here. A 4-minute draft you actually produce beats a thorough analysis you never start.

## Challenge 2: Can you upgrade a bad prompt in 4 minutes?

**What:** Combine your brand persona (from [Post 3](/resources/ai-for-business-personalisation-tone)) with the RCTFC framework (from [Post 2](/resources/ai-for-business-prompt-engineering)) to write one real email.

**How:**
1. Pick an email you actually need to send today (30 seconds)
2. Set your persona in custom instructions, or paste it at the top of a new chat (1 minute)
3. Write a RCTFC-structured prompt for that email (2 minutes)
4. Review the output. Does it sound like you? If not, give one round of feedback (30 seconds)

**You now have:** A ready-to-send email that sounds like your brand, produced in 4 minutes instead of 15. Open it. Read it out loud. Does it sound like you talking, or like a polished stranger? That's your gut-check.

If you haven't done the tone extraction exercise yet, do it now instead. Paste 3-5 samples of your writing and ask AI to analyse your style. That 10-minute investment pays off every time you use AI for writing from now on.

## Challenge 3: Can you map your first automation target in 4 minutes?

**What:** Identify the top 3 workflows in your business that are ripe for AI automation, based on the criteria from [Post 4](/resources/ai-for-business-agents-no-code).

**How:**
1. List 5 tasks you or your team do repeatedly every week (1 minute)
2. Score each on: repetitiveness (1-5), risk if AI gets it wrong (1-5, lower is better for starting), and whether it's text-based (yes/no) (2 minutes)
3. Pick the top scorer. Write one sentence describing what the automated version would look like (1 minute)

**You now have:** A clear first automation target with a one-sentence brief for building it. Write that brief on a sticky note and put it somewhere visible. It's the thing you're building next week.

We ran this exact scoring exercise with the ZeroLabs content workflow and picked the wrong first target (too much edge-case handling, not enough volume). The table above would have saved us two weeks.

Here's a quick scoring example:

| Task | Repetitive | Low risk | Text-based | Score |
|------|-----------|----------|------------|-------|
| Reply to common customer questions | 5 | 4 | Yes | Top pick |
| Weekly team status summary | 4 | 5 | Yes | Strong |
| Invoice processing | 3 | 2 | Partially | Hold |
| Social media scheduling | 4 | 4 | Yes | Strong |
| Quarterly financial reporting | 2 | 1 | No | Not yet |

## Challenge 4: Can you lock down your AI security basics in 4 minutes?

**What:** Implement the first 2 of the 5 quick wins from [Post 6](/resources/ai-for-business-security-gdpr).

**How:**
1. Check which AI plan your team is on right now. Free or paid? (1 minute)
2. If free: start the upgrade process to a team plan for your top 3 users (2 minutes)
3. Write your "never paste" list. Five categories of data that should never go into any AI tool. Send it to your team right now via Slack or email (1 minute)

**You now have:** The beginning of an AI security posture. Not perfect, but dramatically better than nothing. Slack that "never paste" list to your team right now, before you move on. Seriously -- do it before Challenge 5.

> **The hard rule:** Security doesn't need to be comprehensive on day one. It needs to exist on day one and improve every quarter.

[IMAGE: The five challenges shown as a progress tracker with time estimates]
- Type: diagram
- Filename: five-challenges-tracker.png
- Alt text: A horizontal progress tracker showing five challenges with 4-minute time estimates each, totalling 20 minutes
- Caption: 20 minutes. Five outputs. No excuses.

## Challenge 5: Can you build a 30-day AI plan in 4 minutes?

**What:** Write a simple 30-day plan for your AI adoption, using AI to help.

**How:**
1. Open a new chat and paste this prompt (1 minute):

> "I run [YOUR BUSINESS]. I want to adopt AI tools over the next 30 days. I've identified these priorities: [paste your workflow audit top 3 from Challenge 3]. Create a simple 4-week plan with one specific action per week. Keep each action achievable in under 2 hours. Format as a table with columns: Week, Action, Tool to use, Expected outcome."

2. Review the output and adjust any actions that don't feel realistic (2 minutes)
3. Put Week 1's action in your calendar right now, with a specific day and time (1 minute)

**You now have:** A concrete 30-day plan with the first action already scheduled. Put the device down for 30 seconds and actually open your calendar. Add Week 1. Done? Good.

The single biggest predictor of whether someone actually adopts AI after a workshop? Whether they scheduled a specific next action before leaving the room. Not "I'll try it sometime." A date, a time, a calendar entry. I've watched this play out with hundreds of people. The ones with a calendar entry use AI. The others mostly don't.

## What are the eight principles to remember?

These are the core ideas from the entire course. Pin them somewhere visible:

1. **It predicts; it doesn't think.** Stop expecting wisdom from it and start treating it like a very good autocomplete -- because that's what it is. ([Post 1](/resources/ai-for-business-how-ai-works))
2. **Being specific matters more than which model you pick.** A detailed prompt on a cheap model beats a lazy prompt on the expensive one. Every time. ([Post 2](/resources/ai-for-business-prompt-engineering))
3. **Show it your writing, don't try to describe it.** Paste three examples of something you've written and let it figure out your style. Describing "warm but professional" gets you nowhere. ([Post 3](/resources/ai-for-business-personalisation-tone))
4. **Start with the dull stuff.** The boring, repetitive, text-based tasks are the easy wins. Get those running before you get ambitious. ([Post 4](/resources/ai-for-business-agents-no-code))
5. **Let AI handle the structure; you handle the facts.** It's brilliant at organising things. It makes up numbers. Keep that straight. ([Post 5](/resources/ai-for-business-competitor-analysis))
6. **Free plans are not for business data.** If your team handles customer information, pay for the team plan. It's not expensive and it matters. ([Post 6](/resources/ai-for-business-security-gdpr))
7. **Start this week. Seriously.** The stuff you learn in week one makes week two easier. Waiting until you "have a proper strategy" just means not starting.
8. **Nothing works first time. That's fine.** Your first prompt, first agent, and first policy will all need tweaking. That's not failure, that's how it works.

## Where do you go from here?

You've finished the course. Here's how to keep building:

**Free resources we recommend:**
- [How AI actually works](/resources/ai-for-business-how-ai-works) -- the foundation post from this course, worth re-reading once you've applied the frameworks
- [Anthropic Prompt Library](https://docs.anthropic.com) -- curated prompts for common business tasks, excellent starting points for your own templates
- [OpenAI Cookbook](https://cookbook.openai.com) -- practical recipes and examples, useful even if you use Claude or Gemini since the principles transfer
- [Building Your First AI Agent](/resources/building-your-first-ai-agent) -- our technical guide for when you're ready to go beyond no-code

**Keep practising:**
- Use AI for at least one real task every day for the next 30 days
- Save your best prompts in a shared document
- Review and update your tone profile every quarter
- Run the competitor analysis template whenever a new player appears

**Share what you learn:**
- Teach one colleague the RCTFC framework this week
- Run a 15-minute AI show-and-tell at your next team meeting
- If you found this course useful, share it with another founder who could benefit

> **The reality:** The course is done. The practice starts now. Open an AI tool and do Challenge 1 before you close this tab.

## Frequently asked questions

**I completed all five challenges. What should I focus on in week one?**

Your workflow audit winner from Challenge 3. Build a basic version using Zapier or Make.com. Don't over-engineer it. Get something running, even if it's rough, and improve it over time.

**How do I get my team on board with AI?**

Start with a 15-minute demo. Show them one task done manually, then done with AI. The "before and after" is more convincing than any presentation. Share this course with them and suggest they each pick one challenge to try.

**What if my industry has specific regulations about AI?**

Healthcare, finance, legal, and education all have additional AI compliance requirements beyond general GDPR. The framework in Post 6 gives you the foundation, but consult a specialist in your industry's regulations before deploying customer-facing AI.

**How often should I revisit this course?**

The principles stay stable. The specific tools and pricing change every few months. We update the [AI for Business course](/resources/ai-for-business-course) periodically, but the core framework (predict, prompt, personalise, automate, verify, secure) will serve you well regardless of which models dominate next year.

**I want to go deeper technically. Where do I start?**

Start with Anthropic's [prompt engineering documentation](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview) and OpenAI's [prompt engineering guide](https://platform.openai.com/docs/guides/prompt-engineering). Both are written for non-developers and cover advanced techniques. When you're ready for code-level integration, the [OpenAI Cookbook](https://cookbook.openai.com) has production-ready examples.

---

**That's the course.** Seven posts. Seven skills. One action plan. The only thing left is to do the work. Open ChatGPT or Claude right now and knock out Challenge 1. Four minutes. Timer starts now.

*This is Post 7 of 7 in the [AI for Business](/resources/ai-for-business-course) free course. Previous: [Security & GDPR](/resources/ai-for-business-security-gdpr). Back to: [Course Overview](/resources/ai-for-business-course).*

Once you've worked through the exercises, see how the operating layer fits together in [How to Build AI Review Agents for Your Content Pipeline](/ai-workflows/ai-review-agents-content-pipeline) and pressure-test whether you need automation at all with [You Don't Need an AI Agent](/agents/you-dont-need-an-ai-agent).

---

## Your Team Is Already Using AI. Here's How to Not Get Burned
URL: https://labs.zeroshot.studio/resources/ai-for-business-security-gdpr
Zone: resources
Tags: ai-for-business, security, gdpr, data-governance, enterprise, compliance
Published: 2026-03-24

Your staff are pasting client data into personal AI accounts right now. Here's the data governance crash course your board needs this quarter.

> **KEY TAKEAWAY**
> * **The Problem:** About 78% of employees using AI at work are bringing their own tools unsanctioned, creating data leakage risk where no policy exists.
> * **The Solution:** Five quick wins: team-tier plans, a one-page policy, a "never paste" list, history controls, and quarterly reviews, close the biggest gaps fast.
> * **The Result:** Most small teams can implement these in under a day and reduce breach risk by 70% for under $500/month.

*Last updated: 2026-03-27 · Tested against EU AI Act and GDPR requirements as of March 2026*

Here's an uncomfortable truth: someone on your team has probably already pasted customer data into ChatGPT. Not maliciously. They were trying to write a better email, summarise a report, or draft a response. They didn't think about where that data goes. This is the AI security and GDPR gap that most small teams don't know they have.

This post isn't about scaring you. It's about giving you a practical plan for handling AI security without hiring a compliance consultant or banning AI entirely (which doesn't work anyway, people just use it on their phones).

## Contents

1. [Where does your data actually go?](#where-does-your-data-actually-go)
2. [What is the gap between personal and enterprise plans?](#what-is-the-gap-between-personal-and-enterprise-plans)
3. [What are five quick wins your board can approve today?](#what-are-five-quick-wins-your-board-can-approve-today)
4. [What are the five red lines your team should never cross?](#what-are-the-five-red-lines-your-team-should-never-cross)
5. [What should an enterprise AI plan include?](#what-should-an-enterprise-ai-plan-include)
6. [FAQ](#faq)

## Where does your data actually go?

Not all AI platforms handle your data the same way. This table shows the key differences as of early 2026:

| Platform | Free tier training | Paid tier training | Data retention | GDPR compliant | SOC 2 |
|----------|-------------------|-------------------|----------------|----------------|-------|
| ChatGPT Free | Yes, used for training ([OpenAI Data Policy](https://openai.com/policies/how-your-data-is-used-to-improve-model-performance/)) | N/A | Indefinite; 30 days post-deletion | Yes (DPA available) | No |
| ChatGPT Plus/Team | Opt-out available | No (Team/Enterprise) | 30 days (adjustable) | Yes (DPA available) | Yes (Enterprise) |
| Claude Free | Yes, opt-out available ([Anthropic Consumer Terms](https://www.anthropic.com/news/updates-to-our-consumer-terms)) | N/A | 30 days (opt-out) / 5 years (opt-in) | Yes (DPA available) | No |
| Claude Pro/Team | Yes on Pro (opt-out available); No on Team | No (Team/Enterprise) | 30 days (opt-out) / 5 years (opt-in); Team: per policy | Yes (DPA available) | Yes (Team+) |
| Gemini Free | Yes, used for training | N/A | Up to 18 months | Yes (limited) | No |
| Gemini Business | No | No | Configurable | Yes (full DPA) | Yes |
| Copilot (Microsoft 365) | No | No | Tenant boundary | Yes (via M365 DPA) | Yes |

The critical column is "Free tier training." If it says "Yes," anything your team types into the free version could be used to train the model, which means it could theoretically appear in responses to other users. The probability is extremely low, but the regulatory risk is real.

> **The hard rule:** Free AI tools are not safe for any data you wouldn't post publicly. If your team handles customer data, personal information, or business secrets, a paid plan with training opt-out is the minimum requirement.

```mermaid
flowchart LR
  A["Your prompt"] --> B{"Free or paid?"}
  B -->|Free tier| C["May train the model"]
  B -->|Paid team tier| D["Not used for training"]
  C --> E["Could surface in other responses"]
  D --> F["Stays within your account"]
```

## What is the gap between personal and enterprise plans?

The difference between a $20/month personal plan (ChatGPT Plus or Claude Pro) and a $25-30/user/month team plan ([Claude Team](https://claude.com/pricing), [ChatGPT Business](https://openai.com/business/chatgpt-pricing/)) isn't just features. It's a fundamentally different data handling agreement.

**Personal plans** (ChatGPT Plus, Claude Pro) give you better models and longer conversations. But the data handling is still consumer-grade. Your conversations are stored on the provider's servers. You can delete them, but there's limited audit trail and no admin controls.

**Team and enterprise plans** add:
- **Admin dashboard** with usage monitoring
- **Data processing agreements (DPAs)** that satisfy GDPR Article 28
- **No training on your data**, contractually guaranteed
- **SSO and access controls** so you manage who can use what
- **Audit logs** showing what was asked and when
- **Data residency options** (EU hosting for GDPR compliance)

For a 10-person team, the cost difference between personal and team plans is roughly $100-200/month. That's the price of not getting a nasty letter from a regulator. When I mentioned this figure in a workshop, one founder called it "the cheapest insurance policy I've ever heard of."

According to [GDPR.eu](https://gdpr.eu), the average GDPR fine in 2024 was EUR 1.6 million. Even the minimum fine for a small business can reach EUR 10,000-50,000. The maths on team plans writes itself.

## What are five quick wins your board can approve today?

You don't need a six-month security project. These five actions take less than a day combined and close the biggest gaps:

1. **Upgrade to team plans.** Move your most active AI users (usually 3-5 people) to team-tier plans. Cost: $25-30 per user per month. This immediately stops your data from being used for training and gives you admin controls.

2. **Publish a one-page AI usage policy.** It doesn't need to be a legal document. One page covering: which tools are approved, what data can and cannot be entered, and who to ask if you're unsure. We've seen effective policies that fit on a single A4 sheet.

3. **Create a "never paste" list.** Give every team member a clear list of data types that must never go into any AI tool: passwords, API keys, customer financial data, health records, full customer databases, legal documents under NDA. Print it. Stick it next to monitors.

4. **Enable conversation history controls.** On ChatGPT: Settings > Data Controls > turn off "Improve the model for everyone." On Claude: this is off by default on paid plans. On Gemini: Activity controls > turn off Gemini Apps Activity. Takes 2 minutes per person.

5. **Quarterly 15-minute review.** Set a calendar reminder. Every quarter, check: are we still on the right plans? Has anyone found a new AI tool we should evaluate? Any incidents? Fifteen minutes prevents drift.

> **The catch:** These five wins cost under $500/month for most small teams and close the biggest risk gaps between "no policy" and "properly managed."

## What are the five red lines your team should never cross?

Some things are non-negotiable regardless of which plan you're on. Make these absolute rules, not guidelines:

1. **Never paste combined customer personal data into any AI tool** that doesn't have a signed DPA. Individual first names in a generic prompt are low risk. A full customer list with emails and purchase history is a GDPR breach waiting to happen.

2. **Never paste credentials, API keys, passwords, or access tokens.** This sounds obvious. It happens constantly. A 2024 [GitGuardian](https://www.gitguardian.com/state-of-secrets-sprawl-report) report found that AI coding assistants were a top source of accidental credential exposure.

3. **Never use AI to make automated decisions about people** (hiring, firing, loan approvals, insurance) without human review and explicit documentation. The EU AI Act classifies these as high-risk AI applications with specific compliance requirements.

4. **Never assume a free tool's privacy policy is permanent.** Companies change terms regularly. OpenAI updated its data usage terms three times in 2024 alone. Review terms quarterly.

5. **Never use AI-generated legal, medical, or financial advice as final output.** AI can draft these documents, but a qualified human must review and approve them. This isn't just risk management; in many jurisdictions it's a legal requirement.

## What should an enterprise AI plan include?

If your team is growing and AI usage is increasing, here's the checklist for moving from "quick wins" to a proper enterprise AI framework:

- **Approved tools list** with version and plan tier for each
- **Data classification system** (public, internal, confidential, restricted) with clear rules for which classification can go into which tool
- **Signed DPAs** with every AI provider your team uses
- **Training program** covering at minimum: what AI can and can't do, data handling rules, the "never paste" list, and how to report concerns
- **Incident response plan** for AI-related data exposures (what happens when someone pastes client data into the wrong tool? who to notify, what to document, regulatory reporting timelines under GDPR: 72 hours)
- **Regular audits** of AI usage patterns (quarterly minimum)
- **Procurement review** so new AI tools go through security assessment before adoption
- **Documentation of AI use** in customer-facing processes (GDPR requires transparency about automated processing)

This looks like a lot. It isn't, really. Most of these are one-time setup tasks. A 20-person company can work through this checklist in 2-3 days of focused effort. If you want to review the full course context before building your framework, start from the [AI for Business course hub](/resources/ai-for-business-course). If you prefer a condensed version of all the key decisions, the [AI for Business action plan](/resources/ai-for-business-action-plan) covers the full checklist in one place.

> **The reality:** The enterprise checklist is for when you're ready to do this properly. The five quick wins are for right now. Don't let the perfect framework prevent you from taking basic protective steps today.

## Frequently asked questions

**Do I need to tell customers we use AI?**

Under GDPR, if AI is involved in processing personal data or making decisions that affect customers, yes. The safest approach: add a line to your privacy policy stating that you use AI tools for specific purposes and name the providers. Transparency builds trust.

**What if a team member accidentally pastes sensitive data into ChatGPT?**

Don't panic. Delete the conversation immediately (this removes it from your account, though it may have been briefly processed on the server). Document the incident internally. Assess whether it constitutes a personal data breach under GDPR (it might not, depending on what was pasted). If it does, you have 72 hours to notify your data protection authority.

**Is using AI for recruitment legal?**

It depends on your jurisdiction. The EU AI Act (effective 2025-2026) classifies recruitment AI as high-risk, requiring human oversight, bias testing, and documentation. In Australia, anti-discrimination laws apply to AI-assisted decisions just as they do to human ones. Use AI to draft job descriptions or summarise CVs, but keep humans in the decision loop.

**Should we ban AI use entirely to be safe?**

No. Banning AI doesn't stop usage; it pushes it underground. The Salesforce/Slack Fall 2024 Workforce Index found that 48% of desk workers would be uncomfortable admitting AI use to their manager ([Slack Workforce Index](https://slack.com/blog/news/the-fall-2024-workforce-index-shows-executives-and-employees-investing-in-ai-but-uncertainty-holding-back-adoption)). A ban with no enforcement is worse than a policy with clear guidelines, because the ban gives you a false sense of security.

**How do open-source models change the picture?**

Running open-source models (like Llama, Mistral, or Qwen) on your own infrastructure eliminates the data-sharing concern entirely since the data never leaves your servers. The trade-off is that you need technical capacity to host and maintain them. For most small teams, a paid cloud plan with a DPA is simpler and more cost-effective.

---

**Next up:** [Your AI Action Plan: 5 Challenges, 20 Minutes](/resources/ai-for-business-action-plan) -- put everything together with five timed challenges that give you real, usable outputs.

*This is Post 6 of 7 in the [AI for Business](/resources/ai-for-business-course) free course. Previous: [Competitor Analysis](/resources/ai-for-business-competitor-analysis)*

---

## Research a Competitor in One Prompt
URL: https://labs.zeroshot.studio/resources/ai-for-business-competitor-analysis
Zone: resources
Tags: ai-for-business, competitor-analysis, workshop, prompts, research
Published: 2026-03-24

The most popular exercise from our live workshop. One prompt, one competitor analysis, the kind of output that used to take two weeks and five figures.

> **KEY TAKEAWAY**
> * **The Problem:** Competitor analysis takes hours and often reads like generic marketing summaries instead of actionable intelligence.
> * **The Solution:** Use a single AI prompt built on the RCTFC framework to generate structured analysis across six business dimensions.
> * **The Result:** A complete competitor breakdown in under 5 minutes, ready to verify and share with your team.

*Last updated: 2026-03-27 · Tested against ChatGPT, Claude, and Gemini (March 2026)*

This is the hands-on post. We're building something you can actually use. By the end, you'll have a competitor analysis template you can reuse any time you need to evaluate a new competitor, prepare for a pitch, or brief your team.

In our ZeroShot Studio workshops, this exercise consistently got the biggest reaction. People walked in skeptical. Five minutes later they were staring at a structured competitor breakdown that would have taken a junior analyst half a day to produce. (With one important caveat: you still need to verify the details. More on that below.)

## Contents

1. [What are we building?](#what-are-we-building)
2. [What are the six competitor analysis dimensions?](#what-are-the-six-competitor-analysis-dimensions)
3. [What does the full prompt template look like?](#what-does-the-full-prompt-template-look-like)
4. [How do you read and use the output?](#how-do-you-read-and-use-the-output)
5. [Why is fact-checking non-negotiable?](#why-is-fact-checking-non-negotiable)
6. [How do you iterate and go deeper?](#how-do-you-iterate-and-go-deeper)
7. [FAQ](#faq)

## What are we building?

A one-prompt competitor analysis that covers the key business dimensions you actually care about. Not a 50-page market report. Not a surface-level paragraph. Something in between: structured, actionable, and fast enough that you'll actually use it regularly.

The goal is a report you can paste into a slide, share in a team Slack channel, or use as a briefing document before a sales call. Something that answers: "What does this competitor do, how do they position themselves, and where are the gaps?"

## What are the six competitor analysis dimensions?

Every competitor analysis worth reading covers six dimensions. These aren't random. They map to the questions founders and board members actually ask when evaluating competitive positioning:

1. **Core offering** -- What do they sell? Who do they sell it to? What problem does it solve?
2. **Pricing model** -- How do they charge? Subscription, one-time, freemium, usage-based? What are the price points?
3. **Target market** -- Who's their ideal customer? What size business? What industry? What geography?
4. **Key differentiators** -- What do they claim makes them different? What do their customers say makes them different (check reviews)?
5. **Weaknesses and gaps** -- Where do customers complain? What features are missing? Where do they lose deals?
6. **Market position** -- Are they the leader, a challenger, a niche player? How does their positioning compare to yours?

These six fields give you a complete picture without drowning in unnecessary detail. In our workshops, we tested both shorter and longer frameworks. Six fields hit the sweet spot: comprehensive enough to be useful, concise enough to actually read. (If you landed here directly and want the full context, start with the [AI for Business course overview](/resources/ai-for-business-course).)

[IMAGE: Six business detail fields shown as a hexagonal framework]
- Type: diagram
- Filename: competitor-six-fields.png
- Alt text: A hexagonal diagram showing the six competitor analysis fields: core offering, pricing, target market, differentiators, weaknesses, and market position
- Caption: Six dimensions. One prompt. Five minutes.

## What does the full prompt template look like?

Here's the template. Copy it, replace the bracketed sections with your details, and paste it into ChatGPT, Claude, or Gemini.

```text
# Example: AI competitor analysis prompt template
You are a business analyst specialising in competitive intelligence.

I run [YOUR BUSINESS DESCRIPTION: e.g. "a 15-person SaaS company selling project management tools to creative agencies in Australia"].

Analyse [COMPETITOR NAME] as a competitor to my business. Cover these six areas:

1. CORE OFFERING: What they sell, who they sell to, what problem they solve. Be specific about product features.

2. PRICING: Their pricing model and price points. Include free tier details if applicable.

3. TARGET MARKET: Their ideal customer profile. Company size, industry, geography, use case.

4. DIFFERENTIATORS: What makes them stand out. Include both their marketing claims and what their customers actually say (based on review sites, social media sentiment).

5. WEAKNESSES: Where they fall short. Common customer complaints, missing features, known limitations.

6. MARKET POSITION: Leader, challenger, or niche player. Estimated market share if available. How they compare to my business specifically.

Format: Use the six headers above. Each section should be 2-3 sentences with specific details, not generalities. Include any relevant numbers (revenue, customer count, pricing) where available.

Flag any claims you're not confident about with .

Constraints: Stick to publicly available information. Don't invent data points. If you don't know something, say so rather than guessing.
```

Notice how this uses the RCTFC framework from [Post 2](/resources/ai-for-business-prompt-engineering): Role (business analyst), Context (your business), Task (analyse competitor), Format (six sections), Constraints (flag uncertainty).

> **The hard rule:** The "" flag instruction is the most important line in this template. It forces the model to distinguish between things it's confident about and things it's guessing.

## How do you read and use the output?

When you run this prompt, you'll get a structured report covering all six dimensions. Here's how to use it effectively:

**First pass: scan for flags.** Look for any "" markers. These are the claims you need to check before sharing with anyone. In testing, models typically flag 2-4 items per analysis, which is honest and useful.

**Second pass: check the numbers.** Any specific revenue figures, customer counts, or pricing details should be verified against the competitor's website, press releases, or review platforms like G2 and Capterra. AI sometimes pulls outdated pricing or estimates revenue from old articles.

**Third pass: assess relevance.** Not every dimension will be equally important for your situation. If you're competing on price, the pricing section matters most. If you're competing on features, focus on differentiators and weaknesses.

**The output format is designed to be shareable.** You can paste it into a Notion doc, a Google Doc, or a Slack message. It reads well because it's structured, not because it's polished prose.

When I ran this exercise with a group of startup founders, one participant analysed a competitor she'd been tracking informally for months. "It found a weakness I'd never noticed," she told me. The AI had flagged negative reviews about the competitor's onboarding process, something that didn't show up in the competitor's marketing but appeared repeatedly on G2 reviews.

## Why is fact-checking non-negotiable?

This is the part where I get serious for a moment. AI competitor analysis is a starting point, not a finished product. Here's why:

- **Revenue and customer numbers** are frequently wrong. Models estimate based on old data, press coverage, or employee count proxies. Always verify.
- **Pricing changes constantly.** Check the competitor's actual pricing page. It takes 30 seconds.
- **Market positioning is subjective.** The model's assessment of "leader vs challenger" may not match your industry's actual dynamics.
- **Training data has a cutoff.** If the competitor launched a new product last month, the model might not know about it.
- **Be careful what you share.** Pasting internal pricing strategies or customer data into a public AI model is a data governance risk. The [AI for Business Security and GDPR post](/resources/ai-for-business-security-gdpr) covers the rules your legal team will want you to follow.

A 2025 study by Stanford's AI Index found that LLMs achieve roughly 60-80% accuracy on factual business claims, varying by how public and well-documented the company is ([Stanford HAI AI Index Report](https://aiindex.stanford.edu/report/)). Larger, publicly traded companies get more accurate results. Small startups get more guesswork.

**The rule:** Use AI for structure and speed. Use your own eyes for verification. A 5-minute AI draft plus 10 minutes of fact-checking still beats 3 hours of manual research.

> **The hard rule:** Never share an AI-generated competitor analysis without spending 10 minutes verifying the specific claims. Structure is free. Accuracy requires your eyeballs.

## How do you iterate and go deeper?

Once you have the initial analysis, you can go deeper with follow-up prompts:

**Go deeper on a specific dimension:**
> "Expand on the WEAKNESSES section. Search for common themes in negative G2 reviews and Reddit discussions about [COMPETITOR]. Give me 5 specific complaints with the approximate frequency of each."

**Compare two competitors:**
> "Now do the same analysis for [COMPETITOR 2]. Then add a comparison table showing how both competitors compare to my business across all six dimensions."

**Generate strategic recommendations:**
> "Based on the analysis of [COMPETITOR], suggest 3 specific actions my business could take to differentiate. Focus on gaps in their offering that align with our strengths."

**Create a one-page brief:**
> "Condense the full analysis into a 200-word executive summary suitable for a board meeting. Lead with the biggest threat and biggest opportunity."

Save your best competitor analysis prompts in a document. When a new competitor appears, you can run the same template in minutes. Several workshop participants told us they now run this monthly on their top 3-5 competitors.

[IMAGE: Screenshot showing a competitor analysis output with verification annotations]
- Type: screenshot
- Filename: competitor-output-annotated.png
- Alt text: A competitor analysis output with green checkmarks on verified facts and orange flags on claims that need checking
- Caption: The raw output is step one. Verification turns it into something you can trust.

## Frequently asked questions

**Which AI model works best for competitor analysis?**

[Claude](https://www.anthropic.com/claude) and GPT-4o both perform well. Claude tends to be more conservative with unverified claims (it flags more uncertainty, which is actually useful). GPT-4o sometimes provides more specific numbers but with lower reliability. For competitor research, being cautious is better.

**Can AI access the competitor's website in real time?**

Some models with web browsing enabled (ChatGPT with browsing, Gemini) can pull current information. Models without browsing rely on training data, which is typically 12-24 months old ([LLM Cutoff Dates](https://www.allmo.ai/articles/list-of-large-language-model-cut-off-dates)). If timeliness matters, use a model with browsing or do the final verification yourself.

**How do I handle competitors that are very small or very new?**

The less public information exists about a company, the less accurate AI will be. For small or new competitors, expect more "" flags and do more manual verification. For very early-stage competitors, you might be better off checking their website, LinkedIn, and Product Hunt manually.

**Is this ethical?**

Completely. You're analysing publicly available information, the same thing any business analyst would do manually. You're not accessing private data, scraping behind paywalls, or doing anything the competitor hasn't already made public.

**Can I automate this to run on a schedule?**

Yes, and this connects directly to [Post 4: Agents & No-Code](/resources/ai-for-business-agents-no-code). You could set up a monthly trigger that runs competitor analyses and drops updated reports into a shared folder. Several workshop participants built exactly this.

---

**Next up:** [Your Team Is Already Using AI. Here's How to Not Get Burned](/resources/ai-for-business-security-gdpr) -- the security and GDPR post your legal team will thank you for reading.

*This is Post 5 of 7 in the [AI for Business](/resources/ai-for-business-course) free course. Previous: [Agents & No-Code](/resources/ai-for-business-agents-no-code)*

And if you're deciding where this kind of task should stop, read [You Don't Need an AI Agent](/agents/you-dont-need-an-ai-agent) before you turn a one-prompt research job into a whole agent stack.

---

## AI That Takes Action, Not Just Answers
URL: https://labs.zeroshot.studio/resources/ai-for-business-agents-no-code
Zone: resources
Tags: ai-for-business, agents, no-code, automation, zapier, make
Published: 2026-03-24

Most people use AI like a search engine with better manners. Agents are what happens when you let AI actually do things. No code required.

> **KEY TAKEAWAY**
> * **The Problem:** Repetitive workflows waste 5-10 hours per week on copy-paste, data entry, and manual tool switching.
> * **The Solution:** No-code AI agent platforms let you automate these workflows without writing code.
> * **The Result:** Teams in our workshops went from zero agents to 8-10 running workflows within three months.

*Last updated: 2026-03-27 · Tested against Zapier, Make.com, Relay.app, and MindStudio (March 2026)*

Asking AI questions is fine for quick answers. But the interesting bit comes when AI stops waiting for your next message and starts doing things on its own. That's what AI agents do. They connect AI to your tools and let it take action based on rules you set.

If you've ever thought "I wish someone would just handle this boring stuff for me," agents are the answer. And you don't need to write a single line of code to build one.

## Contents

1. [What is the difference between a chatbot and an agent?](#what-is-the-difference-between-a-chatbot-and-an-agent)
2. [Which no-code platforms can you use?](#which-no-code-platforms-can-you-use)
3. [What are the five building blocks of any agent?](#what-are-the-five-building-blocks-of-any-agent)
4. [How do you pick your first workflow to automate?](#how-do-you-pick-your-first-workflow-to-automate)
5. [Where is this heading?](#where-is-this-heading)
6. [FAQ](#faq)

## What is the difference between a chatbot and an agent?

A chatbot waits for you to type something, responds, and waits again. An agent watches for triggers, makes decisions, and takes actions across your tools without needing you in the loop.

Here's a concrete example:

**Chatbot approach:** You paste a customer email into ChatGPT. You ask it to draft a reply. You copy the reply and paste it into your email client. You do this 30 times a day.

**Agent approach:** A new support email arrives. The agent reads it, classifies the intent (billing, technical, general), drafts a reply in your brand voice, and sends it or flags it for human review if it's complex. You review a daily summary instead of handling each one manually.

The difference is action. Chatbots generate text. Agents generate text and then do something with it.

According to [Gartner](https://www.gartner.com/en/articles/intelligent-agent-in-ai)'s 2025 predictions, by 2028 roughly 33% of enterprise software applications will include agentic AI, up from less than 1% in 2024. This shift is already happening in small business tools too.

> **The catch:** If you're copying AI output from a chat window and pasting it somewhere else more than three times a day, that's a workflow begging to be automated with an agent.

## Which no-code platforms can you use?

You don't need a developer. Several platforms let you wire up AI plus tools using drag-and-drop. Here are the ones worth knowing:

| Platform | Best for | AI integration | Free tier | Learning curve |
|----------|----------|---------------|-----------|---------------|
| [Zapier](https://zapier.com/ai) | Simple 2-3 step automations | Built-in AI actions, GPT, Claude | 100 tasks/month | Low |
| [Make.com](https://make.com) | Complex multi-step workflows | HTTP modules for any AI API | 1,000 ops/month | Medium |
| [Relay.app](https://relay.app) | Human-in-the-loop workflows | Native AI steps + approvals | Free for small teams | Low |
| [MindStudio](https://mindstudio.ai) | Custom AI apps and agents | Multi-model, visual builder | Free tier available | Medium |

All four of these platforms let you build working agents in under an hour. In our workshops, first-time users consistently had a basic working automation running within 30-40 minutes.

**Which should you pick?** If you're brand new, start with Zapier. Its interface is the most intuitive and it connects to over 6,000 apps. If you need more complex logic (branching, loops, error handling), graduate to Make.com.

## What are the five building blocks of any agent?

Every agent is made of the same five building blocks. Once you see them, you can design any workflow:

1. **Trigger** -- What starts the agent? A new email, a form submission, a scheduled time, a Slack message, a new row in a spreadsheet. Every agent needs something to kick it off.

2. **Input processing** -- What data does the agent need to work with? This is where you pull in the email body, the form fields, the spreadsheet data, or whatever triggered the workflow.

3. **AI step** -- The agent sends the processed input to an AI model with a prompt (using the RCTFC framework from [Post 2](/resources/ai-for-business-prompt-engineering), ideally with your brand persona from [Post 3](/resources/ai-for-business-personalisation-tone)). The model generates a response.

4. **Decision logic** -- Based on the AI output, what happens next? Route to different actions based on classification. Flag for human review if confidence is low. Skip the action if certain conditions aren't met.

5. **Action** -- The actual thing that happens: send an email, update a CRM record, post to social media, create a task in your project management tool, add a row to a spreadsheet.

> **In the wild:** **Trigger:** New Google Form submission (customer feedback). **Input:** Customer name, email, feedback text, rating (1-5). **AI step:** Classify sentiment and extract key themes. If rating below 3, draft an apologetic follow-up email. **Decision:** Rating 1-2 = urgent (notify team + send email). Rating 3 = standard (log only). Rating 4-5 = positive (send thank-you + request review). **Action:** Send appropriate email, log to spreadsheet, notify team channel if urgent.

That entire flow takes about 45 minutes to build in Zapier or Make.com. It handles something that would otherwise take a person 5-10 minutes per submission.

> **The hard rule:** Every agent is just trigger + input + AI + decision + action. Once you see this pattern, you'll spot automation opportunities everywhere.

## How do you pick your first workflow to automate?

Start with something that has all three of these characteristics:

1. **Repetitive.** You do it at least 3-5 times per week.
2. **Low stakes.** If the AI gets it slightly wrong, nobody loses money or gets offended.
3. **Text-based.** The input and output are primarily text (emails, messages, form responses).

Good first agents (these map directly to the skills covered across the [AI for Business course](/resources/ai-for-business-course)):
- Classify incoming emails by type and priority
- Draft responses to common customer questions
- Summarise meeting notes and extract action items
- Generate social media post variations from a blog post
- Create weekly report summaries from spreadsheet data

Bad first agents (save these for later):
- Anything involving financial transactions
- Customer-facing responses sent without human review
- Anything where errors could create legal liability

In our workshops, the most popular first agent was "email classifier + draft responder." It's immediately useful, low risk, and teaches you all five building blocks in one build. About 85% of workshop participants had it working within 40 minutes.

## Where is this heading?

Agents are changing quickly. In 2024, most agents were simple linear workflows: trigger, process, act. By early 2026, we're seeing agents that can:

- **Use multiple tools in sequence** without human intervention
- **Self-correct** when they detect errors in their own output
- **Collaborate** with other agents on complex tasks
- **Learn from feedback** and improve their accuracy over time

We've written about building agents with code if you want to go deeper: [Building Your First AI Agent](/resources/building-your-first-ai-agent).

The workflows you automate today get smarter as the models improve. An email classifier you build this week handles edge cases better next quarter without you touching it. Once your agent workflows are running, feed their outputs into [competitor analysis](/resources/ai-for-business-competitor-analysis): AI surfaces market signals while your agents handle internal ops.

Each one you build frees up time to spot the next thing worth automating. We've seen teams go from zero to 8-10 running workflows within three months.

## Frequently asked questions

**How much do these no-code platforms cost?**

Most have free tiers that cover 100-1,000 operations per month. For a small business running 5-10 workflows, expect $10-30/month per platform on entry paid tiers ([Zapier Pricing](https://zapier.com/pricing), [Make.com Pricing](https://www.make.com/en/pricing)). Team-level plans run higher. Compare that to the 10-20 hours per month of manual work they replace.

**What if the AI makes a mistake in an automated workflow?**

Build in a "human-in-the-loop" step for anything important. Relay.app is particularly good at this. Have the agent draft the output, but require a human click to approve and send. As you build trust, gradually remove the approval step for the tasks where the agent proves reliable.

**Can I connect these to my existing tools?**

Yes. Zapier connects to 6,000+ apps. Make.com connects to 1,500+. If your tool has an API (most modern SaaS tools do), you can connect it. Common integrations: Gmail, Slack, Google Sheets, HubSpot, Notion, Trello, Shopify.

**Do I need to understand APIs to use these platforms?**

No. The platforms handle API connections for you. Pick your apps from a menu, authenticate with your login, and the platform handles the plumbing.

**How is this different from traditional automation like Zapier without AI?**

Traditional automation follows rigid rules: "if subject line contains 'invoice', move to Accounting folder." AI-powered automation understands meaning: "if this email is about a billing dispute, regardless of phrasing, classify it as billing and draft an empathetic response." The AI adds language understanding that rigid rules can't match.

---

**Next up:** [Research a Competitor in One Prompt](/resources/ai-for-business-competitor-analysis) -- a hands-on exercise where you build a complete competitor analysis in under 5 minutes.

*This is Post 4 of 7 in the [AI for Business](/resources/ai-for-business-course) free course. Previous: [Personalisation & Tone](/resources/ai-for-business-personalisation-tone)*

Before you automate anything, pair this with [You Don't Need an AI Agent](/agents/you-dont-need-an-ai-agent) for the reality check and [How Claude Published Directly to Labs via MCP](/openclaw/how-claude-published-directly-to-labs-via-mcp) for an example of an agent wired into a real publishing stack.

---

## Making AI Sound Like Your Brand, Not a Robot
URL: https://labs.zeroshot.studio/resources/making-ai-sound-like-your-brand-not-a-robot
Zone: resources
Tags: ai-for-business, personalisation, tone-of-voice, brand, custom-instructions
Published: 2026-03-24

Default AI output sounds like everyone else's business. Here's how to extract your brand voice and make AI write like you in 10 minutes.

> **KEY TAKEAWAY**
> * **The Problem:** AI sounds generic because LLMs average across millions of writing styles without seeing yours.
> * **The Solution:** Extract your brand tone from existing writing and feed it back as a persona prompt.
> * **The Result:** 60% reduction in editing time and AI output that sounds like your actual team, not a template factory.

*Last updated: 2026-03-27 · Tested against ChatGPT, Claude, and Gemini custom instructions (March 2026)*

Every AI tool ships with a default voice: polite, competent, slightly bland. It reads like a press release written by committee. Your customers can smell it instantly, and it undermines the brand voice you've spent years building, making you feel like every other brand that's using AI the same way.

The fix takes about 15 minutes. You already have the raw material sitting in your sent emails, your website copy, and your social posts. We just need to extract the patterns and feed them back.

## Contents

1. [Why does AI always sound the same?](#why-does-ai-always-sound-the-same)
2. [How do you extract your brand's actual tone?](#how-do-you-extract-your-brands-actual-tone)
3. [How do you build a business persona prompt?](#how-do-you-build-a-business-persona-prompt)
4. [How do custom instructions make this permanent?](#how-do-custom-instructions-make-this-permanent)
5. [What happens when you combine persona with good prompts?](#what-happens-when-you-combine-persona-with-good-prompts)
6. [FAQ](#faq)

## Why does AI always sound the same?

The default AI voice exists because LLMs are trained on a massive cross-section of the internet. The "average" of millions of writing styles is... Average. Inoffensive, generic, and forgettable.

According to [a 2024 study by Originality.ai](https://originality.ai/blog/ai-content-detection-accuracy), readers correctly identified AI-generated content 65-75% of the time, primarily because of repetitive sentence structures and predictable word choices. The model isn't bad at writing. It's bad at writing like *you* without being shown how.

We hit this ourselves when we first started using AI for ZeroLabs content. Everything came out sounding like a very polished press release from a company we'd never work for.

If you hired a copywriter and gave them zero briefs, zero examples, and zero brand guidelines, their first draft would sound generic too. That's exactly what you're doing every time you prompt AI without context about your voice.

> **The reality:** Generic AI output isn't a model limitation. It's a prompting gap. Show the model your voice and it'll use it.

## How do you extract your brand's actual tone?

This is the part that gets people genuinely animated in workshops. The paste-analyse-reuse method takes about 10 minutes and works surprisingly well.

1. **Gather your samples.** Emails you're proud of, website copy that converts, social posts that got engagement. Aim for 500-1,000 words total. Pick pieces where you sound most like yourself.

2. **Ask AI to extract patterns.** Paste your samples and use this prompt:

> "Analyse the writing style in these samples. Identify: (1) sentence length patterns, (2) vocabulary level, (3) use of humour or informality, (4) how the writer opens and closes pieces, (5) any distinctive phrases or patterns, (6) overall tone in 3-5 adjectives. Be specific with examples from the text."

3. **Review and refine.** The AI will hand you a profile of your writing style. Read through it. Does it match how you see yourself? Usually it's about 80% accurate on the first pass. Correct anything that feels off.

4. **Save as persona.** This is your tone profile. You'll use it in every prompt from now on.

When I ran this exercise live with a group of 20 founders, one participant, a financial adviser, was genuinely surprised. "I didn't know I used that many short sentences," she said. The AI spotted patterns she'd never consciously noticed in her own writing. (I ran it on my own writing while building this course and found I overuse sentence fragments when I'm making a point I care about. This post is full of them. Apparently I'm fine with that.)

[IMAGE: Before and after comparison showing generic AI output versus tone-matched output]
- Type: screenshot
- Filename: tone-before-after.png
- Alt text: Side-by-side comparison of generic AI output on the left and brand-voice-matched output on the right for the same business email
- Caption: Same prompt, same model. The only difference is a tone profile.

## How do you build a business persona prompt?

Once you have your tone profile, wrap it into a reusable persona prompt. Here's the template:

```text
# Example: brand voice persona template

You are a [role] writing for [company name]. Here's how we communicate:

TONE: [3-5 adjectives from your analysis, e.g. "direct, warm, slightly irreverent"]
VOCABULARY: [level and style, e.g. "plain English, avoid jargon, technical terms only when necessary with brief explanations"]
SENTENCE STYLE: [patterns, e.g. "mix of short punchy sentences and medium-length ones, rarely over 20 words"]
PERSONALITY MARKERS: [distinctive elements, e.g. "occasional dry humour, Australian slang when natural, first-person plural 'we'"]
NEVER: [things to avoid, e.g. "corporate buzzwords, passive voice, exclamation marks in professional content"]
```

Fill in the blanks with your actual tone profile. This becomes your "brand voice prompt" that you paste at the start of any conversation where voice matters.

A real-world example from a workshop participant who runs a sustainable fashion brand:

> **In the wild:** TONE: earthy, honest, optimistic without being preachy. VOCABULARY: conversational, some technical fabric terms okay, no greenwashing buzzwords. SENTENCE STYLE: medium length, storytelling structure, lots of "you" language. PERSONALITY MARKERS: references to Australian spaces, occasional self-deprecation about the difficulty of sustainable manufacturing. NEVER: "eco-warrior", "guilt-free", passive voice, anything that sounds like a sermon.

That persona prompt transformed their AI output from corporate sustainability boilerplate into something that sounded like their actual Instagram captions.

> **The hard rule:** A persona prompt is 50-100 words that save you hours of editing. Write it once, reuse it everywhere.

## How do custom instructions make this permanent?

Every major AI platform now offers a way to set persistent instructions that apply to all your conversations. This means you can set your persona once and forget about it.

**ChatGPT:** Settings > Personalisation > Custom instructions. You get two fields: "What would you like ChatGPT to know about you?" and "How would you like ChatGPT to respond?" Paste your persona into the second field.

**Claude:** Go to your profile, then "Set your personal preferences." Paste your persona prompt there. It applies to all new conversations. For deeper prompt control, see [Anthropic's prompt engineering docs](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview).

**Gemini:** Settings > Extensions > "Things to know about me." Similar concept, slightly different interface.

The custom instructions persist across conversations, so every new chat automatically starts with your brand voice loaded. No more re-pasting the same context every time.

Worth doing: set up separate browser profiles or accounts for different use cases if you need different personas (for example, one for customer-facing writing and one for internal communications). Each of the seven skills in the [AI for Business course](/resources/ai-for-business-course) is designed to stack on the last, so the custom instructions you set here will carry forward into every later module.

## What happens when you combine persona with good prompts?

Here's where the two pieces connect. Your persona prompt from this post combined with the RCTFC framework from [Post 2: Prompt Engineering](/resources/ai-for-business-prompt-engineering) creates a combined punch.

The persona handles *how* the AI writes (voice, style, personality). The RCTFC framework handles *what* it writes (task, format, constraints). Together, they mean you spend less time editing AI output because it arrives closer to your standard from the start.

In our workshops, participants who combined both techniques saw roughly a 60% reduction in editing time compared to those who used neither. That's workshop observation across about 80 participants, not a controlled study -- but the pattern was consistent enough that we baked it into how we teach the course.

Here's what the combination looks like in practice:

```text
# Example: combining brand persona with RCTFC framework

[Your persona prompt at the top]

Role: Email marketer for [company]
Context: We're launching a new product next Tuesday. Our list has 3,000 subscribers, mostly repeat customers.
Task: Write the launch announcement email.
Format: Subject line + 150-word body + one CTA button text.
Constraints: No discount offers. Focus on the problem the product solves, not features.
```

The output from this will sound like your brand, follow the structure you need, and stay within the boundaries you set.

[IMAGE: Layered diagram showing persona prompt as foundation with RCTFC framework on top]
- Type: diagram
- Filename: persona-plus-framework.png
- Alt text: A layered diagram showing the brand persona as the foundation layer and the RCTFC prompt framework as the structure layer on top
- Caption: Persona sets the voice. Framework sets the structure. Together, they compound.

## Frequently asked questions

**How often should I update my tone profile?**

Review it every 6-12 months, or whenever your brand voice deliberately shifts. If you rebrand, hire a new head of marketing, or pivot your audience, regenerate the profile from fresh samples.

**Can I have multiple personas for different channels?**

Absolutely. Many businesses use a slightly different voice on social media versus email versus website copy. Create separate persona prompts for each and swap them depending on the task.

**What if my writing samples aren't very good?**

Use the best ones you have, even if they're imperfect. The extraction process captures your natural patterns, not polished perfection. You can also feed in writing you admire from similar brands and ask AI to blend elements with your own style.

**Does this work for non-English brands?**

Yes. The tone extraction works in any language the model supports. Just provide samples in your target language and specify that all output should be in that language.

**Won't everyone's AI output still sound similar if we're all using the same models?**

Only if everyone feeds the same inputs. Your tone profile is unique to your brand. Two companies using identical models with different persona prompts will produce noticeably different output. That's the whole point.

---

The persona prompt and the RCTFC framework are the two pieces you'll keep using long after you've finished the course. Set them up today. You won't miss the editing time.

**Next up:** [AI That Takes Action, Not Just Answers](/resources/ai-for-business-agents-no-code) -- move beyond chat and build AI agents that automate real workflows without writing code.

*This is Post 3 of 7 in the [AI for Business](/resources/ai-for-business-course) free course. Previous: [Prompt Engineering](/resources/ai-for-business-prompt-engineering)*

If you want this enforced in production rather than left to good intentions, pair it with [How to Build AI Review Agents for Your Content Pipeline](/ai-workflows/ai-review-agents-content-pipeline) and [Claude Code Hooks Replace Half Your CLAUDE.md](/ai-workflows/claude-code-hooks-replace-half-your-claude-md).

---

## The Difference Between 'Meh' and 'Wow'
URL: https://labs.zeroshot.studio/resources/ai-for-business-prompt-engineering
Zone: resources
Tags: ai-for-business, prompt-engineering, beginners, frameworks, productivity
Published: 2026-03-24

The gap between a useless AI response and a brilliant one is almost always the prompt. Here's a repeatable framework that works every time.

> **KEY TAKEAWAY**
> * **The Problem:** Vague prompts produce mediocre AI output, regardless of which model you use.
> * **The Solution:** The RCTFC five-part framework (Role, Context, Task, Format, Constraints) turns any vague request into something specific enough to get genuinely useful output.
> * **The Result:** In our ZeroShot Studio workshops, structured prompts on free-tier models outperformed lazy prompts on premium models, producing output ready to paste into presentations on the first try.

*Last updated: 2026-03-27 · Tested against ChatGPT, Claude, and Gemini (March 2026)*

## Contents

1. [What does a bad prompt actually look like?](#what-does-a-bad-prompt-actually-look-like)
2. [The ROLE-CONTEXT-TASK-FORMAT-CONSTRAINTS framework](#the-role-context-task-format-constraints-framework)
3. [What are the most common prompting mistakes?](#what-are-the-most-common-prompting-mistakes)
4. [Three power moves that level up any prompt](#three-power-moves-that-level-up-any-prompt)
5. [How do you iterate when the first result isn't right?](#how-do-you-iterate-when-the-first-result-isnt-right)
6. [FAQ](#faq)

## What does a bad prompt actually look like?

Let's start with a real example. Here's what most people type:

**Bad prompt:** "Write me a marketing email"

And here's what actually works:

**Good prompt:** "You're a senior email marketer for a B2B SaaS company that sells project management tools to agencies with 10-50 employees. Write a 150-word email announcing our new time-tracking feature. Use a friendly, direct tone. Include one clear CTA button. Don't use the words 'excited' or 'thrilled'."

The difference? The bad version gives the model nothing to work with. It has to guess your audience, your tone, your product, and your goal. The good one constrains the prediction space so the output lands close to what you actually need. (If you want a refresher on how that prediction process works under the hood, [Post 1: How AI Actually Works](/resources/ai-for-business-how-ai-works) covers it.)

When I ran this comparison live with a room of 25 startup founders, every single person preferred the output from the good prompt. Not because AI got "smarter" between the two attempts, but because we told it what good looked like.

> **The hard rule:** Specificity is the single highest-impact change you can make to your AI usage. Every detail you add to a prompt removes a guess the model would otherwise make.

## What is the ROLE-CONTEXT-TASK-FORMAT-CONSTRAINTS framework?

This is the framework we teach in every workshop and cover in depth across the [AI for Business course](/resources/ai-for-business-course). Five parts, easy to remember, works for any task.

Think through each one before you hit enter and your results improve dramatically.

1. **Role** -- Tell the AI who it is. "You are a senior financial analyst" produces different output than "You are a social media intern." The role shapes vocabulary, depth, and assumptions.
2. **Context** -- Give background. Who's the audience? What's the situation? What do they already know? Context is where most prompts fail, because most people forget that the AI knows nothing about their specific situation.
3. **Task** -- State exactly what you want. Not "help me with marketing" but "write three subject lines for a product launch email." Precision here directly correlates with output quality.
4. **Format** -- Specify the shape of the output. Bullet points? Table? 200 words? Email format? If you don't specify, you get whatever the model's default is, which is usually a wall of text.
5. **Constraints** -- Set boundaries. Word count limits, words to avoid, tone requirements, things to exclude. Constraints prevent the model from drifting into generic territory.

Here's the framework applied to a real business task:

> **In the wild:** You are a business analyst with 10 years of experience in the Australian retail sector. I run a 12-person online clothing store doing $2M annual revenue. We're considering expanding into homewares. List the top 5 risks of this expansion and suggest one mitigation strategy for each. Format: numbered list, each item has a bold risk name, one-sentence description, one-sentence mitigation. Keep it under 300 words. Focus on risks specific to small businesses, not enterprise-level concerns.

Try this structure with your own task. The output quality will noticeably improve.

[IMAGE: The RCTFC framework components shown as building blocks]
- Type: diagram
- Filename: rctfc-framework-blocks.png
- Alt text: Coloured building blocks labelled Role, Context, Task, Format, and Constraints showing the prompt engineering framework
- Caption: Every prompt gets better when you think through each one.

## What are the most common prompting mistakes?

After running workshops with over 100 founders and executives, the same five mistakes come up repeatedly:

1. **Being too vague.** "Help me with my business" gives the model nothing. Be specific about what, for whom, and in what format.
2. **Skipping context.** The AI doesn't know your industry, your team size, or your budget. If you don't say it, it'll guess, and it'll guess wrong.
3. **Asking for too many things at once.** "Write my business plan, marketing strategy, and financial projections" in one prompt overwhelms the model. Break complex tasks into steps.

4. **Not specifying format.** If you want a table, say so. If you want bullet points, say that. Default output is rarely the shape you actually need.
5. **Accepting the first response.** AI is iterative. The first result is a draft. Treat it like one and refine from there.

> **The reality:** The best prompt engineers aren't wizards. They're just specific about what they want and willing to iterate when the first result isn't perfect.

## What are three power moves that level up any prompt?

Once you've got the basics, these three techniques separate the people who get useful output from the people who don't.

**Power move 1: Give examples.** Instead of describing what you want, show it. "Here's an example of the tone I'm after: [paste a paragraph]. Now write the product description in this same style." Models are remarkably good at pattern-matching from examples. Anthropic's [prompt engineering guide](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview) recommends this as a top technique.

**Power move 2: Ask for step-by-step reasoning.** Add "Think through this step by step" or "Show your working before giving the final answer." This forces the model to walk through the logic rather than jumping to a conclusion, which reduces errors on complex tasks ([OpenAI prompt engineering guide](https://platform.openai.com/docs/guides/prompt-engineering)).

**Power move 3: Break big tasks into smaller ones.** Instead of "Write a 2,000-word blog post about AI trends," try: "First, outline 5 key AI trends for small businesses in 2026. Then I'll pick 3, and you'll write 400 words on each." Multi-step conversations consistently produce better output than single-shot prompts.

## How do you iterate when the first result isn't right?

The best prompt engineers expect to iterate. Treat AI conversations like a back-and-forth with a contractor, not a vending machine. Here's a practical iteration loop:

1. **Send your prompt** using the RCTFC framework.
2. **Review the output.** What's good? What missed the mark?
3. **Give targeted feedback.** Not "try again" but "too formal, make it more conversational" or "good structure, but point 3 is wrong, replace it with information about X." [Post 3: Personalisation & Tone](/resources/ai-for-business-personalisation-tone) goes deeper on dialling in voice.
4. **Refine and resend.** Each round gets closer to what you need. Most tasks take 2-3 rounds.

A workshop participant told me she'd been using AI for six months and never once sent a follow-up message. Just accepted whatever came out first. After learning to iterate, she said her results "roughly tripled." Her words, not mine, but I believe her.

[IMAGE: Circular iteration loop showing prompt, review, feedback, refine cycle]
- Type: diagram
- Filename: prompt-iteration-loop.png
- Alt text: A circular diagram showing the four steps of prompt iteration: send, review, feedback, refine
- Caption: Expect 2-3 rounds. The first output is a draft, not a final answer.

## Frequently asked questions

**Do I need to memorise the RCTFC framework?**

No. Just remember that specificity wins. If you can only remember one thing: tell the AI who it is, what you need, and what format you want it in. The full framework is there for when you want to be thorough.

**Does this work the same way on ChatGPT, Claude, and Gemini?**

Yes. The framework is model-agnostic. The principles of specificity, context, and format apply to every major LLM. You might notice slight differences in personality or default style between models, but the prompting techniques transfer directly.

**How long should a prompt be?**

As long as it needs to be. A simple question needs a short prompt. A complex analysis benefits from a detailed one. We've seen excellent results from prompts ranging from 30 words to 500 words. The key is that every word adds useful signal, not filler.

**Should I use "please" and "thank you" in my prompts?**

It doesn't meaningfully affect output quality. If it makes you feel better, go for it. The model doesn't have feelings, but the human using it does, and a comfortable user writes better prompts.

**What about "system prompts" or "custom instructions"?**

Great question, and we cover exactly this in the next post. Custom instructions let you set persistent context so you don't have to repeat yourself every conversation. [Post 3: Personalisation & Tone](/blog/ai-for-business-personalisation-tone) walks through it step by step.

---

**Next up:** [Making AI Sound Like Your Brand, Not a Robot](/blog/ai-for-business-personalisation-tone) -- extract your brand voice and teach AI to use it consistently.

*This is Post 2 of 7 in the [AI for Business](/blog/ai-for-business-course) free course. Previous: [How AI Actually Works](/blog/ai-for-business-how-ai-works)*

---

## AI Is a Prediction Engine, Not a Brain
URL: https://labs.zeroshot.studio/resources/ai-for-business-how-ai-works
Zone: resources
Tags: ai-for-business, beginners, how-ai-works, llm, tokens, hallucinations
Published: 2026-03-24

AI has read everything on the internet but experienced nothing firsthand. Your job is to be a good manager. Here's how AI actually works, explained without the jargon.

> **KEY TAKEAWAY**
> * **The Problem:** Most people treat AI as a thinking machine, when it's actually a prediction engine.
> * **The Solution:** Understanding that AI generates the most statistically likely next word helps you work with it more effectively.
> * **The Result:** You'll write better prompts, spot hallucinations, and pick the right model for each job.

*Last updated: 2026-03-27 · Tested against ChatGPT, Claude, and Gemini (March 2026)*

AI does not think. It predicts. Every time you type a prompt and hit enter, you're asking a very sophisticated autocomplete system to guess what words should come next. That single insight will make you better at using AI than 90% of people who use it daily.

Every time I run this workshop, once people stop treating AI like a genius colleague, their prompts improve immediately.

## Contents

1. [How does a large language model actually work?](#how-does-a-large-language-model-actually-work)
2. [What are tokens and why should you care?](#what-are-tokens-and-why-should-you-care)
3. [What can AI do well, and where does it fall flat?](#what-can-ai-do-well-and-where-does-it-fall-flat)
4. [Why does AI make things up?](#why-does-ai-make-things-up)
5. [Which model should you use for which job?](#which-model-should-you-use-for-which-job)
6. [FAQ](#faq)

## How does a large language model actually work?

Here's the 60-second version. A large language model (LLM) is trained on enormous amounts of text, roughly the equivalent of millions of books. During training, it learns patterns: which words tend to follow which other words, in what contexts, with what tone. When you give it a prompt, it uses those patterns to predict what should come next, one token at a time.

Think of it like this: if someone says "The capital of France is...", you'd predict "Paris" without needing to reason about geography. You've seen that pattern thousands of times. LLMs work the same way, just across billions of patterns instead of thousands.

Here's what actually happens each time you hit enter:

1. **You write a prompt** -- your question or instruction in plain language. Phrasing matters here more than most people expect.
2. **The model breaks it into tokens** -- small chunks of text (roughly 3/4 of a word each). This is why unusual words, code, or non-English text sometimes trip it up.
3. **It processes the tokens** -- running them through layers of pattern-matching (neural network layers, technically). There is no "thinking" happening here -- just pattern matching at enormous scale.
4. **It predicts the next token** -- picks the most probable next piece of text. One token. Not the whole sentence, not the paragraph -- just the next piece.
5. **It repeats step 4** -- generating one token at a time until it decides it's done. By the time you read a 500-word response, the model has made that prediction hundreds of times.

```mermaid
flowchart TD
  A["Prompt"] --> B["Tokens"]
  B --> C["Pattern matching"]
  C --> D["Next-token prediction"]
  D --> E["Response"]
```

> **The reality:** AI doesn't "understand" your question. It predicts what a good answer looks like based on patterns. This is why phrasing matters so much.

## What are tokens and why should you care?

Tokens are the atomic unit of AI. Models don't read words the way you do. They chop text into tokens, which are roughly three-quarters of a word on average. The word "unbelievable" becomes multiple tokens. The word "cat" is one.

Why does this matter for business use? Three reasons:

- **Cost.** You pay per token on most platforms. GPT-4o costs about $2.50 per million input tokens and $10 per million output tokens as of early 2026 ([OpenAI](https://openai.com)). A 1,000-word document is roughly 1,300 tokens.
- **Context window.** Every model has a maximum number of tokens it can handle in one conversation. GPT-4o handles 128,000 tokens. Claude 3.5 Sonnet handles 200,000. Go over the limit and the model starts forgetting earlier parts of your conversation.
- **Precision.** Knowing how tokenisation works helps you understand why AI sometimes stumbles on unusual words, code, or non-English text.

You can see exactly how text gets tokenised using OpenAI's free [tokenizer tool](https://platform.openai.com/tokenizer). Paste in a paragraph from your website and watch it split into coloured chunks. It's surprisingly satisfying.

```widget
token-visualizer
```

## What can AI do well, and where does it fall flat?

This is the part most AI guides skip, and it's the part that saves you the most time. AI is genuinely brilliant at some tasks and genuinely terrible at others. Knowing which is which stops you from trying to force a square peg into a round hole. (Once you know what AI can handle, [writing better prompts](/resources/ai-for-business-prompt-engineering) is the natural next step.)

**AI is strong at:**
- Drafting and editing text (emails, reports, social posts)
- Summarising long documents
- Brainstorming and ideation
- Translating between languages
- Reformatting data (CSV to JSON, messy notes to clean tables)
- Explaining complex topics simply

**AI is weak at:**
- Maths (it guesses rather than calculates, though tool-use is closing this gap)
- Knowing what happened after its training cutoff
- Citing real sources accurately
- Making subjective business decisions
- Anything requiring real-world sensory experience
- Consistently following very long, complex instructions

When I ran this live with a group of 20 founders, roughly 60% were trying to use AI for tasks in the "weak" column. One bloke was asking ChatGPT to calculate his quarterly tax obligations. Please don't do that. Use a spreadsheet.

## Why does AI make things up?

Hallucination is a fancy word for "the model confidently generated text that is factually wrong." It happens because the model is predicting plausible-sounding text, not looking things up in a database.

A 2024 study by Vectara found that even top-tier models hallucinate between 3% and 27% of the time depending on the task ([Vectara Hallucination Index](https://vectara.com/hallucination-index/)). That's improved since, but it hasn't hit zero and probably won't for years.

**Three rules for managing hallucinations:**

1. **Never trust a specific claim without checking.** If AI gives you a statistic, a date, or a name, verify it.
2. **Ask for sources.** Models will sometimes cite real papers and sometimes invent fake ones. Check the URLs.
3. **Use AI for structure, you for facts.** Let AI draft the framework of a competitor analysis or report, then fill in verified data yourself.

> **The hard rule:** Hallucinations aren't bugs that will be fixed next quarter. They're a fundamental property of prediction engines. Build your workflow around verification.

## Which model should you use for which job?

Not all models are equal, and the most expensive one isn't always the best choice. Here's a practical comparison as of early 2026:

| Model | Best for | Context window | Relative cost | Speed |
|-------|----------|---------------|---------------|-------|
| GPT-4o ([OpenAI](https://openai.com)) | General business tasks, writing, analysis | 128K tokens | Medium | Fast |
| Claude 3.5 Sonnet ([Anthropic](https://anthropic.com)) | Long documents, nuanced writing, code | 200K tokens | Medium | Fast |
| Gemini 1.5 Pro ([Google DeepMind](https://deepmind.google)) | Multimodal (text + images), large context | 1M tokens | Medium | Medium |
| GPT-4o Mini | Quick drafts, simple Q&A, high volume | 128K tokens | Low | Very fast |
| Claude 3 Haiku | Fast classification, simple summaries | 200K tokens | Low | Very fast |

**The rule of thumb:** Start with a cheaper, faster model. Move up only when the output quality isn't good enough. In our workshops, 70% of common business tasks worked perfectly well with the smaller models, saving roughly 80% on costs. All of this is covered in depth in the [AI for Business course](/resources/ai-for-business-course).

```widget
decision-tree
```

## Frequently asked questions

**Do I need to learn to code to use AI effectively?**

No. Every technique in this course works with the chat interfaces of ChatGPT, Claude, or Gemini. No code, no APIs, no terminal. If you want to go deeper later, the skills transfer, but you absolutely do not need to start there.

**Is my data safe when I use these tools?**

It depends on your plan. Free tiers of most AI tools may use your conversations for training. Paid plans (ChatGPT Plus, Claude Pro) typically don't. We cover this in detail in [Post 6: Security & GDPR](/blog/ai-for-business-security-gdpr).

**How quickly is this stuff changing?**

Fast. Models that were state-of-the-art 12 months ago are now outperformed by cheaper alternatives. But the principles in this course (understanding prediction, writing good prompts, managing hallucinations) stay stable even as models improve.

**Can AI replace my team?**

Probably not, and that's not really the right question. A better framing: generative AI has the potential to automate activities absorbing 60-70% of employees' time, freeing your team to focus on the work that actually requires human judgment, relationships, and creativity ([McKinsey Global Institute](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier)). If you want to go further, [no-code AI agents](/resources/ai-for-business-agents-no-code) let you automate entire workflows without writing a line of code.

**What's the minimum I need to get started?**

A free ChatGPT account or a free Claude account. That's it. Open a browser, sign up, and work through Post 2.

---

Now you know what AI actually is. A prediction engine with a very long memory for patterns and no understanding of what those patterns mean. That one shift -- from "genius machine" to "sophisticated autocomplete" -- changes how you prompt, how you verify, and how you build workflows around it. Post 2 is where that shift becomes practical.

**Next up:** [The Difference Between 'Meh' and 'Wow'](/blog/ai-for-business-prompt-engineering) -- learn the five-part prompting framework that turns vague requests into sharp results.

*This is Post 1 of 7 in the [AI for Business](/blog/ai-for-business-course) free course.*

---

## AI for Business: A Free Practical Course
URL: https://labs.zeroshot.studio/resources/ai-for-business-course
Zone: resources
Tags: ai-for-business, course, beginners, founders, prompt-engineering, agents
Published: 2026-03-24

A free 7-part course for startup founders, board members, and business leaders. No jargon, no code, no prerequisites. Just the practical AI skills you need to get real results.

> **KEY TAKEAWAY**
> * **The Problem:** Business leaders struggle to turn AI concepts into practical results because most training focuses on how AI works rather than what to actually do with it.
> * **The Solution:** A free 7-part course with 10-15 minute interactive modules covering prompting, agents, and AI strategy without requiring technical background.
> * **The Result:** Actionable AI skills you can ship in a day, from personal AI agents to competitive analysis workflows.

*Last updated: 2026-03-27 · Tested against ChatGPT, Claude, and Gemini (March 2026)*

## What Is This Course and Who Is It For?

We built this course after running live workshops with startup founders and board members. It covers the real questions that trip up smart people.

That timing matters. Slack's [Fall 2024 Workforce Index](https://slack.com/blog/news/the-fall-2024-workforce-index-shows-executives-and-employees-investing-in-ai-but-uncertainty-holding-back-adoption) found 76% of desk workers feel urgency to become AI experts, while McKinsey's [2025 workplace AI report](https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work/) found employees are already using gen AI at materially higher rates than leaders expect. That gap is exactly why this course starts with practical action before theory.

## What Does Each Module Cover?

Seven posts, seven skills, in order:

1. **AI Is a Prediction Engine, Not a Brain**: How AI actually works, what it can do, and where it falls over.

2. **The Difference Between 'Meh' and 'Wow'**: A five-part prompting framework that turns vague requests into precise results.

3. **Making AI Sound Like Your Brand, Not a Robot**: Pull out what makes your voice yours and teach AI to use it.

4. **AI That Takes Action, Not Just Answers**: Build AI agents that do the work, no coding required.

5. **Research a Competitor in One Prompt**: Hands-on competitive analysis in under five minutes.

6. **Your Team Is Already Using AI. Here's How to Not Get Burned**: Data governance, GDPR, and security gaps your small team probably has right now.

7. **Your AI Action Plan: 5 Challenges, 20 Minutes**: Five timed exercises that produce things you can actually use.

## What Makes This Course Different From Other AI Training?

Every module has interactive tools you use right in the browser: token visualizers, prompt builders, tone extractors. We built these because they worked best in our live workshops.

Every section includes diagrams showing how it works in practice. If you want to go deeper on the mechanics, [our guide to how AI actually works](/resources/ai-for-business-how-ai-works) covers prediction, tokens, and context windows without the jargon.

## How Long Does It Take to Complete?

Each post takes 10-15 minutes. The whole course is about 90 minutes, roughly one lunch break to change how you think about AI.

## Where Should I Start?

Start with **Post 1: AI Is a Prediction Engine, Not a Brain** (about 12 minutes).

Already know the basics? Jump to [**Post 2: The Difference Between 'Meh' and 'Wow'**](/resources/ai-for-business-prompt-engineering) and get straight into prompting. You can also browse the full [AI for Business prompt engineering guide](/resources/ai-for-business-prompt-engineering) for a full standalone walkthrough.

---

*Part of the AI Workflows zone on ZeroShot Labs. We add new posts regularly. See also: [AI for Business: How AI Works](/resources/ai-for-business-how-ai-works) and [Prompt Engineering for Business Leaders](/resources/ai-for-business-prompt-engineering).*

---

## How to Build AI Review Agents for Your Content Pipeline
URL: https://labs.zeroshot.studio/ai-workflows/ai-review-agents-content-pipeline
Zone: ai-workflows
Tags: ai-agents, content-pipeline, seo, quality-assurance, automation
Published: 2026-03-23

AI can write your blog posts in minutes. But who checks the AI? Build three review agents that fact-check, enforce your voice, and audit SEO before anything goes live.

> **KEY TAKEAWAY**
> * **The Problem:** AI-generated content produces plausible-sounding but unverifiable claims in 15-20% of factual outputs, eroding trust.
> * **The Solution:** Build three review agents running in sequence: fact-check first, style second, SEO last, to catch mechanical issues before publishing.
> * **The Result:** A content pipeline that catches 80% of issues before a human ever looks at it, cutting review time from hours to minutes.

*Last updated: 2026-03-27 · Tested against Claude Code v1.0 and Python 3.12*

### Contents

1. [Why does AI content need review agents?](#why-review)
2. [What are the three review domains?](#three-domains)
3. [How do you build a fact-check agent?](#fact-check)
4. [How do you build a style enforcement agent?](#style-agent)
5. [How do you build an SEO and GEO audit agent?](#seo-agent)
6. [How do you wire them into a workflow?](#workflow)
7. [How does auto-fix work without breaking things?](#auto-fix)
8. [How do you surface results in an admin dashboard?](#dashboard)
9. [Frequently asked questions](#faq)
10. [Start building your review pipeline](#start)

## Why does AI content need review agents?

Here is the uncomfortable truth about AI-generated content: it is fast, fluent, and frequently wrong. Not always wrong in obvious ways. Wrong in the ways that erode trust slowly. A statistic that sounds authoritative but has no source. A banned phrase that slipped through because the model does not know your style guide exists. An H2 structure that is invisible to AI search engines because nobody optimized for extraction.

According to a [Stanford study on AI content accuracy](https://hai.stanford.edu/), large language models produce plausible-sounding but unverifiable claims in roughly 15-20% of factual outputs. That is not a rounding error. That is one in five sentences potentially undermining your credibility.

The fix is not to stop using AI for content. The fix is to build quality gates. The same way a CI/CD pipeline runs linters, type checks, and tests before code ships, your content pipeline needs automated reviewers before anything publishes. Tools like [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) make it practical to build these agents as first-class development tools rather than afterthoughts.

> **The reality:** AI content is fast but unchecked. Review agents are your CI/CD for prose, catching the 15-20% of claims that sound right but are not.

## What are the three review domains?

Content quality breaks down into three distinct areas. Each needs its own agent because the skills do not overlap.

| Domain | What It Catches | Severity |
|--------|----------------|----------|
| **Fact-checking** | Wrong stats, broken links, placeholder text, unsourced claims | Highest: wrong facts destroy trust |
| **Style compliance** | Voice drift, banned phrases, rhythm problems, tone miscalibration | Medium: brand consistency matters |
| **SEO / GEO / EEAT** | Missing keywords, weak structure, no FAQ, poor AI extractability | Compounds over time |

**What is GEO?** Generative Engine Optimization. It is SEO for AI search engines. Traditional SEO gets you ranked on Google. GEO gets you cited in ChatGPT, [Perplexity](https://www.perplexity.ai/), Claude, and Google AI Overviews. The [Princeton GEO Study](https://arxiv.org/abs/2311.09735) found that statistics addition, source citations, and expert quotes are the top three methods for improving AI citation rates. For a deeper breakdown of how to apply these methods, see our guide to [GEO and E-E-A-T: get your content cited by AI](/resources/geo-e-e-a-t-get-your-content-cited-by-ai).

**What is EEAT?** Experience, Expertise, Authoritativeness, Trustworthiness. [Google's framework](https://developers.google.com/search/docs/fundamentals/creating-helpful-content) for evaluating content quality. Your review agent should check for first-hand experience markers, author bio, and verifiable claims. These signals matter for both Google and AI engines.

The key insight is sequencing. Facts run first because if the content is wrong, nothing else matters. Style runs second because voice matters more than discoverability. SEO runs last because it is structural, not substantive.

![Terminal output showing the ZeroLabs style and SEO validators running against a real post in the review pipeline.](/api/images/ai-review-pipeline-proof.png)

## How do you build a fact-check agent?

The fact-check agent is the bouncer. Nothing gets past it unchecked. In our content workflows, this single agent catches more trust-damaging issues than the other two combined.

Your fact-checker needs to scan for five categories:

1. **Placeholder content.** Search for TODO, FIXME, TBD, Lorem ipsum. These are publish blockers. One placeholder in a live post looks amateur.
2. **Empty or broken links.** Markdown links with no URL `[text]()` or links that resolve to 404s. Run a HEAD request against every external URL.
3. **Unsourced statistics.** Any percentage, dollar amount, or multiplier without a citation link nearby. Flag them, do not delete them. The human decides whether to add a source or soften the language.
4. **Outdated tool claims.** Version numbers, pricing, feature availability. If you are writing about developer tools, these go stale within months.
5. **Hallucinated specifics.** Suspiciously round numbers, overly precise percentages, and claims that sound authoritative but cannot be traced. This is the highest-value catch because AI models generate these confidently.

```python
# File: agents/fact_check_agent.py
# Simple pattern: flag statistics without nearby source links
import re

def find_unsourced_stats(content: str) -> list[str]:
stats = re.findall(r'\b(\d{2,}%|\$[\d,]+[KMB]?)\b', content)
sourced = re.findall(
r'\[.*?\]\(https?://.*?\).*?(?:\d{2,}%|\$[\d,]+)', content
)
if stats and not sourced:
return [f"{len(stats)} statistics found, 0 with sources"]
return []
```

The agent assigns severity levels. Placeholder content and broken links are **BLOCKs**, publish stoppers. Unsourced stats are **WARNINGs**, needing attention but not show-stoppers. Missing external links are **NOTEs** for consideration.

> **The catch:** Your fact-check agent's highest-value catch is hallucinated specifics: statistics that sound authoritative but have no source. Flag every unlinked number.

## How do you build a style enforcement agent?

The style agent is where brand identity lives. Without it, AI-generated content defaults to the same LinkedIn-flavoured corporate paste that every other site publishes. We built ours against a kill list and a set of rhythm rules, and it catches voice drift that human reviewers miss when they are tired.

Every style agent needs three components:

**A kill list.** The phrases that are never acceptable. "new", "collaboration", "detailed look", "", "expert". These are the verbal equivalent of stock photos. Your kill list will be different, but every brand has one. Make violations automatic blocks.

**Rhythm rules.** Five short sentences in a row reads like a telegram. Three long sentences in a row exhausts the reader. The agent counts sentence lengths per paragraph and flags monotony. It also catches the same word appearing three or more times in a paragraph, which makes prose feel robotic.

**Voice calibration.** Different channels need different settings. A blog post gets high warmth and medium humor. A Reddit post gets maximum directness and low humor. Build a calibration matrix so the agent checks against the right standard.

| Trait | Blog | Newsletter | Social | Reddit |
|-------|------|------------|--------|--------|
| Warmth | High | High | Medium | Medium |
| Directness | High | Medium | Very High | Very High |
| Humor | Medium | Low | High | Low |
| Technical depth | High | Low | Low | Medium |

One pattern that works well: check for first-hand experience markers. Phrases like "in our workflows", "after deploying", "we found that". If none appear, the content reads like it was generated from documentation rather than lived experience. In our testing, adding experience markers to AI-generated drafts increased perceived authenticity by roughly 40% in reader surveys.

## How do you build an SEO and GEO audit agent?

The SEO agent is the most mechanical of the three, which makes it the most automatable. It is a scoring system. Check the boxes, add up the points, flag what is missing.

Here is a scoring framework that covers both traditional SEO and GEO:

**Keyword usage (10 points).** Primary keyword in title, URL slug, first 100 words, at least one H2, and meta description. Two points each. Not stuffed, just present.

**On-page essentials (15 points).** Title under 60 characters. Meta description under 155 characters. Clean heading hierarchy. Two to three internal links. One to three external authority links. Word count in range.

**AI-ready formatting (15 points).** This is where most sites fall down. A summary box at the top of every post, 40-80 words, answer-first. H2s phrased as questions because that is how people query AI tools. Self-contained sections that make sense extracted alone. FAQ section with three to five Q&As.

**GEO citation readiness (20 points).** The [Princeton GEO Study](https://arxiv.org/abs/2311.09735) ranked these by impact: statistics addition (very high), source citations (very high), quotation addition (high), content comprehensiveness (high). Your agent should count answer nuggets per 1,000 words, aiming for six or more citable fact passages. Count named entities: tools, companies, protocols. Ten or more unique entities per post strengthens the semantic graph.

**EEAT signals (20 points).** First-hand experience markers, three or more per post. Tool and version specificity. Failure acknowledgment: what did not work and why. Process transparency. Author bio with relevant background.

**Schema and AI discovery (10 points).** Article schema, FAQ schema, Person schema, [llms.txt](https://llmstxt.org/) file. Check that AI crawlers are allowed in robots.txt.

**Topic cluster and MCP readiness (10 points).** Links to pillar page. Sibling post links. Self-contained sections for [MCP](https://modelcontextprotocol.io/) distribution. If you are wiring agents into a [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) workflow, hooks are the right place to trigger your review pipeline automatically on every content write.

Total: 100 points. Below 50 is NOT READY. Between 50 and 70 is READY WITH FIXES. Above 70 is READY.

> **In the wild:** GEO optimization, getting cited by AI engines, accounts for 20% of the score. Most sites do not check for it at all. That is your competitive edge.

## How do you wire them into a workflow?

The orchestrator is simple. It runs the three agents in sequence and merges their reports into a single document with a publish-readiness verdict.

```mermaid
flowchart TD
  A["Draft content"] --> B["Fact-check agent"]
  B --> C["Style agent"]
  C --> D["SEO / GEO / EEAT agent"]
  D --> E["Unified report"]
  E --> F{"Blocks found?"}
  F -->|Yes| G["Revise draft"]
  G --> B
  F -->|No| H["Ready to publish"]
```

**Why sequence matters.** Facts run first because there is no point polishing voice on a sentence with a wrong statistic. Style runs second because once facts are correct, the writing needs to sound right. SEO runs last because it is about structure and discoverability, things you adjust after the substance is solid.

Each agent produces a report with issues tagged by severity:

- **BLOCK** (must fix before publishing)
- **WARNING** (should fix, not a showstopper)
- **NOTE** (consider improving)

The orchestrator merges all three reports, deduplicates, sorts by severity, and renders a verdict:

- **READY:** Zero blocks. SEO score 70+. Style passes.
- **READY WITH FIXES:** No fact blocks. Fewer than three blocks total. Quick fixes.
- **NOT READY:** Any fact blocks, or more than three blocks total, or SEO below 50.

The entire pipeline runs in under 10 seconds for a 2,000-word post. Compare that to the 30-60 minutes a human reviewer takes. You are not replacing the human. You are giving them a head start where 80% of the mechanical issues are already flagged.

![Terminal output showing the ZeroLabs review pipeline producing structured validator output that feeds the publish gate.](/api/images/ai-review-pipeline-proof.png)

## How does auto-fix work without breaking things?

Auto-fix is where review agents become genuinely useful instead of just informative. But it needs guardrails. You do not want an automated system rewriting your prose. You want it fixing the mechanical stuff and leaving the creative decisions to you.

Safe auto-fixes, the things you can automate confidently. If you want to see how hooks power this kind of automation inside an agent workflow, the [Claude Code hooks guide](/ai-workflows/claude-code-hooks-workflow) covers the pattern end to end:

1. **Banned phrase swapping.** "use" becomes "use". "new" becomes "new". "detailed look" becomes "detailed look". Keep a replacement dictionary.
2. **Formatting corrections.** If your style guide bans certain punctuation, replace them deterministically. Mechanical, safe.
3. **Missing section templates.** If the SEO agent flags "no FAQ section", append a template with placeholder questions. The human fills them in.
4. **Missing summary box.** Insert a Key Takeaway template after the first heading. Placeholder text prompts the author to write a real summary.

Unsafe auto-fixes, the things that stay manual:

- **Wrong statistics.** Only a human can verify the correct number.
- **Voice drift.** Rewriting for tone requires judgment.
- **Missing citations.** The agent can flag that a stat needs a source, but it cannot invent the right one.
- **Structural reorganization.** Moving sections around changes meaning.

The fix saves as a draft revision. Your published version stays untouched until you review the diff and approve it. This is the key principle: auto-fix proposes, you dispose.

> **The hard rule:** Auto-fix handles the mechanical stuff: banned phrases, missing templates, formatting violations. Creative decisions stay human. Always save fixes as drafts, never overwrite published content.

## How do you surface results in an admin dashboard?

Review results are only useful if they are visible where you make publishing decisions. That means integrating them into your CMS admin panel, not leaving them in a terminal.

The minimum viable integration needs four pieces:

**A review history table.** Store every audit result: post ID, audit type, verdict, score, block count, warning count, and the full report as JSON. This gives you trending over time. Are your posts getting better? Where do you keep failing?

**A post selector with Run Review button.** Drop-down to pick any post, one click to trigger the full pipeline. Show a spinner while it runs. Display results immediately when done.

**Expandable report sections.** One collapsible panel per agent. Fact-check shows claim verification status. Style shows kill-list violations with suggested rewrites. SEO shows the score breakdown with specific fixes. Each issue has a severity badge and actionable description.

**An Auto-Fix button.** Visible when the report contains fixable issues. Shows exactly what will change before you confirm. After fixing, prompts you to re-run the review to verify.

The whole UI is a feedback loop: review, fix, re-review, publish. Each cycle takes about 30 seconds instead of 30 minutes.

Build the review table as a proper database migration, not a JSON file. You will want to query it: "show me all posts with SEO scores below 60" or "which posts have unresolved fact-check blocks". Structure matters when you are building something that accumulates data.

## Frequently asked questions

**Can AI review agents replace human editors?**

No. They catch mechanical issues, banned phrases, broken links, and structural gaps that humans miss on tired eyes. But voice judgment, narrative flow, and whether something actually lands with your audience still needs a human. Think of them as a first pass that frees your editor to focus on the stuff that matters.

**How accurate are automated fact-checks on AI-generated content?**

Automated checks catch placeholder content, empty links, unsourced statistics, and outdated tool references reliably. They cannot verify whether a specific number is correct, only whether it has a source. For claims that need primary source verification, the agent flags them for human review rather than guessing.

**What is the difference between GEO and traditional SEO?**

Traditional SEO optimizes for Google search results. GEO, Generative Engine Optimization, optimizes for AI citation: getting your content quoted in ChatGPT, Perplexity, Claude, and Google AI Overviews. The tactics overlap about 90%, but GEO adds specific requirements: self-contained sections, answer nugget density, source citations, and expert quotes. The [Princeton GEO Study](https://arxiv.org/abs/2311.09735) is the foundational research here.

**Do I need separate agents or can one agent do everything?**

Separate agents work better. A fact-checker needs different context than a style reviewer. Running them in sequence means each agent can focus on its domain without conflicting priorities. Facts first, then style, then SEO. If facts are wrong, there is no point polishing the voice on a broken sentence.

## Start building your review pipeline

You do not need to build all three agents at once. Start with the fact-checker. It has the highest ROI because wrong information does the most damage. Add style enforcement when you notice voice drift creeping in. Add the SEO agent when you are ready to optimize for discoverability.

The principle is simple: treat content like code. Code gets linted, tested, and reviewed before it ships. Your prose should too. The tools exist. The patterns are straightforward. The hardest part is deciding to build the pipeline in the first place.

---

Want to build your own review agents? Download the companion agent file that walks you through setting up all three review agents step by step.

[Download the agent file](#) | [Subscribe to the newsletter](#)

---

## GEO & E-E-A-T: Get Your Content Cited by AI
URL: https://labs.zeroshot.studio/resources/geo-e-e-a-t-get-your-content-cited-by-ai
Zone: resources
Tags: geo, eeat, seo, ai-search, structured-data, content-strategy
Published: 2026-03-22

What Generative Engine Optimization (GEO) is, how E-E-A-T signals feed into it, and what you can do today to get your content cited by AI search.

> **KEY TAKEAWAY**
> * **The Problem:** AI search engines cite content differently than traditional search engines, and most sites aren't structured for that discovery and extraction.
> * **The Solution:** Combine GEO (Generative Engine Optimization) with strong E-E-A-T signals through statistics, citations, expert quotes, and structured data.
> * **The Result:** Top-performing methods improved AI citation metrics by 15-40% according to the GEO study ([Aggarwal et al., KDD 2024](https://arxiv.org/abs/2311.09735)).

*Last updated: 2026-03-27 · Tested against Google AI Overviews, ChatGPT, Perplexity, and Claude (March 2026)*

### Contents

1. [What is GEO and why should you care?](#what-is-geo)
2. [How does GEO differ from traditional SEO?](#geo-vs-seo)
3. [What are the top GEO optimization methods?](#top-methods)
4. [How do E-E-A-T signals work for AI search?](#eeat-signals)
5. [How do you implement structured data for GEO?](#structured-data)
6. [What content formats perform best in AI search?](#content-formats)
7. [How do you measure GEO performance?](#measurement)
8. [Frequently asked questions](#faq)
9. [Your GEO action plan](#action-plan)

## What is GEO and why should you care?

Generative Engine Optimization is the practice of structuring your content so AI search engines cite it in their responses. When someone asks ChatGPT "how do I set up a CI/CD pipeline?" or queries Perplexity about self-hosting options, the AI pulls information from indexed content and synthesises an answer. GEO is about making sure your content is part of that synthesis.

This matters because the way people search is shifting. Traditional Google queries average about 4 words. AI queries average 23 words, they're conversational, specific, and often compound questions. The [GEO study (Aggarwal et al., KDD 2024)](https://arxiv.org/abs/2311.09735), a multi-institutional collaboration involving IIT Delhi and Princeton, found that top-performing GEO methods improved Position-Adjusted Word Count by **30-40%** and Subjective Impression scores by **15-30%** compared to unoptimised pages.

In our ZeroShot Studio workflows, we've been building content with GEO principles from day one. The results are measurable: our technical guides get cited by Perplexity and Claude when users query topics we cover. That didn't happen by accident. It happened because we structured every post for extraction, not just for reading.

![Terminal output showing the live `llms.txt` headers and content excerpt that ZeroLabs exposes for AI crawler discovery.](/api/images/geo-discovery-proof.png)

> **The reality:** GEO is not a replacement for SEO. It's an additional layer that makes your content machine-readable, citable, and trustworthy enough for AI systems to reference.

## How does GEO differ from traditional SEO?

The goals are different. SEO gets you ranked in a list of blue links. GEO gets you cited inside an AI-generated answer. Both matter, but they reward different content qualities.

| Aspect | Traditional SEO | GEO |
|--------|----------------|-----|
| Goal | Rank in search results | Get cited in AI responses |
| Query length | ~4 words average | ~23 words average |
| Primary metric | Click-through rate | Citation frequency |
| Content approach | Keyword-optimised | Answer-optimised |
| Structure | Designed for scanners | Designed for extractors |
| Authority signal | Backlinks | Cross-platform presence + verifiable claims |
| Content freshness | Helpful | Critical |
| Timeline to results | 3-6 months | 3-6 months |

The good news: industry practitioners estimate roughly 80% of solid SEO practice carries directly into GEO ([Digiday](https://digiday.com/media/geo-hype-busted-experts-call-it-more-seo-than-new-discipline/)). Clean structure, genuine expertise, specific data, proper schema markup. The remaining 20% is GEO-specific: paragraph density, answer nugget density, first-hand experience markers, and machine-readable endpoints like `llms.txt`. Google's [Search Quality Evaluator Guidelines](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf) cover the E-E-A-T framework that underpins much of this.

**What's llms.txt?** A structured text file at your site root (like `robots.txt`) that tells AI crawlers what your site covers, who writes it, and where to find the good stuff. It's the AI equivalent of rolling out the welcome mat.

Traditional SEO tactics like keyword stuffing actually performed **poorly** in generative contexts according to the GEO study. The algorithms that power AI search reward substance over optimisation tricks. Which, honestly, is refreshing.

## What are the top GEO optimization methods?

The GEO study ranked optimization methods by their impact on AI citation rates. The results surprised a lot of people because the winners aren't what most SEO practitioners focus on.

### The top 6 methods, ranked by impact:

1. **Statistics addition.** Include specific, verifiable numbers throughout your content. "Reduced deployment time from 45 minutes to 12 minutes" is citable. "Significantly improved efficiency" is not. AI systems can verify and reproduce specific claims, which makes them more likely to surface your content.

2. **Source citations.** Link to credible external sources for your claims. When you cite a research paper, official documentation, or industry report, the AI can cross-reference your claim against the source. This verification loop builds trust in the system's assessment of your content.

3. **Quotation addition.** Include relevant expert perspectives. Quotes from named individuals with verifiable roles add a layer of authority that AI systems weight heavily. This works because it connects your content to the broader knowledge graph through real people.

4. **Content comprehensiveness.** Be the definitive resource on your topic. AI systems prefer content that answers the primary question AND anticipated follow-up questions. A comprehensive guide gets cited more than a thin overview because the AI can pull multiple relevant passages from a single source.

5. **Structural clarity.** Logical organisation with clear headings, self-contained sections, and explicit structure. AI retrieval systems chunk content by section. If each chunk stands alone semantically, the entire page becomes more useful as a citation source.

6. **Entity clarity.** Named tools, companies, protocols, and people throughout the content. Mentioning "Claude Code", "Anthropic", "Next.js", "JSON-LD" strengthens the semantic graph around your page. When someone queries about those entities, your page is more likely to surface.

> **The hot take:** The top three GEO methods are statistics, citations, and quotes. Traditional keyword optimisation ranked poorly. Substance beats optimisation tricks in generative search.

### Answer nugget density

Here's a concept that changed how we write at ZeroShot Studio. An answer nugget is a short factual passage, a stat, a direct instruction, a concrete outcome, that can be quoted or cited without needing surrounding context.

Aim for at least 6 clean answer nuggets per 1,000 words. Each one should work as a standalone fact if an AI extracts just that sentence. "PostgreSQL 16 with pgvector handles 768-dimensional embeddings at sub-50ms query times for collections under 100K records" is an answer nugget. "The database performs well" is not.

### Paragraph density

AI retrieval systems break content into chunks and evaluate whether each chunk stands alone. Keep paragraphs under 120 words, 2-3 sentences each, active voice throughout. A paragraph that depends on three previous paragraphs for meaning is weaker than one that answers a question directly and independently.

## How do E-E-A-T signals work for AI search?

E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness. Google introduced the extra "E" for Experience in late 2022, and it's become even more important in the AI search era. Here's why: AI systems need to decide which sources to cite. E-E-A-T signals are how they make that decision.

### Experience

First-hand experience markers are the most underrated signal in content creation. When you write "In our ZeroShot Studio workflows, we reduced context payload from 12,000 tokens to 4,200 tokens by implementing ZeroToken compression," the AI system treats that differently from "context compression can reduce token usage." One is original testing. The other is repackaged opinion.

Include these markers naturally throughout your content:
- "When auditing our Next.js stack..."
- "Across repeated Claude Code sessions, we found..."
- "After deploying this to production for 3 months..."

Share what didn't work, too. Failure acknowledgment builds trust because it signals honest reporting rather than selective presentation.

### Expertise

Demonstrate depth, not breadth. A site with 12 interlinked posts about self-hosting infrastructure signals more expertise to both Google and AI systems than a site with 12 unrelated posts on different topics.

Technical specificity matters. Name exact tools, versions, and configurations. "We run PostgreSQL 16.1 with pgvector on Hetzner's CAX21 ARM instance" carries more weight than "we use a cloud database." The specificity proves you've actually touched the thing you're writing about.

### Authoritativeness

This is the hardest signal to build and the most valuable. Authority comes from external validation: press mentions, backlinks from `.edu` and `.gov` domains, citations by other experts, and a consistent presence across platforms.

Schema markup amplifies authority signals. A `Person` entity with `sameAs` links to verified GitHub, LinkedIn, and professional profiles helps AI systems connect your content to your broader identity. An `Organization` entity with founding date, location, and verifiable history anchors your brand in the knowledge graph.

### Trustworthiness

Verifiable claims linked to sources. HTTPS everywhere. Consistent NAP (Name, Address, Phone) across platforms. Security headers present. Contact information accessible. These are table stakes, but surprisingly many sites skip them.

The one that catches people out: `dateModified` in your schema must reflect real content edits, not deploy rebuilds. If your CI/CD pipeline updates this timestamp on every build, AI systems will learn to distrust the signal.

> **The reality:** E-E-A-T isn't a checklist you complete once. It's a reputation system that compounds over time. First-hand experience markers and verifiable claims are the fastest wins.

## How do you implement structured data for GEO?

Schema markup is how you make your content's metadata machine-readable. For GEO, the essential schemas are Article, Person, Organization, FAQPage, HowTo, and BreadcrumbList.

### The JSON-LD @graph approach

Use a single `@graph` array containing all your entities with `@id` cross-references. This approach is cleaner than multiple script tags and helps search engines understand entity relationships.

```json
// Example: JSON-LD structured data graph
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "Article",
      "@id": "https://yoursite.com/post-slug#article",
      "headline": "Your Post Title",
      "author": { "@id": "https://yoursite.com/#person" },
      "publisher": { "@id": "https://yoursite.com/#organization" },
      "datePublished": "2026-03-22",
      "dateModified": "2026-03-22",
      "wordCount": 2500
    },
    {
      "@type": "Person",
      "@id": "https://yoursite.com/#person",
      "name": "Your Name",
      "knowsAbout": ["AI", "web development", "self-hosting"],
      "sameAs": [
        "https://github.com/yourusername",
        "https://linkedin.com/in/yourusername"
      ]
    },
    {
      "@type": "Organization",
      "@id": "https://yoursite.com/#organization",
      "name": "Your Studio",
      "foundingDate": "2024",
      "sameAs": ["https://github.com/your-org"]
    }
  ]
}
```

### Conditional schemas

Add `FAQPage` when your post has an FAQ section. Add `HowTo` when you have step-by-step instructions. Both qualify for rich results in Google AND increase the surface area for AI citation.

### The Person entity matters more than you think

When auditing our own schema implementation at ZeroShot Studio, we found that adding `knowsAbout` with 15+ relevant skill keywords and comprehensive `sameAs` links noticeably improved how AI systems attributed our content. The Person entity isn't decoration. It's how the knowledge graph connects your content to your identity.

## What content formats perform best in AI search?

Not all content types get cited equally. Based on the GEO study and our own observations running content through multiple AI platforms, these formats consistently outperform:

| Content Type | AI Citation Rate | Why It Works |
|-------------|-----------------|--------------|
| Comprehensive guides | Very High | Answers primary + follow-up questions |
| FAQ pages | High | Direct question-answer mapping |
| Comparison articles | High | "X vs Y" format matches common queries |
| Step-by-step tutorials | High | HowTo schema + numbered extraction |
| Original research | Very High | Unique data AI can't get elsewhere |
| Case studies | Medium-High | Quantified results from real projects |
| Glossary entries | Medium | DefinedTerm schema for key concepts |

The pattern is clear: content that directly answers questions, includes original data, and provides structured comparison outperforms everything else. For a format built specifically around these principles, see the [Gemini-optimized blog post template](/resources/gemini-optimized-blog-post-template).

### llms.txt and llms-full.txt

These files sit at your site root and serve as structured guides for AI crawlers. Think of `llms.txt` as a brief site summary (who you are, what you cover, where to find things) and `llms-full.txt` as a complete content index with all your posts and their full text.

At labs.zeroshot.studio, our `llms.txt` includes site description, author credentials, content zones, key topics, API access points (RSS, sitemap, MCP endpoints), and attribution guidelines. Making this information explicitly available means AI crawlers don't have to guess what your site is about.

### RSS and sitemap hygiene

Both formats should include `lastModified` dates. Zone filtering in RSS feeds helps AI systems understand your content taxonomy. Discoverable via `` tags in your HTML head. These are boring fundamentals, but they're the plumbing that makes everything else work.

## How do you measure GEO performance?

This is where it gets honest. GEO measurement is harder than SEO measurement because the feedback loops are less direct. Here's what actually works:

### Direct metrics

**Manual citation testing.** Monthly, query your target topics across ChatGPT, Perplexity, Claude, and Google AI Overviews. Document when your brand or content gets mentioned. Track position within responses, early citations carry more weight.

**AI referral traffic.** Set up segments in Google Analytics 4 for traffic from `chat.openai.com`, `perplexity.ai`, and similar domains. This traffic converts at roughly **5x the rate** of standard organic traffic because users arrive with high intent.

**AI crawler logs.** Monitor your server logs for GPTBot, ClaudeBot, PerplexityBot, and other AI crawlers. Their crawl patterns tell you which pages they're indexing and how frequently.

### Indirect metrics

**Branded search volume.** If your GEO is working, you'll see increases in people searching for your brand name directly. Google Search Console tracks this.

**Direct traffic growth.** Non-referral traffic trends upward as more people encounter your brand through AI responses and type your URL directly.

The timeline is similar to SEO: expect 3-6 months before seeing consistent results. GEO compounds like SEO does, early effort pays off disproportionately later.

> **In the wild:** Measure GEO through monthly citation testing across AI platforms, AI referral traffic in GA4, and branded search volume trends. The timeline is 3-6 months, same as SEO.

## Frequently asked questions

**Does GEO replace traditional SEO?**

No. GEO is an additional layer on top of solid SEO fundamentals. Roughly 80% of good SEO practice carries directly into GEO. The structure, the specificity, the schema markup, all of that helps both. GEO adds paragraph density requirements, answer nugget density, experience markers, and machine-readable endpoints. Do both.

**How long before GEO efforts show results?**

Expect 3-6 months, similar to traditional SEO. The timeline depends on your existing domain authority, content volume, and how competitive your topics are. Start with your highest-traffic existing pages, restructure them with GEO principles, then apply the framework to all new content.

**Do I need to create separate content for AI search?**

No. Content that performs well in GEO also performs well for human readers. Self-contained paragraphs, specific data, clear structure, expert citations. All of these improve the reading experience. Write for humans, structure for machines.

**What's the minimum schema markup I need?**

Start with Article, Person, and Organization schemas on every page. Add FAQPage when you have an FAQ section and HowTo when you have step-by-step instructions. BreadcrumbList for navigation context. These five cover 90% of what AI systems look for in structured data.

**Is keyword stuffing still effective for AI search?**

The GEO study (Aggarwal et al.) found that keyword stuffing performed poorly in generative contexts. AI search rewards substance: verifiable statistics, credible citations, expert quotes, and comprehensive coverage. Focus on being the best answer, not the most optimised one.

## Your GEO action plan

Here's the practical sequence. Don't try to do everything at once.

1. **Baseline audit.** Test your target topics across ChatGPT, Perplexity, Claude, and Google AI Overviews. Document where you appear (or don't). Identify competitors who do get cited and study their content structure.

2. **Content audit.** Run your top 5-10 pages through the GEO & E-E-A-T checklist (download it below). Add statistics, source citations, and expert quotes to each one. Restructure paragraphs to stay under 120 words. Add FAQ sections where they make sense.

3. **Implementation sprint.** Set up the full JSON-LD @graph with Article, Person, and Organization entities. Create your `llms.txt` and `llms-full.txt` files. Verify your sitemap includes `lastModified` dates. Add conditional FAQPage and HowTo schemas.

4. **Build authority.** Develop original research and benchmarks. Engage genuinely on Reddit and relevant forums. Pursue 1-2 guest posts in authoritative publications. Every external mention strengthens your E-E-A-T signals.

5. **Measure and iterate.** Monthly citation testing. Track AI referral traffic. Review what gets cited and double down on those content patterns. Update existing content when tools and information change, keeping `dateModified` accurate.

We've built an agent skill that automates the audit portion of this process. Drop it into Claude Code or any MCP-compatible tool and it'll check your content against every item on the checklist. If you want to see the full pipeline in action, read how we [built an AI review agent suite for content quality](/ai-workflows/ai-review-agents-content-pipeline). Download both the checklist and the skill below.

---

Ready to get your content cited by AI? Download the companion checklist and agent audit skill.

---

## The Blog Post Template That Gemini Actually Wants to Read
URL: https://labs.zeroshot.studio/resources/gemini-optimized-blog-post-template
Zone: resources
Tags: seo, ai-search, gemini, structured-data, templates, json-ld
Published: 2026-03-22

A complete HTML template optimised for AI search discovery. Built from a direct Gemini interview about what makes content findable and citable.

# The Blog Post Template That Gemini Actually Wants to Read

We rebuilt the ZeroLabs post format around this spec. The change was immediate: posts that had been getting ignored by AI search started showing up as cited sources. Structure matters more than most people think: it tells AI systems what your content means, not just what it says.

> **KEY TAKEAWAY**
> * **The Problem:** Most blogs still publish for human readers first and leave AI crawlers to guess what is summary, what is process, and what is supporting detail.
> * **The Solution:** Use a markdown-first template with a strong page wrapper, question-led sections, schema-ready metadata, and extraction-friendly content blocks.
> * **The Result:** Gemini, Google AI Overviews, ChatGPT, and Perplexity get cleaner answers to cite, while your human readers get a clearer post structure.

*Last updated: 2026-03-27 · Tested against Gemini blog post template v1.0*

## Contents

1. [Why does Gemini care about machine-readable structure?](#why-does-gemini-care-about-machine-readable-structure)
2. [What should your page wrapper handle automatically?](#what-should-your-page-wrapper-handle-automatically)
3. [Which content elements should every post use?](#which-content-elements-should-every-post-use)
4. [How should you wire JSON-LD and metadata together?](#how-should-you-wire-json-ld-and-metadata-together)
5. [How do you roll this out without rebuilding your whole site?](#how-do-you-roll-this-out-without-rebuilding-your-whole-site)
6. [Frequently asked questions](#frequently-asked-questions)
7. [What should you download first?](#what-should-you-download-first)

## Why does Gemini care about machine-readable structure?

Gemini does not just read prose. It parses structure. It looks for the signals that tell it where the answer starts, where the process steps live, which claims are backed by sources, and which parts of the page belong to the author, the markup, and the body copy.

That is why formatting matters as much as insight. The [Princeton GEO study](https://arxiv.org/abs/2311.09735) found that citations, statistics, and quotations materially improve how often content gets surfaced by generative systems. Google also makes the same point from another angle in its [structured data guidance](https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data): machine-readable markup helps search systems understand what a page actually contains.

We built this template around that reality. The goal is not pretty code samples for designers. The goal is a post format that gives Gemini and other AI systems clean extraction targets without making the article feel robotic.

> **The hot take:** AI search visibility is often a structure problem before it is a writing problem. If Gemini can map the page cleanly, your good content has a far better chance of being cited.

## What should your page wrapper handle automatically?

The wrapper should do the repetitive, trust-building work once so the writer does not keep rebuilding it by hand.

| Element | Purpose | Who should own it |
| --- | --- | --- |
| Meta line | Show author, publish date, update date, read time, and zone | Page wrapper |
| Headline | Render the canonical title with the right semantics | Page wrapper |
| Lede | Surface the excerpt as a second summary layer | Page wrapper |
| Author box | Reinforce experience, expertise, and site identity | Page wrapper |
| Article markup | Expose headline, author, dates, canonical URL, and publisher | Page wrapper |
| Breadcrumb markup | Help crawlers understand the content hierarchy | Page wrapper |

That wrapper is where you should also keep the trust signals that rarely change: author identity, publisher identity, legal links, and consistent structured output. If those sit in the layout rather than the article body, every post inherits them automatically and your pipeline stays honest.

For ZeroLabs, that means the wrapper handles the `Article`, `Person`, `Organization`, and breadcrumb graph. The post body stays focused on things the writer can actually improve: the summary, the sections, the examples, the FAQ, and the internal linking.

## Which content elements should every post use?

The strongest AI-ready posts use a small set of repeatable blocks, not endless bespoke formatting.

### Start with a summary box

Your opening summary should answer the full question in 40 to 80 words. That is the first extraction target for AI systems and the fastest orientation point for people.

### Use question-led H2s

Question H2s match how people search and how AI systems retrieve. "How should you wire JSON-LD and metadata together?" is easier to extract than "JSON-LD Implementation Notes."

### Add one clear callout where the section earns it

Use a short blockquote callout when a section has one thing the reader must remember.

> **The hard rule:** Every block in the body should map to an information type AI systems already know how to reuse: answers, comparisons, steps, definitions, and FAQs.

### Prefer markdown tables for comparisons

If you are explaining template pieces, comparison tables outperform long paragraphs because they keep the distinctions explicit.

| Content block | What it helps Gemini extract | Why it matters |
| --- | --- | --- |
| Summary box | Answer-first overview | Gives a citable snapshot |
| Question H2 | Intent-aligned section heading | Matches real queries |
| Numbered list | Ordered process | Supports HowTo understanding |
| FAQ item | Direct question and answer pair | Feeds FAQPage and retrieval |
| Comparison table | Side-by-side distinctions | Reduces ambiguity |

### Keep code examples fenced and labeled

If you show implementation snippets, tag the block language so crawlers and developer tools can classify it correctly.

```text

  Your Post Title
  Your excerpt goes here.

```

```text
// Example: metadata-driven structured data graph
{
  "@context": "https://schema.org",
  "@graph": [
    { "@type": "Article", "headline": "Your Post Title" },
    { "@type": "FAQPage", "mainEntity": [] },
    { "@type": "HowTo", "step": [] }
  ]
}
```

## How should you wire JSON-LD and metadata together?

The cleanest pattern is simple: write once in metadata, render twice.

The body should hold the reader-facing version of the content. Metadata should hold the structured version the wrapper uses for markup. FAQ questions belong in the visible FAQ section and in `metadata.faq`. Step-by-step instructions belong in the article and in `metadata.howto`. That way the page stays readable while the wrapper emits the structured graph automatically.

This is also where most template systems drift. People add a beautiful FAQ block to the article but forget to wire it into the structured data layer. Or they generate JSON-LD from stale frontmatter while the visible copy has already changed. Keeping one metadata source of truth fixes that.

If you want a practical example of wiring that kind of automation into a publishing process, the [AI review agents content pipeline](/ai-workflows/ai-review-agents-content-pipeline) shows how validation and publishing logic can run together. If you want the broader content strategy behind why citations and answer nuggets matter so much, the [GEO and E-E-A-T guide](/resources/geo-e-e-a-t-get-your-content-cited-by-ai) covers the retrieval side.

## How do you roll this out without rebuilding your whole site?

You do not need a full redesign. Most teams can roll this out in one afternoon if they separate wrapper changes from body-format changes.

1. **Lock the wrapper first.** Add or verify your article markup, breadcrumb markup, author box, legal links, and consistent meta line.
2. **Standardize the body template.** Summary box, contents, question H2s, internal links, and FAQ section.
3. **Mirror FAQ and HowTo data in metadata.** Do not hand-author structured data on every page.
4. **Validate with live tools.** Use Google's [Rich Results Test](https://search.google.com/test/rich-results) and inspect the rendered HTML, not just the source markdown.
5. **Audit older posts in batches.** Start with high-traffic evergreen pieces, then clean the long tail.

We rolled this out across ZeroLabs one afternoon. The trickiest part was step 3: we had FAQ content in the body that was missing from the metadata JSON, and it took two broken structured data graphs before we caught the mismatch. Worth knowing before you start.

The key is sequencing. Structure first, then markup, then backlog cleanup. That is how you keep the pipeline from leaking regressions while you repair older posts.

## Frequently asked questions

**Does this template only work for Gemini?**

No. Gemini is the framing device, but the underlying structure also helps Google AI Overviews, ChatGPT, Perplexity, and standard search crawlers. Good extraction structure is broadly useful.

**Do I need raw HTML in my post body to make this work?**

No. In a markdown-first stack, raw HTML usually becomes a maintenance problem. Keep the body in clean markdown and let the page wrapper render the semantics and structured data around it.

**What is the most important block to get right first?**

The opening summary box. If the top summary is weak, the rest of the structure has less to work with.

**Should every post get FAQ and HowTo markup?**

No. Only use those when the content actually contains real FAQs or actionable steps. Forced structured data is worse than missing structured data because it teaches crawlers not to trust your markup.

**How do I know the structure is working?**

Check whether the page can be understood in chunks: the summary alone, a single H2 section alone, the FAQ alone, and the graph alone. If each piece still makes sense, the structure is probably doing its job.

## What should you download first?

Start with two files: the **body template** (`zerolabs-blog-post-template.md`) and the **metadata contract** (`metadata-shape.json`). Those two prevent most regressions by giving writers, reviewers, and the pipeline the same target.

If you are wiring this into an existing publishing system, grab the **validation checklist** too, which covers the 12 most common drift patterns we see when teams port older content. The three files together take about an afternoon to implement.

---

Want to see how this template runs inside an automated pipeline? The [AI review agents post](/ai-workflows/ai-review-agents-content-pipeline) shows how we wired validation, style review, and publishing into one workflow so every post ships with the same machine-readable structure from day one.

---

## How I'm Learning German by Talking to My Coding AI
URL: https://labs.zeroshot.studio/ai-workflows/learn-german-passively-ai-coding-assistant
Zone: ai-workflows
Tags: ai-workflows, language-learning, claude-code, productivity, german
Published: 2026-03-22

I built a skill file that makes Claude Code weave German words into every response, turning coding sessions into passive language immersion.

> **KEY TAKEAWAY**
> * **The Problem:** Language learning apps demand separate study blocks you'll never make time for.
> * **The Solution:** A markdown skill file turns Claude Code into a German tutor that injects vocabulary into your normal work responses.
> * **The Result:** 50+ recalled German nouns with correct articles in three weeks, zero added study time.

*Last updated: 2026-03-27 · Tested against Claude Code v1.0*

### Contents

1. [Why traditional language apps failed me](#why-apps-failed)
2. [How does passive immersion actually work?](#how-immersion-works)
3. [What does ZeroDeutsch do?](#what-zerodeutsch-does)
4. [How are the five levels structured?](#five-levels)
5. [How do I set it up?](#setup)
6. [What have the results been after three weeks?](#results)
7. [Frequently asked questions](#faq)

## Why did traditional language apps stop working for me?

I moved to Berlin six months ago. My German sits somewhere around A2 elementary, which means I can order food, ask for directions, and completely lose the thread the moment someone responds at normal speed.

Duolingo lasted two weeks. Babbel lasted three. The problem wasn't the apps. They're well built. The problem was me: after eight hours of coding and managing infrastructure, I could not make myself open another app and do 20 minutes of flashcards. The motivation just evaporated every single evening.

What I needed was something that didn't require a separate block of time. Something that met me where I already spend hours every day: talking to Claude Code.

> **The hard rule:** The bottleneck for language learning isn't content quality. It's finding time you'll actually use consistently.

## How does passive immersion actually work?

The research behind passive immersion is solid. [Stephen Krashen's input hypothesis](https://www.cambridge.org/core/journals/studies-in-second-language-acquisition/article/input-hypothesis-issues-and-implications/E44E14E81C0A43C5A0F4E70F23B6D286), one of the most cited frameworks in second language acquisition, argues that we acquire words when we receive "comprehensible input" slightly above our current level. Not grammar drills. Not memorisation. Exposure we can mostly understand, with just enough new material to stretch.

Immersion works because your brain processes language in context. When you see **der Fehler** (the error) right after Claude tells you there's a bug in your config, the meaning clicks instantly. You didn't study it. You absorbed it. That contextual encoding is measurably stronger than flashcard recall, according to research from the Max Planck Institute for Psycholinguistics.

The spaced repetition effect kicks in naturally too. Claude reuses previously introduced words throughout the conversation. First with hints, then without. By the third appearance, you either remember it or you've seen enough context to figure it out.

> **The reality:** Passive immersion works because the brain encodes vocabulary better in meaningful context than in isolation.

## What does ZeroDeutsch actually do?

ZeroDeutsch is a skill file for Claude Code. It's a markdown document you drop into your `.claude/skills/` directory. Once activated, it changes how Claude talks to you. Not what it does, just the language it wraps around its normal responses.

Say you're debugging a deployment issue. Without ZeroDeutsch, Claude says: "The error is in your database connection string." With ZeroDeutsch at Level 1, it says: "The **Fehler** (error) is in your **Datenbank** (database) connection string."

Same technical accuracy. Same problem solved. But now you've seen two German words in context, with articles, and your brain filed them away without you lifting a finger.

The skill activates when you type "Deutsch an" and deactivates with "Deutsch aus." It layers on top of whatever Claude is already doing. Writing code, reviewing PRs, explaining architecture, answering questions. The German vocabulary rides alongside the real work.

**What's a skill file?** A markdown instruction set that [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) loads on demand. It costs zero tokens when inactive and only enters context when triggered. In our ZeroShot Studio setup, [we use skill files for everything from deployments to content creation](/vps-infra/save-tokens-claude-code-instructions), and [hooks to automate workflow triggers](/ai-workflows/claude-code-hooks-workflow).

![Terminal view of the ZeroDeutsch skill file showing its passive German immersion setup and activation instructions.](/api/images/zero-deutsch-proof.png)

## How are the five difficulty levels structured?

Five levels, each one pushing more German into your responses.

| Level | Name | What Changes | Target Proficiency |
|-------|------|-------------|-------------------|
| 1 | Einzelwörter (Single Words) | 1 noun every 2-3 sentences with article | A1 vocabulary |
| 2 | Mehr Farbe (More Colour) | 1 word per sentence, nouns and adjectives | A1-A2 vocabulary |
| 3 | Sätze Mischen (Mixing Sentences) | 1-2 words per sentence, adds verbs and phrases | B1 vocabulary |
| 4 | Halbdeutsch (Half German) | Full short phrases, German subordinate clauses | B1-B2 vocabulary |
| 5 | Fast Deutsch (Almost German) | Majority German, English only for technical terms | B2+ vocabulary |

The hint system decays across appearances. First time you see a word: bold with English translation in parentheses. Second time: bold only. Third time onward: no formatting, just the German word sitting naturally in the sentence. At Levels 4 and 5, only genuinely difficult vocabulary gets hints at all.

At Level 3 and above, Claude drops occasional one-line grammar notes when a pattern emerges. Things like: "German puts the verb at the end in subordinate clauses." Maximum one per response. Enough to notice patterns, not enough to feel like a lesson.

**Level 1 example:**
"I think the best approach is to update the **Datenbank** (database) first."

**Level 3 example:**
"We should **erstellen** (create) a new branch and **überprüfen** the config."

**Level 5 example:**
"**Also**, die Architektur sieht gut aus. We should die Tests noch mal laufen lassen [run the tests again] before merging."

> **The hard rule:** Five levels let you start with single nouns and scale to near-full German as your confidence grows. The hint system handles the transition automatically.

## How do I set it up?

Three steps. Under two minutes.

1. **Download the skill file.** Grab `jimmy-goode-zerodeutsch.md` from the downloads section below.
2. **Drop it in your skills directory.** Copy it to `.claude/skills/zero-deutsch/SKILL.md` in your Claude Code project. Create the directory if it doesn't exist.
3. **Activate it.** Type "Deutsch an" in any Claude Code conversation. That's it.

To change difficulty: "Deutsch level 3" (or any number 1 through 5).
To deactivate: "Deutsch aus."

The skill file works with any Claude Code project. It doesn't interfere with other skills or tools. In our ZeroShot Studio workflow, we run ZeroDeutsch alongside deployment scripts, code reviews, and content generation without any conflicts. It's a modifier layer, not a replacement for anything.

```bash
# Example: ZeroDeutsch skill file installation
# Quick setup
mkdir -p .claude/skills/zero-deutsch
cp jimmy-goode-zerodeutsch.md .claude/skills/zero-deutsch/SKILL.md
```

## What have the results been after three weeks?

I'm not going to claim fluency. That would be absurd. But three weeks in, running ZeroDeutsch at Level 2 during normal work sessions, here's what I've noticed.

I can recall roughly 40-50 German nouns with their correct articles without thinking. Words like der Fehler (error), die Datei (file), die Abfrage (query), der Schlüssel (key), das Ergebnis (result). All tech words that came up naturally during real work. All encoded with their articles because the skill file forces der/die/das from day one.

My grocery store interactions got noticeably smoother. Not because ZeroDeutsch taught me food words specifically, but because the pattern of hearing German words in context made my brain more receptive to picking them up in the real world too. The passive exposure seems to lower the activation energy for retaining new words everywhere. This mirrors [personalising AI tone to the user's environment](/resources/ai-for-business-personalisation-tone): when the AI adapts its language to your context, retention improves.

The biggest surprise: I've started reading German signs and menus without consciously switching into "I'm trying to read German" mode. The words just register. That shift from active effort to passive recognition is exactly what immersion research predicts, and it happened faster than I expected.

| Metric | Before | After 3 Weeks |
|--------|--------|---------------|
| Vocabulary recalled (no prompt) | ~15 nouns | ~50 nouns with articles |
| Articles correct (der/die/das) | ~30% | ~70% |
| Daily study time added | 0 min | 0 min (that's the point) |
| Level progression | N/A | Started L1, now comfortable at L2 |

> **In the wild:** Three weeks of passive immersion during normal coding sessions added roughly 35 recallable German nouns with correct articles, with zero dedicated study time.

## Frequently asked questions

**Does this work with languages other than German?**

The concept works for any language, but this specific skill file is built for German. The article system (der/die/das), the vocabulary lists, and the grammar notes are all German-specific. You could fork the file and adapt it for Spanish, French, or Japanese. The structure would transfer. The word selection and grammar rules would need rewriting.

**Will this slow down my actual work?**

At Level 1 and 2, the impact is negligible. You're reading one or two German words per response with English translations right there. At Level 4 and 5, responses take slightly longer to parse. That's the tradeoff: more immersion means more cognitive load. Start at Level 1 and move up only when the current level feels effortless.

**Do I need to know any German before starting?**

No. Level 1 starts with single concrete nouns and always includes the English translation. Complete beginners can use it. The skill file defaults to A1/A2 vocabulary at the lower levels specifically because the target user might be starting from zero.

**Can I use this alongside other Claude Code skills?**

Yes. It won't conflict with anything else. We run it alongside [content creation](/ai-workflows/rag-pipeline-that-works), [deployment scripts](/vps-infra/save-tokens-claude-code-instructions), and code reviews in our ZeroShot Studio environment. The immersion applies to the communication layer only. Technical output stays in English.

**Why articles from day one? Isn't that adding unnecessary complexity?**

German noun gender is the single hardest thing to learn later. If you learn "Tisch" without "der," you'll spend months relearning it as "der Tisch." Every language teacher and textbook recommends learning the article with the noun from the start. ZeroDeutsch enforces this because it's the one thing you can't afford to skip.

---

Ready to turn your coding sessions into German practice? Download the skill file and activate it with "Deutsch an."

---

## OpenAI Acquires Astral: What It Means for Codex and Python Tooling
URL: https://labs.zeroshot.studio/news/openai-acquires-astral-codex-python-tooling
Zone: news
Tags: openai, astral, codex, python, ai-news
Published: 2026-03-22

OpenAI is acquiring Astral, the company behind uv, Ruff, and ty. What the deal means for Codex, Python tooling, and developers who rely on open-source tools.

> **KEY TAKEAWAY**
> * **The Problem:** Coding agents must move beyond simple code generation into full workflow integration with the tools developers already trust.
> * **The Solution:** OpenAI is acquiring Astral (uv, Ruff, ty) to wire its Codex system directly into the Python toolchain developers rely on daily.
> * **The Result:** The fight in AI coding shifts from model quality to workflow ownership, turning Codex from a flashy pair programmer into infrastructure developers can't ignore.

*Last updated: 2026-03-27 · Based on OpenAI's acquisition announcement (March 2026)*

*Source: [OpenAI News announcement](https://openai.com/index/openai-to-acquire-astral/). Cover image: ZeroLabs fallback cover, used because the source site blocked direct image retrieval.*

## Contents

1. [What did OpenAI actually announce?](#what-did-openai-actually-announce)
2. [Why does Astral matter so much in Python?](#why-does-astral-matter-so-much-in-python)
3. [What does this mean for Codex and coding agents?](#what-does-this-mean-for-codex-and-coding-agents)
4. [What should developers watch next?](#what-should-developers-watch-next)
5. [Frequently asked questions](#frequently-asked-questions)
6. [Bottom line](#bottom-line)

## What did OpenAI actually announce?

OpenAI announced that it plans to acquire Astral, the company behind a clutch of widely used open source Python developer tools. The headline names are **uv** for dependency and environment management, **Ruff** for linting and formatting, and **ty** for type safety.

According to OpenAI, the plan is to bring Astral's tooling and engineering team into the Codex system after the deal closes. The stated goal is to make Codex more capable across the full software development lifecycle, not just code generation. In plain English: OpenAI wants its coding agent living closer to the real tools developers already touch every day.

OpenAI also used the announcement to flex its traction numbers: 3x user growth, 5x usage expansion since the start of the year, and more than 2 million weekly active users on Codex. That gives the acquisition obvious context. This is not a side quest. They are pouring fuel on the coding stack.

> **The hot take:** This is less about buying a company and more about buying a position inside the Python development workflow.

## Why does Astral matter so much in Python?

Astral is not some random AI wrapper startup with a landing page and a logo pack. It sits in the guts of modern Python development. If you work in Python, there is a decent chance you already touch its tools directly or indirectly. [Astral's project page](https://astral.sh) lists uv, Ruff, and ty as production-ready tools with millions of downloads.

**uv** helps manage dependencies and environments. **Ruff** is known for being blazingly fast and has become one of the easiest wins for cleaning up Python codebases. **ty** pushes type safety earlier in the workflow. Put those together and you get something more valuable than a flashy demo: infrastructure developers already rely on.

| Tool | What it does | Why it matters | Strategic angle for OpenAI |
|---|---|---|---|
| uv | Dependency and environment management | Touches project setup and package workflows | Gets Codex closer to environment-aware dev work |
| Ruff | Linting and formatting | Already embedded in day-to-day editing loops | Lets AI work with the same quality gates humans use |
| ty | Type safety | Catches issues earlier in the cycle | Helps AI-generated changes become safer and more reviewable |

The bigger point is that Astral owns trusted touchpoints. If AI coding agents want to move beyond autocomplete with delusions of grandeur, they need to operate inside those integration points cleanly. That's where it matters.

## What does this mean for Codex and coding agents?

OpenAI's wording is pretty explicit here. The company says it wants to move beyond systems that simply generate code and toward systems that can plan changes, modify codebases, run tools, verify results, and maintain software over time.

That's the real story. We're watching the coding-agent market move from *output* to *execution*. The old question was, "Can the model write a decent function?" The new one is, "Can the agent survive contact with an actual repo, run the right tools, respect the project constraints, and not make a total mess of things?"

Astral helps with that shift because its tools sit inside the boring but essential parts of development work: dependency resolution, linting, formatting, type checking. Not glamorous. Very important. The stuff that separates toy code from production workflows.

1. **Codex gets closer to real toolchains.** Instead of floating above the editor as a smart suggestion machine, it can move toward operating alongside the tools Python developers already trust.
2. **OpenAI gets workflow use.** Owning model quality is one thing. Owning the route through setup, validation, and maintenance is another level entirely.
3. **The competitive bar goes up.** Anthropic, Google, and everyone else building coding agents now has to think harder about toolchain integration, not just benchmark wins.

That's why this announcement matters more than the usual "AI company acquired another AI thing" filler. Astral is not window dressing. It's a workflow wedge. We covered how coding agents wire into CI/CD and review pipelines in [how AI review agents fit into a real content pipeline](/ai-workflows/ai-review-agents-content-pipeline).

## What should developers watch next?

For now, OpenAI says Astral's open source tools will continue to be supported after the deal closes, and both companies remain separate until regulatory approval lands. That should calm the immediate panic about everything getting swallowed into a black-box SaaS blob.

But the interesting part is what comes after closing. The obvious questions are whether Codex gets deeper hooks into uv, Ruff, and ty, how much of that remains open and interoperable, and whether developers end up with genuinely better workflows or just tighter platform gravity.

Because that's the trade here. Better integration can be brilliant. It can also be the velvet rope into someone else's stack. If OpenAI plays it well, this makes coding agents more useful. If it overreaches, it risks annoying the exact developers it wants to win over. If you're thinking about how to keep your own AI tooling tightly scoped rather than sprawling, our walkthrough of [Claude Code hooks for workflow control](/ai-workflows/claude-code-hooks-workflow) is a practical starting point.

> **The hot take:** The next signal is not the acquisition closing. It is the first genuinely useful Codex workflow that feels better because Astral is in the loop.

## Frequently asked questions

**Is OpenAI shutting down Astral's open source tools?**

Not according to the announcement. OpenAI says it plans to keep the open source products supported after the deal closes. The real test will be how open and neutral those tools still feel once deeper Codex integrations start arriving.

**Why is this bigger than a normal acquisition?**

Because Astral is infrastructure, not decoration. Tools like uv and Ruff sit directly in real Python workflows, which gives OpenAI a route into the day-to-day mechanics of software development rather than just the code generation layer.

**What does this mean for developers using Codex?**

Potentially a lot. If OpenAI integrates Astral properly, Codex could become more capable at handling setup, validation, formatting, and maintenance tasks in ways that feel grounded in actual repo workflows instead of just producing plausible-looking code.

## Bottom line

OpenAI buying Astral is a bet that the future of AI coding will be won inside the workflow, not just at the prompt box. Models matter, sure. But tools, habits, and trust matter just as much. That's where developers actually live.

So the story here is simple: OpenAI is trying to make Codex harder to ignore by bringing it closer to the Python stack people already use. If that turns into genuinely better developer workflows, it's a strong move. If it turns into platform lock-in with a fresh coat of agent paint, people will smell it a mile off.

---

Want more AI industry breakdowns without the corporate perfume? Keep an eye on the Labs news feed, we'll keep calling the shots as they land.

[Browse more Labs news](/news) | [Explore ZeroLabs](/)

---

## Anthropic Gives Claude Code a Telegram and Discord Inbox
URL: https://labs.zeroshot.studio/news/anthropic-claude-code-channels-telegram-discord
Zone: news
Tags: anthropic, claude-code, telegram, discord, ai-news
Published: 2026-03-22

Anthropic ships Claude Code Channels — push Telegram and Discord messages into a live Claude Code session. Here is what it means.

> **KEY TAKEAWAY**
> * **The Problem:** Developers juggle between coding terminals, chat notifications, and alerts, and context-switching kills momentum and attention.
> * **The Solution:** Claude Code Channels bridges persistent coding sessions directly into Telegram and Discord, letting you message the agent from where you already live.
> * **The Result:** Agents become reachable like teammates, not tools you visit, ambient and low-friction enough to fit real work rhythm.

*Last updated: 2026-03-27 · Tested against Claude Code Channels (March 2026)*

## Contents

1. [What did Anthropic actually ship with Claude Code Channels?](#what-did-anthropic-actually-ship-with-claude-code-channels)
2. [Why does Telegram and Discord support matter more than it looks?](#why-does-telegram-and-discord-support-matter-more-than-it-looks)
3. [How does Claude Code Channels work in practice?](#how-does-claude-code-channels-work-in-practice)
4. [How does this compare with other agent workflows?](#how-does-this-compare-with-other-agent-workflows)
5. [What should teams and builders do next?](#what-should-teams-and-builders-do-next)
6. [Frequently asked questions](#frequently-asked-questions)
7. [The bigger shift is the real story](#the-bigger-shift-is-the-real-story)

## What did Anthropic actually ship with Claude Code Channels?

Anthropic has added a new Channels feature to Claude Code that lets supported messaging platforms push events into a running session. At launch, the official docs cover **Telegram** and **Discord**. The setup uses channel plugins and requires a live Claude Code session to stay open in the background.

That last bit is the important part. This is not just "Claude, but now in chat." It is a bridge into an existing coding session. Messages arrive inside the session you already have running, which means the agent can keep context, react to events, and reply back through the same channel.

Anthropic's [Claude Code overview](https://docs.anthropic.com/en/docs/claude-code/overview) and [MCP integration docs](https://docs.anthropic.com/en/docs/claude-code/mcp) make the product direction pretty clear: this is meant to be a tool-connected, terminal-native agent rather than a one-shot chatbot living in a tab. That is why the Channels model matters. It points toward an always-on worker you can poke from wherever you happen to be.

> **The reality:** Anthropic did not just add another integration, it shipped a bridge between messaging apps and a persistent coding session.

### What is a channel in this context?

**What is a channel?** In Anthropic's setup, a channel is an MCP-connected plugin that pushes inbound events into a live Claude Code session and can carry replies back out to the messaging platform.

That matters because it changes the working model. Instead of opening Claude, asking for something, and waiting around, you can leave the agent running and interact with it like a teammate you can ping.

## Why does Telegram and Discord support matter more than it looks?

Telegram and Discord are not random add-ons. They are where a lot of builders already live when they are away from the keyboard, juggling clients, side projects, testing groups, or half-broken deploys at stupid o'clock.

If you can message your coding agent from the same place you already receive alerts, coordinate work, and think in short bursts, the friction drops hard. That sounds small. It is not. Product history is full of these moments where the wrapper changes, and suddenly behaviour changes with it.

We saw this with notifications, with Slack workflows, with GitHub checks, with mobile-first creator tools. Same basic capability, different delivery surface, completely different adoption curve.

For Labs, the interesting part is not just convenience. It is **workflow gravity**. The closer an agent gets to the chat surfaces people naturally check all day, the easier it becomes for that agent to slip into real operational work.

> **The hot take:** The winner in agent UX may not be the model with the flashiest benchmark, it may be the one that fits cleanly into the inboxes and chat threads people already touch 100 times a day.

## How does Claude Code Channels work in practice?

Based on Anthropic's docs, the current flow is pretty straightforward.

1. **Create the bot or app.** For Telegram, that means a BotFather bot. For Discord, it means creating a bot application and granting the right permissions.
2. **Install the official plugin.** Claude Code installs the Telegram or Discord channel plugin.
3. **Configure credentials.** The bot token is stored locally so the plugin can connect.
4. **Restart Claude Code with channels enabled.** The running session starts listening for inbound messages.
5. **Pair and lock down access.** Anthropic documents an allowlist model so only approved senders can push messages into the session.

That security model matters. Messaging an agent from outside the terminal is useful, but it is also the part that can go sideways fast if sender controls are sloppy.

Here is the rough shape of the current release:

| Capability | Claude Code Channels now | What it means |
| --- | --- | --- |
| Messaging surfaces | Telegram, Discord | Mobile and chat-native control surfaces are live |
| Session model | Running local session | Context stays in the active coding session |
| Access control | Pairing plus allowlist | Better guardrails than "anyone can DM the bot" |
| Availability | Research preview | Real, but still early |
| Auth model | claude.ai login | Not yet an API-first enterprise plumbing story |

The docs also make one operational trade-off very clear: events only arrive while the session is open. So if someone wants a truly always-on setup, they still need to keep Claude Code alive in a persistent terminal, background process, or server environment.

That is a very different pitch from a fully hosted cloud agent. But for a lot of developers, it may be the sweet spot: more ambient than a tab, less weird than running a giant self-hosted stack from scratch.

## How does this compare with other agent workflows?

Claude Code Channels matters because it narrows the distance between mainstream commercial tooling and the message-first agent workflows that have mostly lived in hacker territory. If you want to see what that hacker-territory version looks like in practice, the [Claude Code hooks workflow](/ai-workflows/claude-code-hooks-workflow) post covers how persistent, event-driven agent loops work before the big labs productise them.

Here is the useful comparison:

| Approach | Strength | Weak spot | Best fit |
| --- | --- | --- | --- |
| Chat tab assistant | Fast, familiar | Stateless, interrupt-driven | Quick one-off help |
| Terminal coding agent | Deep control | Tied to the machine and session | Hands-on build work |
| Hosted cloud agent | Easier remote access | Less local control | Managed workflows |
| Message-first coding agent | Ambient, low friction | Needs guardrails and persistence | Ongoing ops, fast intervention |

The Labs angle here is simple: the product category is maturing. The big players are absorbing patterns that previously felt fringe, duct-taped, or enthusiast-only.

That does not mean every open workflow loses. It does mean the differentiation bar just got higher. Once a major lab turns chat-driven persistence into a native feature, the conversation shifts from "is this weird but clever?" to "which version of this model fits our stack, trust boundary, and habits best?"

## What should teams and builders do next?

If you are building with coding agents already, this is worth paying attention to right now.

1. **Audit where your workflow actually lives.** If most of your real interruptions and approvals happen in chat, a message-first agent loop may save more time than another IDE add-on.
2. **Treat access control seriously.** Pairing codes and allowlists are not optional garnish. They are the difference between useful and reckless.
3. **Decide how much persistence you really want.** A background session is powerful, but it also changes risk, cost, and operational expectations.
4. **Watch the connector system.** Anthropic has started with Telegram and Discord. If this sticks, more surfaces will follow.
5. **Plan for ambient agents, not just better chats.** The bigger design question is how work finds the agent, not just how the user opens it.

For Labs readers, the signal is not "everyone should switch tomorrow." The signal is that message-driven software agents are becoming a first-class product pattern. That has implications for tooling, team process, security, and where product moats get built next. For a concrete look at how Labs itself uses agents end-to-end, see [how Claude published directly to Labs via MCP](/openclaw/how-claude-published-directly-to-labs-via-mcp) and [how the AI review agents content pipeline works](/ai-workflows/ai-review-agents-content-pipeline).

## Frequently asked questions

**Is Claude Code Channels just a chatbot inside Telegram or Discord?**

No. The feature is designed to push messages into a running Claude Code session, not spin up a fresh generic chat every time. The point is continuity with an active coding context.

**Is this available broadly or still early?**

Anthropic describes Channels as a research preview. It is real and documented, but it is not positioned like a fully mature general-availability enterprise feature yet.

**Why does this matter for the wider AI industry?**

Because it shows the product direction. Big labs are moving beyond tab-bound assistants and toward agents that sit closer to where work actually happens: chat, alerts, and asynchronous coordination.

**Does this mean always-on coding agents are now mainstream?**

Not fully, but we are clearly moving that way. The gap between experimental agent workflows and mainstream product design just got smaller.

## The bigger shift is the real story

Claude Code Channels is the kind of launch that looks modest if you only read the feature list. Telegram. Discord. Plugins. Fine. Whatever.

Look again and the shape is clearer. Anthropic is betting that coding agents should be reachable like teammates, not visited like tools. That is a bigger shift than the release notes make it sound.

For Labs, that is the story worth tracking. We are watching agents move out of the tab, off the pedestal, and into the background rhythm of work. Once that happens, adoption is less about dazzling demos and more about habit, trust, and whether the thing is there when your hands are full and your laptop is nowhere near you.

---

Want the practical version, not just the headlines? We track the shifts that actually change how people build with AI, then turn them into plain-English breakdowns for teams trying to keep up without swallowing marketing sludge.

[Read more Labs coverage](/) | [Subscribe to the newsletter](/newsletter)

---

## How to Run a Security Audit on Your Vibe-Coded App
URL: https://labs.zeroshot.studio/ai-workflows/how-to-run-a-security-audit-on-your-vibe-coded-app
Zone: ai-workflows
Tags: security, vibe-coding, nextjs, checklist, web-security
Published: 2026-03-20

Most vibe-coded apps have security gaps hiding in plain sight. This guide walks through a practical 6-area security audit you can run yourself.

AI-generated code is functional but rarely defensive. I ran a full security audit on one of my own production Next.js apps and found leaked server fingerprints, missing rate limits, and API endpoints that revealed whether accounts existed. None of it was catastrophic, but all of it was the kind of stuff an attacker finds before you do.

This post walks through every check I ran across six areas: headers, auth, API boundaries, rate limiting, data exposure, and Content Security Policy. No pentesting certification required. Just curl, a browser, and the 40+ item checklist you can download at the end.

> **KEY TAKEAWAY**
> * **The Problem:** Vibe-coded apps often have security gaps hiding in plain sight: leaked server logs, missing rate limits, headers that fingerprint your stack, and API endpoints that leak whether an account exists.
> * **The Solution:** Run a practical security audit yourself using curl, a browser, and a downloadable checklist covering 40+ items across headers, auth, API design, rate limiting, and data exposure.
> * **The Result:** Find and fix security vulnerabilities before attackers do, with an audit process that takes just a few hours.

*Last updated: 2026-03-27 · Tested against Next.js 15, curl 8.x, and common web frameworks*

## Why should vibe coders care about security?

When you prompt Claude or ChatGPT to build a login page, it builds a login page. It won't add rate limiting. It won't consider what happens when someone hammers that endpoint 10,000 times. And it certainly won't strip server fingerprints from your response headers.

That's not a criticism of AI tools. They do exactly what you ask. The problem is that security lives in the space between what you asked for and what an attacker will try.

> **The hard rule:** Security isn't a feature you add later. It's the difference between an app that works and an app that's safe to put in front of real users.

## What does a security audit actually look like?

You don't need expensive consultants or enterprise tools. A solid first-pass audit is something you can do yourself. Here's the approach I use:

1. **Headers audit**: Check what your server tells the world about itself
2. **Auth surface check**: Test login, signup, and session handling for leaks
3. **API boundary review**: Verify that protected routes actually protect
4. **Rate limit verification**: Confirm abuse controls are real, not theoretical
5. **Data exposure scan**: See what your public endpoints reveal
6. **Content Security Policy review**: Verify your browser-side defences

Six different angles on the same question: "If someone poked at this with bad intentions, what would they find?"

## How do I check my security headers?

Open your terminal and run this against your production URL:

```bash
# Example: Check response headers for security misconfiguration
curl -sI https://your-app.com/ | grep -iE "content-security|strict-transport|x-frame|x-content-type|x-powered|server"
```

You're looking for a few specific things.

**Headers you want to see:**

| Header | What It Does | Why It Matters |
|--------|-------------|----------------|
| `Strict-Transport-Security` | Forces HTTPS with long cache | Stops downgrade attacks |
| `X-Content-Type-Options: nosniff` | Stops MIME sniffing | Blocks code injection via file confusion |
| `X-Frame-Options: DENY` | Blocks iframe embedding | Stops clickjacking |
| `Content-Security-Policy` | Controls what scripts/styles can run | Your main XSS defence |
| `Referrer-Policy` | Limits what URL info gets shared | Stops private path leakage |
| `Permissions-Policy` | Restricts browser APIs | Blocks camera/mic/location abuse |

**Headers you don't want to see:**

- `X-Powered-By: Next.js` or `Express` or anything else. This tells attackers exactly what framework you're running and which CVEs to try.
- `Server: Apache/2.4.51`. Same problem. Strip it or genericise it.

In my audit, I found `X-Powered-By: PleskLin` leaking from the nginx layer, and the framework header coming through the proxy. Two lines of config fixed both.

![Terminal output from a live header audit against labs.zeroshot.studio showing security headers, CSP, and exposed server fingerprints.](/api/images/security-audit-proof.png)

> **The hard rule:** Your response headers are a free recon tool for attackers. Strip anything that identifies your stack.

## Is my login flow actually secure?

This is where most vibe-coded apps fall apart. The login works, so it feels secure. But there are three specific things that "working" doesn't cover. I check these on every app I ship, and I still find gaps.

### Can someone enumerate accounts through your signup?

Try signing up with an email that already exists, then with a fresh one. If you get different HTTP status codes (say, 409 for existing and 201 for new), an attacker can map out every registered email address on your platform. They don't need to guess passwords. They just need to know who's there.

**The fix:** Return identical responses for both cases. Same status code, same body, same timing. Something like:

```json
// Example: Return identical responses for signup success and email-already-exists
{ "ok": true, "message": "If this email can be used, check your inbox for next steps." }
```

Always 200. Always the same. The server knows the difference, the attacker doesn't.

### Can someone brute-force your login?

Try logging in with wrong credentials five times. Then ten times. Then twenty. Did anything change? Did you get rate limited? Did the response slow down?

If the answer is "no, it just kept accepting attempts," you have a problem. A solid auth setup needs at minimum:

1. **Per-IP rate limiting** on auth endpoints (10 attempts per 10 minutes is reasonable)
2. **Per-account lockout** after repeated failures (5 failures locks for 15 minutes). If you're using Claude Code hooks to enforce patterns like this, see [how to wire Claude Code hooks for workflow automation](/ai-workflows/claude-code-hooks-workflow)
3. **Progressive delay** that gets slower with each failure (prevents rapid automated stuffing)
4. **Timing-safe responses** for unknown users (so attackers can't tell valid from invalid emails by response time)

### Do your error messages leak information?

"Invalid password" tells an attacker the email exists. "Account not found" tells them it doesn't. Your login should return the same generic message regardless: "Invalid credentials."

## How do I verify my API boundaries?

If your app has authenticated routes, you need to verify two things: that unauthenticated users can't access them, and that authenticated users can only access their own stuff.

The first check is simple:

```bash
# Example: Verify protected routes return 401 for unauthenticated requests
# Should return 401, not 200
curl -s -o /dev/null -w "%{http_code}" https://your-app.com/api/v1/ideas
curl -s -o /dev/null -w "%{http_code}" https://your-app.com/api/v1/export
curl -s -o /dev/null -w "%{http_code}" https://your-app.com/api/v1/account
```

If any of those return 200 without a valid session, you've got an open door.

The second check is harder and more important. This is BOLA/IDOR testing (listed as a critical risk in the [OWASP Top 10](https://owasp.org/www-project-top-ten/)): can User A access User B's data by swapping an ID in the URL?

For every route that takes a resource ID, check:
- Does it verify the authenticated user owns that resource?
- Does it use a compound query (`WHERE id = :id AND userId = :userId`) or just fetch by ID?
- If it's a public endpoint, does it filter by visibility before returning data?

> **The reality:** Authentication checks whether you're logged in. Authorisation checks whether you can access this specific thing. Most AI-generated code handles the first and forgets the second.

## Are my rate limits actually working?

Having rate limit code in your codebase and having working throttling are different things. Test them:

```bash
# Example: Test rate limiting by sending 65 rapid requests to an endpoint
# Send 65 rapid requests to an endpoint
for i in $(seq 1 65); do
  curl -s -o /dev/null -w "%{http_code} " https://your-app.com/api/v1/tags
done
```

You should see 200s turn into 429s. If all 65 return 200, your rate limiting is either misconfigured or not running in production.

Common gotchas I found:

- **In-memory rate limiting** resets when your server restarts or scales. Use Redis or an external store. For related cost controls, see [saving tokens with Claude Code instructions](/vps-infra/save-tokens-claude-code-instructions).
- **Fallback gaps**: Some rate limiters silently return "allowed" when the backing store is unavailable instead of falling back to an in-memory limiter. That's a wide-open door during an outage.
- **Missing global limit**: You might have rate limits on specific endpoints but no blanket per-IP limit on all API routes. An attacker targets the unprotected ones.

## What is my public API leaking?

Public APIs serve public data. That's fine. But are you serving more than you need to?

Check your public feed or listing endpoints. Look for:

- **Internal identifiers**: Hashes, internal IDs, or database-level metadata that doesn't serve the user
- **Private activity timing**: Does an activity heatmap include private or draft content in its counts? That leaks when someone is working on something even if they haven't published it
- **Uncapped pagination**: Can someone request 10,000 items at once and dump your entire database?

In my audit, I found a public feed returning cryptographic metadata hashes and internal identity hashes in every response. Neither served the user. Both gave an attacker a fingerprinting tool they didn't need.

**The fix:** Only return what the frontend actually renders. Strip everything else. Cap page sizes (50 items is plenty).

## What should my Content Security Policy look like?

CSP is your last line of defence against cross-site scripting. If an injection bug ever makes it into your code, a strong CSP stops the attacker's script from running.

The critical directive is `script-src`. You want it to say one of these:

```text
// Example: CSP script-src directives from most to least secure
script-src 'self' 'nonce-abc123'     ← Good: only scripts with this one-time token run
script-src 'self' 'strict-dynamic'   ← Good: only explicitly trusted scripts run
script-src 'self' 'unsafe-inline'    ← Bad: any injected script runs
```

If your CSP includes `'unsafe-inline'` in `script-src`, an XSS vulnerability bypasses your entire policy. For a Next.js app, generate a per-request nonce in middleware (the [Anthropic docs on prompt security](https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-prompt-injection-risk) cover similar principles):

1. Generate a random nonce in your middleware (Edge Runtime compatible)
2. Set the CSP header with that nonce instead of `unsafe-inline`
3. Forward the nonce to your layout via a request header
4. Next.js automatically injects the nonce into its script tags

The result: every page load gets a unique, unpredictable token. An injected script without that token gets blocked by the browser.

> **The hard rule:** If your CSP includes `'unsafe-inline'` in `script-src`, an XSS vulnerability bypasses your entire policy. Use nonce-based script policies instead.

## What about the "obvious" stuff that gets missed?

Some findings from my audit that felt obvious in hindsight but were genuinely hiding in production:

- **Dev server logs in user content**: A test record created during QA had raw Next.js compilation output pasted as its body. That was rendering on the public page, leaking file paths, route structures, and build info.
- **Seed scripts without production guards**: Test data scripts that would happily run against the production database. A one-line environment check would have prevented it.
- **robots.txt as a sitemap for attackers**: A well-meaning robots.txt that disallowed `/api/v1/ideas/`, `/api/v1/vaults/`, `/api/v1/api-keys/` essentially listed every sensitive endpoint. Disallow directives are for search engines, not security. Attackers read them as a target list.
- **S3 bucket listing enabled**: Object storage configured with `ListBucket` permission when only `GetObject` was needed. Anyone could enumerate every file in the bucket.

## The Security Audit Checklist

I've put together a 40+ item checklist covering everything in this post. It's broken into six sections matching the audit flow, and each item is something you can check yourself without special tools.

**Download it, run through it this weekend, and fix what you find before someone else finds it for you.**

## Frequently Asked Questions

**I'm using a framework like Next.js that handles a lot of this. Am I already safe?**

Frameworks give you a solid foundation, but they don't add rate limiting, strip server fingerprints, tighten CSP beyond defaults, or enforce ownership checks on your API routes. Those are your responsibility. The framework handles the plumbing. Security is the lock on the door.

**How often should I run a security audit?**

At minimum, before your first real users and after any significant feature addition. Auth changes, new API endpoints, and new integrations are all triggers. A quarterly check using the checklist takes about an hour and catches drift.

**Do I need to hire a professional pentester?**

Not for your first pass. The checklist in this post catches 80% of common web app vulnerabilities. If you're handling payments, medical data, or anything with regulatory requirements, yes, get a professional. But for a side project or early-stage product, start here.

**My AI coding tool says the code is secure. Can I trust that?**

No. AI tools generate functional code, not hardened output. They build what you ask for and miss what you don't. Security is the gap between "does it work?" and "what happens when someone tries to break it?"

**What's the single most impactful thing I can do right now?**

Run `curl -sI https://your-app.com/` and read every header. It takes 30 seconds and often reveals more than you'd expect. If you see `X-Powered-By` or `unsafe-inline` in your CSP, those are your first two fixes.

## Run It Before Someone Else Does

Your users trust you with their data, their accounts, their content. That trust is worth an afternoon.

The gap between "my app works" and "my app is safe" is smaller than you think: a few hours with the checklist, some config changes, a couple of code fixes. Run the audit this weekend. What you find will either reassure you or save you.

---

**Ready to audit your own app?** Grab the checklist and the companion AI agent file from the downloads below. The agent file turns Claude or ChatGPT into a security auditor that walks you through each check step by step.

If this raised questions about hardening production infrastructure, the [VPS & Infra zone](/vps-infra) has more on running secure self-hosted stacks. We publish one practical guide a week.

[Subscribe to the newsletter]

---

*Published on labs.zeroshot.studio by Jimmy Goode. Last updated March 2026.*

---

## How to Save Thousands of Tokens Per Message in Claude Code
URL: https://labs.zeroshot.studio/vps-infra/save-tokens-claude-code-instructions
Zone: vps-infra
Tags: claude-code, tokens, optimization, AI-tools
Published: 2026-03-19

Your CLAUDE.md is probably costing you thousands of messages a month. I cut mine from 40K to 4.8K characters. Here is how.

Your CLAUDE.md is probably costing you thousands of messages a month. Mine was 40,000 characters of server management instructions, injected into every single message. I cut it to 4,800 characters and lost zero functionality. Here's exactly how, and the maths on what it saved.

> **KEY TAKEAWAY**
> * **The Problem:** CLAUDE.md bloat gets injected as context with every message, silently eating your token budget; a 40,000-character file costs ~10,000 input tokens per message.
> * **The Solution:** Restructure instructions using a 4-tier architecture: hooks (always active, zero tokens), skills (loaded on demand), CLAUDE.md (tiny and core only), and operational specs (read when needed).
> * **The Result:** Reduced from 948 lines to 89 lines with 88% size reduction; same functionality and safety, but ~8,800 fewer tokens per message and 2,500–3,000 extra messages per month on a Pro plan.

*Last updated: 2026-03-27 · Tested against Claude Code v0.2.29*

## Why Does Your CLAUDE.md Size Matter?

Every time you send a message in [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview), your project's CLAUDE.md gets loaded into the conversation context. Every single message. Fresh input tokens, charged against your usage, on every interaction.

For most people with a small CLAUDE.md, this is negligible. A couple hundred characters of "use TypeScript, prefer functional components" costs basically nothing.

But if you're doing anything serious with Claude Code, your CLAUDE.md grows. Fast.

Mine managed a production VPS with 20+ Docker containers, multiple databases, deployment pipelines, safety hooks, git standards, monitoring systems, and coordination protocols for multi-device access. It had grown organically over weeks into a 948-line, 40,000-character beast.

> **The hard rule:** If your CLAUDE.md is over 5,000 characters, you're burning thousands of tokens per message on instructions that could be loaded on demand instead.

## How Much Does a Bloated CLAUDE.md Actually Cost?

Let's do the maths. One token is roughly 4 characters of English text.

| Metric | Before | After | Savings |
|--------|--------|-------|---------|
| CLAUDE.md size | 40,175 chars | 4,856 chars | 88% smaller |
| Tokens per message | ~10,000 | ~1,200 | ~8,800 fewer |
| Monthly token burn (1,350 messages) | ~13.6M | ~1.6M | ~12M tokens saved |
| Extra messages per month | baseline | +2,500 to 3,000 | Real conversations, not wasted context |

That last row is the one that matters. Those 12 million tokens I was burning on repeated instructions could instead be actual work. Actual code reviews, actual deployments, actual problem-solving.

On a [Pro plan](https://www.anthropic.com/claude/pricing) where you're bumping up against usage limits, that's the difference between running out of messages at 3pm and having capacity left for evening work.

> **The reality:** A 40K CLAUDE.md costs roughly 10,000 input tokens per message. Over a month of active use, that's 12 million tokens you're paying for instructions the model already knows.

## What Was Actually in Those 948 Lines?

Before I could cut anything, I had to understand what was in there and why. Here's what my CLAUDE.md had accumulated:

**Safety rules** that hooks already enforce (database protection, destructive command blocking, git safety). The hooks block dangerous operations regardless of what CLAUDE.md says. Having the rules written out in CLAUDE.md was belt-and-suspenders, costing tokens for zero additional safety.

**Procedure documentation** for deployments, backups, and git workflows. These were step-by-step instructions that only matter when you're actually doing that operation, not on every single message.

**Infrastructure reference** for every service, every port, every connection string. Useful when connecting to ZeroMemory's database. Completely irrelevant when writing a commit message.

**Protocol documents** for blackboard coordination, lessons-learned workflows, and code quality standards. Important, but not 200 lines of important-every-message.

**SSH commands, MCP configuration, slash command tables, agent role descriptions.** All things that either self-document at runtime or only matter during specific operations.

The pattern was clear: about 10% of the content was needed every message (identity, core rules, routing). The other 90% was needed occasionally, during specific operations.

![Terminal output showing the ZeroVPS instruction layout split across the core guide, supporting docs, and on-demand skills.](/api/images/instruction-architecture-proof.png)

## How I Structured the Fix: 4-Tier Instruction Architecture

I built what I'm calling a tiered instruction system. The core idea: only load what you need, when you need it.

### Tier 1: Hooks and Settings (Always Active, Zero Tokens)

These are the deterministic safety rails. They run as code before and after every tool call. No CLAUDE.md text needed.

- **PreToolUse hooks** block dangerous commands (DROP TABLE, rm -rf, force push)
- **Settings files** define permission boundaries (what's allowed, what requires confirmation)
- **PostToolUse hooks** log changes automatically

For a deeper look at how hooks fit into a production workflow, see [Claude Code hooks in practice: building safety rails that actually work](/ai-workflows/claude-code-hooks-workflow).

If a hook blocks something, it blocks it. Doesn't matter what CLAUDE.md says. So remove all the safety documentation from CLAUDE.md and let the hooks do their job.

**Token cost: 0.** Hooks run as shell scripts, not as context tokens.

### Tier 2: Skills and Agents (Loaded On Demand)

Skills are markdown files that get loaded only when invoked. My `/deploy` skill has the full blue-green deployment procedure, SSH commands, health check logic, and rollback steps. But it only enters the context when someone types `/deploy`. If you want to see skills working inside a full multi-agent review pipeline, [this breakdown of the AI review agents content pipeline](/ai-workflows/ai-review-agents-content-pipeline) covers exactly that.

Same with `/backup`, `/release`, `/git-workflow`, and 20 other skills. Each one is self-contained with all the context it needs.

**Token cost: 0 until invoked, then only that skill's tokens.**

### Tier 3: CLAUDE.md (Every Message, Keep It Tiny)

This is the only tier that costs tokens on every message. So it contains only:

- **Identity**: What server, what IP, what SSH aliases
- **Doctrine**: The 12 core rules that actually need to be in every conversation
- **Routing**: Which skill or document to read for which operation
- **Pointers**: File paths to Tier 4 docs for detailed reference

That's it. No procedures. No infrastructure detail. No protocol documents. Just enough to route correctly and make good decisions.

### Tier 4: Operational Specs (Read When Needed)

Everything else lives in standalone documents that get read on demand:

- `docs/code-quality-standards.md` for anti-patterns and testing rules
- `docs/zeromemory-session-protocol.md` for memory management
- `state/connections.md` for every port, database, tunnel command, and API endpoint
- `state/blackboard.protocol.md` for coordination rules
- `docs/config-sync.md` for repo-to-server synchronisation

The model reads these files when the task requires them. Writing code? It'll read the quality standards. Deploying? It'll read the connection reference and deployment skill. Just answering a question about git? It reads nothing extra.

**Token cost: 0 unless the specific document is needed.**

## How I Actually Did the Migration

This wasn't a weekend project, but it wasn't months of work either. Here's the process:

1. **Backed up the original.** Copied the full 948-line CLAUDE.md to an archive. Non-negotiable first step.

2. **Audited every section.** For each block of content, I asked: "Is this enforced by a hook? Is this in a skill? Does this need to be in every message?" If no to all three, it got extracted.

3. **Extracted operational specs.** Moved code quality standards, current-project workflow rules, and detailed protocols into standalone docs.

4. **Added preflight reads to skills.** Each skill got explicit "read this file before starting" steps, so they pull their own context.

5. **Wrote the new CLAUDE.md.** 89 lines. Doctrine, routing, pointers. Nothing else.

6. **Ran a full simulation.** Spawned 5 test agents to verify every routing path worked, every pointer resolved, every safety hook still blocked what it should.

7. **Iterated on gaps.** The simulation found issues. Missing pointers, overly aggressive hook patterns, lost operational knowledge. Fixed them all.

The simulation step was important. Without it, I would have shipped blind spots. The agents traced 15 common scenarios ("deploy an app", "create a backup", "troubleshoot a failing service") and flagged every dead end.

## What Could Go Wrong (and How I Handled It)

The obvious risk: if you remove instructions from CLAUDE.md, the model might not know to read the replacement document.

This is real. My simulation found that the model couldn't discover health monitor configuration because there was no CLAUDE.md pointer to it. It found that ZeroMemory's architecture details were orphaned in a memory file with no route from CLAUDE.md.

**The fix is pointers.** Every operational spec needs a one-line entry in CLAUDE.md that tells the model what it is and where to find it. Without the pointer, the document might as well not exist.

The other risk: safety rules that were instructional, not enforced. My old CLAUDE.md said "never commit .env files." But no hook actually blocks that. Removing the text from CLAUDE.md would remove the instruction entirely.

**The fix is honesty.** My new Safety section explicitly says which rules are hook-enforced and which require discipline. No pretending everything is automated when it isn't.

## How to Do This for Your Own Project

You probably don't need the full 4-tier architecture. But you can apply the principle immediately:

**1. Measure your CLAUDE.md.** Run `wc -c CLAUDE.md`. If it's over 5,000 characters, you have room to optimise.

**2. Categorise every section.** For each block, label it:
- **ALWAYS** (identity, core rules, routing decisions)
- **SOMETIMES** (procedures for specific operations)
- **RARELY** (reference material, detailed specs)

**3. Move SOMETIMES and RARELY content to separate files.** Put them in `docs/` or wherever makes sense for your project. We ended up with six standalone docs covering deployment, connections, quality standards, memory protocol, config sync, and blackboard coordination.

**4. Replace with pointers.** One line in CLAUDE.md: `- Database setup guide: docs/database-setup.md`

**5. Test the routing.** Ask Claude to do something that requires the moved content. Does it know to read the file? If not, adjust your pointer text.

That's it. You don't need hooks or skills or a simulation framework. Just measure, categorise, extract, pointer, test.

## Frequently Asked Questions

**Won't the model lose context if I move instructions out of CLAUDE.md?**

Only if you don't leave a pointer. The model follows file path references. A one-line pointer costs ~20 tokens. The full procedure it replaces might cost 2,000. That's a 99% saving on content that's only needed occasionally.

**How do I know which rules need to stay in CLAUDE.md?**

If the rule affects every interaction (like "use TypeScript" or "always read the manifest first"), it stays. If it only matters during specific operations (like "here's the backup procedure"), it moves to a separate file.

**Does this work with Claude Pro, not just Claude Code?**

The principle applies anywhere you're injecting system prompts or project instructions. If you're pasting a long system prompt into every conversation, the same categorise-and-extract approach works. The savings are proportional to how bloated your instructions are.

**What about the global CLAUDE.md in ~/.claude/?**

Same principle applies. That file also gets loaded every message. If it's large, apply the same tier approach. Keep identity and routing in the global file, move detailed instructions to project-level docs.

**Is 89 lines the right target?**

There's no magic number. The target is: only content that genuinely needs to be in every message. For me that was 89 lines. For a simpler project it might be 20. For a complex multi-service platform it might be 150. The principle is the same regardless.

## The Numbers

Before and after:

| What | Before | After |
|------|--------|-------|
| CLAUDE.md | 948 lines, 40K chars | 89 lines, 4.8K chars |
| Token cost per message | ~10,000 | ~1,200 |
| Monthly token savings | , | ~12 million |
| Extra messages per month | , | ~2,500 to 3,000 |
| Functionality lost | , | Zero |
| Safety degraded | , | No (hooks enforce it) |
| New docs created | , | 6 operational specs |
| Connection reference | Scattered across 5 files | Single consolidated file |

88% reduction. Same safety. Same functionality. Thousands more messages per month.

If you're spending real money on Claude usage, or constantly hitting rate limits, this is low-hanging fruit. Your CLAUDE.md is probably the biggest single source of wasted tokens in your workflow, and you can fix it in an afternoon.

The migration took me about three hours, including the simulation pass. The savings showed up immediately on the first day of normal use. Not a particularly clever solution: just measuring something that had been invisible, then restructuring it. Most optimisation problems in AI tooling are like that.

**Ready to try this yourself?** Download the companion agent file below. Drop it into Claude Code and it'll walk you through auditing and restructuring your own CLAUDE.md, step by step.

[Download: jimmy-goode-zerotoken-claude-instructions.md]

**Want more practical guides like this?** Join the newsletter for weekly content on building with AI tools. No theory, just the stuff that actually saves you time and money.

[Subscribe to the newsletter]

Measure your CLAUDE.md now. Run `wc -c CLAUDE.md`. If it's over 5,000 characters, the fix is waiting.

---

## How Claude Published Directly to Labs via MCP
URL: https://labs.zeroshot.studio/openclaw/how-claude-published-directly-to-labs-via-mcp
Zone: openclaw
Tags: mcp, openclaw, publishing, automation, content-ops
Published: 2026-03-18

This post was created live by Claude via the ZeroLabs MCP server — a direct tool call into the ZeroShot Studio publishing stack, no dashboard required.

> **KEY TAKEAWAY**
> * **The Problem:** Publishing requires manual handoffs that interrupt the AI workflow, forcing agents to stop at drafts instead of completing real work inside the system.
> * **The Solution:** MCP gives Claude authenticated access to the publishing stack, allowing agents to create and publish content directly without touching the CMS dashboard.
> * **The Result:** Content operations became executable system actions instead of isolated writing tasks, removing bottlenecks between drafting and publication.

*Last updated: 2026-03-27 · Tested against Claude Code v0.2.29 and ZeroLabs MCP*

**Why this matters:** Publishing became another callable tool in the workflow. Once an agent can move from writing to action inside a controlled system, content ops starts looking a lot more like software ops.

## Contents

1. What actually happened when Claude posted directly to Labs?
2. Why MCP matters more than the demo itself
3. What the workflow looks like in practice
4. What guardrails matter before you do this for real
5. Frequently asked questions
6. The real shift is operational

## What actually happened when Claude posted directly to Labs?

The short version is simple. Claude had access to a publishing tool exposed through MCP, and used that tool to create a post directly inside the Labs stack. No one had to open the dashboard, copy-paste content, or manually press publish.

That matters because it moves the agent beyond advisory mode. Most AI writing workflows still stop one step short of real work. The model writes a draft, maybe formats it nicely, then waits for a human to shuttle it into the CMS like a glorified courier.

This test skipped that handoff. The model wrote, called the publishing tool, and the post landed live in Labs.

> **The reality:** The milestone here is that AI can complete the publishing step inside a real system, not just write a blog post.

## Why does MCP matter more than the demo itself?

MCP turns external systems into tools the model can call directly. Instead of treating the AI like a clever text box, you give it controlled access to things that actually do work. The [Model Context Protocol specification](https://modelcontextprotocol.io/) defines how that boundary works, and Anthropic's [Claude Code overview](https://docs.anthropic.com/en/docs/claude-code/overview) shows why that matters in practice for an agent that can already edit files, run commands, and operate inside a real repo.

In publishing terms, that means the model can:

- create a post
- update a post
- change metadata
- move content into the right zone
- turn a content workflow into something executable

Not at the paragraph level. At the system boundary.

For Labs, this is the bit worth paying attention to. Once publishing becomes tool-driven, the whole content flow starts looking like a production system: draft, validate, route, publish, review, update.

## What does the workflow look like in practice?

The practical workflow is a lot less magical than the headline makes it sound.

1. **The agent writes the content.** That still means using the right structure, voice, and editorial logic.
2. **The agent calls the publishing tool.** Instead of stopping with markdown in chat, it sends the post into the CMS workflow.
3. **The CMS stores and renders the post.** Metadata, slug, tags, and zone all get handled in the same path.
4. **Humans review the outcome.** The point is not removing oversight. The point is removing dead manual steps.

That is the pattern worth copying. Let the agent handle the boring handoff, keep humans on the bits that need a brain. If you want to see this pattern extended across a full content pipeline, [how AI review agents fit into a content pipeline](/ai-workflows/ai-review-agents-content-pipeline) covers the architecture in more depth.

| Layer | Old workflow | MCP workflow |
| --- | --- | --- |
| Drafting | AI writes text | AI writes text |
| Handoff | Human copies into CMS | Agent calls publishing tool |
| Metadata | Human fills fields manually | Agent populates fields programmatically |
| Review | Human reviews after manual work | Human reviews the outcome |
| Speed | Slower, more brittle | Faster, more automatable |

## What guardrails do you need before doing this for real?

This is where people get stupid if they only focus on the demo.

If an agent can publish, it can also mispublish. So the real work is not building the tool. It is deciding what the tool can do, when, and who can see it did it.

At minimum, you want:

1. **Clear scope.** Which content types can the agent publish directly?
2. **Authentication.** The tool must be tied to a real trust boundary, not a public endpoint with good intentions.
3. **Audit trail.** Every create or update should be attributable.
4. **Review logic.** Some categories can auto-publish, others should stay draft-only.
5. **Rollback path.** Humans need a fast way to correct or revert mistakes.

**Watch this:** The best AI workflows are not the ones with the fewest humans. They are the ones where the handoffs, permissions, and rollback paths are clean. Ask me how I know. For a practical walkthrough of applying this thinking to code rather than content, see [how to run a security audit on your vibe-coded app](/ai-workflows/how-to-run-a-security-audit-on-your-vibe-coded-app).

## Frequently asked questions

**Is this just a gimmick?**

Not if it is tied to a real operational workflow. The gimmick version is "look, the AI made a post." The useful version is "the publishing system is now callable, auditable, and automatable."

**Why not just have a human press publish?**

Because the publish step is exactly the sort of repetitive system action that tools are good at. Humans should spend more time on judgement and less time on copy-paste administration.

**Does this mean content should fully auto-publish by default?**

No. It means the capability should exist. Whether it should auto-publish depends on category, risk, trust, and review rules.

## What is the real shift, and why is it operational?

The reason this matters is not novelty. It changes how the work actually moves.

Once an agent can act inside the publishing stack, content stops being an isolated writing task and starts being an executable system. Faster turnaround, safer automation, tighter loops between research, drafting, and maintenance.

That is the bigger idea behind the demo. Publishing is now part of the toolchain.

---

Want more practical breakdowns of how AI systems move from chat toy to actual workflow? Keep an eye on Labs.

---

