Beyond Prompts: Building a Codex Playbook for Real Application Development

April 13, 2026
Beyond Prompts: Building a Codex Playbook for Real Application Development

One-off prompting is weak engineering infrastructure.
It can be useful for getting unstuck, drafting code, or exploring ideas, but it does not produce repeatable engineering behavior on its own. The same prompt that works acceptably on one backend feature often degrades into generic advice or inconsistent implementation on the next one. That happens because a prompt is usually trying to do too many jobs at once: repository policy, architectural philosophy, feature workflow, review heuristics, and implementation detail.
This repository was built to separate those concerns and make them durable. 
JordiCorbilla/codex-engineering-playbook
The Core ProblemMost teams experimenting with coding agents start with giant instruction blobs. They pour style preferences, architecture opinions, code review heuristics, framework conventions, and edge-case warnings into a single file or prompt. That usually fails in predictable ways:
durable guidance gets buried under temporary workflow detail
the instructions become too large to scan and too vague to route reliably
project structure becomes secondary to the agent's last prompt
implementation becomes inconsistent across languages and application layers
The problem is not that the model needs more words. The problem is that the repository needs a better operating model.
What This Repository Builds InsteadThis playbook uses four layers:
AGENTS.md for durable repository-wide rules
.agents/skills for reusable workflows
templates and scripts for deterministic implementation support
examples for concrete, inspectable reference projects
That split is the point of the repository.
The top-level AGENTS.md stays small. It says what the repository is for, when to prefer skills, what "done" means, and what kinds of validation are expected. It does not try to teach every implementation pattern directly. That restraint is important. Durable instructions should be hard to invalidate.
The detailed workflow logic moves into focused skills such as:
csharp-backend-feature
python-backend-review
react-typescript-feature
api-contract-review
layered-architecture-recon
This is better than a giant style blob because routing matters. Reviewing a Python backend diff is not the same task as implementing a React feature or mapping an unfamiliar layered architecture. Putting those workflows into separate skills makes the instructions sharper and more reusable.
Why Durable Repo Instructions MatterAgent behavior improves when the repository teaches stable defaults instead of relying on the operator to restate them every time.
In this repository, the durable defaults are simple:
inspect existing structure first
keep boundaries clear
put logic in the right layer
validate the touched area before calling the work done
explain trade-offs when architecture changes
Those rules are broad enough to matter across C#, Python, and React, but narrow enough to remain useful. They establish an operating model without pretending the languages are interchangeable.
Why Skills Beat Giant Prompt FilesA single giant prompt tends to flatten all engineering work into "follow best practices." That is almost always too soft. Real engineering work has modes, and the quality bar changes with the mode.
Feature work needs placement discipline. Review work needs risk detection. Contract review needs attention to caller-visible behavior. Architecture reconnaissance needs an accurate map, not rewrite advice.
That is why the skills in this repository are intentionally separate:
csharp-backend-feature cares about thin controllers, service-led behavior, async correctness, DTO boundaries, and targeted tests.
csharp-backend-review cares about fat controllers, sync-over-async mistakes, leaky entity boundaries, and weak exception flows.
python-backend-feature emphasizes typed boundaries, thin request handling, explicit services, and restrained abstraction.
frontend-ux-polish is about hierarchy, spacing, labels, state clarity, and affordances, not broad frontend architecture.
Separating those workflows does two things. It makes routing more reliable, and it avoids making every task carry the weight of every other task's instructions.
The C# GuidanceThe C# portion of the playbook favors a layered ASP.NET Core style, but it is not blindly enterprise. The guidance in conventions/csharp.md pushes toward controllers, services, and repositories when those boundaries provide real value. It also says something many style guides avoid saying clearly: repository abstractions can become ceremony.
That trade-off matters. In the C# example app, the repository boundary exists because it helps keep persistence concerns separate and makes the service easy to test. But it stays intentionally small and in-memory. The goal is to demonstrate responsibility boundaries and async flow, not to perform ORM theater.
The example includes:
OrdersController.cs for HTTP concerns
OrderService.cs for validation, orchestration, and business flow
InMemoryOrderRepository.cs for isolated storage behavior
ExceptionMappingMiddleware.cs for centralized error translation
That is a good example of the repository's overall stance: use structure to clarify responsibilities, not to multiply files for their own sake.
The Python GuidancePython code often swings between two bad extremes when teams talk about architecture. One extreme is route handlers full of business logic and IO. The other is an imported enterprise pattern that creates classes, abstract base types, and indirection that the codebase has not earned.
The Python guidance in conventions/python.md tries to avoid both.
It emphasizes:
thin request handling
typed boundaries
explicit validation
clear service or module ownership
simplicity over cleverness
In the Python example app, the route module translates HTTP concerns, the service owns the business path, and the data-access layer stores records in memory. The example is intentionally small, but the boundaries are inspectable:
orders.py keeps request handling thin
orders.py owns the business behavior
orders.py isolates storage concerns
orders.py defines typed request and response models
That structure is useful because it makes future changes cheaper. A new validation rule or storage implementation has an obvious home.
The React + TypeScript GuidanceFrontend guidance is often too aesthetic or too abstract. This playbook is trying to be engineering guidance, so the React conventions focus on code structure, state handling, and contract discipline.
The priorities in conventions/react-typescript.md are:
keep components focused
separate API access from rendering
make loading, error, and empty states explicit
type props and async data honestly
keep hooks disciplined
avoid business logic hidden in JSX
The React example app is not a design system demo. It is a small application slice showing the separation between transport, state, and presentation:
orders.ts owns data access behavior
useOrders.ts models async state transitions
OrderSummaryList.tsx renders typed data
App.tsx composes the UI and makes loading, empty, and error states visible
That is deliberate. Many generated React examples hide the interesting part of the problem inside a single file. This one does not.
Why The Examples MatterExample apps are not filler in a repository like this. They are operational anchors.
Without examples, convention documents drift toward abstract advice. Skills become more likely to overfit to generic framework knowledge. Templates become harder to evaluate because there is no concrete reference point for what "good" looks like in the repository.
The examples in this playbook make the guidance testable. They show the relationship between the documents and actual code. They also give Codex stable insertion points when exploring an unfamiliar but similarly structured codebase.
The practical run instructions live in the repository README. That split is intentional: the blog explains the design, while the README acts as the operational starting point.
Scripts And Templates As Engineering ToolsAnother failure mode in agent-oriented repositories is over-explaining deterministic work in prose. If a C# controller shape is predictable, a template is better than another paragraph. If you need to discover likely insertion points or summarize changed files, a small script is usually better than asking the model to infer everything from scratch every time.
That is why this repository includes:
templates/csharp
templates/python
templates/react
scripts/scan_repo.py
scripts/find_feature_insertion_points.py
scripts/discover_tests.py
scripts/summarize_changed_files.py
scripts/validate_playbook.py
These are small on purpose. They exist to support repeatability, not to introduce another platform inside the repository.
StrengthsThe repository is structured for reuse, not just demonstration.
The top-level instructions are durable enough to survive ordinary growth.
The skills are narrow enough to route reliably.
The examples make the guidance concrete instead of rhetorical.
The validation bar is lightweight but real.
LimitationsThe playbook is intentionally opinionated, which means it will not fit every team culture.
The examples are small; they demonstrate boundaries, not the full complexity of production systems.
The repository does not solve framework-specific edge cases beyond the patterns it chooses to illustrate.
Skills can become stale if a team's real architecture evolves and the playbook does not.
There is also a rigidity trade-off. If a team treats the playbook as law rather than guidance, it can become a local maximum. Not every codebase needs controller/service/repository layering. Not every React feature needs another hook. Not every Python service needs another module boundary. Good playbooks create defaults, not dogma.
Where This Helps MostThis approach helps most when:
a team wants more repeatable coding-agent behavior
the codebase already values clear boundaries
multiple languages need a shared architectural vocabulary
implementation and review workflows should be routable and explicit
It helps less when:
the codebase is intentionally experimental
architecture changes weekly
framework-specific constraints dominate general engineering structure
the team wants maximal freedom and minimal convention
Pros And ConsPros:
better repeatability than one-off prompts
clearer separation between policy, workflow, and implementation detail
easier extension through focused skills
stronger grounding through examples and validation
Cons:
more repository structure to maintain
more upfront design work than dropping a prompt file into a repo
some duplication between conventions, skills, and examples is intentional and needs upkeep
can become too rigid if teams stop exercising judgment
Final ViewThe point of this repository is not to prove that Codex can follow instructions. The point is to make those instructions worth following.
That requires durable repository guidance, focused workflow skills, deterministic helpers, and concrete examples. It also requires saying no to the usual temptation to dump everything into a single mega-prompt and hope the agent will sort it out.
That temptation is understandable. It is also a weak substitute for engineering structure.
Search This Blog

Random thoughts on coding and technology

Beyond Prompts: Building a Codex Playbook for Real Application Development

The Core Problem

What This Repository Builds Instead

Why Durable Repo Instructions Matter

Why Skills Beat Giant Prompt Files

The C# Guidance

The Python Guidance

The React + TypeScript Guidance

Why The Examples Matter

Scripts And Templates As Engineering Tools

Strengths

Limitations

Where This Helps Most

Pros And Cons

Final View

Comments

Post a Comment

Popular Posts

Train Your Own LoRA with ComfyUI: A Step-by-Step Guide

Firebase Cloud Messaging with Delphi 10.1 Berlin update 2.