— How OpenAI Used Codex to Build a One-Million-Line Codebase System
Recently, OpenAI’s engineering team shared an article that’s very worth reading for engineers:
Original: https://openai.com/zh-Hans-CN/index/harness-engineering/
The article describes an experiment they ran:
Build a real software product, but with not a single line of the entire codebase written by humans.
All code, including:
- application logic
- tests
- CI/CD
- documentation
- ops scripts
- internal tools
was generated entirely by Codex.
And the human engineers did only one thing:
Design the system, constraints, and feedback loops.
In the end, with 3 engineers, they built a system approaching 1 million lines of code in 5 months.
This article actually reveals an important trend:
Software engineering is shifting from “writing code” to “designing systems that AI can work in.”
Below I’ll break down the core content of the article and analyze what it means for future engineering practices.
1. The experiment: a product whose code is generated entirely by AI
The OpenAI team started from an empty Git repository.
The initial architecture was generated automatically by Codex + GPT-5, including:
- project structure
- CI configuration
- code formatting rules
- package management
- application framework
- AGENTS.md
They even had this:
The documentation that instructs AI how to work in the repository was also written by AI.
Results after 5 months:
- about 1 million lines of code
- 1500+ Pull Requests
- only 3 engineers on the team
- on average 3.5 PRs handled per person per day
The system already had:
- internal daily active users
- external Alpha users
In other words:
This wasn’t a Demo—it was a real, running product.
Their core principle was:
Humans steer, agents execute. Humans steer; AI executes.
2. The engineer’s role changed
In this setup:
Engineers hardly write code.
Engineers’ work becomes:
1️⃣ design the system structure 2️⃣ define task goals 3️⃣ design feedback loops 4️⃣ build tools that AI can use
In other words,
Before:
Now:
The whole flow looks roughly like:
Quite often:
PR review is also done by AI.
3. The system must be “readable to Agents”
A very important lesson:
Knowledge that the Agent can’t see = it doesn’t exist.
For example:
- Slack discussions
- Google Docs
- human memory
are all invisible knowledge to AI.
So they made a key decision:
Turn the code repository into the single source of truth.
Everything is written into the repo:
- architecture docs
- design docs
- product specs
- execution plans
- technical debt
- decision records
The directory structure looks roughly like:
Through this structure, AI can:
- find information
- reason about the system
- modify the code
They call this approach:
Agent-readable repository
4. Don’t write 1000-line prompts
They tried:
It failed completely.
Reasons:
1 Context windows are a scarce resource
Long documents crowd out:
- code
- tasks
- important context
2 Too many rules become ineffective
When everything is important:
actually nothing is important.
3 Docs rot
If the maintenance cost is too high:
docs become:
a graveyard of outdated rules
What they ended up doing:
AGENTS.md is only a table of contents
Like:
That is:
Give AI a map, not a manual.
5. Architecture must be strict
Agents have a characteristic in this environment:
They will copy existing patterns.
If the architecture isn’t strict:
the code will drift quickly.
So they enforced:
a strict layered architecture
For example:
Rules:
- can only depend downward
- cross-layer calls are not allowed
All rules are enforced automatically through:
- custom linter
- structural tests
That is:
Architectural constraints are enforced by machines.
6. Throughput changed engineering culture
Agents write code far faster than humans.
That creates a new problem:
code throughput far exceeds humans’ ability to review.
So they changed traditional engineering culture:
Before:
Now:
Because:
The cost of waiting is higher than the cost of fixing.
7. AI can automatically complete the entire development workflow
As tooling improves, Codex can now:
automatically complete the entire development workflow:
1️⃣ validate repo state 2️⃣ reproduce a bug 3️⃣ record a bug video 4️⃣ implement the fix 5️⃣ run the app to verify 6️⃣ record a fix video 7️⃣ open a PR 8️⃣ handle review 9️⃣ fix CI 🔟 auto-merge
Humans only step in when necessary.
8. A new problem: AI will create “technical debt”
Fully AI-driven development brings one issue:
AI will copy wrong patterns.
For example:
- inconsistent implementations
- unreasonable abstractions
- duplicated tooling
Their initial approach was:
clean up AI code every Friday.
Later they found it didn’t scale.
Their final solution was:
Write engineering principles into the code.
For example:
- ban YOLO parsing
- enforce typed SDK
- prioritize shared utility libraries
Then:
run a Refactor Agent regularly.
Like:
automatic garbage collection (GC).
9. The truly scarce resource: human attention
The article ends with a single sentence:
The bottleneck in software engineering is no longer writing code, but human attention.
The core questions of future engineering systems become:
- how to design environments
- how to design feedback loops
- how to design constraint systems
rather than:
writing code.
10. What this implies for the future of software engineering
The article reveals a very important trend:
Software engineering is entering:
Agent-first engineering
Future engineers will be more like:
system designers
rather than:
code laborers
Core capabilities in the future may be:
- architecture design
- Agent workflow
- engineering automation
- feedback loop design
- engineering knowledge modeling
rather than:
- writing CRUD
Closing
This article really shows one thing:
AI is not just a code-writing tool.
It is changing:
the structure of software engineering itself.
Future software teams may become:
And the core value of engineers will become:
Designing systems in which AI can work efficiently.