Train an AI to Write Code Without Docker: An In-Depth Look at SWE-MiniSandbox
SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents arXiv:2602.11210
In recent years, Software Engineering Agents (AI that automatically fixes bugs / edits code) have become a hot direction, for example:
- SWE-agent
- OpenHands
- Devin-like systems
- SWE-bench and other evaluation suites
But behind these systems there is a very practical engineering problem:
How do we provide an isolated environment for an AI agent where it can execute code, run tests, and modify a repository?
Traditional solutions almost all rely on Docker containers.
However, containers introduce new problems:
- Slow environment setup
- Very large images
- High training cost at high concurrency
- Requires Docker / Kubernetes infrastructure
A recent paper, SWE-MiniSandbox, proposes a very interesting idea:
Use no containers at all, and instead use Linux kernel mechanisms to implement a lightweight isolated environment.
The result is:
- Storage usage reduced by 85%+
- Environment preparation time reduced by 75%
- RL training performance is almost identical
In this article, we’ll systematically analyze this system.
I. Background: Why Docker Becomes a Bottleneck
In an SWE Agent system, a task is usually:
1. clone GitHub repo
2. checkout a certain commit
3. install dependencies
4. run tests
5. agent modifies code
6. run tests again
7. determine whether the patch succeeded
To prevent:
- Different tasks contaminating each other
- Dependency conflicts
- The agent executing dangerous commands
Usually each task runs inside an independent Docker container.
A typical flow:
task -> docker container
-> code repo
-> python env
-> run tests
But the problem is:
1 Huge image sizes
Many SWE-bench task images are:
1GB ~ 3GB
With dozens of tasks you might end up with:
> 100GB
2 Slow startup time
Container startup + environment initialization:
~90 seconds
3 Very high concurrency in RL training
Common in reinforcement learning training:
128 - 512 parallel environments
The overhead introduced by Docker becomes very noticeable.
II. Core Idea: Replace Containers with the Linux Kernel
The key idea of SWE-MiniSandbox is:
Containers are not actually required.
Many isolation capabilities are provided by the Linux kernel itself.
The paper chooses two core mechanisms:
mount namespace
+
chroot
to implement an isolated sandbox.
III. Key Technique 1: mount namespace
Linux namespaces are the foundation of container technology.
Docker is essentially a combination of:
PID namespace
Network namespace
Mount namespace
User namespace
...
SWE-MiniSandbox uses only part of these.
What mount namespace does
It allows a process to:
See a completely different filesystem mount view
For example:
Host directories:
/
├── home
├── usr
└── data
Inside the sandbox it sees:
/
├── repo
├── venv
└── tmp
Each sandbox has:
a completely independent filesystem
IV. Key Technique 2: chroot
chroot is a very old but powerful Linux feature.
It can:
Change a process’s root directory
If you run:
chroot /sandbox/task_001
then what this process sees is:
/ == /sandbox/task_001
So:
/repo
/venv
/tmp
all become part of an isolated environment.
Combining mount namespace + chroot:
each task is like having its own Linux root directory.
But it still shares:
kernel
system libraries
some resources
so it’s very lightweight.
V. Key Technique 3: Reusing Python Environments
Another key optimization in SWE-MiniSandbox:
Avoid repeatedly creating Python environments
Traditional Docker:
each image includes
OS
Python
dependencies
which makes it large.
MiniSandbox’s approach:
Step 1: Share miniconda
Install on the host:
miniconda3
Step 2: Create a venv
For each task:
python -m venv
venv structure:
venv/
bin/
site-packages/
symlink -> python
Many files are actually:
symlinks
so the size is only:
~100MB
VI. Key Technique 4: Environment Caching
The slowest steps in environment construction are:
pip install
repo setup
test run
MiniSandbox does the following:
Prebuild the environment
The first time:
repo
+ venv
+ dependencies
+ tests
After everything is ready:
tar.gz
is cached.
After that, each task only needs:
untar
to use it directly.
VII. Key Technique 5: I/O Concurrency Control
Untarring a tarball is disk I/O intensive.
If 100 tasks untar at the same time:
you can saturate the disk immediately
The authors propose a simple model:
Assume:
disk bandwidth = B
untar bandwidth per task = bj
Number of concurrent untars:
∑ bj ≤ B
Implementation:
Ray resource tags
+
semaphore
to control how many untar tasks run simultaneously.
VIII. System Architecture
MiniSandbox is not a standalone system; it is embedded into the SWE Agent ecosystem.
Overall structure:
+-------------------+
| SkyRL |
| RL distributed |
+---------+---------+
|
|
+----------v-----------+
| SWE-agent |
| solve issues |
+----------+-----------+
|
|
+---------v---------+
| SWE-Rex |
| terminal control |
+---------+---------+
|
|
+---------v---------+
| MiniSandbox |
| mount + chroot |
+-------------------+
Each sandbox contains:
repo
venv
terminal
IX. Experimental Results
The paper uses:
SWE-smith
SWE-bench Verified
1 Storage savings
| Dataset | Docker | MiniSandbox |
|---|---|---|
| SWE-smith | 295 GB | 13.5 GB |
| SWE-bench | 605 GB | 89 GB |
Savings:
85%+
2 Environment preparation time
| Method | Time |
|---|---|
| Docker | 88.9 s |
| MiniSandbox | 23.6 s |
Speedup:
~4x
3 Rollout time
Average task time:
Docker ~350s
MiniSandbox ~250s
Clearly faster.
4 Training performance
The paper introduces a metric:
Reward MD (Mean Deviation)
to measure:
reward differences between the two frameworks
Result:
≈ 0
This indicates:
The RL training signal is almost identical.
X. Why This Work Matters
This paper does not propose a new AI model.
But it solves a very critical systems problem:
How to run SWE agents at low cost.
Its potential impact:
1 Lowering the research barrier
Many teams don’t have:
Docker cluster
Kubernetes
With MiniSandbox:
a normal Linux server
+
Python
is enough.
2 Large-scale RL becomes more feasible
Reinforcement learning heavily depends on:
number of parallel environments
The lighter the environment:
the faster the rollout
and the lower the training cost.
3 Easier enterprise deployment
Within enterprises:
CI
testing
automatic bug fixing
Many scenarios don’t actually need:
full containers
MiniSandbox is lighter.
XI. Limitations
The paper also candidly points out issues.
1 Security isolation is weaker than containers
MiniSandbox only has:
mount namespace
chroot
While Docker also has:
cgroup
seccomp
user namespace
capabilities
So security is somewhat weaker.
2 Mainly Python projects
Experiments focus primarily on:
Python repos
For:
C++
Rust
Java
complex build systems
it still needs validation.
3 I/O is still a bottleneck
Even after optimization:
repo copy
venv copy
untar
still consumes a lot of time.
In the future, it might use:
overlayfs
copy-on-write
to further optimize.
XII. Some Thoughts
The most interesting part of this paper is:
Many bottlenecks in AI systems are not the model, but systems engineering.
The development of SWE agents requires:
model capability
+
toolchain
+
environment system
None can be missing.
MiniSandbox is essentially doing:
AI infra optimization
similar to:
RL environment acceleration
If in the future:
- SWE agents
- Devin-like systems
- automatic code repair
are deployed at scale,
this kind of lightweight sandbox is very likely to become new infrastructure.
Summary
SWE-MiniSandbox’s core contribution can be summarized in one sentence:
Replace Docker containers with Linux-native isolation + Python environment caching to build a lightweight runtime environment for SWE agents.
Benefits:
- Storage reduced by 85%+
- Environment preparation 4x faster
- RL training performance almost identical
This shows:
In AI systems engineering, often better system design matters more than a bigger model.