SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents arXiv:2602.11210

In recent years, Software Engineering Agents (AI that automatically fixes bugs / edits code) have become a hot direction, for example:

SWE-agent
OpenHands
Devin-like systems
SWE-bench and other evaluation suites

But behind these systems there is a very practical engineering problem:

How do we provide an isolated environment for an AI agent where it can execute code, run tests, and modify a repository?

Traditional solutions almost all rely on Docker containers.

However, containers introduce new problems:

Slow environment setup
Very large images
High training cost at high concurrency
Requires Docker / Kubernetes infrastructure

A recent paper, SWE-MiniSandbox, proposes a very interesting idea:

Use no containers at all, and instead use Linux kernel mechanisms to implement a lightweight isolated environment.

The result is:

Storage usage reduced by 85%+
Environment preparation time reduced by 75%
RL training performance is almost identical

In this article, we’ll systematically analyze this system.

I. Background: Why Docker Becomes a Bottleneck

In an SWE Agent system, a task is usually:

1. clone GitHub repo
2. checkout a certain commit
3. install dependencies
4. run tests
5. agent modifies code
6. run tests again
7. determine whether the patch succeeded

To prevent:

Different tasks contaminating each other
Dependency conflicts
The agent executing dangerous commands

Usually each task runs inside an independent Docker container.

A typical flow:

task -> docker container
     -> code repo
     -> python env
     -> run tests

But the problem is:

1 Huge image sizes

Many SWE-bench task images are:

1GB ~ 3GB

With dozens of tasks you might end up with:

> 100GB

2 Slow startup time

Container startup + environment initialization:

~90 seconds

3 Very high concurrency in RL training

Common in reinforcement learning training:

128 - 512 parallel environments

The overhead introduced by Docker becomes very noticeable.

II. Core Idea: Replace Containers with the Linux Kernel

The key idea of SWE-MiniSandbox is:

Containers are not actually required.

Many isolation capabilities are provided by the Linux kernel itself.

The paper chooses two core mechanisms:

mount namespace
+
chroot

to implement an isolated sandbox.

III. Key Technique 1: mount namespace

Linux namespaces are the foundation of container technology.

Docker is essentially a combination of:

PID namespace
Network namespace
Mount namespace
User namespace
...

SWE-MiniSandbox uses only part of these.

What mount namespace does

It allows a process to:

See a completely different filesystem mount view

For example:

Host directories:

/
├── home
├── usr
└── data

Inside the sandbox it sees:

/
├── repo
├── venv
└── tmp

Each sandbox has:

a completely independent filesystem

IV. Key Technique 2: chroot

chroot is a very old but powerful Linux feature.

It can:

Change a process’s root directory

If you run:

chroot /sandbox/task_001

then what this process sees is:

/ == /sandbox/task_001

So:

/repo
/venv
/tmp

all become part of an isolated environment.

Combining mount namespace + chroot:

each task is like having its own Linux root directory.

But it still shares:

kernel
system libraries
some resources

so it’s very lightweight.

V. Key Technique 3: Reusing Python Environments

Another key optimization in SWE-MiniSandbox:

Avoid repeatedly creating Python environments

Traditional Docker:

each image includes
OS
Python
dependencies

which makes it large.

MiniSandbox’s approach:

Install on the host:

miniconda3

Step 2: Create a venv

For each task:

python -m venv

venv structure:

venv/
  bin/
  site-packages/
  symlink -> python

Many files are actually:

symlinks

so the size is only:

~100MB

VI. Key Technique 4: Environment Caching

The slowest steps in environment construction are:

pip install
repo setup
test run

MiniSandbox does the following:

Prebuild the environment

The first time:

repo
+ venv
+ dependencies
+ tests

After everything is ready:

tar.gz

is cached.

After that, each task only needs:

untar

to use it directly.

VII. Key Technique 5: I/O Concurrency Control

Untarring a tarball is disk I/O intensive.

If 100 tasks untar at the same time:

you can saturate the disk immediately

The authors propose a simple model:

Assume:

disk bandwidth = B
untar bandwidth per task = bj

Number of concurrent untars:

∑ bj ≤ B

Implementation:

Ray resource tags
+
semaphore

to control how many untar tasks run simultaneously.

VIII. System Architecture

MiniSandbox is not a standalone system; it is embedded into the SWE Agent ecosystem.

Overall structure:

 +-------------------+
 |      SkyRL        |
 |  RL distributed   |
 +---------+---------+
           |
           |
+----------v-----------+
|      SWE-agent       |
|   solve issues       |
+----------+-----------+
           |
           |
 +---------v---------+
 |      SWE-Rex      |
 | terminal control  |
 +---------+---------+
           |
           |
 +---------v---------+
 |   MiniSandbox     |
 | mount + chroot    |
 +-------------------+

Each sandbox contains:

repo
venv
terminal

IX. Experimental Results

The paper uses:

SWE-smith
SWE-bench Verified

1 Storage savings

Dataset	Docker	MiniSandbox
SWE-smith	295 GB	13.5 GB
SWE-bench	605 GB	89 GB

Savings:

85%+

2 Environment preparation time

Method	Time
Docker	88.9 s
MiniSandbox	23.6 s

Speedup:

~4x

3 Rollout time

Average task time:

Docker       ~350s
MiniSandbox  ~250s

Clearly faster.

4 Training performance

The paper introduces a metric:

Reward MD (Mean Deviation)

to measure:

reward differences between the two frameworks

Result:

≈ 0

This indicates:

The RL training signal is almost identical.

X. Why This Work Matters

This paper does not propose a new AI model.

But it solves a very critical systems problem:

How to run SWE agents at low cost.

Its potential impact:

1 Lowering the research barrier

Many teams don’t have:

Docker cluster
Kubernetes

With MiniSandbox:

a normal Linux server
+
Python

is enough.

2 Large-scale RL becomes more feasible

Reinforcement learning heavily depends on:

number of parallel environments

The lighter the environment:

the faster the rollout

and the lower the training cost.

3 Easier enterprise deployment

Within enterprises:

CI
testing
automatic bug fixing

Many scenarios don’t actually need:

full containers

MiniSandbox is lighter.

XI. Limitations

The paper also candidly points out issues.

1 Security isolation is weaker than containers

MiniSandbox only has:

mount namespace
chroot

While Docker also has:

cgroup
seccomp
user namespace
capabilities

So security is somewhat weaker.

2 Mainly Python projects

Experiments focus primarily on:

Python repos

For:

C++
Rust
Java
complex build systems

it still needs validation.

3 I/O is still a bottleneck

Even after optimization:

repo copy
venv copy
untar

still consumes a lot of time.

In the future, it might use:

overlayfs
copy-on-write

to further optimize.

XII. Some Thoughts

The most interesting part of this paper is:

Many bottlenecks in AI systems are not the model, but systems engineering.

The development of SWE agents requires:

model capability
+
toolchain
+
environment system

None can be missing.

MiniSandbox is essentially doing:

AI infra optimization

similar to:

RL environment acceleration

If in the future:

SWE agents
Devin-like systems
automatic code repair

are deployed at scale,

this kind of lightweight sandbox is very likely to become new infrastructure.

Summary

SWE-MiniSandbox’s core contribution can be summarized in one sentence:

Replace Docker containers with Linux-native isolation + Python environment caching to build a lightweight runtime environment for SWE agents.

Benefits:

Storage reduced by 85%+
Environment preparation 4x faster
RL training performance almost identical

This shows:

In AI systems engineering, often better system design matters more than a bigger model.

Train an AI to Write Code Without Docker: An In-Depth Look at SWE-MiniSandbox