The run_script Pattern: How AWS Gave AI Agents Sandboxed Python Without Shell Access

Intermediate15m readFull-stack developers

A deep look at run_script — the AWS MCP Server tool that lets agents chain multiple AWS API calls in one server-side Python execution, with the IAM, network, and debugging tradeoffs developers should understand before turning it on in production.

Primary Focus

development tools

AI Tools Covered

AWSMCPAI Agents

What You'll Learn

  • The One-Sentence Description
  • Where the Script Runs
  • Why This Tool Exists
  • What the Sandbox Allows
  • The IAM Caveat — How Does It Call AWS APIs Without Network Access?
  • What the Sandbox Does Not Constrain

Guide Curriculum

What run_script Actually Is

Learn key concepts

3 lessons
  • The One-Sentence Description1m
  • Where the Script Runs1m
  • Why This Tool Exists2m

The Sandbox

Learn key concepts

3 lessons
  • What the Sandbox Allows1m
  • The IAM Caveat — How Does It Call AWS APIs Without Network Access?1m
  • What the Sandbox Does Not Constrain1m

Tradeoffs and Risks

Learn key concepts

3 lessons
  • The IAM Blast Radius2m
  • Debugging Opaque Failures1m
  • Before vs. After — A Concrete Example1m

When to Use It (And When Not To)

Learn key concepts

3 lessons
  • Good Fits for run_script1m
  • Bad Fits for run_script1m
  • A Decision Heuristic2m

Preview: First Lesson

What run_script Actually Is

The One-Sentence Description

From the AWS announcement, verbatim:

The run_script tool lets the agent write a short Python script that runs server-side in a sandboxed environment.

Three things matter in that sentence:

  1. The agent writes the script. You don't write it. The LLM emits Python code as a single tool-call argument.
  2. It runs server-side. Not on your laptop. Not in your shell. AWS hosts the execution.
  3. It is sandboxed. Some operations the script could normally do are blocked.

Read that again before you turn this on. The AI is generating Python code. AWS is running that Python code. And the IAM permissions used by that script are yours.

Free Access

Start learning with this comprehensive guide

This guide includes:

4 modules with 12 lessons
15m estimated reading time

About the Author

H
✨ Vibe Coder
@hiram-clark

Hiram Clark is the founder and managing editor of vybecoding.ai and sets editorial direction for the guides and news published here. Articles are drafted with AI assistance and edited before publication. He works hands-on with the AI development tools, workflows, and infrastructure covered on the site.

Full Guide Content

Complete lesson text — start the interactive course above for exercises and progress tracking.

Module 1What run_script Actually Is

1.1The One-Sentence Description

From the AWS announcement, verbatim:

The run_script tool lets the agent write a short Python script that runs server-side in a sandboxed environment.

Three things matter in that sentence:

  1. The agent writes the script. You don't write it. The LLM emits Python code as a single tool-call argument.
  2. It runs server-side. Not on your laptop. Not in your shell. AWS hosts the execution.
  3. It is sandboxed. Some operations the script could normally do are blocked.

Read that again before you turn this on. The AI is generating Python code. AWS is running that Python code. And the IAM permissions used by that script are yours.

1.2Where the Script Runs

The AWS post says only "server-side in a sandboxed environment." It does not name a specific service — no mention of Lambda, Fargate, or any other compute primitive. Treat the runtime as opaque: an AWS-managed sandbox you cannot SSH into, attach a debugger to, or read logs from in the usual way.

If you are coming from a mental model of "I'll just run this on my dev box," run_script is not that. It is closer to a serverless function with a single entry point: agent emits Python, AWS runs it, AWS returns the result. The script's lifetime is the duration of one MCP tool call.

1.3Why This Tool Exists

Look at the alternative. To answer "Which of my S3 buckets have versioning disabled and were created in the last 30 days?", an agent without run_script does this:

  1. Call s3:ListBuckets via call_aws. Get back potentially hundreds of buckets.
  2. For each bucket, call s3:GetBucketVersioning via call_aws. That's N round-trips.
  3. For each bucket, call s3:GetBucketLocation and check creation date metadata. Another N round-trips.
  4. Combine, filter, return.

Every round-trip is a model token cost (the agent has to think about each response), a latency hit (sequential tool calls block on each other), and a failure surface (any one call can timeout and break the chain).

With run_script, the agent emits one Python script that does all of the above and returns the filtered list. One tool call. One round-trip. The model only sees the final answer, not 200 intermediate API responses.

AWS frames the benefit this way:

With run_script, the agent chains API calls, filters responses, and computes results in a single round-trip, which is both faster and more context-efficient.

"Context-efficient" is the underrated word. Token budget is the bottleneck for most agent tasks. Cutting 200 API responses from the conversation history means the agent can keep working on a task it would otherwise abandon for context-window reasons.


Module 2The Sandbox

2.1What the Sandbox Allows

The post is precise about the sandbox's permission model:

The sandbox inherits your IAM permissions but has no network access, so you can give an agent the ability to process data without giving it access to your local file system or a shell.

Unpack that:

  • IAM inheritance. The Python script runs with the same AWS permissions you (or the role the MCP Server is configured with) have. If your role can s3:GetObject on a bucket, the script can. If it can iam:CreateUser, the script can.
  • No network access. No outbound HTTP to arbitrary endpoints. No pip install of new packages at runtime. No exfil to a Discord webhook.
  • No local filesystem access. The script cannot read files from your machine — it runs server-side in the sandbox, so "local" simply does not apply.
  • No shell. The script cannot fork a subprocess to run aws s3 cp or curl or any binary. Pure Python only.

Notice the asymmetry. The sandbox is locked down on the outside (no internet) but wide open on the inside (your full IAM scope).

2.2The IAM Caveat — How Does It Call AWS APIs Without Network Access?

There is a contradiction in the announcement that AWS does not explain: the sandbox has "no network access" but is also expected to "chain API calls" to AWS services. AWS API calls go over the network.

The post does not document this. The most likely explanation is that "no network access" means no general internet access — outbound to arbitrary internet endpoints is blocked — while AWS API access is routed through internal infrastructure that is not classified as "network" from the script's point of view. This is consistent with how VPC-isolated Lambda functions can still hit AWS service endpoints via VPC endpoints without being attached to a NAT gateway.

Until AWS publishes more detail, treat the practical rule as: AWS APIs work, everything else does not. If you wanted the agent to cross-reference your S3 inventory against an external SaaS API, run_script will not do that — the agent will have to fall back to multiple tool calls, with the external call happening outside the sandbox.

2.3What the Sandbox Does Not Constrain

Three things the sandbox explicitly does not block:

  1. Mutating AWS APIs. If your IAM role can call iam:DeleteRole or s3:DeleteBucket, the agent's Python script can call them too. The sandbox is a network/filesystem boundary, not a permission filter.
  2. The agent's script logic. AWS does not run a static analyzer over the Python before executing it. If the model emits a script with a subtle bug — say, deleting the wrong bucket because of an off-by-one in a list comprehension — the bug runs.
  3. Combining sensitive reads with destructive writes. The script can read PII from one service and write it to another, all in one round-trip, with no human review between steps.

This is not theoretical. The sandbox is doing exactly what it advertises — keeping the agent off your laptop. It is not, and was never designed to be, a guardrail on what the agent does within AWS.


Module 3Tradeoffs and Risks

3.1The IAM Blast Radius

This is the single most important sentence in this guide:

The script inherits every permission of the calling role.

If the AWS MCP Server is configured with an IAM role that has AdministratorAccess, every run_script invocation runs as administrator. If the agent — driven by an LLM that occasionally hallucinates — emits a script that walks every region and deletes every untagged resource, that is a valid, well-formed action under that IAM role. Nothing in the sandbox stops it.

Compare with call_aws: a destructive call is one tool invocation, and a human-in-the-loop approval flow (or an MCP client policy) can intercept it before it fires. With run_script, the destructive call is buried inside arbitrary Python code that the human reviewer has to read and understand, in real time, before approving execution. That is a much harder review problem.

The mitigation is not to disable run_script. It is to ensure the IAM role driving the AWS MCP Server is scoped to the minimum permissions needed for the agent's task. Read-only roles for read-only tasks. Mutating permissions only when the agent's job requires mutation, and then only on the specific resources that are in scope. AWS also automatically adds two IAM context keys — aws:CalledViaAWSMCP and aws:ViaAWSMCPService — to every request that flows through a managed MCP server. You can use these in a standard IAM policy to constrain what any MCP-driven action can do, regardless of whether it came from call_aws or run_script. See Understanding IAM for Managed AWS MCP Servers for the policy patterns.

3.2Debugging Opaque Failures

When run_script fails, you get back a Python exception or error message from the sandbox. That is your entire debugging signal. There are no logs you can tail. No CloudWatch group named /aws/mcp/run_script. No sandbox console you can SSH into.

For trivial errors — a missing IAM permission, a syntax bug in the agent's script — this is enough. For non-trivial errors — a script that completes successfully but returns the wrong answer because the model misunderstood your request — you have nothing. The script ran, the API calls happened, the return value looks plausible, and you do not know it's wrong until the downstream consumer notices.

The practical implication: when you are designing an agent workflow that will use run_script, build a verification step. After the script returns, have the agent (or a separate tool) double-check the result against a second source of truth. For high-stakes tasks, do not let run_script be the last word.

3.3Before vs. After — A Concrete Example

Before run_script (4 sequential call_aws invocations):
Tool call 1: call_aws("s3", "list-buckets")
  Returns: 47 buckets
Tool call 2: call_aws("s3", "get-bucket-versioning", bucket="bucket-1")
  Returns: { Status: "Enabled" }
Tool call 3: call_aws("s3", "get-bucket-versioning", bucket="bucket-2")
  Returns: { Status: "Suspended" }
... (44 more calls)
Tool call 47: call_aws("s3", "get-bucket-versioning", bucket="bucket-47")
  Returns: { Status: undefined }
Agent: "Buckets without versioning: bucket-2, bucket-19, bucket-31, bucket-47"

47 tool calls. 47 model thinking steps. 47 chances for one of them to time out.

After run_script (1 invocation):
Tool call 1: run_script(script="""
import boto3
s3 = boto3.client('s3')
buckets = s3.list_buckets()['Buckets']
result = []
for b in buckets:
    v = s3.get_bucket_versioning(Bucket=b['Name'])
    if v.get('Status') != 'Enabled':
        result.append(b['Name'])
print(result)
""")
  Returns: ["bucket-2", "bucket-19", "bucket-31", "bucket-47"]
Agent: "Buckets without versioning: bucket-2, bucket-19, bucket-31, bucket-47"

One tool call. One round-trip. The agent only sees four bucket names, not 47 individual API responses.

That is the pattern, and that is the win.


Module 4When to Use It (And When Not To)

4.1Good Fits for run_script

Use run_script when:

  • The task requires combining results from multiple AWS API calls before the agent can answer
  • The intermediate API responses are large enough to bloat the conversation context
  • Latency matters and sequential call_aws is too slow
  • The work is purely computational — list, filter, join, count — and does not need internet access or local files
  • The IAM role driving the agent is scoped to read-only or to a narrow mutation surface

4.2Bad Fits for run_script

Avoid run_script when:

  • The task involves a single AWS API call — call_aws is simpler and easier to audit
  • The task needs data from outside AWS — external SaaS, databases not exposed via AWS APIs, web scraping
  • The task involves destructive mutations — keep those in call_aws so a human reviewer can intercept each one
  • The IAM role has broad write permissions and you have not yet scoped it down — the blast radius is too large for a script you do not get to read in advance
  • You need rich logs or step-by-step debugging — the sandbox does not give you that

4.3A Decision Heuristic

Ask three questions before letting an agent use run_script:

  1. Could this task be done with one or two call_aws calls? If yes, do that instead. The simplicity is worth the extra round-trips.
  2. Does the role driving the MCP Server have any mutating permissions the agent does not strictly need for this task? If yes, scope the role down before enabling run_script.
  3. If the LLM hallucinates and produces a malformed script, what is the worst thing that could happen? If the answer is "delete production resources," do not enable run_script for this role.

If all three checks pass, run_script is a good fit. If any one fails, fall back to call_aws or scope the role tighter first.


Summary

run_script is a small tool with a big tradeoff. It collapses N sequential AWS API calls into one server-side Python execution, which is faster, cheaper in tokens, and easier on the model's context window. The sandbox keeps the script off your laptop and off the public internet. But the sandbox does not narrow your IAM permissions, does not validate the agent's script before running it, and does not give you a debugger when something goes wrong.

The right mental model is not "the agent has Python." It is "the agent has whatever your IAM role has, plus the ability to chain calls." Scope the role first. Add a verification step for high-stakes tasks. Then run_script is a clear win.

For the broader picture of the AWS MCP Server and how to wire it into your agent of choice, see the AWS Agent Toolkit setup guide.

Source: The AWS MCP Server is now generally available — AWS News Blog, May 6, 2026.