Why production agents are different from demos
In demos, an agent runs once, with perfect conditions, and everyone is watching. In production:
users give messy instructions
inputs include untrusted content (docs, chats, web pages)
tools can have real consequences (sending emails, editing records)
retries happen, and costs add up
failures must be explainable
So a production agent needs the same mindset as any system that can change data: clear boundaries, controlled permissions, and strong observability.
1) Choose the simplest agent that works
“Agent” doesn’t have to mean “autonomous robot.” Start with the lowest-risk pattern:
Pattern A: Assistant + tools (recommended start)
model suggests actions
app decides what tools can run
sensitive actions require approval
Pattern B: Planner → Executor
the model writes a short plan (2–6 steps)
each step is executed with constraints
execution stops when success criteria are met
Pattern C: Fully autonomous loop
Use only when you have:
mature guardrails
strong tracing + evals
strict tool permissions
clear business justification
Rule: If you can’t explain the agent’s behavior to a teammate in 60 seconds, it’s too complex.
2) Tools are permissions (treat them like admin access)
Tool calling is powerful because it lets models interact with external systems.
But that power is also the risk: an agent with broad tools is like an admin account with no limits.
Minimum safety rules for tools
Least privilege: tools can only do what’s necessary
Hard allowlists: restrict actions to allowed operations
Human approval: required for irreversible/high-risk actions
Schema validation: tool inputs/outputs must match strict schemas
Examples of “approval required” actions
initiating payments/refunds
deleting records
exporting user data
changing permissions/roles
sending messages to customers
A great design is “read-only by default,” then graduate to write actions with controls.
3) Prompt injection is not theoretical
Agents often read untrusted content (webpages, emails, documents). Prompt injection tries to hide instructions inside that content so the model follows the attacker’s intent instead of yours.
OWASP lists prompt injection and insecure output handling among key LLM application risks—meaning it’s common enough to be a standard security concern.
Practical defenses that work
A) Treat retrieved content as data, not instructionsnever allow retrieved text to override system rules
only extract facts from it
keep it clearly separated (quoted blocks / structured fields)
store system rules separately
place user content in a distinct section
never concatenate raw web text into your system prompt
Even if a prompt injection tries to force actions, your system should:
require approval
enforce policy checks
limit tool permissions
4) Build observability: make agents debuggable
Without tracing, agent failures look like “the AI is random.” With tracing, you can see:
what it retrieved
what it planned
what tool calls happened
where it failed
Agent observability tools commonly emphasize tracing, monitoring, and evaluation for agent behavior.
What to log (minimum)
request_id, user_id/tenant_id
model + version
retrieved sources (IDs, not full private content)
tool calls (name, parameters, result)
safety decisions (why it refused / why it proceeded)
latency + token/cost estimates
Important: redact secrets (tokens, passwords, sensitive IDs).
5) Evaluation: ship with tests, not hope
You don’t ship a payment system without tests. Agents also need tests—just different types:
Evals you should run
Accuracy evals: known questions with expected answers
Safety evals: disallowed actions, sensitive data requests
Tool reliability evals: tool failures, timeouts, retries
Regression evals: what got worse after prompt/tool changes
Make evals part of deployment. If score drops, you pause rollout.
A production agent “baseline” template (copy)
Agent Baseline (Production)
Scope:
- Allowed tasks: <list>
- Disallowed tasks: <list>
- Stop conditions: max steps, max time, max cost
Tools:
- Read-only tools by default
- Write tools require approval
- Tool allowlists + strict schemas
Security:
- Treat inputs as untrusted
- No instruction-following from retrieved content
- Policy checks before actions
Observability:
- Trace every step + tool call
- Log request_id + redacted inputs
- Monitor cost/latency/errors
Evaluation:
- Golden tests for accuracy
- Safety tests for refusal/constraints
- Regression checks on updates
Common mistakes (and quick fixes)
Too much autonomy too early → start tool-assisted, then graduate
Broad tool permissions → least privilege + allowlists
No tracing → add step-by-step logs and tool traces
Mixing retrieved content into system prompt → isolate content and extract facts only
No evals → build a small test set and run it every release
Closing
Production agents succeed when they’re treated like a controlled system—not a chat demo. Define boundaries, control tools, defend against prompt injection, and add observability + evals so improvements are measurable and failures are fixable.
If you want, OSCORP can help you ship a safe agent stack:
agent architecture selection (right pattern)
tool permission design + approvals
prompt injection defenses and policies
tracing + eval harness for production reliability