The Detection Boundary of Agent Skills

Agent security is often discussed at the level of prompts. That is useful, but it is not where the system ends. An agent becomes dangerous, useful, or merely annoying through the capabilities it is allowed to exercise: files, network, shell commands, browsers, credentials, databases, and all the small wrappers we call tools or skills.

That is why the detection boundary of a skill matters.

By detection boundary, I mean the line between what can be inferred before the skill runs and what only becomes visible at runtime. Static analysis lives on the left side of that line. It can inspect imports, call sites, command templates, file paths, network clients, environment variable access, permission surfaces, and obvious data flows. It can say: this skill appears to read from the filesystem; this one shells out; this one may transmit data; this one depends on a remote service.

It cannot see everything.

The interesting part is not pretending otherwise. Dynamic string construction, runtime configuration, plugin loading, hidden side effects, opaque binaries, and model-generated parameters all push behavior past the static boundary. In an agent system, the skill is rarely the whole program. It is a capability exposed to another reasoning process. The real behavior is produced by the interaction between the skill, the model, the state of the environment, and the user’s task.

So a static analyzer for agent skills should not be judged only by whether it finds “bad” skills. It should also describe what kind of confidence it has and where that confidence stops.

There are at least three useful outputs:

A visible capability map: file, network, shell, browser, credential, storage.
A confidence boundary: direct evidence, inferred evidence, unknown behavior.
A review queue: small regions where human inspection is cheaper than runtime sandboxing everything.

This is more modest than a universal detector, but probably more useful.

Security tools are often evaluated as if the ideal result is a clean binary answer. Safe or unsafe. Malicious or benign. Allowed or blocked. Agent skills do not fit that shape very well. A skill that can send email is not automatically malicious. A skill that reads local files is not automatically wrong. A skill that calls curl may be ordinary deployment glue or a data exfiltration path.

The question is not only “is this skill dangerous?”

The better question is: “what would this skill be capable of doing if the model decided to use it that way?”

That question is exactly where static analysis becomes useful. It gives a map of capabilities before execution. The map will be incomplete, but incompleteness is still information. If the analyzer can say “I can see the filesystem access but not the destination of the network request,” that is a meaningful boundary. It tells the reviewer where the uncertain part begins.

For agent safety, this may be the right level of ambition. Not total knowledge. Not perfect prevention. A clear account of what is visible, what is inferred, and what remains beyond the static line.