SKILL PLAYGROUND · SAFETY SCANNER

Skillcheck

Skillcheck verifies third-party skills before you deploy. Run tasks in a controlled sandbox, capture evidence, and scan for safety issues before anything touches production.

Skillcheck v0 is a static concept page. The runtime + scanner are in buildout.
Example Skill Report
Skill
Supply Chain Alert Bot
Task
Monitor API drift + notify Slack
Risk Scan
2 warnings · 0 blocks
Evidence
Trace log + artifacts bundle
Every run ships with a reproducible trace, inputs, outputs, and a safety summary.

Bring any skill

Point to a SKILL.md or bundle. Skillcheck pulls inputs, prompts, and constraints into a single run sheet.

Run with guardrails

Execute in a controlled environment with explicit tool policies and logged side effects.

Ship evidence

Each result includes the trace, outputs, and a structured safety report you can audit.

Why now

Five signals making Skillcheck urgent right now.

  • Model choices are exploding; teams need cross-vendor evaluation to pick the right stack now.
  • Safety expectations are rising; audit-ready evidence is becoming a baseline requirement.
  • LLM features are shipping weekly; repeatable skill tests are the only way to prevent regressions.
  • Safety tooling is fragmented; a unified skill + safety score reduces decision friction.
  • Cost pressure is real; measurable skill performance is required to justify spend.

Example Skillcheck

A concrete snapshot of what a skill review looks like before it ships.

Scenario

Vendor risk monitor

Scan new vendors weekly, flag anomalies, and notify the risk queue.

Inputs
  • SKILL.md + config bundle
  • Fixture dataset (100 vendors)
  • Tool policy: read-only APIs
Outputs
  • Trace log + JSON bundle
  • Safety scan summary
  • Risk alerts with evidence

How Skillcheck works

Three steps, one truth: prove the skill before you deploy it.

01

Ingest

Import a skill definition, configuration, and target task.

02

Execute

Run the skill against a safe fixture with bounded tools and explicit permissions.

03

Verify

Review the evidence pack: trace, outputs, and safety annotations.

Preview console (static)

Sketch what you want to test. The runtime button is wired to a placeholder for now.

Coming soon.
Now

Visual + narrative v0

Clear story, honest scope, and a preview layout ready for runtime wiring.

Next

Execution harness

Run skills in a sandbox with tool policies, evidence capture, and trace bundling.

Later

Safety scanner

Automatic policy checks, diff reports, and deploy-ready confidence scores.