Guide to Writing Skills
This guide explains how to create well-structured, benchmarkable skills for the AI Research Skills library. Whether you're writing a simple instructional skill or a complex benchmarked workflow, this document covers the complete anatomy.
Skill Anatomy
Every skill lives in a directory under skills/{category}/{skill-name}/ and consists of:
skills/research/your-skill-name/
├── SKILL.md # Required: skill definition + instructions
└── rubrics/ # Optional: evaluation criteria
└── skill_rubric.json # General rubric for judging outputs
SKILL.md Structure
The SKILL.md file has two parts: YAML frontmatter and Markdown body.
---
id: research.your-skill-name
title: Your Skill Title
summary: One-line description of what the skill does.
category: research
version: "0.1.0"
tags:
- tag1
- tag2
workflow_stage: discover
inputs:
- Description of required inputs
outputs:
- Description of what the skill produces
tools:
- optional.tool.reference
author: Your Name
license: CC-BY-4.0
last_updated: 2026-01-18
---
# Your Skill Title
## What you produce
...
## Procedure
...
Required Fields
| Field | Description | Example |
|---|---|---|
id | Unique identifier: {category}.{kebab-name} | research.read-journal-article |
title | Human-readable title | Read a Journal Article |
summary | One-line description | A workflow to read papers and produce notes |
category | research or analysis | research |
tags | List of keywords | [reading, literature-review] |
author | Creator name | AI Research Skills Contributors |
license | Default CC-BY-4.0 | CC-BY-4.0 |
last_updated | ISO date | 2026-01-18 |
Optional Fields
| Field | Description |
|---|---|
version | Semantic version (e.g., "0.1.0") |
workflow_stage | Where in research workflow: discover, analyze, write, publish |
inputs | List of required inputs |
outputs | List of what the skill produces |
tools | List of MCP tools the skill uses |
creator | Original creator (if different from author) |
amenders | List of contributors who modified the skill |
Benchmarking Artifacts
Skills can be benchmarked to measure quality. This requires additional artifacts.
Overview
skills/research/your-skill-name/
└── rubrics/
└── skill_rubric.json # How to judge outputs (general)
data/benchmarks/datasets/{category}/{skill-id}.{locale}/
├── dataset.json # Index of test cases
└── cases/
├── case-001/
│ ├── case_rubric.json # Case-specific judging criteria
│ └── expected_spec.json # Deterministic patterns to match
└── case-002/
└── ...
Three Evaluation Layers
| Layer | File | Purpose | When to Use |
|---|---|---|---|
| Skill Rubric | skill_rubric.json | General criteria for all outputs | Always (defines quality dimensions) |
| Case Rubric | case_rubric.json | Case-specific requirements/overrides | When cases need custom criteria |
| Expected Spec | expected_spec.json | Deterministic pattern matching | When outputs have predictable structure |
DatasetBundle Structure
A DatasetBundle contains test cases with SHA-256 pinned references.
dataset.json
{
"dataset_id": "reading/read-journal-article.en-US",
"version": "0.1.0",
"name": "Read Journal Article",
"description": "Test cases for reading skill evaluation.",
"skill_id": "research.read-journal-article",
"governance": "public",
"cases": [
{
"case_id": "sample-001",
"name": "Neural Network Compression",
"description": "Synthetic paper about pruning techniques.",
"input": {
"paper": { "title": "...", "abstract": "...", "body": "..." },
"question": "What compression ratio does this achieve?",
"constraints": { "depth": "triage" }
},
"case_rubric_ref": {
"path": "cases/sample-001/case_rubric.json",
"sha256": "70e34bf880e76534df7630d57e068674e901ddbbae744cdfdacf25e49e9c5260",
"size_bytes": 474
},
"expected_spec_ref": {
"path": "cases/sample-001/expected_spec.json",
"sha256": "4cfcbb7af3e024fce48dbff6a15be9c14722d4447028a8814d8768150a620cfc",
"size_bytes": 1390
},
"tags": ["reading", "machine-learning"]
}
],
"metadata": {
"note": "Synthetic papers for testing.",
"judge_topology": {
"default": "pointwise",
"description": "Pointwise: LLM judge evaluates single output. Pairwise: for A/B regression testing."
}
}
}
Hash Pinning
Every referenced file must have a SHA-256 hash. This ensures:
- Reproducibility: Same hash = same content
- Integrity: Tampered files fail verification
- Fail-closed: Missing or mismatched files error immediately
Generate hashes:
# Single file
sha256sum cases/sample-001/case_rubric.json
# All case files
find cases -name "*.json" -exec sha256sum {} \;
Rubric-Writing Guidance
Skill Rubric (skill_rubric.json)
Defines general quality dimensions that apply to all outputs from this skill.
{
"rubric_id": "read-journal-article",
"version": "0.1.0",
"name": "Read a Journal Article — Skill Rubric",
"skill_id": "research.read-journal-article",
"evaluation_type": "absolute",
"criteria": [
{
"id": "faithfulness",
"name": "Faithfulness",
"description": "Claims match the paper; no invented results.",
"weight": 3,
"scale": {
"type": "numeric",
"min": 1,
"max": 5,
"anchors": [
{ "value": 1, "label": "Hallucinated or contradicts the paper" },
{ "value": 3, "label": "Mostly faithful with minor inaccuracies" },
{ "value": 5, "label": "Fully faithful and careful" }
]
},
"required": true
}
],
"scoring": { "aggregation": "weighted_average", "normalize": true }
}
Best practices:
- Use 3-5 criteria (not too many)
- Assign weights to indicate importance
- Provide clear anchor descriptions for each score level
- Make criteria independent (avoid overlap)
Case Rubric (case_rubric.json)
Adds case-specific requirements that override or extend the skill rubric.
{
"case_rubric_id": "case-sample-001",
"base_rubric_id": "read-journal-article",
"case_id": "sample-001",
"description": "Case-level rubric for neural network compression paper.",
"criteria_overrides": [
{
"id": "completeness",
"required_elements": [
"Must mention the 8x compression ratio",
"Must identify the ImageNet benchmark"
]
}
]
}
When to use:
- When specific content must be present (figures, tables, numbers)
- When the case has unique constraints
- When testing specific edge cases
Expected Spec (expected_spec.json)
Defines deterministic patterns for offline evaluation (no LLM judge needed).
{
"expected_spec_id": "exp-sample-001",
"case_id": "sample-001",
"description": "Pattern-only spec for reading note headings.",
"expected_patterns": [
{ "pattern": "^##\\s+Thesis", "flags": "mi", "description": "Thesis heading" },
{ "pattern": "^##\\s+Key\\s+Claims", "flags": "mi", "description": "Key Claims heading" }
],
"forbidden_patterns": [
{ "pattern": "\\bTODO\\b", "flags": "i", "description": "No TODO placeholders" }
],
"metadata": {
"pass_threshold": 0.7,
"note": "6/8 headings must be present."
}
}
Best practices:
- Use for structural validation (headings, required sections)
- Combine with rubrics for full evaluation
- Set realistic
pass_thresholdvalues
Case Rubric vs Expected Spec
| Aspect | Case Rubric | Expected Spec |
|---|---|---|
| Evaluation | LLM judge (subjective) | Pattern matching (deterministic) |
| Use for | Quality, completeness, accuracy | Structure, format, required content |
| Requires | Judge model at runtime | Offline, no LLM needed |
| Best for | Open-ended outputs | Predictable output structure |
Authoring Workflow
1. Create Skill Directory
cd ai-research-skills
mkdir -p skills/research/your-skill-name/rubrics
2. Write SKILL.md
Copy from template and fill in your content:
cp skills/_template/SKILL.md skills/research/your-skill-name/
3. Add Skill Rubric (Optional)
Create rubrics/skill_rubric.json with your evaluation criteria.
4. Create DatasetBundle (Optional)
For benchmarkable skills:
mkdir -p data/benchmarks/datasets/research/your-skill.en-US/cases/case-001
Create dataset.json and case-level rubrics/specs.
5. Verify Locally
# Sync and build docs
cd packages/airs/docs
npm run build
# Run tests (from bars package)
cd packages/bars
npm test
# Verify dataset integrity
node -e "
const fs = require('fs');
const crypto = require('crypto');
const dataset = JSON.parse(fs.readFileSync('data/benchmarks/datasets/...'));
// ... verify hashes
"
6. Commit and Submit
git add .
git commit -m "Add skill: your-skill-name"
git push
# Create PR on GitHub
Contributing Methods
Option 1: Add Skill Portal (Recommended)
The Add Skill Portal provides a guided form:
- Fill out skill details (title, summary, content)
- Preview the generated SKILL.md
- Submit—a PR is created automatically
Best for: Simple skills without benchmarking artifacts.
Option 2: Manual PR
For skills with rubrics and datasets:
- Fork the repository
- Create your skill directory structure
- Add all required files
- Run local proofs (
npm run build,npm test) - Submit a PR
Option 3: Issue Submission
For non-technical contributors:
- Open a GitHub Issue
- Fill in the skill template
- A maintainer converts it to a PR
Example Skills
Reference these public skills for patterns:
| Skill | Type | Benchmarking |
|---|---|---|
research.read-journal-article | Non-deterministic | Rubric + expected_spec patterns |
research.citation-formatting-apa-7-cne | Deterministic | Full expected_spec |
research.reference-formatting | Mixed | Rubric-based |
analysis.text-transform | Simple | Minimal |
Local Proof Commands
Before submitting, verify your changes work:
# Build the docs site (includes skill sync)
cd packages/airs/docs
npm run build
# Expect: "Sync successful" + "64 pages generated" (or more)
# Run bars tests (if you added datasets)
cd packages/bars
npm test
# Expect: "375 tests passed" (or more)
# Verify dataset hashes (if applicable)
node -e "
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');
const datasetPath = 'data/benchmarks/datasets/...';
const dataset = JSON.parse(fs.readFileSync(path.join(datasetPath, 'dataset.json')));
for (const c of dataset.cases) {
for (const ref of [c.case_rubric_ref, c.expected_spec_ref].filter(Boolean)) {
const content = fs.readFileSync(path.join(datasetPath, ref.path));
const hash = crypto.createHash('sha256').update(content).digest('hex');
console.log(hash === ref.sha256 ? '✓' : '✗', ref.path);
}
}
"
Questions?
- Open an issue for help
- See the Taxonomy for field specifications
- Review the Benchmark Method for evaluation details