Guide to Writing Skills

This guide explains how to create well-structured, benchmarkable skills for the AI Research Skills library. Whether you're writing a simple instructional skill or a complex benchmarked workflow, this document covers the complete anatomy.

Skill Anatomy

Every skill lives in a directory under skills/{category}/{skill-name}/ and consists of:

skills/research/your-skill-name/
├── SKILL.md              # Required: skill definition + instructions
└── rubrics/              # Optional: evaluation criteria
    └── skill_rubric.json # General rubric for judging outputs

SKILL.md Structure

The SKILL.md file has two parts: YAML frontmatter and Markdown body.

---
id: research.your-skill-name
title: Your Skill Title
summary: One-line description of what the skill does.
category: research
version: "0.1.0"
tags:
  - tag1
  - tag2
workflow_stage: discover
inputs:
  - Description of required inputs
outputs:
  - Description of what the skill produces
tools:
  - optional.tool.reference
author: Your Name
license: CC-BY-4.0
last_updated: 2026-01-18
---

# Your Skill Title

## What you produce
...

## Procedure
...

Required Fields

Field	Description	Example
`id`	Unique identifier: `{category}.{kebab-name}`	`research.read-journal-article`
`title`	Human-readable title	`Read a Journal Article`
`summary`	One-line description	`A workflow to read papers and produce notes`
`category`	`research` or `analysis`	`research`
`tags`	List of keywords	`[reading, literature-review]`
`author`	Creator name	`AI Research Skills Contributors`
`license`	Default `CC-BY-4.0`	`CC-BY-4.0`
`last_updated`	ISO date	`2026-01-18`

Optional Fields

Field	Description
`version`	Semantic version (e.g., `"0.1.0"`)
`workflow_stage`	Where in research workflow: `discover`, `analyze`, `write`, `publish`
`inputs`	List of required inputs
`outputs`	List of what the skill produces
`tools`	List of MCP tools the skill uses
`creator`	Original creator (if different from author)
`amenders`	List of contributors who modified the skill

Benchmarking Artifacts

Skills can be benchmarked to measure quality. This requires additional artifacts.

Overview

skills/research/your-skill-name/
└── rubrics/
    └── skill_rubric.json     # How to judge outputs (general)

data/benchmarks/datasets/{category}/{skill-id}.{locale}/
├── dataset.json              # Index of test cases
└── cases/
    ├── case-001/
    │   ├── case_rubric.json  # Case-specific judging criteria
    │   └── expected_spec.json # Deterministic patterns to match
    └── case-002/
        └── ...

Three Evaluation Layers

Layer	File	Purpose	When to Use
Skill Rubric	`skill_rubric.json`	General criteria for all outputs	Always (defines quality dimensions)
Case Rubric	`case_rubric.json`	Case-specific requirements/overrides	When cases need custom criteria
Expected Spec	`expected_spec.json`	Deterministic pattern matching	When outputs have predictable structure

DatasetBundle Structure

A DatasetBundle contains test cases with SHA-256 pinned references.

dataset.json

{
  "dataset_id": "reading/read-journal-article.en-US",
  "version": "0.1.0",
  "name": "Read Journal Article",
  "description": "Test cases for reading skill evaluation.",
  "skill_id": "research.read-journal-article",
  "governance": "public",
  "cases": [
    {
      "case_id": "sample-001",
      "name": "Neural Network Compression",
      "description": "Synthetic paper about pruning techniques.",
      "input": {
        "paper": { "title": "...", "abstract": "...", "body": "..." },
        "question": "What compression ratio does this achieve?",
        "constraints": { "depth": "triage" }
      },
      "case_rubric_ref": {
        "path": "cases/sample-001/case_rubric.json",
        "sha256": "70e34bf880e76534df7630d57e068674e901ddbbae744cdfdacf25e49e9c5260",
        "size_bytes": 474
      },
      "expected_spec_ref": {
        "path": "cases/sample-001/expected_spec.json",
        "sha256": "4cfcbb7af3e024fce48dbff6a15be9c14722d4447028a8814d8768150a620cfc",
        "size_bytes": 1390
      },
      "tags": ["reading", "machine-learning"]
    }
  ],
  "metadata": {
    "note": "Synthetic papers for testing.",
    "judge_topology": {
      "default": "pointwise",
      "description": "Pointwise: LLM judge evaluates single output. Pairwise: for A/B regression testing."
    }
  }
}

Hash Pinning

Every referenced file must have a SHA-256 hash. This ensures:

Reproducibility: Same hash = same content
Integrity: Tampered files fail verification
Fail-closed: Missing or mismatched files error immediately

Generate hashes:

# Single file
sha256sum cases/sample-001/case_rubric.json

# All case files
find cases -name "*.json" -exec sha256sum {} \;

Rubric-Writing Guidance

Skill Rubric (skill_rubric.json)

Defines general quality dimensions that apply to all outputs from this skill.

{
  "rubric_id": "read-journal-article",
  "version": "0.1.0",
  "name": "Read a Journal Article — Skill Rubric",
  "skill_id": "research.read-journal-article",
  "evaluation_type": "absolute",
  "criteria": [
    {
      "id": "faithfulness",
      "name": "Faithfulness",
      "description": "Claims match the paper; no invented results.",
      "weight": 3,
      "scale": {
        "type": "numeric",
        "min": 1,
        "max": 5,
        "anchors": [
          { "value": 1, "label": "Hallucinated or contradicts the paper" },
          { "value": 3, "label": "Mostly faithful with minor inaccuracies" },
          { "value": 5, "label": "Fully faithful and careful" }
        ]
      },
      "required": true
    }
  ],
  "scoring": { "aggregation": "weighted_average", "normalize": true }
}

Best practices:

Use 3-5 criteria (not too many)
Assign weights to indicate importance
Provide clear anchor descriptions for each score level
Make criteria independent (avoid overlap)

Case Rubric (case_rubric.json)

Adds case-specific requirements that override or extend the skill rubric.

{
  "case_rubric_id": "case-sample-001",
  "base_rubric_id": "read-journal-article",
  "case_id": "sample-001",
  "description": "Case-level rubric for neural network compression paper.",
  "criteria_overrides": [
    {
      "id": "completeness",
      "required_elements": [
        "Must mention the 8x compression ratio",
        "Must identify the ImageNet benchmark"
      ]
    }
  ]
}

When to use:

When specific content must be present (figures, tables, numbers)
When the case has unique constraints
When testing specific edge cases

Expected Spec (expected_spec.json)

Defines deterministic patterns for offline evaluation (no LLM judge needed).

{
  "expected_spec_id": "exp-sample-001",
  "case_id": "sample-001",
  "description": "Pattern-only spec for reading note headings.",
  "expected_patterns": [
    { "pattern": "^##\\s+Thesis", "flags": "mi", "description": "Thesis heading" },
    { "pattern": "^##\\s+Key\\s+Claims", "flags": "mi", "description": "Key Claims heading" }
  ],
  "forbidden_patterns": [
    { "pattern": "\\bTODO\\b", "flags": "i", "description": "No TODO placeholders" }
  ],
  "metadata": {
    "pass_threshold": 0.7,
    "note": "6/8 headings must be present."
  }
}

Best practices:

Use for structural validation (headings, required sections)
Combine with rubrics for full evaluation
Set realistic pass_threshold values

Case Rubric vs Expected Spec

Aspect	Case Rubric	Expected Spec
Evaluation	LLM judge (subjective)	Pattern matching (deterministic)
Use for	Quality, completeness, accuracy	Structure, format, required content
Requires	Judge model at runtime	Offline, no LLM needed
Best for	Open-ended outputs	Predictable output structure

Authoring Workflow

1. Create Skill Directory

cd ai-research-skills
mkdir -p skills/research/your-skill-name/rubrics

2. Write SKILL.md

Copy from template and fill in your content:

cp skills/_template/SKILL.md skills/research/your-skill-name/

3. Add Skill Rubric (Optional)

Create rubrics/skill_rubric.json with your evaluation criteria.

4. Create DatasetBundle (Optional)

For benchmarkable skills:

mkdir -p data/benchmarks/datasets/research/your-skill.en-US/cases/case-001

Create dataset.json and case-level rubrics/specs.

5. Verify Locally

# Sync and build docs
cd packages/airs/docs
npm run build

# Run tests (from bars package)
cd packages/bars
npm test

# Verify dataset integrity
node -e "
const fs = require('fs');
const crypto = require('crypto');
const dataset = JSON.parse(fs.readFileSync('data/benchmarks/datasets/...'));
// ... verify hashes
"

6. Commit and Submit

git add .
git commit -m "Add skill: your-skill-name"
git push
# Create PR on GitHub

Contributing Methods

Option 1: Add Skill Portal (Recommended)

The Add Skill Portal provides a guided form:

Fill out skill details (title, summary, content)
Preview the generated SKILL.md
Submit—a PR is created automatically

Best for: Simple skills without benchmarking artifacts.

Option 2: Manual PR

For skills with rubrics and datasets:

Fork the repository
Create your skill directory structure
Add all required files
Run local proofs (npm run build, npm test)
Submit a PR

Option 3: Issue Submission

For non-technical contributors:

Open a GitHub Issue
Fill in the skill template
A maintainer converts it to a PR

Example Skills

Reference these public skills for patterns:

Skill	Type	Benchmarking
`research.read-journal-article`	Non-deterministic	Rubric + expected_spec patterns
`research.citation-formatting-apa-7-cne`	Deterministic	Full expected_spec
`research.reference-formatting`	Mixed	Rubric-based
`analysis.text-transform`	Simple	Minimal

Local Proof Commands

Before submitting, verify your changes work:

# Build the docs site (includes skill sync)
cd packages/airs/docs
npm run build

# Expect: "Sync successful" + "64 pages generated" (or more)

# Run bars tests (if you added datasets)
cd packages/bars
npm test

# Expect: "375 tests passed" (or more)

# Verify dataset hashes (if applicable)
node -e "
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

const datasetPath = 'data/benchmarks/datasets/...';
const dataset = JSON.parse(fs.readFileSync(path.join(datasetPath, 'dataset.json')));

for (const c of dataset.cases) {
  for (const ref of [c.case_rubric_ref, c.expected_spec_ref].filter(Boolean)) {
    const content = fs.readFileSync(path.join(datasetPath, ref.path));
    const hash = crypto.createHash('sha256').update(content).digest('hex');
    console.log(hash === ref.sha256 ? '✓' : '✗', ref.path);
  }
}
"

Questions?

Open an issue for help
See the Taxonomy for field specifications
Review the Benchmark Method for evaluation details