Guide to Writing Skills

This guide explains how to create well-structured, benchmarkable skills for the AI Research Skills library. Whether you're writing a simple instructional skill or a complex benchmarked workflow, this document covers the complete anatomy.

Skill Anatomy

Every skill lives in a directory under skills/{category}/{skill-name}/ and consists of:

skills/research/your-skill-name/
├── SKILL.md              # Required: skill definition + instructions
└── rubrics/              # Optional: evaluation criteria
    └── skill_rubric.json # General rubric for judging outputs

SKILL.md Structure

The SKILL.md file has two parts: YAML frontmatter and Markdown body.

---
id: research.your-skill-name
title: Your Skill Title
summary: One-line description of what the skill does.
category: research
version: "0.1.0"
tags:
  - tag1
  - tag2
workflow_stage: discover
inputs:
  - Description of required inputs
outputs:
  - Description of what the skill produces
tools:
  - optional.tool.reference
author: Your Name
license: CC-BY-4.0
last_updated: 2026-01-18
---

# Your Skill Title

## What you produce
...

## Procedure
...

Required Fields

FieldDescriptionExample
idUnique identifier: {category}.{kebab-name}research.read-journal-article
titleHuman-readable titleRead a Journal Article
summaryOne-line descriptionA workflow to read papers and produce notes
categoryresearch or analysisresearch
tagsList of keywords[reading, literature-review]
authorCreator nameAI Research Skills Contributors
licenseDefault CC-BY-4.0CC-BY-4.0
last_updatedISO date2026-01-18

Optional Fields

FieldDescription
versionSemantic version (e.g., "0.1.0")
workflow_stageWhere in research workflow: discover, analyze, write, publish
inputsList of required inputs
outputsList of what the skill produces
toolsList of MCP tools the skill uses
creatorOriginal creator (if different from author)
amendersList of contributors who modified the skill

Benchmarking Artifacts

Skills can be benchmarked to measure quality. This requires additional artifacts.

Overview

skills/research/your-skill-name/
└── rubrics/
    └── skill_rubric.json     # How to judge outputs (general)

data/benchmarks/datasets/{category}/{skill-id}.{locale}/
├── dataset.json              # Index of test cases
└── cases/
    ├── case-001/
    │   ├── case_rubric.json  # Case-specific judging criteria
    │   └── expected_spec.json # Deterministic patterns to match
    └── case-002/
        └── ...

Three Evaluation Layers

LayerFilePurposeWhen to Use
Skill Rubricskill_rubric.jsonGeneral criteria for all outputsAlways (defines quality dimensions)
Case Rubriccase_rubric.jsonCase-specific requirements/overridesWhen cases need custom criteria
Expected Specexpected_spec.jsonDeterministic pattern matchingWhen outputs have predictable structure

DatasetBundle Structure

A DatasetBundle contains test cases with SHA-256 pinned references.

dataset.json

{
  "dataset_id": "reading/read-journal-article.en-US",
  "version": "0.1.0",
  "name": "Read Journal Article",
  "description": "Test cases for reading skill evaluation.",
  "skill_id": "research.read-journal-article",
  "governance": "public",
  "cases": [
    {
      "case_id": "sample-001",
      "name": "Neural Network Compression",
      "description": "Synthetic paper about pruning techniques.",
      "input": {
        "paper": { "title": "...", "abstract": "...", "body": "..." },
        "question": "What compression ratio does this achieve?",
        "constraints": { "depth": "triage" }
      },
      "case_rubric_ref": {
        "path": "cases/sample-001/case_rubric.json",
        "sha256": "70e34bf880e76534df7630d57e068674e901ddbbae744cdfdacf25e49e9c5260",
        "size_bytes": 474
      },
      "expected_spec_ref": {
        "path": "cases/sample-001/expected_spec.json",
        "sha256": "4cfcbb7af3e024fce48dbff6a15be9c14722d4447028a8814d8768150a620cfc",
        "size_bytes": 1390
      },
      "tags": ["reading", "machine-learning"]
    }
  ],
  "metadata": {
    "note": "Synthetic papers for testing.",
    "judge_topology": {
      "default": "pointwise",
      "description": "Pointwise: LLM judge evaluates single output. Pairwise: for A/B regression testing."
    }
  }
}

Hash Pinning

Every referenced file must have a SHA-256 hash. This ensures:

Generate hashes:

# Single file
sha256sum cases/sample-001/case_rubric.json

# All case files
find cases -name "*.json" -exec sha256sum {} \;

Rubric-Writing Guidance

Skill Rubric (skill_rubric.json)

Defines general quality dimensions that apply to all outputs from this skill.

{
  "rubric_id": "read-journal-article",
  "version": "0.1.0",
  "name": "Read a Journal Article — Skill Rubric",
  "skill_id": "research.read-journal-article",
  "evaluation_type": "absolute",
  "criteria": [
    {
      "id": "faithfulness",
      "name": "Faithfulness",
      "description": "Claims match the paper; no invented results.",
      "weight": 3,
      "scale": {
        "type": "numeric",
        "min": 1,
        "max": 5,
        "anchors": [
          { "value": 1, "label": "Hallucinated or contradicts the paper" },
          { "value": 3, "label": "Mostly faithful with minor inaccuracies" },
          { "value": 5, "label": "Fully faithful and careful" }
        ]
      },
      "required": true
    }
  ],
  "scoring": { "aggregation": "weighted_average", "normalize": true }
}

Best practices:

Case Rubric (case_rubric.json)

Adds case-specific requirements that override or extend the skill rubric.

{
  "case_rubric_id": "case-sample-001",
  "base_rubric_id": "read-journal-article",
  "case_id": "sample-001",
  "description": "Case-level rubric for neural network compression paper.",
  "criteria_overrides": [
    {
      "id": "completeness",
      "required_elements": [
        "Must mention the 8x compression ratio",
        "Must identify the ImageNet benchmark"
      ]
    }
  ]
}

When to use:

Expected Spec (expected_spec.json)

Defines deterministic patterns for offline evaluation (no LLM judge needed).

{
  "expected_spec_id": "exp-sample-001",
  "case_id": "sample-001",
  "description": "Pattern-only spec for reading note headings.",
  "expected_patterns": [
    { "pattern": "^##\\s+Thesis", "flags": "mi", "description": "Thesis heading" },
    { "pattern": "^##\\s+Key\\s+Claims", "flags": "mi", "description": "Key Claims heading" }
  ],
  "forbidden_patterns": [
    { "pattern": "\\bTODO\\b", "flags": "i", "description": "No TODO placeholders" }
  ],
  "metadata": {
    "pass_threshold": 0.7,
    "note": "6/8 headings must be present."
  }
}

Best practices:

Case Rubric vs Expected Spec

AspectCase RubricExpected Spec
EvaluationLLM judge (subjective)Pattern matching (deterministic)
Use forQuality, completeness, accuracyStructure, format, required content
RequiresJudge model at runtimeOffline, no LLM needed
Best forOpen-ended outputsPredictable output structure

Authoring Workflow

1. Create Skill Directory

cd ai-research-skills
mkdir -p skills/research/your-skill-name/rubrics

2. Write SKILL.md

Copy from template and fill in your content:

cp skills/_template/SKILL.md skills/research/your-skill-name/

3. Add Skill Rubric (Optional)

Create rubrics/skill_rubric.json with your evaluation criteria.

4. Create DatasetBundle (Optional)

For benchmarkable skills:

mkdir -p data/benchmarks/datasets/research/your-skill.en-US/cases/case-001

Create dataset.json and case-level rubrics/specs.

5. Verify Locally

# Sync and build docs
cd packages/airs/docs
npm run build

# Run tests (from bars package)
cd packages/bars
npm test

# Verify dataset integrity
node -e "
const fs = require('fs');
const crypto = require('crypto');
const dataset = JSON.parse(fs.readFileSync('data/benchmarks/datasets/...'));
// ... verify hashes
"

6. Commit and Submit

git add .
git commit -m "Add skill: your-skill-name"
git push
# Create PR on GitHub

Contributing Methods

Option 1: Add Skill Portal (Recommended)

The Add Skill Portal provides a guided form:

  1. Fill out skill details (title, summary, content)
  2. Preview the generated SKILL.md
  3. Submit—a PR is created automatically

Best for: Simple skills without benchmarking artifacts.

Option 2: Manual PR

For skills with rubrics and datasets:

  1. Fork the repository
  2. Create your skill directory structure
  3. Add all required files
  4. Run local proofs (npm run build, npm test)
  5. Submit a PR

Option 3: Issue Submission

For non-technical contributors:

  1. Open a GitHub Issue
  2. Fill in the skill template
  3. A maintainer converts it to a PR

Example Skills

Reference these public skills for patterns:

SkillTypeBenchmarking
research.read-journal-articleNon-deterministicRubric + expected_spec patterns
research.citation-formatting-apa-7-cneDeterministicFull expected_spec
research.reference-formattingMixedRubric-based
analysis.text-transformSimpleMinimal

Local Proof Commands

Before submitting, verify your changes work:

# Build the docs site (includes skill sync)
cd packages/airs/docs
npm run build

# Expect: "Sync successful" + "64 pages generated" (or more)

# Run bars tests (if you added datasets)
cd packages/bars
npm test

# Expect: "375 tests passed" (or more)

# Verify dataset hashes (if applicable)
node -e "
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

const datasetPath = 'data/benchmarks/datasets/...';
const dataset = JSON.parse(fs.readFileSync(path.join(datasetPath, 'dataset.json')));

for (const c of dataset.cases) {
  for (const ref of [c.case_rubric_ref, c.expected_spec_ref].filter(Boolean)) {
    const content = fs.readFileSync(path.join(datasetPath, ref.path));
    const hash = crypto.createHash('sha256').update(content).digest('hex');
    console.log(hash === ref.sha256 ? '✓' : '✗', ref.path);
  }
}
"

Questions?