ci: add AI-powered auto-labeling for new issues and PRs

Uses actions/ai-inference with GPT-5 to classify newly opened issues
and pull requests against the area/*, integration/*, db/*, and
concern/* label namespaces.

The system prompt is rendered at runtime from the live repo label list
plus descriptions, so GitHub label state is the single source of truth
for the taxonomy. Suggested labels are re-validated against the live
list before being applied, capped at 6 per item.
This commit is contained in:
kolaente 2026-04-11 17:41:05 +02:00
parent 2796fffbc1
commit c4cc6d34f6
2 changed files with 249 additions and 0 deletions

47
.github/workflows/auto-label.prompt.md vendored Normal file
View File

@ -0,0 +1,47 @@
You are a triage assistant for the Vikunja repository. Your job is to classify a single issue or pull request using the label taxonomy below, and return ONLY a JSON array of chosen label names — nothing else.
# Output format
Return exactly a JSON array of strings, e.g.:
["area/kanban", "area/recurring-tasks", "concern/regression"]
No prose, no markdown fences, no explanation. If you cannot confidently classify, return an empty array: []
# Rules
1. Every well-formed item gets at least one `area/*` label. If you truly cannot pick one, return [].
2. Multi-label is the norm. 24 labels is typical, occasionally up to 6.
3. `concern/*` is additive — it describes a cross-cutting quality (UX polish, performance, a11y, regression) on top of the feature area.
4. `integration/*` applies only when the item is about connecting to a *specific third-party system* (Slack, Gotify, Apprise, external webhooks, WeKan import, Todoist import, add-task-from-email, MCP, etc.).
- CalDAV is its own `area/caldav` — do NOT also tag `integration/*`.
- Generic webhook infrastructure is `area/webhooks`; a PR adding Slack delivery is `area/webhooks` + `integration/outbound`.
5. `db/mysql`, `db/postgres`, `db/sqlite` ONLY when the item is explicitly engine-specific (e.g. "fails on MySQL 8"). General DB issues get `area/database` with no engine tag.
6. `concern/regression` ONLY if the body explicitly says it worked in a prior version and is broken now.
7. Do NOT invent labels. Only use names from the taxonomy below — anything else will be discarded.
# Taxonomy
The following labels are available. Each line is `label-name — description`. Pick only from this list.
{{TAXONOMY}}
# Examples
Input:
TITLE: SQL syntax error on MySQL due to CAST in is_archived computation
BODY: After upgrading to 2.3.0 I get SQL syntax errors on MySQL 8. Worked fine on 2.2.x.
Output:
["area/database", "db/mysql", "concern/regression"]
Input:
TITLE: feat: add Slack webhook support
BODY: Adds outbound Slack notifications when tasks change.
Output:
["area/webhooks", "area/notifications", "integration/outbound"]
Input:
TITLE: Mobile: "Mark task done" should be easier to find
BODY: The checkbox is too small on phones.
Output:
["area/mobile", "area/task-editor", "concern/ux"]

202
.github/workflows/auto-label.yml vendored Normal file
View File

@ -0,0 +1,202 @@
name: Auto-label new issues and PRs
on:
issues:
types: [opened]
pull_request_target:
types: [opened]
permissions:
contents: read
issues: write
pull-requests: write
models: read
concurrency:
group: auto-label-${{ github.event.issue.number || github.event.pull_request.number }}
cancel-in-progress: false
jobs:
classify:
runs-on: ubuntu-latest
steps:
- name: Checkout (for prompt template)
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
sparse-checkout: |
.github/workflows/auto-label.prompt.md
sparse-checkout-cone-mode: false
- name: Render system prompt from live labels
id: render
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8
env:
PROMPT_TEMPLATE_PATH: .github/workflows/auto-label.prompt.md
with:
script: |
const fs = require('fs');
const path = require('path');
// Fetch every label in the repo, keep only the managed namespaces.
const managedPrefixes = ['area/', 'integration/', 'db/', 'concern/'];
const all = await github.paginate(
github.rest.issues.listLabelsForRepo,
{ owner: context.repo.owner, repo: context.repo.repo, per_page: 100 }
);
const managed = all
.filter(l => managedPrefixes.some(p => l.name.startsWith(p)))
.sort((a, b) => a.name.localeCompare(b.name));
if (managed.length === 0) {
core.setFailed('No managed labels found on the repo — cannot build taxonomy.');
return;
}
// Warn about labels without descriptions — they confuse the classifier.
const undescribed = managed.filter(l => !l.description || !l.description.trim());
if (undescribed.length > 0) {
core.warning(
`Labels without descriptions will be skipped: ${undescribed.map(l => l.name).join(', ')}`
);
}
// Group by namespace for readability in the prompt.
const groups = {};
for (const l of managed) {
if (!l.description || !l.description.trim()) continue;
const prefix = managedPrefixes.find(p => l.name.startsWith(p));
(groups[prefix] ||= []).push(l);
}
const sections = [];
for (const prefix of managedPrefixes) {
const entries = groups[prefix] || [];
if (entries.length === 0) continue;
sections.push(`## ${prefix}*\n`);
for (const l of entries) {
sections.push(`- \`${l.name}\` — ${l.description.trim()}`);
}
sections.push('');
}
const taxonomy = sections.join('\n');
// Expand the template.
const templatePath = process.env.PROMPT_TEMPLATE_PATH;
const template = fs.readFileSync(templatePath, 'utf8');
if (!template.includes('{{TAXONOMY}}')) {
core.setFailed(`Template ${templatePath} is missing the {{TAXONOMY}} placeholder.`);
return;
}
const rendered = template.replace('{{TAXONOMY}}', taxonomy);
const outPath = path.join(process.env.RUNNER_TEMP, 'system-prompt.md');
fs.writeFileSync(outPath, rendered);
core.setOutput('system_prompt_path', outPath);
core.info(`Rendered ${managed.length} labels into ${outPath}`);
- name: Build user prompt
id: prep
env:
TITLE: ${{ github.event.issue.title || github.event.pull_request.title }}
BODY: ${{ github.event.issue.body || github.event.pull_request.body }}
KIND: ${{ github.event_name == 'issues' && 'issue' || 'pull request' }}
run: |
mkdir -p "$RUNNER_TEMP/ai"
python3 - <<'PY' > "$RUNNER_TEMP/ai/user-prompt.txt"
import os
title = os.environ.get("TITLE", "").strip()
body = (os.environ.get("BODY", "") or "").strip() or "(no description)"
kind = os.environ.get("KIND", "issue")
# Truncate very long bodies to keep token usage predictable
if len(body) > 8000:
body = body[:8000] + "\n\n[... truncated ...]"
print(f"Classify the following {kind}. Return ONLY a JSON array of labels.\n")
print("--- TITLE ---")
print(title)
print()
print("--- BODY ---")
print(body)
print("--- END ---")
PY
echo "prompt_path=$RUNNER_TEMP/ai/user-prompt.txt" >> "$GITHUB_OUTPUT"
- name: Classify with AI
id: classify
uses: actions/ai-inference@e09e65981758de8b2fdab13c2bfb7c7d5493b0b6 # v2.0.7
with:
model: openai/gpt-5
# GPT-5 is a reasoning model: output tokens include reasoning, so budget generously.
# Temperature is ignored by reasoning models and intentionally omitted.
max-completion-tokens: 2000
system-prompt-file: ${{ steps.render.outputs.system_prompt_path }}
prompt-file: ${{ steps.prep.outputs.prompt_path }}
- name: Apply labels
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8
env:
AI_RESPONSE: ${{ steps.classify.outputs.response }}
with:
script: |
const raw = (process.env.AI_RESPONSE || '').trim();
core.info(`Raw AI response:\n${raw}`);
// Extract the first JSON array from the response (tolerates stray prose or code fences)
const match = raw.match(/\[[\s\S]*\]/);
if (!match) {
core.warning('No JSON array found in AI response — skipping labeling.');
return;
}
let parsed;
try {
parsed = JSON.parse(match[0]);
} catch (e) {
core.warning(`Failed to parse JSON array: ${e.message}`);
return;
}
if (!Array.isArray(parsed)) {
core.warning('AI response JSON is not an array — skipping.');
return;
}
// Re-validate against live repo labels. Same source of truth as the prompt renderer,
// so drift is impossible — any label the model picks MUST exist in the repo.
const managedPrefixes = ['area/', 'integration/', 'db/', 'concern/'];
const allRepoLabels = await github.paginate(
github.rest.issues.listLabelsForRepo,
{ owner: context.repo.owner, repo: context.repo.repo, per_page: 100 }
);
const allowed = new Set(
allRepoLabels
.map(l => l.name)
.filter(n => managedPrefixes.some(p => n.startsWith(p)))
);
const valid = [...new Set(parsed)].filter(
l => typeof l === 'string' && allowed.has(l)
);
const rejected = parsed.filter(l => !valid.includes(l));
if (rejected.length > 0) {
core.warning(`Ignored unknown labels: ${JSON.stringify(rejected)}`);
}
// Cap at 6 labels — our taxonomy rule says 24 is typical, 6 is the ceiling.
const toApply = valid.slice(0, 6);
if (toApply.length === 0) {
core.info('No valid labels selected — leaving item unlabeled for human triage.');
return;
}
const number =
context.payload.issue?.number ?? context.payload.pull_request.number;
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: number,
labels: toApply,
});
core.info(`Applied labels to #${number}: ${toApply.join(', ')}`);