Kodecraft
Tech Enrichment

Lance Alexander Ventura · AI Engineer Intern

Problem Context

Tech links scroll into oblivion.

Kodecraft developers regularly post tech and AI links into Mattermost as a way to share discoveries with the team. The links live in a communication medium, not a knowledge base.

Past links become practically unfindable — they scroll out of view, mix with unrelated chat, and cannot be referenced systematically. A link posted six months ago that would solve today's problem is effectively lost.

Volume is high enough that even recently-shared links get missed. There is no filter for what matters to Kodecraft, no record of past evaluations, no signal when a tool is superseded.

What this Project Does

A continuous pipeline from Mattermost links to a curated knowledge base.

Ingestion — polls scoped Mattermost channels, canonicalizes URLs, dedupes
Two-stage LLM evaluation — cheap classifier short-circuits a capable evaluator
Outline knowledge base — workflow, dependency, general, and history docs
Mattermost discovery hub — top-5 ranking per category, edited in place
HITL signaling — 🟢 / 🔴 vote reactions feed lifecycle decisions
Lifecycle & freshness — disposition + freshness axes, scheduled re-evaluation

Architecture

Six stages, two surfaces.

Periodic poll fetches new posts from scoped Mattermost channels. URLs are canonicalized and deduped, then a heuristic filter rejects obvious non-tech links before any LLM call.

A cheap classifier (Stage 1) labels surviving links and narrows project context. A capable evaluator (Stage 2) produces the structured record that drives writes to Outline (full record) and the Mattermost Hub (top-5 ranking).

Where it Starts

Kodecraft already shares links — in #tech-links.

Developers post AI tools, libraries, frameworks, and articles in Mattermost as fast as they encounter them. This is the input surface, and it already exists.

The pipeline does not change how people share. It changes what happens after a link is posted: every URL is captured, canonicalized, deduped, and routed through evaluation — without anyone needing to do anything new.

Tech Enrichment Hub · Outline side

Outline holds the full record.

Every evaluated link lives in Outline as a structured entry: classification, relevance score, pros/cons, project routing, freshness state. Multiple docs split the surface — Workflow, per-project Dependencies, General, and an append-only Evaluation History.

This is the source of truth. Search, audit, freshness — all here.

switching to Outline

Tech Enrichment Hub · Mattermost side

Mattermost Hub is the top-5 triage surface.

One pinned post per category in #tech-enrichment-hub: Workflow, Dependency (per project), General. Each post is edited in place on every dirty-flag dispatch — no scrollback, no growing thread.

Outline = full record. Mattermost Hub = "what should I look at next?"

switching to Mattermost

HITL Signaling

The LLM curates. Humans signal.

Each new entry gets a threaded reply in #tech-links with two seed reactions: 🟢 ("I used this and it worked") and 🔴 ("I tried it; it didn't fit"). No third "haven't used" — that creates social pressure and noisy signal.

Non-engagement is computed, not declared. Reactions are retractable; current state is authoritative.

Reaction model

🟢I used it · positive signal
🔴didn't fit · negative signal
—silence · computed, never asked

Stage 0

Reject the obvious before spending an LLM token.

URLs are canonicalized — tracking parameters stripped, redirects resolved, GitHub URLs collapsed to owner/repo. Duplicates against the link registry are dropped here.

A small YAML rule set (config/heuristic_rules.yaml) rejects domains and URL patterns that aren't worth evaluating: pure social media, internal Kodecraft URLs, unrelated content.

Reject reasons

1duplicate of canonical URL
2blocklisted domain
3internal Kodecraft URL
4known non-artifact pattern

Stage 1

Cheap model labels, narrows context, short-circuits.

A small model (gpt-oss-120) sees the link, fetched content, the project taxonomy, and all per-project PRDs. It labels the link as workflow / dependency / both / general / neither.

Output also carries candidate_slugs — projects most likely affected — which Stage 2 will use to narrow its expensive context. general with medium+ confidence short-circuits Stage 2 entirely.

Stage 1 output

1classification
2candidate_slugs[]
3confidence (low/med/high)
4rationale

Stage 2

Full evaluation with Decision Rules.

A capable model (gpt-oss-120) consumes per-project PRD + dependency context, narrowed by Stage 1's candidate_slugs within a 4000-token budget per axis.

Output is a strict JSON schema. matches_existing filters per project but preserves the global adopted signal — already-adopted excludes that project, not the link. Fetched content is treated as untrusted data, not instructions.

Output schema (strict)

1classification
2relevance_score · 0.0–1.0
3matches_existing
4affected_projects[]
5pros / cons
6confidence

Cost & Token Usage

gpt-oss-120 on both stages — tokens observed, cost estimated.

Langfuse captured 950 observations across 575 traces (Apr 26 – May 8) but model pricing wasn't configured — cost_details are null for all calls.

Figures are estimated at $0.15 / 1M input · $0.60 / 1M output applied to observed token counts from 51 clean two-stage runs.

Token & cost (est.) — 51 runs

S1Stage 1 / run · ~3,330 tok · ~$0.00065
S2Stage 2 / run · ~6,043 tok · ~$0.00125
∑Pipeline / run · ~9,373 tok · ~$0.0019
$51-run total · ~$0.097 est.
⏱Avg latency · ~5.35 s / run

Two-axis state model

Disposition answers "what's our stance?" Freshness answers "how trustworthy is the analysis?"

Disposition (mutually exclusive, pipeline-managed)

State	Meaning
🟢 active	Current recommendation or informational entry.
✅ adopted	Team signals indicate Kodecraft is using it.
🔴 rejected	Team or pipeline determined it's not useful.
⚫ obsolete	Tool itself is no longer viable (archived, deprecated).
🟡 superseded	Another candidate replaced it as the recommendation.

Freshness (overlay, separate from disposition)

State	Meaning
✓ fresh	Latest evaluation is recent enough to trust.
◷ due_for_review	Still usable, but the check window has expired.
⏳ awaiting_evaluation	Queued for a re-check.
⚠ reevaluation_failed	Last re-check failed; prior analysis remains visible.

Scheduled re-checks

Re-evaluation cadence by record type.

Default review windows

Record type	Review cadence
active recommendation	30 days
alternative candidate	14 days
adopted	60 days
general current-state item	60 days
rejected	no scheduled review
obsolete or superseded	no scheduled review unless manually targeted

Failed re-evaluations preserve the prior payload; obsolete requires evidence-backed reason; superseded requires a successor reference. Age alone never makes a tool obsolete.

Gentle prompting, never silent removal

Low-signal entries get a follow-up nudge, not a deletion.

Entry signal lifecycle

Phase	Trigger	System action
Newly surfaced	Entry just created	Threaded reply with 🟢 / 🔴 seed reactions
Active	Entry visible, accumulating signals	Counts tracked; no notification
Low-engagement	Age > 6 weeks, low reaction rate	Follow-up prompt in channel
Persistent silence	Long quiet period after prompts	Marked "awaiting evaluation"; remains visible
Adopted / Rejected	Sufficient 🟢 / 🔴 signals	Disposition transition; appended to history

Future Direction

RAG, when retrieval actually starts to hurt.

Today's retrieval is direct: PRD bodies and dep doc sections are loaded from Outline and injected into prompts within token budgets. This is simple and sufficient for the current corpus.

RAG enters when two signals appear: the LLM misses semantic matches (adjacent concepts not surfacing because queries target specific names), or token cost from full-document loads becomes a material fraction of evaluation spend. Either signal earns RAG its place — neither earns it preemptively.

Parallel future input: claude-mem workflow telemetry, captured from real Claude Code / OpenCode sessions, will feed workflow-track context once that project ships.

Thank you

Questions?

Lance Alexander Ventura · AI Engineer Intern · lance@kodecraft.dev